Live Session: How Modern CPUs Execute Your Code: A Deep Dive into Performance
I hope you enjoyed the recent article on how Unix spell was designed to lookup a 250kB dictionary on a 64kB machine. Writing it wore me down.
Also, I am starting work on the next chapter/article of Linux Context Switching Internals book/series. So this leaves me no time to do a new live session this month. But we will do one in February.
Live Session Agenda
Low level performance engineering requires familiarity with the microarchitecture of the processor. These are the internal implementation details of the hardware which dictate how the processor executes the code.
Most performance bottlenecks lie in this layer, where the processor is unable to utilize its execution resources to their full potential. A proper understanding of the microarchitecture gives us the insights and tools to debug, analyze and fix complex performance issues.
What We’ll Cover
This live session will give you an overview of key microarchitectural features in modern processors, including:
Cache hierarchies and their impact
Translation Lookaside Buffers (TLB)
Data prefetching mechanisms
Branch prediction strategies
Instruction level parallelism
We’ll then connect these concepts by discussing their impact on the performance of real-world code by discussing:
How these were leveraged in the 1 billion row challenge (1BRC) winning solutions
Examining microarchitectural optimizations in the CPython interpreter
Discussing practical applications in your own code
Learning Outcomes
After this session, you will:
Understand key microarchitectural features that impact performance
Be able to apply these concepts to your own performance problems
Recognize patterns in high-performance code
Prerequisites
Basic Hardware Knowledge:
Understanding of CPU components (registers, caches, memory)
Familiarity with memory hierarchy
Basic knowledge of paging and virtual memory
Surface-level understanding is sufficient; we’ll build on these concepts
Data Structure Implementation Experience:
Hash tables, linked lists
Concurrent Programming Background:
Thread synchronization basics (race conditions, mutexes, locks)
Experience with multithreading
Session Format
90-minute live session
Interactive Q&A throughout
Slides and reference materials will be shared afterward
Real-world examples and visualizations
Focus on practical understanding rather than theoretical concepts
What We Won’t Cover
This will not be a walk through of 1BRC code
We will be discussing the intersection of hardware-software layer, but not go at the level of circuits or gate level implementation of the hardware.
Date & Time
February 9th, 16:30-18:00 UTC (22:00-23:30 IST)
How to Attend
The event is free for the paid subscribers, you can RSVP at the Luma link below.
If you are not a paid subscriber, you can upgrade and join. If you face errors in upgrading, reach out to me. I can provide alternate ways to upgrade.
If you don’t wish to upgrade to a paid subscription, you can also choose to buy a ticket to attend the session. I will add you to the list and send the invite.
RSVP
You can RSVP at the below link: