Live Session: How Modern CPUs Execute Your Code: A Deep Dive into Performance

Jan 14, 2025

∙ Paid

I hope you enjoyed the recent article on how Unix spell was designed to lookup a 250kB dictionary on a 64kB machine. Writing it wore me down.
Also, I am starting work on the next chapter/article of Linux Context Switching Internals book/series. So this leaves me no time to do a new live session this month. But we will do one in February.

Live Session Agenda

Low level performance engineering requires familiarity with the microarchitecture of the processor. These are the internal implementation details of the hardware which dictate how the processor executes the code.

Most performance bottlenecks lie in this layer, where the processor is unable to utilize its execution resources to their full potential. A proper understanding of the microarchitecture gives us the insights and tools to debug, analyze and fix complex performance issues.

What We’ll Cover

This live session will give you an overview of key microarchitectural features in modern processors, including:

Cache hierarchies and their impact
Translation Lookaside Buffers (TLB)
Data prefetching mechanisms
Branch prediction strategies
Instruction level parallelism

We’ll then connect these concepts by discussing their impact on the performance of real-world code by discussing:

How these were leveraged in the 1 billion row challenge (1BRC) winning solutions
Examining microarchitectural optimizations in the CPython interpreter
Discussing practical applications in your own code

Learning Outcomes

After this session, you will:

Understand key microarchitectural features that impact performance
Be able to apply these concepts to your own performance problems
Recognize patterns in high-performance code

Prerequisites

Basic Hardware Knowledge:
- Understanding of CPU components (registers, caches, memory)
- Familiarity with memory hierarchy
- Basic knowledge of paging and virtual memory
- Surface-level understanding is sufficient; we’ll build on these concepts
Data Structure Implementation Experience:
- Hash tables, linked lists
Concurrent Programming Background:
- Thread synchronization basics (race conditions, mutexes, locks)
- Experience with multithreading

Session Format

90-minute live session
Interactive Q&A throughout
Slides and reference materials will be shared afterward
Real-world examples and visualizations
Focus on practical understanding rather than theoretical concepts

What We Won’t Cover

This will not be a walk through of 1BRC code
We will be discussing the intersection of hardware-software layer, but not go at the level of circuits or gate level implementation of the hardware.

Date & Time

February 9th, 16:30-18:00 UTC (22:00-23:30 IST)

How to Attend

The event is free for the paid subscribers, you can RSVP at the Luma link below.

If you are not a paid subscriber, you can upgrade and join. If you face errors in upgrading, reach out to me. I can provide alternate ways to upgrade.

RSVP

You can RSVP at the below link: