(Live Session) Performance Thinking: Six Key Lessons from 1BRC
Over the past year as I’ve dived deep into systems programming, I’ve developed a strong appreciation for the finer details that drive performance optimization—something I truly enjoy discussing. In the next live session I want to revisit it.
Performance engineering is really about a systems thinking approach where you learn to connect the dots between the different layers of your stack to reason about the performance. It is both wide and deep, so the goal of this session is to discuss this approach of systems thinking.
To do this we need a concrete problem and for that I will take the help of the One Billion Row Challenge (1BRC). We will use this problem and discuss six techniques that were used by the winners of the challenge to be at the top.
Agenda
We will cover six broad things in the session:
Mechanical sympathy and a review of how caches, branch prediction, and instruction level parallelism affect performance
Reading files using the read(2) system call versus mmap
We will cover in detail the two file reading techniques, what happens under the hood in both the cases, details of read-ahead, trade-offs of the two techniques.
Concurrency: cost of shared vs private data structures
In multi-threaded code, the impact of a shared data structure versus private
Work stealing
Loop unrolling
Cost of branch misses
We will talk about branch prediction and what happens when the prediction is wrong. In the context of the 1BRC, we will discuss the technique of SWAR (SIMD within a register) to parse the newline character in strings without branching.
Design of a performant hash table
Sometimes you need to roll out your own implementations of common data structure to take advantage of the constraints of the problem. We will discuss some ideas that people used for a cache optimized hash table in the 1BRC.
Format
I intend to do this as a discussion of the underlying system concepts. We will discuss the details of the hardware and the operating system to understand the performance bottlenecks and how the specific optimization tricks help. The focus will be less on specific code implementation details because that tends to take the focus of the discussion away from the broad systems details to the minutiae of code. In my experience if you understand the why behind the techniques, the code becomes very easy to understand and you can do that on your own.
Prerequisites
Some basic background in computer architecture would be helpful but not an absolute requirement. I will try to cover some basics where required, just interrupt and ask me questions when something is not clear.
You should have some experience of writing multithreaded code and implementing common data structures such as hash tables.
No specific programming language prerequisites are there, as long as you are proficient in some programming language it should be fine.
Date and Time
September 22nd, 2024 from 16:00 to 17:30 UTC
Logistics
As always, the session is free for the paid subscribers. To attend consider upgrading your subscription.
If for some reason the Substack payment does not work for you, you can try becoming a member via buymeacoffee or GitHub sponsorships.
Not Interested in Membership?
If you don’t wish to become a regular member, then you can buy a one time ticket for the session. The membership price is heavily discounted to attract long-term subscribers; it is not fair to take advantage of it to attend just one session :)
RSVP
To get the Zoom event invite, please RSVP at the following link.