What Happens When Python Starts Up? CPython Runtime Internals
CPython JIT Internals, Part 1: What is the Runtime and How is it initialized?
I recently announced that I will be writing a series on CPython’s JIT compiler internals. But, before we get to the JIT compiler we need to cover a lot of ground about the things which happen before it, because all of that context is required to understand how the JIT compiler is implemented in CPython.
The following things happen in the CPython runtime before the JIT compiler actually kicks in:
When the CPython process starts up, it performs runtime initialization. The runtime contains critical data required for the operation of the rest of the virtual machine.
Next, the Python code gets parsed and compiled into bytecode instructions, this is called tier-1 bytecode. This compiled bytecode is interpreted by the tier-1 bytecode interpreter.
The tier-1 bytecode interpreter further profiles the instructions and the hot paths are optimized and converted into tier-2 bytecode. The tier-2 bytecode is interpreted by a tier-2 bytecode interpreter.
Finally, the JIT compiler translates tier-2 bytecode into native code.
You can clearly see the dependency between the different stages of code execution within CPython. It is possible to explain the JIT compiler independently of everything else, but we cannot go into the implementation details. So, I am starting from the beginning and going to cover all of these stages.
This first article starts at the CPython runtime and talks about what the CPython runtime is, how it is represented and how it is initialized when we start executing a Python program.
⚡️Announcement⚡️: Live session on the performance engineering lessons from 1BRC
What Happens When We Execute a Python Program?
Before we dive into the specifics of the internal implementation details of the CPython, let’s do a high level overview of the set of things that happen when executing a Python program. This will help us understand why we are starting at the runtime and not directly at the bytecode interpreter.
The following figure illustrates the set of events that take place when we start the Python interpreter.
Broadly, it happens in three parts:
First the main function of the CPython implementation starts up.
The main function first performs the runtime initialization. The CPython runtime includes the main interpreter, the main execution thread, any statically initialized objects (such as small integers, free list cache), memory allocators, interned strings cache etc. Here, the main interpreter and the thread are the important bits of the runtime state which are critical for the operation of the bytecode interpreter.
Finally, once the runtime is in place, then the main function triggers the code execution path where the user’s Python code is parsed and compiled to bytecode and then the bytecode interpreter comes into the picture to interpret the bytecode.
Since we are just getting started, we will focus on the runtime initialization part.
The CPython Runtime Representation
The whole focus of this article is around the initialization of the CPython runtime, so we must first look at how this runtime is represented in code.
The CPython runtime state is represented by the struct _PyRuntimeState
which is defined in the file Include/internal/pycore_runtime.h. It’s a very large struct but the following figure shows some of its key fields which will be our focus during this article series as well.
The two main things in the CPython runtime are all the interpreters and all the threads because these are the ones responsible for all the Python code execution.
The CPython process starts with one main interpreter and usually that is the only interpreter.However, user code can also start one or more subinterpreters.
The CPython runtime tracks the state of these interpreters using a linked list of interpreter state objects. The head of this linked list is the main interpreter and every subinterpreter when created gets added to this linked list.
Similarly, there is one main thread which is associated with the main interpreter. The runtime tracks the state of this thread using the main_tstate
field.
Let’s now see the definition of the interpreter state and thread state objects.
Representation of the Interpreter State in CPython
Every interpreter in the CPython has a state which is represented using the PyInterpreterState
struct defined in the file Include/internal/pycore_interp.h. The struct is pretty large so I cannot show its full definition, however, the following figure shows and annotates some of its most important fields.
The annotations do a pretty good job of explaining the important fields of the PyInterpreterState
struct. From the point of view of the bytecode interpreter, the threads field is the most important one. It maintains a linked list of the states of all the threads within that interpreter, and it also holds a reference to the thread state of the main thread (the threads.main
field).
The thread state of the main thread contains the stack frame of the function that is currently executing on the interpreter, the stack, instruction pointer etc. These details are exactly what represent the state of the interpreter.
The Thread State Representation
In CPython runtime, each OS thread has at least one associated thread state. This state is represented by the struct PyThreadState
which is defined in the file Include/cpython/pystate.h.
Again, this struct is too big to show all of it. I’ve truncated it and shown some of the important fields which are relevant from the point of view of the bytecode interpreter code.
The most important field in the thread state is the frame pointer which points to the stack frame of the currently executing Python function on the bytecode interpreter (VM). The stack frame contains the bytecode of the function being executed, instruction pointer, the stack and all the related states. We will discuss it in more detail in the next article when we get to see the bytecode interpreter implementation.
How the Thread State is Stored and Accessed?
As there can be multiple threads in the interpreter and each of those have their own thread states, the interpreter needs a fast and safe mechanism to get the thread state of the currently active thread.
To enable this, each thread’s current thread state is stored in a thread local variable called _Py_tss_tstate
and it is defined in the file Python/pystate.c.
Thread local storage (TLS) is a private area in the memory of each thread which is not shared with other threads in the process. Any object stored in the thread local storage means that each thread will have its own copy of that object and when we try to access that object, we will get the object from the local storage of the currently active thread.
In order to get the currently active thread state, CPython implements an inline function called _PyThreadState_GET
which is used throughout the VM implementation. The following figure shows its definition.
It is also possible for one OS thread to have multiple thread states, in which case it switches these thread states using the function _PyThreadState_Swap
defined in the file Python/pystate.c.
Summary of CPython Runtime State
This pretty much covers the important details that we need to know about the CPython’s runtime state in order to dive into its bytecode interpreter implementation. Let’s summarize it quickly:
The CPython runtime state is represented by the struct
_PyRuntimeState
.It maintains all the global state of the runtime. The couple of important fields from the point of view of the bytecode interpreter are the list of interpreter states and the main thread’s thread state fields.
Usually there will only be one main interpreter but it is also possible to have many subinterpreters, all of these are tracked by the runtime using a linked list. The runtime state holds reference to the main interpreter’s state and then that has the reference to the next interpreter’s state.
The interpreter state is represented by the struct
PyInterpreterState
. Even though it tracks a lot of things related to the execution of the interpreter, the most important field is the threads field which is a linked list of all the thread states active in the interpreter. Usually the interpreter will only have one main thread state, but if more threads are created, then their states are attached to this linked list. At any point of time, thethreads.main
field will point to the thread state of the currently active thread in the interpreter.Finally, the state of a thread is represented by the struct
PyThreadState
. Each thread state is associated with only one OS thread and only one interpreter. However, one OS thread may have multiple associated thread states. The currently active thread state is stored in a thread local variable and can be accessed via the function_PyThreadState_GET
.
This covers what the runtime state is and what information it stores. The next part of the article covers a walk through of the CPython code to show how exactly all of this state is initialized when we start executing a Python program.
If you don’t want to become a paid subscriber, you can also buy me a coffee and I will give you one month complimentary paid subscription.