CPython Runtime Internals: Key Data Structures & Runtime Bootstrapping
What are the key data structures which form the CPython runtime and how are they initialized at startup
While this article is freely available to read online, I am also making a PDF of this article available. If you enjoy reading in that format, you can purchase it at the below link. If you are a paid subscriber you can find a 100% discount code in the header of the email, or just reach out to me via email or DM and I will give you the PDF.
The runtime of a programming language is the crucial piece which orchestrates code execution by integrating various components such as the virtual machine, object system, memory allocators and the garbage collector. It initializes and manages the state of these systems, to do this the runtime maintains few key data structures which are initialized during the startup of the Python process.
In this article, we will look at the definition of the data structures which form the CPython runtime, how they are intialized and understand what is their role in Python code execution.
⚡️Announcement⚡️: Recording of the live session on the performance engineering lessons from 1BRC
What Happens When We Execute a Python Program?
Before we dive into the specifics of the internal implementation details of the CPython, let’s do a high level overview of the set of things that happen when executing a Python program. This will help us understand why we are starting at the runtime and not directly at the bytecode interpreter.
The following figure illustrates the set of events that take place when we start the Python interpreter.
Broadly, it happens in three parts:
First the main function of the CPython implementation starts up.
The main function first performs the runtime initialization. The CPython runtime includes the main interpreter, the main execution thread, any statically initialized objects (such as small integers, free list cache), memory allocators, interned strings cache etc. Here, the main interpreter and the thread are the important bits of the runtime state which are critical for the operation of the bytecode interpreter.
Finally, once the runtime is in place, then the main function triggers the code execution path where the user’s Python code is parsed and compiled to bytecode and then the bytecode interpreter comes into the picture to interpret the bytecode.
Since we are just getting started, we will focus on the runtime initialization part.
The CPython Runtime Representation
The whole focus of this article is around the initialization of the CPython runtime, so we must first look at how this runtime is represented in code.
The CPython runtime state is represented by the struct _PyRuntimeState
which is defined in the file Include/internal/pycore_runtime.h. It’s a very large struct but the following figure shows some of its key fields which will be our focus during this article series as well.
The two main things in the CPython runtime are all the interpreters and all the threads because these are the ones responsible for all the Python code execution.
The CPython process starts with one main interpreter and usually that is the only interpreter.However, user code can also start one or more subinterpreters.
The CPython runtime tracks the state of these interpreters using a linked list of interpreter state objects. The head of this linked list is the main interpreter and every subinterpreter when created gets added to this linked list.
Similarly, there is one main thread which is associated with the main interpreter. The runtime tracks the state of this thread using the main_tstate
field.
Let’s now see the definition of the interpreter state and thread state objects.
Representation of the Interpreter State in CPython
Every interpreter in the CPython has a state which is represented using the PyInterpreterState
struct defined in the file Include/internal/pycore_interp.h. The struct is pretty large so I cannot show its full definition, however, the following figure shows and annotates some of its most important fields.
The annotations do a pretty good job of explaining the important fields of the PyInterpreterState
struct. From the point of view of the bytecode interpreter, the threads field is the most important one. It maintains a linked list of the states of all the threads within that interpreter, and it also holds a reference to the thread state of the main thread (the threads.main
field).
The thread state of the main thread contains the stack frame of the function that is currently executing on the interpreter, the stack, instruction pointer etc. These details are exactly what represent the state of the interpreter.
The Thread State Representation
In CPython runtime, each OS thread has at least one associated thread state. This state is represented by the struct PyThreadState
which is defined in the file Include/cpython/pystate.h.
Again, this struct is too big to show all of it. I’ve truncated it and shown some of the important fields which are relevant from the point of view of the bytecode interpreter code.
The most important field in the thread state is the frame pointer which points to the stack frame of the currently executing Python function on the bytecode interpreter (VM). The stack frame contains the bytecode of the function being executed, instruction pointer, the stack and all the related states. We will discuss it in more detail in the next article when we get to see the bytecode interpreter implementation.
How the Thread State is Stored and Accessed?
As there can be multiple threads in the interpreter and each of those have their own thread states, the interpreter needs a fast and safe mechanism to get the thread state of the currently active thread.
To enable this, each thread’s current thread state is stored in a thread local variable called _Py_tss_tstate
and it is defined in the file Python/pystate.c.
Thread local storage (TLS) is a private area in the memory of each thread which is not shared with other threads in the process. Any object stored in the thread local storage means that each thread will have its own copy of that object and when we try to access that object, we will get the object from the local storage of the currently active thread.
In order to get the currently active thread state, CPython implements an inline function called _PyThreadState_GET
which is used throughout the VM implementation. The following figure shows its definition.
It is also possible for one OS thread to have multiple thread states, in which case it switches these thread states using the function _PyThreadState_Swap
defined in the file Python/pystate.c.
Summary of CPython Runtime State
This pretty much covers the important details that we need to know about the CPython’s runtime state in order to dive into its bytecode interpreter implementation. Let’s summarize it quickly:
The CPython runtime state is represented by the struct
_PyRuntimeState
.It maintains all the global state of the runtime. The couple of important fields from the point of view of the bytecode interpreter are the list of interpreter states and the main thread’s thread state fields.
Usually there will only be one main interpreter but it is also possible to have many subinterpreters, all of these are tracked by the runtime using a linked list. The runtime state holds reference to the main interpreter’s state and then that has the reference to the next interpreter’s state.
The interpreter state is represented by the struct
PyInterpreterState
. Even though it tracks a lot of things related to the execution of the interpreter, the most important field is the threads field which is a linked list of all the thread states active in the interpreter. Usually the interpreter will only have one main thread state, but if more threads are created, then their states are attached to this linked list. At any point of time, thethreads.main
field will point to the thread state of the currently active thread in the interpreter.Finally, the state of a thread is represented by the struct
PyThreadState
. Each thread state is associated with only one OS thread and only one interpreter. However, one OS thread may have multiple associated thread states. The currently active thread state is stored in a thread local variable and can be accessed via the function_PyThreadState_GET
.
This covers what the runtime state is and what information it stores. The next part of the article covers a walk through of the CPython code to show how exactly all of this state is initialized when we start executing a Python program.
The CPython Runtime Initialization Process
When we execute the python command on the terminal, what happens? CPython is a giant C program so it starts with the main function. In the case of CPython, this main function lives in the file Programs/python.c. The following figure shows the initial flow from the main function till something of interest happens.
The main function leads to a call to the pymain_main
function in Modules/main.c where two things happen.
First, it calls
pymain_init
where the runtime initialization happensAnd, then it calls
Py_RunMain
where the Python code execution starts to take place.
We will stay focused on the runtime initialization and follow the trail into pymain_init()
. The following figure shows pymain_init
.
Here, again two things happen. First, the function _PyRuntime_Initialize
is being called, which is defined in the file Python/pylifecycle.c. At the return from this function, most of the runtime state is initialized but two main things remain: the initialization of the main thread’s state and the creation of the main interpreter. These happen in the call to the function Py_InitializeFromConfig
which is defined in pylifecycle.c itself. Let’s see these next.
Runtime Initialization in pylifecycle.c
We just saw that the pymain_init()
in main.c makes two function calls which are defined in pylifecycle.c and these two calls combined do the proper setup of the CPython runtime. So let’s spend some time to understand how that happens in pylifecycle.c.
The Global Runtime State Object Declaration
Before we look at the functions in pylifecycle.c which pymain_init is calling, I want to show how the global runtime state object is actually created because till now we have not seen it.
It turns out that the CPython’s runtime state is declared as a global variable in pylifecycle.c. The following figure shows how it is done.
One interesting thing to note here is that on Linux, this object is placed in its own section in the ELF binary which is generated after the build. Doing so aids debugging a Python process even in the absence of debugging symbols. I wrote a Twitter post on it, that you can read for details.
This global runtime state object is statically initialized by using the macro _PyRuntimeState_INIT
which is defined in the file Include/internal/pycore_runtime_init.h. I would not show the macro code here because it is pretty boring and pretty long. It initializes most of the fields of the runtime state, but not the main thread state and the main interpreter. I will leave the macro code to you to explore and understand.
Remaining Initialization of Runtime in pylifecycle.c
Let’s get back to where we diverged. We saw that pymain_init calls two functions in pylifecycle.c to intialize the runtime. First is a call to _PyRuntime_Initialize
and then it calls Py_InitializeFromConfig.
The _PyRuntime_Initialize
does not do anything special in because the runtime is already statically initialized (but you can check it out yourself here).
We will focus on Py_InitializeFromConfig
because this is where the main interpreter state and main thread states are initialized.
Quite a few things happen in Py_InitializeFromConfig
, most of which are mechanical and not interesting. We will focus on the path that leads to the creation of the main interpreter. I’ve highlighted those parts and put the code of the functions called as part of the path.
The whole thing leads to calling the pyinit_config
function and that calls pycore_create_interpreter
, let’s look at it.
As you can see, this function creates both the main interpreter and the main thread state by calling _PyInterpreterState_New
and _PyThreadState_New
respectively. This is the final leg of the CPython runtime initialization, let’s take a look at these functions next.
Creation of New PyInterpreterState
The following figure shows the definition of the function _PyInterpreterState_New
.
Although, this function is quite big. It is simply handling two cases:
We are either setting up the runtime’s main interpreter
Or, we are creating a new subinterpreter
To check which case we are in, we check the head of the interpreters list in the runtime object, if that is NULL
that means we are setting up the runtime’s main interpreter.
The runtime state includes a field of type PyInterpreterState
with the name _main_interpreter
. This field is statically initialized when the runtime was initialized. So to set up the main interpreter, the function simply points the main and the head pointers in the runtime’s interpreter list to this previously created interpreter state object and it’s done.
In the other case if this was a call for creating a new subinterpreter, then the function dynamically allocates a new interpreter state object, initializes it and then sets it up as the new head of the runtime’s interpreters list.
Creation of New PyThreadState
The following figure shows the definition of the function new_threadstate
which is called by _PyThreadState_New
to create a new thread state.
It is very similar to the function we just saw above for creating a new interpreter state.
The interpreter may have multiple threads and it tracks their states using a linked list in the interpreter state. And the interpreter state also has a reference to the currently active thread’s state (the main thread state field).
So when this function is called, either the interpreter doesn’t have the main thread state set up and that needs to be done, or the interpreter is creating a new thread and the thread state for that thread needs to be created.
In the former case, the linked list’s head will be NULL
. To set up the main thread’s state, this function simply reuses a thread state which was statically created at the time of the creation of the interpreter state. This statically initialized thread state lives in the interpreter state in the field called _initial_thread
.
In the other case when the interpreter is creating a new thread, this function dynamically allocates a new thread state object, initializes it, and adds it to the linked list of thread states.
At this point the runtime is fully ready to start executing Python code, which we will get to in the next article.
Summary
If you’ve reached till here, you should have a good grasp of what the CPython runtime is and how it is initialized. This is a quick list of the things we learned:
The definition of the CPython runtime - it contains many fields but the list of interpreter states and the main thread state are the most crucial ones for the operation of the VM.
The runtime starts with one main interpreter and one main thread. Although, the user’s program can create subinterpreters which are also tracked by the runtime. Similarly, the user’s code can start more threads.
The state of the interpreter is tracked by the interpreter state object which is defined in the struct
PyInterpreterState
. One of the most important fields in this struct is the reference to the main thread state.The thread state is represented by the struct
PyThreadState
. The most important field in the thread state is a pointer to the stack frame of the currently executing function which contains the bytecode, stack, and the instruction pointer.When the CPython process starts up, it first initializes the runtime.
The runtime state is represented by a global variable called
_PyRuntime
declared in the file pylifecycle.c.The global runtime state is statically initialized, except the main interpreter and the main thread state.
The main interpreter and main thread state are initialized later when initializing the rest of the CPython runtime based on config.
In the next article in this series, we will start looking at how the bytecode interpreter executes the bytecode. We will continue our trail from the main function and see how the Python program lands in the interpreter for execution. Stay tuned!