CPython Runtime Internals: Key Data Structures & Runtime Bootstrapping

What are the key data structures which form the CPython runtime and how are they initialized at startup

Apr 26, 2024

While this article is freely available to read online, I am also making a PDF of this article available. If you enjoy reading in that format, you can purchase it at the below link. If you are a paid subscriber you can find a 100% discount code in the header of the email, or just reach out to me via email or DM and I will give you the PDF.

Purchase PDF and Support My Work

The runtime of a programming language is the crucial piece which orchestrates code execution by integrating various components such as the virtual machine, object system, memory allocators and the garbage collector. It initializes and manages the state of these systems, to do this the runtime maintains few key data structures which are initialized during the startup of the Python process.

In this article, we will look at the definition of the data structures which form the CPython runtime, how they are intialized and understand what is their role in Python code execution.

black flat screen computer monitor — Photo by Kevin Canlas on Unsplash

⚡️Announcement⚡️: Recording of the live session on the performance engineering lessons from 1BRC

Recording: Six Key Performance Engineering Lessons from 1BRC

Abhinav Upadhyay

September 23, 2024

Recording: Six Key Performance Engineering Lessons from 1BRC

Last night we did this live session on performance engineering. I’ve done sessions on 1BRC in the past as well, but this time the focus was on learning things at a more fundamental level. We spent a good chunk of the time in the beginning understanding the basics of how the CPU executes code, how the data cache when utilized efficiently improves program performance significantly, and touched upon the branch predictor and instruction level parallelism.

Read full story

What Happens When We Execute a Python Program?

Before we dive into the specifics of the internal implementation details of the CPython, let’s do a high level overview of the set of things that happen when executing a Python program. This will help us understand why we are starting at the runtime and not directly at the bytecode interpreter.

The following figure illustrates the set of events that take place when we start the Python interpreter.

Before CPython can start executing any Python code it first needs to initialize the runtime and then invoke the bytecode interpreter to execute the bytecode

Broadly, it happens in three parts:

First the main function of the CPython implementation starts up.
The main function first performs the runtime initialization. The CPython runtime includes the main interpreter, the main execution thread, any statically initialized objects (such as small integers, free list cache), memory allocators, interned strings cache etc. Here, the main interpreter and the thread are the important bits of the runtime state which are critical for the operation of the bytecode interpreter.
Finally, once the runtime is in place, then the main function triggers the code execution path where the user’s Python code is parsed and compiled to bytecode and then the bytecode interpreter comes into the picture to interpret the bytecode.

Since we are just getting started, we will focus on the runtime initialization part.

The CPython Runtime Representation

The whole focus of this article is around the initialization of the CPython runtime, so we must first look at how this runtime is represented in code.

The CPython runtime state is represented by the struct _PyRuntimeState which is defined in the file Include/internal/pycore_runtime.h. It’s a very large struct but the following figure shows some of its key fields which will be our focus during this article series as well.

The definition of the _PyRunttimeState struct which represents CPython’s runtime state

The two main things in the CPython runtime are all the interpreters and all the threads because these are the ones responsible for all the Python code execution.

The CPython process starts with one main interpreter and usually that is the only interpreter.However, user code can also start one or more subinterpreters.

The CPython runtime tracks the state of these interpreters using a linked list of interpreter state objects. The head of this linked list is the main interpreter and every subinterpreter when created gets added to this linked list.

Similarly, there is one main thread which is associated with the main interpreter. The runtime tracks the state of this thread using the main_tstate field.

Let’s now see the definition of the interpreter state and thread state objects.

Representation of the Interpreter State in CPython

Every interpreter in the CPython has a state which is represented using the PyInterpreterState struct defined in the file Include/internal/pycore_interp.h. The struct is pretty large so I cannot show its full definition, however, the following figure shows and annotates some of its most important fields.

The definition of the PyInterpreterStruct which represents the interpreter state in CPython

The annotations do a pretty good job of explaining the important fields of the PyInterpreterState struct. From the point of view of the bytecode interpreter, the threads field is the most important one. It maintains a linked list of the states of all the threads within that interpreter, and it also holds a reference to the thread state of the main thread (the threads.main field).

The thread state of the main thread contains the stack frame of the function that is currently executing on the interpreter, the stack, instruction pointer etc. These details are exactly what represent the state of the interpreter.

The Thread State Representation

In CPython runtime, each OS thread has at least one associated thread state. This state is represented by the struct PyThreadState which is defined in the file Include/cpython/pystate.h.

The definition of the PyThreadState struct which represents the state of the thread in the CPython runtime

Again, this struct is too big to show all of it. I’ve truncated it and shown some of the important fields which are relevant from the point of view of the bytecode interpreter code.

The most important field in the thread state is the frame pointer which points to the stack frame of the currently executing Python function on the bytecode interpreter (VM). The stack frame contains the bytecode of the function being executed, instruction pointer, the stack and all the related states. We will discuss it in more detail in the next article when we get to see the bytecode interpreter implementation.

How the Thread State is Stored and Accessed?

As there can be multiple threads in the interpreter and each of those have their own thread states, the interpreter needs a fast and safe mechanism to get the thread state of the currently active thread.

To enable this, each thread’s current thread state is stored in a thread local variable called _Py_tss_tstate and it is defined in the file Python/pystate.c.

Thread local storage (TLS) is a private area in the memory of each thread which is not shared with other threads in the process. Any object stored in the thread local storage means that each thread will have its own copy of that object and when we try to access that object, we will get the object from the local storage of the currently active thread.

In order to get the currently active thread state, CPython implements an inline function called _PyThreadState_GET which is used throughout the VM implementation. The following figure shows its definition.

The PyThreadState_Get() function is used to get the thread state of the current thread

It is also possible for one OS thread to have multiple thread states, in which case it switches these thread states using the function _PyThreadState_Swap defined in the file Python/pystate.c.

The definition of the _PyThreadState_Swap function which is used to swap thread states of a thread in CPython runtime.

Summary of CPython Runtime State

This pretty much covers the important details that we need to know about the CPython’s runtime state in order to dive into its bytecode interpreter implementation. Let’s summarize it quickly:

The CPython runtime state is represented by the struct _PyRuntimeState.
It maintains all the global state of the runtime. The couple of important fields from the point of view of the bytecode interpreter are the list of interpreter states and the main thread’s thread state fields.
Usually there will only be one main interpreter but it is also possible to have many subinterpreters, all of these are tracked by the runtime using a linked list. The runtime state holds reference to the main interpreter’s state and then that has the reference to the next interpreter’s state.
The interpreter state is represented by the struct PyInterpreterState. Even though it tracks a lot of things related to the execution of the interpreter, the most important field is the threads field which is a linked list of all the thread states active in the interpreter. Usually the interpreter will only have one main thread state, but if more threads are created, then their states are attached to this linked list. At any point of time, the threads.main field will point to the thread state of the currently active thread in the interpreter.
Finally, the state of a thread is represented by the struct PyThreadState. Each thread state is associated with only one OS thread and only one interpreter. However, one OS thread may have multiple associated thread states. The currently active thread state is stored in a thread local variable and can be accessed via the function _PyThreadState_GET.

This covers what the runtime state is and what information it stores. The next part of the article covers a walk through of the CPython code to show how exactly all of this state is initialized when we start executing a Python program.

The CPython Runtime Initialization Process

When we execute the python command on the terminal, what happens? CPython is a giant C program so it starts with the main function. In the case of CPython, this main function lives in the file Programs/python.c. The following figure shows the initial flow from the main function till something of interest happens.

The code path from the main function of CPython when we initially start it

The main function leads to a call to the pymain_main function in Modules/main.c where two things happen.

First, it calls pymain_init where the runtime initialization happens
And, then it calls Py_RunMain where the Python code execution starts to take place.

We will stay focused on the runtime initialization and follow the trail into pymain_init(). The following figure shows pymain_init.

The definition of the pymain_init function in main.c which does the full initialization of the CPython runtime

Here, again two things happen. First, the function _PyRuntime_Initialize is being called, which is defined in the file Python/pylifecycle.c. At the return from this function, most of the runtime state is initialized but two main things remain: the initialization of the main thread’s state and the creation of the main interpreter. These happen in the call to the function Py_InitializeFromConfig which is defined in pylifecycle.c itself. Let’s see these next.

Runtime Initialization in pylifecycle.c

We just saw that the pymain_init() in main.c makes two function calls which are defined in pylifecycle.c and these two calls combined do the proper setup of the CPython runtime. So let’s spend some time to understand how that happens in pylifecycle.c.

The Global Runtime State Object Declaration

Before we look at the functions in pylifecycle.c which pymain_init is calling, I want to show how the global runtime state object is actually created because till now we have not seen it.

It turns out that the CPython’s runtime state is declared as a global variable in pylifecycle.c. The following figure shows how it is done.

One interesting thing to note here is that on Linux, this object is placed in its own section in the ELF binary which is generated after the build. Doing so aids debugging a Python process even in the absence of debugging symbols. I wrote a Twitter post on it, that you can read for details.

The CPython runtime state is declared as a global variable in pylifecycle.c and is initialized statically

This global runtime state object is statically initialized by using the macro _PyRuntimeState_INIT which is defined in the file Include/internal/pycore_runtime_init.h. I would not show the macro code here because it is pretty boring and pretty long. It initializes most of the fields of the runtime state, but not the main thread state and the main interpreter. I will leave the macro code to you to explore and understand.

Remaining Initialization of Runtime in pylifecycle.c

Let’s get back to where we diverged. We saw that pymain_init calls two functions in pylifecycle.c to intialize the runtime. First is a call to _PyRuntime_Initialize and then it calls Py_InitializeFromConfig. The _PyRuntime_Initialize does not do anything special in because the runtime is already statically initialized (but you can check it out yourself here).

We will focus on Py_InitializeFromConfig because this is where the main interpreter state and main thread states are initialized.

The code path from Py_InitializeFromConfig which leads to the creation of the runtime’s main interpreter

Quite a few things happen in Py_InitializeFromConfig, most of which are mechanical and not interesting. We will focus on the path that leads to the creation of the main interpreter. I’ve highlighted those parts and put the code of the functions called as part of the path.

The whole thing leads to calling the pyinit_config function and that calls pycore_create_interpreter, let’s look at it.

The definition of the pycore_create_interpreter function which is where the runtime’s main interpreter and the main thread’s state are set up

As you can see, this function creates both the main interpreter and the main thread state by calling _PyInterpreterState_New and _PyThreadState_New respectively. This is the final leg of the CPython runtime initialization, let’s take a look at these functions next.

Creation of New PyInterpreterState

The following figure shows the definition of the function _PyInterpreterState_New.

The definition of the _PyInterpreterState_New function in pystate.c which is called to create a new interpreter state

Although, this function is quite big. It is simply handling two cases:

We are either setting up the runtime’s main interpreter
Or, we are creating a new subinterpreter

To check which case we are in, we check the head of the interpreters list in the runtime object, if that is NULL that means we are setting up the runtime’s main interpreter.

The runtime state includes a field of type PyInterpreterState with the name _main_interpreter. This field is statically initialized when the runtime was initialized. So to set up the main interpreter, the function simply points the main and the head pointers in the runtime’s interpreter list to this previously created interpreter state object and it’s done.

In the other case if this was a call for creating a new subinterpreter, then the function dynamically allocates a new interpreter state object, initializes it and then sets it up as the new head of the runtime’s interpreters list.

Creation of New PyThreadState

The following figure shows the definition of the function new_threadstate which is called by _PyThreadState_New to create a new thread state.

The definition of the new_threadstate function in pystate.c which is called to create the thread state of a new thread, or in this case it is being called to set up the thread state of the runtime’s main thread

It is very similar to the function we just saw above for creating a new interpreter state.

The interpreter may have multiple threads and it tracks their states using a linked list in the interpreter state. And the interpreter state also has a reference to the currently active thread’s state (the main thread state field).

So when this function is called, either the interpreter doesn’t have the main thread state set up and that needs to be done, or the interpreter is creating a new thread and the thread state for that thread needs to be created.

In the former case, the linked list’s head will be NULL. To set up the main thread’s state, this function simply reuses a thread state which was statically created at the time of the creation of the interpreter state. This statically initialized thread state lives in the interpreter state in the field called _initial_thread.

In the other case when the interpreter is creating a new thread, this function dynamically allocates a new thread state object, initializes it, and adds it to the linked list of thread states.

At this point the runtime is fully ready to start executing Python code, which we will get to in the next article.

Summary

If you’ve reached till here, you should have a good grasp of what the CPython runtime is and how it is initialized. This is a quick list of the things we learned:

The definition of the CPython runtime - it contains many fields but the list of interpreter states and the main thread state are the most crucial ones for the operation of the VM.
The runtime starts with one main interpreter and one main thread. Although, the user’s program can create subinterpreters which are also tracked by the runtime. Similarly, the user’s code can start more threads.
The state of the interpreter is tracked by the interpreter state object which is defined in the struct PyInterpreterState. One of the most important fields in this struct is the reference to the main thread state.
The thread state is represented by the struct PyThreadState. The most important field in the thread state is a pointer to the stack frame of the currently executing function which contains the bytecode, stack, and the instruction pointer.
When the CPython process starts up, it first initializes the runtime.
The runtime state is represented by a global variable called _PyRuntime declared in the file pylifecycle.c.
The global runtime state is statically initialized, except the main interpreter and the main thread state.
The main interpreter and main thread state are initialized later when initializing the rest of the CPython runtime based on config.

In the next article in this series, we will start looking at how the bytecode interpreter executes the bytecode. We will continue our trail from the main function and see how the Python program lands in the interpreter for execution. Stay tuned!