Walkthrough
Why this workshop exists
If you do research computing, you’ve probably experienced both extremes:
- Python is great for exploring ideas and gluing tools together.
- Python can become painfully slow when the real work is a tight numerical loop.
Most groups end up with a split architecture:
- Python for orchestration (I/O, parameter sweeps, plotting, job scripts).
- C/C++/Fortran for kernels (the part that actually burns CPU/GPU time).
The question is not “should I use Python or C++?” It’s: where is the boundary, and how do I cross it cleanly?
pybind11 is one of the most widely-used answers in modern C++ projects: it lets you build a Python module from C++ with relatively little boilerplate. pybind11’s goal is to make the C++ side feel like “normal C++”, while the Python side feels like “normal Python”.
A concrete mental model: what happens on import
When you run:
import hello
Python is not doing anything magical. It searches for a module named hello:
- a
hello.pyfile, or - a package directory
hello/, or - a compiled extension like
hello*.so(Linux),hello*.pyd(Windows), etc.
A pybind11 project produces that compiled extension.
When Python loads it, it calls a special entrypoint function defined by the
PYBIND11_MODULE(name, m) macro. That macro is literally “the function Python calls at import time”.
What you gain by combining Python and C++
The boundary cost is real, so the natural question is: why bother?
In research software, the usual gains are practical:
- Performance where it matters: you can move the hot loop (the kernel) into C++ and keep Python for orchestration.
- Reuse of existing C++ libraries: many mature numerical and domain libraries are in C++ (or expose C APIs).
- Cleaner research workflows: Python becomes the “glue” language for parameter sweeps, notebooks, plotting, and I/O, while C++ stays focused on the computation.
- A path to scaling: once the kernel is in C++, it’s much easier to later add OpenMP/MPI/GPU support without rewriting your analysis code.
The trick is to structure your interface so you pay the boundary overhead rarely, but get the C++ speed often.
Crossing the language boundary has overhead:
- Python objects must be checked and converted to C++ values.
- Reference counts / lifetimes must be managed.
- Arrays may get copied if the memory layout doesn’t match what C++ expects.
This leads to the first performance rule you should internalize:
Call C++ once per chunk of work, not once per scalar.
Bad (boundary call repeated millions of times):
for i in range(N):
y[i] = cpp_scale(x[i], 2.0)
Better (one call, loop moved to C++):
y = cpp_scale_vector(x, 2.0)
In Modules 4 and 5 we’ll make this “chunking” idea concrete.
References
- pybind11 “First steps / basics” (simple function bindings): https://pybind11.readthedocs.io/en/stable/basics.html
PYBIND11_MODULEreference: https://pybind11.readthedocs.io/en/stable/reference.html