Skip to content

Walkthrough

Why this workshop exists

If you do research computing, you’ve probably experienced both extremes:

  • Python is great for exploring ideas and gluing tools together.
  • Python can become painfully slow when the real work is a tight numerical loop.

Most groups end up with a split architecture:

  • Python for orchestration (I/O, parameter sweeps, plotting, job scripts).
  • C/C++/Fortran for kernels (the part that actually burns CPU/GPU time).

The question is not “should I use Python or C++?” It’s: where is the boundary, and how do I cross it cleanly?

pybind11 is one of the most widely-used answers in modern C++ projects: it lets you build a Python module from C++ with relatively little boilerplate. pybind11’s goal is to make the C++ side feel like “normal C++”, while the Python side feels like “normal Python”.

A concrete mental model: what happens on import

When you run:

import hello

Python is not doing anything magical. It searches for a module named hello:

  1. a hello.py file, or
  2. a package directory hello/, or
  3. a compiled extension like hello*.so (Linux), hello*.pyd (Windows), etc.

A pybind11 project produces that compiled extension.

When Python loads it, it calls a special entrypoint function defined by the PYBIND11_MODULE(name, m) macro. That macro is literally “the function Python calls at import time”.

What you gain by combining Python and C++

The boundary cost is real, so the natural question is: why bother?

In research software, the usual gains are practical:

  • Performance where it matters: you can move the hot loop (the kernel) into C++ and keep Python for orchestration.
  • Reuse of existing C++ libraries: many mature numerical and domain libraries are in C++ (or expose C APIs).
  • Cleaner research workflows: Python becomes the “glue” language for parameter sweeps, notebooks, plotting, and I/O, while C++ stays focused on the computation.
  • A path to scaling: once the kernel is in C++, it’s much easier to later add OpenMP/MPI/GPU support without rewriting your analysis code.

The trick is to structure your interface so you pay the boundary overhead rarely, but get the C++ speed often.

Crossing the language boundary has overhead:

  • Python objects must be checked and converted to C++ values.
  • Reference counts / lifetimes must be managed.
  • Arrays may get copied if the memory layout doesn’t match what C++ expects.

This leads to the first performance rule you should internalize:

Call C++ once per chunk of work, not once per scalar.

Bad (boundary call repeated millions of times):

for i in range(N):
    y[i] = cpp_scale(x[i], 2.0)

Better (one call, loop moved to C++):

y = cpp_scale_vector(x, 2.0)

In Modules 4 and 5 we’ll make this “chunking” idea concrete.

References

  • pybind11 “First steps / basics” (simple function bindings): https://pybind11.readthedocs.io/en/stable/basics.html
  • PYBIND11_MODULE reference: https://pybind11.readthedocs.io/en/stable/reference.html