Walkthrough

At this point you have the basic mechanics: modules compile, Python imports them, functions work. Real projects fail for more mundane reasons: toolchains drift, libraries can’t be found, or the interface accidentally becomes slow.

This module is about building instincts that scale beyond a workshop.

Reproducibility: pin the triangle (Python, compiler, dependencies)

A compiled extension depends on:

the Python ABI (version and build flags)
the compiler and standard library (GCC/Clang, libstdc++)
linked third-party libraries (Eigen/BLAS/GSL/…)

If these change between build and run, you can get confusing ImportErrors. Practical habit: decide your environment first, then build inside it.

The three most common failures (and how to think about them)

1) “Module not found”

If Python can’t find the extension file, you get ModuleNotFoundError. This is usually a search-path issue: the compiled .so is not in a place Python searches.

2) “Undefined symbol” or missing libraries

If Python finds the module but the dynamic loader can’t resolve symbols, you get an ImportError complaining about undefined symbols or missing shared libraries.

The fastest diagnosis is to inspect dependencies:

Linux: ldd yourmodule*.so
macOS: otool -L yourmodule*.so

This tells you what the module needs at runtime and what is missing.

3) Template errors during compilation

If you see enormous template errors, the most common cause is a missing pybind11 header that enables a type conversion (e.g., pybind11/stl.h for STL containers, pybind11/eigen.h for Eigen).

Debugging that scales

A reliable debugging workflow is:

In Python, print where the module came from:

import yourmodule
print(yourmodule.__file__)

Inspect dependencies (ldd / otool -L)
If needed, rebuild from a clean build directory inside the intended environment

This prevents “random thrashing” and usually leads you to the root cause quickly.

Performance instincts (what to optimize first)

The biggest performance win is often not micro-optimizing C++. It is avoiding a design where you cross Python↔C++ millions of times.

Good pattern: - Python prepares arrays - one call into C++ does a whole kernel - Python analyzes/plots results

Risky pattern: - Python loops and calls a small C++ function per element

Also watch for accidental copies: - Python list ↔ std::vector involves conversion/copying - non-contiguous NumPy views often force copies before Eigen can consume them

The right attitude is skepticism: measure on realistic inputs, then optimize what actually dominates.