Context: other Rust–C++ interop tools
When it comes to interacting with an idiomatic Rust API or idiomatic C++ API from the other language, the generally applicable approaches outside of the CXX crate are:
-
Build a C-compatible wrapper around the code (expressed using
extern "C"
signatures, primitives, C-compatible structs, raw pointers). Translate that manually to equivalentextern "C"
declarations in the other language and keep them in sync. Preferably, build a safe/idiomatic wrapper around the translatedextern "C"
signatures for callers to use. -
Build a C wrapper around the C++ code and use bindgen to translate that programmatically to
extern "C"
Rust signatures. Preferably, build a safe/idiomatic Rust wrapper on top. -
Build a C-compatible Rust wrapper around the Rust code and use cbindgen to translate that programmatically to an
extern "C"
C++ header. Preferably, build an idiomatic C++ wrapper.
If the code you are binding is already "effectively C", the above has you covered. You should use bindgen or cbindgen, or manually translated C signatures if there aren't too many and they seldom change.
C++ vs C
Bindgen has some basic support for C++. It can reason about classes, member functions, and the layout of templated types. However, everything it does related to C++ is best-effort only. Bindgen starts from a point of wanting to generate declarations for everything, so any C++ detail that it hasn't implemented will cause a crash if you are lucky (bindgen#388) or more likely silently emit an incompatible signature (bindgen#380, bindgen#607, bindgen#652, bindgen#778, bindgen#1194) which will do arbitrary memory-unsafe things at runtime whenever called.
Thus using bindgen correctly requires not just juggling all your pointers correctly at the language boundary, but also understanding ABI details and their workarounds and reliably applying them. For example, the programmer will discover that their program sometimes segfaults if they call a function that returns std::unique_ptr<T> through bindgen. Why? Because unique_ptr, despite being "just a pointer", has a different ABI than a pointer or a C struct containing a pointer (bindgen#778) and is not directly expressible in Rust. Bindgen emitted something that looks reasonable and you will have a hell of a time in gdb working out what went wrong. Eventually people learn to avoid anything involving a non-trivial copy constructor, destructor, or inheritance, and instead stick to raw pointers and primitives and trivial structs only — in other words C.
Geometric intuition for why there is so much opportunity for improvement
The CXX project attempts a different approach to C++ FFI.
Imagine Rust and C and C++ as three vertices of a scalene triangle, with length of the edges being related to similarity of the languages when it comes to library design.
The most similar pair (the shortest edge) is Rust–C++. These languages have largely compatible concepts of things like ownership, vectors, strings, fallibility, etc that translate clearly from signatures in either language to signatures in the other language.
When we make a binding for an idiomatic C++ API using bindgen, and we fall down
to raw pointers and primitives and trivial structs as described above, what we
are really doing is coding the two longest edges of the triangle: getting from
C++ down to C, and C back up to Rust. The Rust–C edge always involves a
great deal of unsafe
code, and the C++–C edge similarly requires care
just for basic memory safety. Something as basic as "how do I pass ownership of
a string to the other language?" becomes a strap-yourself-in moment,
particularly for someone not already an expert in one or both sides.
You should think of the cxx
crate as being the midpoint of the Rust–C++
edge. Rather than coding the two long edges, you will code half the short edge
in Rust and half the short edge in C++, in both cases with the library playing
to the strengths of the Rust type system and the C++ type system to help
assure correctness.
If you've already been through the tutorial in the previous chapter, take a moment to appreciate that the C++ side really looks like we are just writing C++ and the Rust side really looks like we are just writing Rust. Anything you could do wrong in Rust, and almost anything you could reasonably do wrong in C++, will be caught by the compiler. This highlights that we are on the "short edge of the triangle".
But it all still boils down to the same things: it's still FFI from one piece of native code to another, nothing is getting serialized or allocated or runtime-checked in between.
Role of CXX
The role of CXX is to capture the language boundary with more fidelity than what
extern "C"
is able to represent. You can think of CXX as being a replacement
for extern "C"
in a sense.
From this perspective, CXX is a lower level tool than the bindgens. Just as
bindgen and cbindgen are built on top of extern "C"
, it makes sense to think
about higher level tools built on top of CXX. Such a tool might consume a C++
header and/or Rust module (and/or IDL like Thrift) and emit the corresponding
safe cxx::bridge language boundary, leveraging CXX's static analysis and
underlying implementation of that boundary. We are beginning to see this space
explored by the autocxx tool, though nothing yet ready for broad use in the
way that CXX on its own is.
But note in other ways CXX is higher level than the bindgens, with rich support for common standard library types. CXX's types serve as an intuitive vocabulary for designing a good boundary between components in different languages.