Tutorial: CXX blobstore client

This example walks through a Rust application that calls into a C++ client of a blobstore service. In fact we'll see calls going in both directions: Rust to C++ as well as C++ to Rust. For your own use case it may be that you need just one of these directions.

All of the code involved in the example is shown on this page, but it's also provided in runnable form in the demo directory of https://github.com/dtolnay/cxx. To try it out directly, run cargo run from that directory.

This tutorial assumes you've read briefly about shared structs, opaque types, and functions in the Core concepts page.

Creating the project

We'll use Cargo, which is the build system commonly used by open source Rust projects. (CXX works with other build systems too; refer to chapter 5.)

Create a blank Cargo project: mkdir cxx-demo; cd cxx-demo; cargo init.

Edit the Cargo.toml to add a dependency on the cxx crate:

# Cargo.toml
[package]
name = "cxx-demo"
version = "0.1.0"
edition = "2018"

[dependencies]
cxx = "1.0"

We'll revisit this Cargo.toml later when we get to compiling some C++ code.

Defining the language boundary

CXX relies on a description of the function signatures that will be exposed from each language to the other. You provide this description using extern blocks in a Rust module annotated with the #[cxx::bridge] attribute macro.

We'll open with just the following at the top of src/main.rs and walk through each item in detail.

// src/main.rs

#[cxx::bridge]
mod ffi {

}

fn main() {}

The contents of this module will be everything that needs to be agreed upon by both sides of the FFI boundary.

Calling a C++ function from Rust

Let's obtain an instance of the C++ blobstore client, a class BlobstoreClient defined in C++.

We'll treat BlobstoreClient as an opaque type in CXX's classification so that Rust does not need to assume anything about its implementation, not even its size or alignment. In general, a C++ type might have a move-constructor which is incompatible with Rust's move semantics, or may hold internal references which cannot be modeled by Rust's borrowing system. Though there are alternatives, the easiest way to not care about any such thing on an FFI boundary is to require no knowledge about a type by treating it as opaque.

Opaque types may only be manipulated behind an indirection such as a reference &, a Rust Box, or a UniquePtr (Rust binding of std::unique_ptr). We'll add a function through which C++ can return a std::unique_ptr<BlobstoreClient> to Rust.

// src/main.rs

#[cxx::bridge]
mod ffi {
    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
    }
}

fn main() {
    let client = ffi::new_blobstore_client();
}

The nature of unsafe extern blocks is clarified in more detail in the extern "C++" chapter. In brief: the programmer is not promising that the signatures they have typed in are accurate; that would be unreasonable. CXX performs static assertions that the signatures exactly match what is declared in C++. Rather, the programmer is only on the hook for things that C++'s semantics are not precise enough to capture, i.e. things that would only be represented at most by comments in the C++ code. In this case, it's whether new_blobstore_client is safe or unsafe to call. If that function said something like "must be called at most once or we'll stomp yer memery", Rust would instead want to expose it as unsafe fn new_blobstore_client, this time inside a safe extern "C++" block because the programmer is no longer on the hook for any safety claim about the signature.

If you build this file right now with cargo build, it won't build because we haven't written a C++ implementation of new_blobstore_client nor instructed Cargo about how to link it into the resulting binary. You'll see an error from the linker like this:

error: linking with `cc` failed: exit code: 1
 |
 = /bin/ld: target/debug/deps/cxx-demo-7cb7fddf3d67d880.rcgu.o: in function `cxx_demo::ffi::new_blobstore_client':
   src/main.rs:1: undefined reference to `cxxbridge1$new_blobstore_client'
   collect2: error: ld returned 1 exit status

Adding in the C++ code

In CXX's integration with Cargo, all #include paths begin with a crate name by default (when not explicitly selected otherwise by a crate; see CFG.include_prefix in chapter 5). That's why we see include!("cxx-demo/include/blobstore.h") above — we'll be putting the C++ header at relative path include/blobstore.h within the Rust crate. If your crate is named something other than cxx-demo according to the name field in Cargo.toml, you will need to use that name everywhere in place of cxx-demo throughout this tutorial.

// include/blobstore.h

#pragma once
#include <memory>

class BlobstoreClient {
public:
  BlobstoreClient();
};

std::unique_ptr<BlobstoreClient> new_blobstore_client();
// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"

BlobstoreClient::BlobstoreClient() {}

std::unique_ptr<BlobstoreClient> new_blobstore_client() {
  return std::unique_ptr<BlobstoreClient>(new BlobstoreClient());
}

Using std::make_unique would work too, as long as you pass -std=c++14 to the C++ compiler as described later on.

The placement in include/ and src/ is not significant; you can place C++ code anywhere else in the crate as long as you use the right paths throughout the tutorial.

Be aware that CXX does not look at any of these files. You're free to put arbitrary C++ code in here, #include your own libraries, etc. All we do is emit static assertions against what you provide in the headers.

Compiling the C++ code with Cargo

Cargo has a build scripts feature suitable for compiling non-Rust code.

We need to introduce a new build-time dependency on CXX's C++ code generator in Cargo.toml:

# Cargo.toml
[package]
name = "cxx-demo"
version = "0.1.0"
edition = "2018"

[dependencies]
cxx = "1.0"

[build-dependencies]
cxx-build = "1.0"

Then add a build.rs build script adjacent to Cargo.toml to run the cxx-build code generator and C++ compiler. The relevant arguments are the path to the Rust source file containing the cxx::bridge language boundary definition, and the paths to any additional C++ source files to be compiled during the Rust crate's build.

// build.rs

fn main() {
    cxx_build::bridge("src/main.rs")
        .file("src/blobstore.cc")
        .compile("cxx-demo");
}

This build.rs would also be where you set up C++ compiler flags, for example if you'd like to have access to std::make_unique from C++14. See the page on Cargo-based builds for more details about CXX's Cargo integration.

// build.rs

fn main() {
    cxx_build::bridge("src/main.rs")
        .file("src/blobstore.cc")
        .flag_if_supported("-std=c++14")
        .compile("cxx-demo");
}

The project should now build and run successfully, though not do anything useful yet.

cxx-demo$  cargo run
  Compiling cxx-demo v0.1.0
  Finished dev [unoptimized + debuginfo] target(s) in 0.34s
  Running `target/debug/cxx-demo`

cxx-demo$

Calling a Rust function from C++

Our C++ blobstore supports a put operation for a discontiguous buffer upload. For example we might be uploading snapshots of a circular buffer which would tend to consist of 2 pieces, or fragments of a file spread across memory for some other reason (like a rope data structure).

We'll express this by handing off an iterator over contiguous borrowed chunks. This loosely resembles the API of the widely used bytes crate's Buf trait. During a put, we'll make C++ call back into Rust to obtain contiguous chunks of the upload (all with no copying or allocation on the language boundary). In reality the C++ client might contain some sophisticated batching of chunks and/or parallel uploading that all of this ties into.

// src/main.rs

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type MultiBuf;

        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
    }
}

fn main() {
    let client = ffi::new_blobstore_client();
}

Any signature having a self parameter (the Rust name for C++'s this) is considered a method / non-static member function. If there is only one type in the surrounding extern block, it'll be a method of that type. If there is more than one type, you can disambiguate which one a method belongs to by writing self: &BlobstoreClient in the argument list.

As usual, now we need to provide Rust definitions of everything declared by the extern "Rust" block and a C++ definition of the new signature declared by the extern "C++" block.

// src/main.rs

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type MultiBuf;

        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
    }
}

// An iterator over contiguous chunks of a discontiguous file object. Toy
// implementation uses a Vec<Vec<u8>> but in reality this might be iterating
// over some more complex Rust data structure like a rope, or maybe loading
// chunks lazily from somewhere.
pub struct MultiBuf {
    chunks: Vec<Vec<u8>>,
    pos: usize,
}

pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
    let next = buf.chunks.get(buf.pos);
    buf.pos += 1;
    next.map_or(&[], Vec::as_slice)
}

fn main() {
    let client = ffi::new_blobstore_client();
}
// include/blobstore.h

#pragma once
#include <memory>

struct MultiBuf;

class BlobstoreClient {
public:
  BlobstoreClient();
  uint64_t put(MultiBuf &buf) const;
};

std::unique_ptr<BlobstoreClient> new_blobstore_client();

In blobstore.cc we're able to call the Rust next_chunk function, exposed to C++ by a header main.rs.h generated by the CXX code generator. In CXX's Cargo integration this generated header has a path containing the crate name, the relative path of the Rust source file within the crate, and a .rs.h extension.

// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"
#include "cxx-demo/src/main.rs.h"
#include <functional>

BlobstoreClient::BlobstoreClient() {}

std::unique_ptr<BlobstoreClient> new_blobstore_client() {
  return std::make_unique<BlobstoreClient>();
}

// Upload a new blob and return a blobid that serves as a handle to the blob.
uint64_t BlobstoreClient::put(MultiBuf &buf) const {
  // Traverse the caller's chunk iterator.
  std::string contents;
  while (true) {
    auto chunk = next_chunk(buf);
    if (chunk.size() == 0) {
      break;
    }
    contents.append(reinterpret_cast<const char *>(chunk.data()), chunk.size());
  }

  // Pretend we did something useful to persist the data.
  auto blobid = std::hash<std::string>{}(contents);
  return blobid;
}

This is now ready to use. :)

// src/main.rs

#[cxx::bridge]
mod ffi {
    extern "Rust" {
        type MultiBuf;

        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
    }
}

pub struct MultiBuf {
    chunks: Vec<Vec<u8>>,
    pos: usize,
}
pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
    let next = buf.chunks.get(buf.pos);
    buf.pos += 1;
    next.map_or(&[], Vec::as_slice)
}

fn main() {
    let client = ffi::new_blobstore_client();

    // Upload a blob.
    let chunks = vec![b"fearless".to_vec(), b"concurrency".to_vec()];
    let mut buf = MultiBuf { chunks, pos: 0 };
    let blobid = client.put(&mut buf);
    println!("blobid = {}", blobid);
}
cxx-demo$  cargo run
  Compiling cxx-demo v0.1.0
  Finished dev [unoptimized + debuginfo] target(s) in 0.41s
  Running `target/debug/cxx-demo`

blobid = 9851996977040795552

Interlude: What gets generated?

For the curious, it's easy to look behind the scenes at what CXX has done to make these function calls work. You shouldn't need to do this during normal usage of CXX, but for the purpose of this tutorial it can be educative.

CXX comprises two code generators: a Rust one (which is the cxx::bridge attribute procedural macro) and a C++ one.

Rust generated code

It's easiest to view the output of the procedural macro by installing cargo-expand. Then run cargo expand ::ffi to macro-expand the mod ffi module.

cxx-demo$  cargo install cargo-expand
cxx-demo$  cargo expand ::ffi

You'll see some deeply unpleasant code involving #[repr(C)], #[link_name], and #[export_name].

C++ generated code

For debugging convenience, cxx_build links all generated C++ code into Cargo's target directory under target/cxxbridge/.

cxx-demo$  exa -T target/cxxbridge/
target/cxxbridge
├── cxx-demo
│  └── src
│     ├── main.rs.cc -> ../../../debug/build/cxx-demo-11c6f678ce5c3437/out/cxxbridge/sources/cxx-demo/src/main.rs.cc
│     └── main.rs.h -> ../../../debug/build/cxx-demo-11c6f678ce5c3437/out/cxxbridge/include/cxx-demo/src/main.rs.h
└── rust
   └── cxx.h -> ~/.cargo/registry/src/github.com-1ecc6299db9ec823/cxx-1.0.0/include/cxx.h

In those files you'll see declarations or templates of any CXX Rust types present in your language boundary (like rust::Slice<T> for &[T]) and extern "C" signatures corresponding to your extern functions.

If it fits your workflow better, the CXX C++ code generator is also available as a standalone executable which outputs generated code to stdout.

cxx-demo$  cargo install cxxbridge-cmd
cxx-demo$  cxxbridge src/main.rs

Shared data structures

So far the calls in both directions above only used opaque types, not shared structs.

Shared structs are data structures whose complete definition is visible to both languages, making it possible to pass them by value across the language boundary. Shared structs translate to a C++ aggregate-initialization compatible struct exactly matching the layout of the Rust one.

As the last step of this demo, we'll use a shared struct BlobMetadata to pass metadata about blobs between our Rust application and C++ blobstore client.

// src/main.rs

#[cxx::bridge]
mod ffi {
    struct BlobMetadata {
        size: usize,
        tags: Vec<String>,
    }

    extern "Rust" {
        // ...
        type MultiBuf;

        fn next_chunk(buf: &mut MultiBuf) -> &[u8];
    }

    unsafe extern "C++" {
        // ...
        include!("cxx-demo/include/blobstore.h");

        type BlobstoreClient;

        fn new_blobstore_client() -> UniquePtr<BlobstoreClient>;
        fn put(&self, parts: &mut MultiBuf) -> u64;
        fn tag(&self, blobid: u64, tag: &str);
        fn metadata(&self, blobid: u64) -> BlobMetadata;
    }
}

pub struct MultiBuf {
    chunks: Vec<Vec<u8>>,
    pos: usize,
}
pub fn next_chunk(buf: &mut MultiBuf) -> &[u8] {
    let next = buf.chunks.get(buf.pos);
    buf.pos += 1;
    next.map_or(&[], Vec::as_slice)
}

fn main() {
    let client = ffi::new_blobstore_client();

    // Upload a blob.
    let chunks = vec![b"fearless".to_vec(), b"concurrency".to_vec()];
    let mut buf = MultiBuf { chunks, pos: 0 };
    let blobid = client.put(&mut buf);
    println!("blobid = {}", blobid);

    // Add a tag.
    client.tag(blobid, "rust");

    // Read back the tags.
    let metadata = client.metadata(blobid);
    println!("tags = {:?}", metadata.tags);
}
// include/blobstore.h

#pragma once
#include "rust/cxx.h"
#include <memory>

struct MultiBuf;
struct BlobMetadata;

class BlobstoreClient {
public:
  BlobstoreClient();
  uint64_t put(MultiBuf &buf) const;
  void tag(uint64_t blobid, rust::Str tag) const;
  BlobMetadata metadata(uint64_t blobid) const;

private:
  class impl;
  std::shared_ptr<impl> impl;
};

std::unique_ptr<BlobstoreClient> new_blobstore_client();
// src/blobstore.cc

#include "cxx-demo/include/blobstore.h"
#include "cxx-demo/src/main.rs.h"
#include <algorithm>
#include <functional>
#include <set>
#include <string>
#include <unordered_map>

// Toy implementation of an in-memory blobstore.
//
// In reality the implementation of BlobstoreClient could be a large
// complex C++ library.
class BlobstoreClient::impl {
  friend BlobstoreClient;
  using Blob = struct {
    std::string data;
    std::set<std::string> tags;
  };
  std::unordered_map<uint64_t, Blob> blobs;
};

BlobstoreClient::BlobstoreClient() : impl(new class BlobstoreClient::impl) {}

// Upload a new blob and return a blobid that serves as a handle to the blob.
uint64_t BlobstoreClient::put(MultiBuf &buf) const {
  // Traverse the caller's chunk iterator.
  std::string contents;
  while (true) {
    auto chunk = next_chunk(buf);
    if (chunk.size() == 0) {
      break;
    }
    contents.append(reinterpret_cast<const char *>(chunk.data()), chunk.size());
  }

  // Insert into map and provide caller the handle.
  auto blobid = std::hash<std::string>{}(contents);
  impl->blobs[blobid] = {std::move(contents), {}};
  return blobid;
}

// Add tag to an existing blob.
void BlobstoreClient::tag(uint64_t blobid, rust::Str tag) const {
  impl->blobs[blobid].tags.emplace(tag);
}

// Retrieve metadata about a blob.
BlobMetadata BlobstoreClient::metadata(uint64_t blobid) const {
  BlobMetadata metadata{};
  auto blob = impl->blobs.find(blobid);
  if (blob != impl->blobs.end()) {
    metadata.size = blob->second.data.size();
    std::for_each(blob->second.tags.cbegin(), blob->second.tags.cend(),
                  [&](auto &t) { metadata.tags.emplace_back(t); });
  }
  return metadata;
}

std::unique_ptr<BlobstoreClient> new_blobstore_client() {
  return std::make_unique<BlobstoreClient>();
}
cxx-demo$  cargo run
  Running `target/debug/cxx-demo`

blobid = 9851996977040795552
tags = ["rust"]

You've now seen all the code involved in the tutorial. It's available all together in runnable form in the demo directory of https://github.com/dtolnay/cxx. You can run it directly without stepping through the steps above by running cargo run from that directory.


Takeaways

The key contribution of CXX is it gives you Rust–C++ interop in which all of the Rust side of the code you write really looks like you are just writing normal Rust, and the C++ side really looks like you are just writing normal C++.

You've seen in this tutorial that none of the code involved feels like C or like the usual perilous "FFI glue" prone to leaks or memory safety flaws.

An expressive system of opaque types, shared types, and key standard library type bindings enables API design on the language boundary that captures the proper ownership and borrowing contracts of the interface.

CXX plays to the strengths of the Rust type system and C++ type system and the programmer's intuitions. An individual working on the C++ side without a Rust background, or the Rust side without a C++ background, will be able to apply all their usual intuitions and best practices about development in their language to maintain a correct FFI.