🔧 C/C++ API (libpiper)

For high-performance applications, Piper offers a shared library (libpiper) with a C-style API that can be used from C, C++, and other languages that support C bindings.

Building libpiper

The libpiper library is built using CMake. From the libpiper/ directory in the repository:

  1. Configure the build:

    cmake -Bbuild -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=$PWD/install

  2. Build the library:

    cmake --build build

  3. Install the library and headers:

    cmake --install build

This process will automatically download and build espeak-ng and download the pre-compiled onnxruntime shared libraries. The final artifacts will be placed in the libpiper/install directory.

To use libpiper in your project, you will need to:

  • Include the header file: install/include/piper.h
  • Link against the libpiper shared library: install/libpiper.so (or .dll/.dylib)
  • Link against the libonnxruntime shared library: install/lib/libonnxruntime.so
  • Ensure the espeak-ng-data directory (install/espeak-ng-data/) is available at runtime.

C++ Example

Here is a basic example of how to use the C API from C++:

#include <fstream>
#include "piper.h"

int main() {
    // Create the synthesizer
    piper_synthesizer *synth = piper_create("/path/to/voice.onnx",
                                            "/path/to/voice.onnx.json",
                                            "/path/to/espeak-ng-data");

    if (!synth) {
        // Handle error
        return 1;
    }

    // Open a file to write the raw audio samples
    std::ofstream audio_stream("output.raw", std::ios::binary);

    // Get and modify default synthesis options
    piper_synthesize_options options = piper_default_synthesize_options(synth);
    // options.length_scale = 1.5; // 50% slower
    // options.speaker_id = 5;

    // Start synthesis
    piper_synthesize_start(synth, "Welcome to the world of speech synthesis!", &options);

    piper_audio_chunk chunk;
    while (piper_synthesize_next(synth, &chunk) != PIPER_DONE) {
        audio_stream.write(reinterpret_cast<const char *>(chunk.samples),
                           chunk.num_samples * sizeof(float));
    }

    // Free resources
    piper_free(synth);

    return 0;
}

To play the output file, you can use a tool like aplay: aplay -r 22050 -c 1 -f FLOAT_LE -t raw output.raw

C API Reference

This section details the functions and structs exposed by piper.h.

Structs

piper_synthesizer

An opaque struct representing the text-to-speech synthesizer instance.

piper_audio_chunk

Contains a chunk of synthesized audio and associated metadata.

  • const float *samples: Raw floating-point audio samples.
  • size_t num_samples: The number of samples in the chunk.
  • int sample_rate: Sample rate in Hertz (e.g., 22050).
  • bool is_last: True if this is the final audio chunk for the synthesis request.
  • const char32_t *phonemes: Phoneme codepoints. See the Alignments documentation for details.
  • size_t num_phonemes: Number of phoneme codepoints.
  • const int *phoneme_ids: Phoneme IDs used by the model.
  • size_t num_phoneme_ids: Number of phoneme IDs.
  • const int *alignments: Audio sample count for each phoneme ID. Requires a patched model.
  • size_t num_alignments: Number of alignments.

piper_synthesize_options

Configuration for a synthesis request.

  • int speaker_id: ID of the speaker for multi-speaker models (0 for the first speaker).
  • float length_scale: Speaking speed (default: 1.0).
  • float noise_scale: Audio variability (e.g., 0.667).
  • float noise_w_scale: Phoneme length variability (e.g., 0.8).

Functions

piper_synthesizer *piper_create(const char *model_path, const char *config_path, const char *espeak_data_path)

Creates and initializes a synthesizer. Returns NULL on failure.

  • model_path: Path to the .onnx voice model.
  • config_path: Path to the .onnx.json config file. If NULL, it's assumed to be model_path + .json.
  • espeak_data_path: Path to the espeak-ng-data directory.

void piper_free(piper_synthesizer *synth)

Frees all resources associated with a synthesizer.

piper_synthesize_options piper_default_synthesize_options(piper_synthesizer *synth)

Returns the default synthesis options for a given voice model.

int piper_synthesize_start(piper_synthesizer *synth, const char *text, const piper_synthesize_options *options)

Begins the synthesis process for the given text. Call piper_synthesize_next to retrieve audio chunks. Returns PIPER_OK on success.

  • text: The UTF-8 encoded text to synthesize.
  • options: Synthesis options. If NULL, default options are used.

int piper_synthesize_next(piper_synthesizer *synth, piper_audio_chunk *chunk)

Retrieves the next chunk of synthesized audio. The memory for chunk members is valid until the next call to this function. Returns PIPER_OK if a chunk is available, PIPER_DONE if synthesis is complete, or an error code.