🐍 Python API Reference

Piper provides a clean and simple Python API for integrating text-to-speech capabilities directly into your applications.

Prerequisites

Ensure you have installed Piper and downloaded a voice model.

pip install piper-tts
python3 -m piper.download_voices en_US-lessac-medium

Basic Synthesis

The primary class for interacting with Piper is PiperVoice. You can load a voice model and synthesize text to a WAV file with just a few lines of code.

import wave
from piper import PiperVoice

# Load the voice model
voice = PiperVoice.load("./en_US-lessac-medium.onnx")

# Synthesize text to a WAV file
with wave.open("output.wav", "wb") as wav_file:
    voice.synthesize_wav("Welcome to the world of speech synthesis!", wav_file)

print("Synthesized audio saved to output.wav")

Adjusting Synthesis Parameters

You can customize the synthesis output by providing a SynthesisConfig object to the synthesize_wav method. This allows you to control speed, volume, and variability.

from piper import PiperVoice, SynthesisConfig
import wave

voice = PiperVoice.load("./en_US-lessac-medium.onnx")

# Configure synthesis options
syn_config = SynthesisConfig(
    volume=0.5,           # half as loud
    length_scale=1.5,     # 50% slower
    noise_scale=0.667,    # amount of audio variation
    noise_w_scale=0.8,    # amount of speaking variation
    normalize_audio=True  # automatically normalize volume
)

with wave.open("configured_output.wav", "wb") as wav_file:
    voice.synthesize_wav(
        "This is a customized voice.", 
        wav_file,
        syn_config=syn_config
    )

GPU Acceleration (CUDA)

To use a CUDA-enabled GPU for faster inference, you must first install the onnxruntime-gpu package:

pip install onnxruntime-gpu

Then, set the use_cuda flag to True when loading the voice:

voice = PiperVoice.load("./en_US-lessac-medium.onnx", use_cuda=True)

Streaming Audio

For real-time applications, you can stream audio by iterating over the chunks produced by the synthesize method. Each AudioChunk object contains raw 16-bit integer audio bytes.

from piper import PiperVoice

voice = PiperVoice.load("./en_US-lessac-medium.onnx")

# The synthesize method returns an iterator of AudioChunk objects
for chunk in voice.synthesize("This audio is being streamed chunk by chunk."):
    # chunk.audio_int16_bytes contains the raw audio data
    # You can write this to a file, play it, or send it over a network
    print(f"Received audio chunk of {len(chunk.audio_int16_bytes)} bytes.")
    # Example: my_audio_player.write(chunk.audio_int16_bytes)

Each AudioChunk also provides metadata:

  • chunk.sample_rate (e.g., 22050)
  • chunk.sample_width (bytes per sample, e.g., 2 for 16-bit)
  • chunk.num_channels (e.g., 1 for mono)