🐍 Python API Reference
Piper provides a clean and simple Python API for integrating text-to-speech capabilities directly into your applications.
Prerequisites
Ensure you have installed Piper and downloaded a voice model.
pip install piper-tts
python3 -m piper.download_voices en_US-lessac-medium
Basic Synthesis
The primary class for interacting with Piper is PiperVoice
. You can load a voice model and synthesize text to a WAV file with just a few lines of code.
import wave
from piper import PiperVoice
# Load the voice model
voice = PiperVoice.load("./en_US-lessac-medium.onnx")
# Synthesize text to a WAV file
with wave.open("output.wav", "wb") as wav_file:
voice.synthesize_wav("Welcome to the world of speech synthesis!", wav_file)
print("Synthesized audio saved to output.wav")
Adjusting Synthesis Parameters
You can customize the synthesis output by providing a SynthesisConfig
object to the synthesize_wav
method. This allows you to control speed, volume, and variability.
from piper import PiperVoice, SynthesisConfig
import wave
voice = PiperVoice.load("./en_US-lessac-medium.onnx")
# Configure synthesis options
syn_config = SynthesisConfig(
volume=0.5, # half as loud
length_scale=1.5, # 50% slower
noise_scale=0.667, # amount of audio variation
noise_w_scale=0.8, # amount of speaking variation
normalize_audio=True # automatically normalize volume
)
with wave.open("configured_output.wav", "wb") as wav_file:
voice.synthesize_wav(
"This is a customized voice.",
wav_file,
syn_config=syn_config
)
GPU Acceleration (CUDA)
To use a CUDA-enabled GPU for faster inference, you must first install the onnxruntime-gpu
package:
pip install onnxruntime-gpu
Then, set the use_cuda
flag to True
when loading the voice:
voice = PiperVoice.load("./en_US-lessac-medium.onnx", use_cuda=True)
Streaming Audio
For real-time applications, you can stream audio by iterating over the chunks produced by the synthesize
method. Each AudioChunk
object contains raw 16-bit integer audio bytes.
from piper import PiperVoice
voice = PiperVoice.load("./en_US-lessac-medium.onnx")
# The synthesize method returns an iterator of AudioChunk objects
for chunk in voice.synthesize("This audio is being streamed chunk by chunk."):
# chunk.audio_int16_bytes contains the raw audio data
# You can write this to a file, play it, or send it over a network
print(f"Received audio chunk of {len(chunk.audio_int16_bytes)} bytes.")
# Example: my_audio_player.write(chunk.audio_int16_bytes)
Each AudioChunk
also provides metadata:
chunk.sample_rate
(e.g.,22050
)chunk.sample_width
(bytes per sample, e.g.,2
for 16-bit)chunk.num_channels
(e.g.,1
for mono)