🖥️ Command-Line Interface

The Piper command-line interface (piper) allows for quickly generating audio from text and experimenting with different voices. While convenient for single tasks, it can be slow for repeated use because it needs to load the voice model each time. For high-throughput or continuous use, the HTTP API is recommended.

Running Piper

After downloading a voice model (e.g., en_US-lessac-medium), you can synthesize speech with the following command:

python3 -m piper -m /path/to/en_US-lessac-medium.onnx -f output.wav -- 'This is a test.'
  • The -m <MODEL> argument specifies the path to the .onnx voice model file.
  • The -f <FILE> argument specifies the output .wav file.
  • The text to be synthesized is passed as the final argument.

If your voices are located in a different directory, you can use --data-dir <DIR> to specify it. Piper will look for the required .onnx.json configuration file in that directory.

Direct Audio Playback

If you have ffplay installed, you can omit the -f argument to hear the audio immediately:

python3 -m piper -m en_US-lessac-medium.onnx -- 'This will play on your speakers.'

Command-Line Options

Here are some other useful command-line options:

  • --cuda: Enable GPU acceleration. This requires the onnxruntime-gpu package to be installed.
  • --input-file <FILE>: Read input text from one or more files. This option can be specified multiple times.
  • --sentence-silence <SECONDS>: Add a specified number of seconds of silence to the end of all but the last sentence.
  • --volume <MULTIPLIER>: Adjust the output volume. The default is 1.0. For example, 0.5 would be half volume.
  • --no-normalize: Disable automatic volume normalization applied by default.
  • --output-raw: Output raw audio samples to stdout instead of playing them. Useful for piping to other applications.
  • --help: Show the full list of available commands and options.

Injecting Raw Phonemes

You can inject raw espeak-ng phonemes directly into your text using [[ <phonemes> ]] blocks. This allows for fine-grained control over pronunciation.

For example:

I am the [[ bˈætmæn ]] not [[bɹˈuːs wˈe‍ɪn]]

To get the phonemes for a word, you can use the espeak-ng command-line tool:

espeak-ng -v <VOICE> --ipa=3 -q "<TEXT>"

For example, to get the phonemes for "batman" using the en-us voice:

espeak-ng -v en-us --ipa=3 -q "batman"
# Output: bˈætmæn