SupertonicTTS

This page explains how to use sherpa-onnx with SupertonicTTS.

Hint

Support of this model in sherpa-onnx is contributed by https://github.com/Wasser1462 in the PR https://github.com/k2-fsa/sherpa-onnx/pull/3605.

SupertonicTTS 3 is an offline multi-speaker, multi-language TTS model supporting 31 languages. In a typical setup, you select a speaker with --sid and a language with --lang.

You can try it online at HuggingFace Spaces.

The following table lists the supported languages and links to their documentation, download instructions, and code examples.

Language

Documentation

Arabic

supertonic-3-ar

Bulgarian

supertonic-3-bg

Croatian

supertonic-3-hr

Czech

supertonic-3-cs

Danish

supertonic-3-da

Dutch

supertonic-3-nl

English

supertonic-3-en

Estonian

supertonic-3-et

Finnish

supertonic-3-fi

French

supertonic-3-fr

German

supertonic-3-de

Greek

supertonic-3-el

Hindi

supertonic-3-hi

Hungarian

supertonic-3-hu

Indonesian

supertonic-3-id

Italian

supertonic-3-it

Japanese

supertonic-3-ja

Korean

supertonic-3-ko

Latvian

supertonic-3-lv

Lithuanian

supertonic-3-lt

Polish

supertonic-3-pl

Portuguese

supertonic-3-pt

Romanian

supertonic-3-ro

Russian

supertonic-3-ru

Slovak

supertonic-3-sk

Slovenian

supertonic-3-sl

Spanish

supertonic-3-es

Swedish

supertonic-3-sv

Turkish

supertonic-3-tr

Ukrainian

supertonic-3-uk

Vietnamese

supertonic-3-vi

Download a pre-trained model

Download the released SupertonicTTS archive from https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-supertonic-3-tts-int8-2026-05-11.tar.bz2
tar xf sherpa-onnx-supertonic-3-tts-int8-2026-05-11.tar.bz2
rm sherpa-onnx-supertonic-3-tts-int8-2026-05-11.tar.bz2

Run a command-line example

The following command uses the same model files as rust-api-examples/examples/supertonic_tts.rs:

./build/bin/sherpa-onnx-offline-tts \
  --supertonic-duration-predictor=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/duration_predictor.int8.onnx \
  --supertonic-text-encoder=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/text_encoder.int8.onnx \
  --supertonic-vector-estimator=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/vector_estimator.int8.onnx \
  --supertonic-vocoder=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/vocoder.int8.onnx \
  --supertonic-tts-json=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/tts.json \
  --supertonic-unicode-indexer=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/unicode_indexer.bin \
  --supertonic-voice-style=./sherpa-onnx-supertonic-3-tts-int8-2026-05-11/voice.bin \
  --sid=0 \
  --lang=en \
  --output-filename=./supertonic.wav \
  "Today as always, men fall into two groups: slaves and free men."

You can change --lang to any of the 31 supported language codes (e.g., ja for Japanese, ko for Korean, fr for French).

You can also use this tracked helper script:

API examples

Additional example code is available here:

Notes

  • Use --sid to choose a speaker.

  • Use --lang to select the synthesis language (e.g., en, ja, ko, fr, etc.).

  • The model files include tts.json and unicode_indexer.bin in addition to ONNX files.

See also