Models for QNN

We assume you have already read Run executables on your phone with adb (using model.bin) and are familiar with setting up the environment.

This section provides a list of the models supported by QNN in sherpa-onnx.

It only shows the usage of models from

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-qnn-binary

Hint

Please see Run executables on your phone with adb (using libmodel.so) for models from

https://github.com/k2-fsa/sherpa-onnx/releases/tag/asr-models-qnn

Caution

I am using a Xiaomi 17 Pro for testing, so I selected a model with SM8850 in its name in this section.
Make sure to select a model that matches your own device.
Suppose you are testing on a Samsung Galaxy S23 Ultra, which uses the SM8550 SoC; In this case, you should select a model with SM8550 in its name instead of SM8850.

Since QNN does not support dynamic input shapes, we limit the maximum duration the model can handle. For example, if the limit is 10 seconds, any input shorter than 10 seconds will be padded to 10 seconds, and inputs longer than 10 seconds will be truncated to that length.

The model name indicates the maximum duration the model can handle. We use 5-seconds in this section as an example.

sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8 (Chinese)

This model is converted from sherpa-onnx-zipformer-ctc-zh-int8-2025-12-22.

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2

Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin) to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.

pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell  71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell  10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell  15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-22 17:31 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 17:45 sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8
total 351K
-rw-rw-rw- 1 shell shell   15 2025-12-22 17:29 info.txt
-rw-rw-rw- 1 shell shell 351M 2025-12-22 17:29 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 17:45 test_wavs
-rw-rw-rw- 1 shell shell  13K 2025-12-22 17:29 tokens.txt
pandora:/data/local/tmp $ export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"
pandora:/data/local/tmp $

Run it on your device:

./sherpa-onnx-offline \
  --provider=qnn \
  --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt \
  --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so \
  --zipformer-ctc.qnn-system-lib=./libQnnSystem.so \
  --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin \
  ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav

or write it in a single line:

./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav

You can find the output log below:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin", system_lib="./libQnnSystem.so"), ), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.583 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:89 Number of input frames 561 is too large. Truncate it to 500 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:93 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!

./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "", "timestamps": [], "durations": [], "tokens":[], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.306 s
Real time factor (RTF): 0.306 / 5.611 = 0.055
     0.0ms [WARN   ] QnnDsp <W> Initializing HtpProvider
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList

Please ignore the num_threads information in the log. It is not used by qnn.

Hint

The model actually processed only 5 seconds of audio, not 5.592 seconds.

sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8 (Chinese)

This model is converted from sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03 (Chinese).

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2

Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin) to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.

pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell  71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell  10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell  15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-10 11:42 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 17:19 sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8
total 72K
-rw-rw-rw- 1 shell shell   15 2025-12-22 17:15 info.txt
-rw-rw-rw- 1 shell shell  72M 2025-12-22 17:15 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 17:19 test_wavs
-rw-rw-rw- 1 shell shell  13K 2025-12-22 17:15 tokens.txt
pandora:/data/local/tmp $ export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"
pandora:/data/local/tmp $

Run it on your device:

./sherpa-onnx-offline \
  --provider=qnn \
  --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt \
  --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so \
  --zipformer-ctc.qnn-system-lib=./libQnnSystem.so \
  --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin \
  ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav

or write it in a single line:

./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav

You can find the output log below:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin", system_lib="./libQnnSystem.so"), ), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.234 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:89 Number of input frames 561 is too large. Truncate it to 500 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:93 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!

./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [0.00, 0.32, 0.48, 0.64, 0.80, 0.96, 1.08, 1.16, 1.60, 1.76, 1.92, 2.08, 2.24, 2.40, 2.56, 2.72, 3.04, 3.20, 3.36, 3.44, 3.52, 3.68, 3.76, 3.84, 4.00, 4.16, 4.32, 4.48, 4.60, 4.68, 4.80], "durations": [], "tokens":["▁ƌŕş", "▁ƍĩĴ", "▁ƌĢĽ", "▁ƋŠħ", "▁ƋšĬ", "▁Ǝ", "š", "Į", "▁Ɛģň", "▁Ƌşĩ", "▁ƍĩĴ", "▁ƍĤř", "▁ƏŕŚ", "▁ƎĽĥ", "▁ƍĻŕ", "▁ƌĴŇ", "▁ƌŊō", "▁ƌŔŜ", "▁ƌŌģ", "▁ƍŃŁ", "▁ƌŕş", "▁ƍĩĴ", "▁ƎĽĥ", "▁ƎŅķ", "▁ƎŏŜ", "▁ƍĥń", "▁ƌĦŚ", "▁Ə", "Ŝ", "ň", "▁ƌĴŇ"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.150 s
Real time factor (RTF): 0.150 / 5.611 = 0.027
     0.0ms [WARN   ] QnnDsp <W> Initializing HtpProvider
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList

Please ignore the num_threads information in the log. It is not used by qnn.

Hint

The model actually processed only 5 seconds of audio, not 5.592 seconds.

sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)

This model is converted from sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2

Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin) to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.

pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell  71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell  10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell  15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-10 11:42 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 16:46 sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8
total 242K
-rw-rw-rw- 1 shell shell   71 2025-12-09 16:05 LICENSE
-rw-rw-rw- 1 shell shell  104 2025-12-09 16:05 README.md
-rw-rw-rw- 1 shell shell   22 2025-12-09 16:05 info.txt
-rw-rw-rw- 1 shell shell 241M 2025-12-09 16:05 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 16:46 test_wavs
-rw-rw-rw- 1 shell shell 308K 2025-12-09 16:05 tokens.txt

Run it on your device:

./sherpa-onnx-offline \
  --provider=qnn \
  --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt \
  --sense-voice.qnn-backend-lib=./libQnnHtp.so \
  --sense-voice.qnn-system-lib=./libQnnSystem.so \
  --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin \
  ./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

or write it in a single line:

./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav

You can find the output log below:

/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav 

OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin", system_lib="./libQnnSystem.so"), language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.535 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-sense-voice-model-qnn.cc:ApplyLFR:216 Number of input frames 92 is too large. Truncate it to 83 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-sense-voice-model-qnn.cc:ApplyLFR:220 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!

./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开饭时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.24, 3.90, 4.20, 4.50, 4.68], "durations": [], "tokens":["开", "饭", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.052 s
Real time factor (RTF): 0.052 / 5.592 = 0.009
     0.0ms [WARN   ] QnnDsp <W> Initializing HtpProvider
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
     0.0ms [WARN   ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList

Please ignore the num_threads information in the log. It is not used by qnn.

Hint

The model actually processed only 5 seconds of audio, not 5.592 seconds.