Models for QNN
We assume you have already read Run executables on your phone with adb (using model.bin) and are familiar with setting up the environment.
This section provides a list of the models supported by QNN in sherpa-onnx.
It only shows the usage of models from
Caution
I am using a Xiaomi 17 Pro for testing, so I selected a model with SM8850 in its name in this section.
Make sure to select a model that matches your own device.
Suppose you are testing on a Samsung Galaxy S23 Ultra, which uses the SM8550 SoC; In this case, you should select a model with SM8550 in its name instead of SM8850.
Since QNN does not support dynamic input shapes, we limit the maximum duration the model can handle. For example, if the limit is 10 seconds, any input shorter than 10 seconds will be padded to 10 seconds, and inputs longer than 10 seconds will be truncated to that length.
The model name indicates the maximum duration the model can handle. We use 5-seconds
in this section as an example.
sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8 (Chinese)
This model is converted from sherpa-onnx-zipformer-ctc-zh-int8-2025-12-22.
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8.tar.bz2
Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin)
to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.
pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell 71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell 10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell 15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-22 17:31 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 17:45 sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8
total 351K
-rw-rw-rw- 1 shell shell 15 2025-12-22 17:29 info.txt
-rw-rw-rw- 1 shell shell 351M 2025-12-22 17:29 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 17:45 test_wavs
-rw-rw-rw- 1 shell shell 13K 2025-12-22 17:29 tokens.txt
pandora:/data/local/tmp $ export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"
pandora:/data/local/tmp $
Run it on your device:
./sherpa-onnx-offline \
--provider=qnn \
--tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt \
--zipformer-ctc.qnn-backend-lib=./libQnnHtp.so \
--zipformer-ctc.qnn-system-lib=./libQnnSystem.so \
--zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin \
./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav
or write it in a single line:
./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav
You can find the output log below:
Click ▶ to see the log .
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/model.bin", system_lib="./libQnnSystem.so"), ), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.583 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:89 Number of input frames 561 is too large. Truncate it to 500 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:93 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!
./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-12-22-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "", "timestamps": [], "durations": [], "tokens":[], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.306 s
Real time factor (RTF): 0.306 / 5.611 = 0.055
0.0ms [WARN ] QnnDsp <W> Initializing HtpProvider
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
Please ignore the num_threads information in the log. It is not used by qnn.
Hint
The model actually processed only 5 seconds of audio, not 5.592 seconds.
sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8 (Chinese)
This model is converted from sherpa-onnx-zipformer-ctc-zh-int8-2025-07-03 (Chinese).
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8.tar.bz2
Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin)
to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.
pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell 71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell 10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell 15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-10 11:42 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 17:19 sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8
total 72K
-rw-rw-rw- 1 shell shell 15 2025-12-22 17:15 info.txt
-rw-rw-rw- 1 shell shell 72M 2025-12-22 17:15 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 17:19 test_wavs
-rw-rw-rw- 1 shell shell 13K 2025-12-22 17:15 tokens.txt
pandora:/data/local/tmp $ export ADSP_LIBRARY_PATH="$PWD;$ADSP_LIBRARY_PATH"
pandora:/data/local/tmp $
Run it on your device:
./sherpa-onnx-offline \
--provider=qnn \
--tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt \
--zipformer-ctc.qnn-backend-lib=./libQnnHtp.so \
--zipformer-ctc.qnn-system-lib=./libQnnSystem.so \
--zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin \
./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav
or write it in a single line:
./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav
You can find the output log below:
Click ▶ to see the log .
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt --zipformer-ctc.qnn-backend-lib=./libQnnHtp.so --zipformer-ctc.qnn-system-lib=./libQnnSystem.so --zipformer-ctc.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/model.bin", system_lib="./libQnnSystem.so"), ), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.234 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:89 Number of input frames 561 is too large. Truncate it to 500 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-zipformer-ctc-model-qnn.cc:Run:93 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!
./sherpa-onnx-qnn-SM8850-binary-5-seconds-zipformer-ctc-zh-2025-07-03-int8/test_wavs/0.wav
{"lang": "", "emotion": "", "event": "", "text": "对我做了介绍那么我想说的是呢大家如果对我的研究感兴趣呢", "timestamps": [0.00, 0.32, 0.48, 0.64, 0.80, 0.96, 1.08, 1.16, 1.60, 1.76, 1.92, 2.08, 2.24, 2.40, 2.56, 2.72, 3.04, 3.20, 3.36, 3.44, 3.52, 3.68, 3.76, 3.84, 4.00, 4.16, 4.32, 4.48, 4.60, 4.68, 4.80], "durations": [], "tokens":["▁ƌŕş", "▁ƍĩĴ", "▁ƌĢĽ", "▁ƋŠħ", "▁ƋšĬ", "▁Ǝ", "š", "Į", "▁Ɛģň", "▁Ƌşĩ", "▁ƍĩĴ", "▁ƍĤř", "▁ƏŕŚ", "▁ƎĽĥ", "▁ƍĻŕ", "▁ƌĴŇ", "▁ƌŊō", "▁ƌŔŜ", "▁ƌŌģ", "▁ƍŃŁ", "▁ƌŕş", "▁ƍĩĴ", "▁ƎĽĥ", "▁ƎŅķ", "▁ƎŏŜ", "▁ƍĥń", "▁ƌĦŚ", "▁Ə", "Ŝ", "ň", "▁ƌĴŇ"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.150 s
Real time factor (RTF): 0.150 / 5.611 = 0.027
0.0ms [WARN ] QnnDsp <W> Initializing HtpProvider
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
Please ignore the num_threads information in the log. It is not used by qnn.
Hint
The model actually processed only 5 seconds of audio, not 5.592 seconds.
sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语)
This model is converted from sherpa-onnx-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8 (Chinese, English, Japanese, Korean, Cantonese, 中英日韩粤语).
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models-qnn-binary/sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
tar xvf sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
rm sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8.tar.bz2
Now copy them to your Qualcomm device. Make sure you have read Run executables on your phone with adb (using model.bin)
to copy required *.so files from QNN SDK 2.40.0.251030 and setup the environment variable ADSP_LIBRARY_PATH.
pandora:/data/local/tmp $ ls -lh
total 104K
-rw-rw-rw- 1 shell shell 2.3M 2025-11-20 11:05 libQnnHtp.so
-rw-rw-rw- 1 shell shell 71M 2025-11-20 11:05 libQnnHtpPrepare.so
-rw-rw-rw- 1 shell shell 10M 2025-11-20 11:10 libQnnHtpV81Skel.so
-rw-rw-rw- 1 shell shell 618K 2025-11-20 11:06 libQnnHtpV81Stub.so
-rw-rw-rw- 1 shell shell 2.4M 2025-11-20 11:06 libQnnSystem.so
-rw-rw-rw- 1 shell shell 15M 2025-12-10 11:43 libonnxruntime.so
-rwxrwxrwx 1 shell shell 2.1M 2025-12-10 11:42 sherpa-onnx-offline
drwxrwxr-x 3 shell shell 3.3K 2025-12-22 16:46 sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8
pandora:/data/local/tmp $ ls -lh sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8
total 242K
-rw-rw-rw- 1 shell shell 71 2025-12-09 16:05 LICENSE
-rw-rw-rw- 1 shell shell 104 2025-12-09 16:05 README.md
-rw-rw-rw- 1 shell shell 22 2025-12-09 16:05 info.txt
-rw-rw-rw- 1 shell shell 241M 2025-12-09 16:05 model.bin
drwxrwxr-x 2 shell shell 3.3K 2025-12-22 16:46 test_wavs
-rw-rw-rw- 1 shell shell 308K 2025-12-09 16:05 tokens.txt
Run it on your device:
./sherpa-onnx-offline \
--provider=qnn \
--tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt \
--sense-voice.qnn-backend-lib=./libQnnHtp.so \
--sense-voice.qnn-system-lib=./libQnnSystem.so \
--sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin \
./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
or write it in a single line:
./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
You can find the output log below:
Click ▶ to see the log .
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 ./sherpa-onnx-offline --provider=qnn --tokens=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt --sense-voice.qnn-backend-lib=./libQnnHtp.so --sense-voice.qnn-system-lib=./libQnnSystem.so --sense-voice.qnn-context-binary=./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin ./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
OfflineRecognizerConfig(feat_config=FeatureExtractorConfig(sampling_rate=16000, feature_dim=80, low_freq=20, high_freq=-400, dither=0, normalize_samples=True, snip_edges=False), model_config=OfflineModelConfig(transducer=OfflineTransducerModelConfig(encoder_filename="", decoder_filename="", joiner_filename=""), paraformer=OfflineParaformerModelConfig(model=""), nemo_ctc=OfflineNemoEncDecCtcModelConfig(model=""), whisper=OfflineWhisperModelConfig(encoder="", decoder="", language="", task="transcribe", tail_paddings=-1), fire_red_asr=OfflineFireRedAsrModelConfig(encoder="", decoder=""), tdnn=OfflineTdnnModelConfig(model=""), zipformer_ctc=OfflineZipformerCtcModelConfig(model=""), wenet_ctc=OfflineWenetCtcModelConfig(model=""), sense_voice=OfflineSenseVoiceModelConfig(model="", qnn_config=QnnConfig(backend_lib="./libQnnHtp.so", context_binary="./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/model.bin", system_lib="./libQnnSystem.so"), language="auto", use_itn=False), moonshine=OfflineMoonshineModelConfig(preprocessor="", encoder="", uncached_decoder="", cached_decoder=""), dolphin=OfflineDolphinModelConfig(model=""), canary=OfflineCanaryModelConfig(encoder="", decoder="", src_lang="", tgt_lang="", use_pnc=True), omnilingual=OfflineOmnilingualAsrCtcModelConfig(model=""), telespeech_ctc="", tokens="./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/tokens.txt", num_threads=2, debug=False, provider="qnn", model_type="", modeling_unit="cjkchar", bpe_vocab=""), lm_config=OfflineLMConfig(model="", scale=0.5, lodr_scale=0.01, lodr_fst="", lodr_backoff_id=-1), ctc_fst_decoder_config=OfflineCtcFstDecoderConfig(graph="", max_active=3000), decoding_method="greedy_search", max_active_paths=4, hotwords_file="", hotwords_score=1.5, blank_penalty=0, rule_fsts="", rule_fars="", hr=HomophoneReplacerConfig(lexicon="", rule_fsts=""))
Creating recognizer ...
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/utils.cc:CopyGraphsInfo:465 version: 3
recognizer created in 0.535 s
Started
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-sense-voice-model-qnn.cc:ApplyLFR:216 Number of input frames 92 is too large. Truncate it to 83 frames.
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/qnn/offline-sense-voice-model-qnn.cc:ApplyLFR:220 Recognition result may be truncated/incomplete. Please select a model accepting longer audios.
Done!
./sherpa-onnx-qnn-SM8850-binary-5-seconds-sense-voice-zh-en-ja-ko-yue-2024-07-17-int8/test_wavs/zh.wav
{"lang": "<|zh|>", "emotion": "<|NEUTRAL|>", "event": "<|Speech|>", "text": "开饭时间早上九点至下午五点", "timestamps": [0.72, 0.96, 1.26, 1.44, 1.92, 2.10, 2.58, 2.82, 3.24, 3.90, 4.20, 4.50, 4.68], "durations": [], "tokens":["开", "饭", "时", "间", "早", "上", "九", "点", "至", "下", "午", "五", "点"], "ys_log_probs": [], "words": []}
----
num threads: 2
decoding method: greedy_search
Elapsed seconds: 0.052 s
Real time factor (RTF): 0.052 / 5.592 = 0.009
0.0ms [WARN ] QnnDsp <W> Initializing HtpProvider
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
0.0ms [WARN ] QnnDsp <W> m_CFBCallbackInfoObj is not initialized, return emptyList
Please ignore the num_threads information in the log. It is not used by qnn.
Hint
The model actually processed only 5 seconds of audio, not 5.592 seconds.