On-device AI inference SDK for Android. Runs models locally on Qualcomm Snapdragon NPU (Hexagon) via ONNX Runtime + QNN Execution Provider, with LiteRT-LM and Genie LLM backends. No server dependency for inference.
This repository contains:
- TheStageCore.aar -- pre-built SDK binary
- onnxruntime-android.aar -- ONNX Runtime + QNN Execution Provider
- onnxruntime-genai-android.aar -- ORT-genai (LLM inference)
- Flutter plugin (
plugin/qlip_sdk) -- Flutter integration via method channels - Example apps:
examples/voice_transcribe-- Whisper speech-to-text with hold-to-talk mic inputexamples/tts_app-- NeuTTS text-to-speech with on-device LLM + NeuCodec audio decoder
- Prerequisites
- Quick Start
- API Token
- Using in Your Own App
- Example Apps
- Platform Support
- Project Structure
- Troubleshooting
| Requirement | Version |
|---|---|
| Android compileSdk | 35 |
| Android minSdk | 28 |
| JDK | 17 |
| Kotlin | 2.1+ |
| Flutter | 3.22+ |
| QAIRT SDK | 2.42.0 |
- Physical Snapdragon device required. The SDK targets the Hexagon NPU via QNN; the emulator and non-Snapdragon devices fall back to CPU only.
- API token -- required for SDK initialization. Generate one at app.thestage.ai.
- Qualcomm AI Runtime SDK (QAIRT) 2.42.0 --
free download from Qualcomm.
Required to run inference on Qualcomm Snapdragon devices --
it provides the QNN backend libraries (
libGenie.so,libQnnCpu.so, plus the HTP/NPU stack) that the SDK loads at runtime to dispatch ops to the Hexagon NPU. The shipped AARs are deliberately Qualcomm-clean and don't redistribute QAIRT binaries; each integrator installs QAIRT once locally and accepts Qualcomm's license at install time.setup.shthen copies the two libs that aren't on Maven into the plugin.
If you don't have Flutter installed yet:
# macOS (Homebrew)
brew install --cask flutter
# Linux (snap)
sudo snap install flutter --classic
# Verify installation
flutter doctorAlternatively, follow the official guide: https://docs.flutter.dev/get-started/install
Make sure flutter doctor reports a working Android toolchain
(Android Studio + SDK 35 + a connected device). On first run,
accept the Android licenses:
flutter doctor --android-licensesgit clone https://github.com/TheStageAI/TheStageAI.AndroidSDK.git
cd TheStageAI.AndroidSDK
# Point QAIRT at your local install (adjust path).
export QAIRT=~/Qualcomm/AIStack/QAIRT/2.42.0.251225
./scripts/setup.shsetup.sh:
- Symlinks
TheStageCore.aar,onnxruntime-android.aar, andonnxruntime-genai-android.aarinto the Flutter plugin (single source of truth -- the example app and any consuming app reference them from there). - Copies
libGenie.soandlibQnnCpu.sofrom$QAIRTinto the plugin'ssrc/main/jniLibs/arm64-v8a/. Library-modulejniLibsare auto-merged into the consuming app's APK, so this works for the example app and any app that depends on the plugin.
Everything else -- libQnnHtp.so, skels/stubs,
libonnxruntime.so, etc. -- is pulled in automatically via the
com.qualcomm.qti:qnn-runtime Maven dependency and the bundled
ORT AARs.
Export the token in your shell -- the example apps read it via
--dart-define:
export TOKEN="your-thestage-api-token"cd examples/voice_transcribe
flutter build apk --release --dart-define=QLIP_API_TOKEN="$TOKEN"
flutter install --releaseUse --debug for Flutter UI hot-reload while iterating;
--release gives realistic inference timings.
To pick a specific connected device:
flutter devices
flutter run --release --dart-define=QLIP_API_TOKEN="$TOKEN" -d <DEVICE_ID>Launch the app, grant microphone permission, and hold the button to record. First launch downloads the Whisper engine bundle from Hugging Face (~5 min over Wi-Fi); subsequent launches hit a local cache.
An API token is required to use the SDK. Generate one at app.thestage.ai and pass it during initialization. The token is validated once on first model start; all subsequent operations run offline.
In the example apps the token is read from the Dart compile-time
environment variable QLIP_API_TOKEN (set via
--dart-define=QLIP_API_TOKEN=...).
# pubspec.yaml
dependencies:
qlip_sdk:
path: /path/to/this-repo/plugin/qlip_sdkRun ./scripts/setup.sh (from this repo) to populate the
plugin once -- it symlinks the AARs and copies the QAIRT
runtime libs from $QAIRT into the plugin's jniLibs/.
The Android library module can't re-export local AARs to the
consuming app, so your app must reference them on its runtime
classpath -- but it can point straight at the plugin's libs/
directory rather than keeping a second copy:
// android/app/build.gradle.kts
dependencies {
val pluginLibs =
"../../../../plugin/qlip_sdk/android/libs"
implementation(files("$pluginLibs/TheStageCore.aar"))
implementation(
files("$pluginLibs/onnxruntime-android.aar")
)
implementation(
files("$pluginLibs/onnxruntime-genai-android.aar")
)
implementation("com.qualcomm.qti:qnn-runtime:2.42.0")
}Adjust the pluginLibs relative path to match where the SDK
repo sits next to your app. The QAIRT native libs ride along
automatically via the plugin's jniLibs/ -- no per-app
copying needed.
Other settings your app needs:
compileSdk = 35,minSdk = 28,targetSdk = 35- JDK 17 toolchain
- Add ARM64 ABI filter:
android {
defaultConfig {
ndk { abiFilters += "arm64-v8a" }
}
}import 'package:qlip_sdk/qlip_sdk.dart';
// Initialize the SDK (call once at app start).
await QlipSdk.initialize(apiToken: 'YOUR_API_TOKEN');
// Start a model.
await QlipSdk.startModel(
modelType: 'whisper',
modelName: 'whisper',
enginesPath: '/data/local/tmp/whisper_engines',
device: 'npu',
);
// Run inference.
final results = await QlipSdk.infer(
modelName: 'whisper',
inputJson: {
'audio': pcm16kFloatSamples, // 16 kHz mono float
'language': 'en',
},
);
final transcript = results.first['transcription'] as String;
// Stop when done.
await QlipSdk.stopModel(modelName: 'whisper');flutter clean
flutter run --release --dart-define=QLIP_API_TOKEN="$TOKEN"Hold-to-record speech-to-text. The app auto-detects the device's
SoC (Build.SOC_MODEL), downloads the matching engine bundle
from
TheStageAI/Elastic-whisper-large-v3-turbo
on Hugging Face, and runs Whisper encoder + decoder on the NPU.
Benchmarks on Samsung S25 Ultra (SM8750 / Hexagon v79):
| Stage | Time |
|---|---|
| Encoder (NPU) | 338 ms |
| Decoder (NPU, 21 tokens) | 239 ms |
| Total inference | ~0.6 s |
See examples/voice_transcribe/README.md for overrides
(local engine path, custom HF repo, ...).
On-device text-to-speech built on NeuTTS. Uses the same SoC
auto-detect + HF bundle flow and streams synthesised audio from
a short text prompt. Bundle pulled from
TheStageAI/neutts.
Toggle Boost CPU in the UI for a tps boost.
| SoC | Variant tag | Devices |
|---|---|---|
| Snapdragon 8 Elite | qualcomm_sm8750 |
Samsung S25, S25 Ultra, S25 Edge |
| Snapdragon 8 Gen 3 | qualcomm_sm8650 |
Samsung S24 family, OnePlus 12 |
Other Snapdragon SKUs are detected automatically at runtime
(Build.SOC_MODEL) and request a correspondingly-tagged bundle
from Hugging Face. If the exact variant isn't published, the SDK
falls back to a cpu bundle when available.
.
├── README.md
├── TheStageCore.aar Pre-built SDK binary
├── onnxruntime-android.aar ORT + QNN EP
├── onnxruntime-genai-android.aar ORT-genai (LLM inference)
│
├── plugin/qlip_sdk/ Flutter plugin (Android)
│ ├── lib/qlip_sdk.dart Dart API
│ └── android/ Plugin Kotlin code
│
├── examples/
│ └── voice_transcribe/ Whisper STT (hold-to-talk)
│
└── scripts/setup.sh One-time AAR symlink wiring
| Symptom | Fix |
|---|---|
Could not find TheStageCore.aar |
Run ./scripts/setup.sh to create the AAR symlinks |
dlopen failed: library "libQnnCpu.so" not found |
Vendor QAIRT libs into android/app/src/main/jniLibs/arm64-v8a/ (Quick Start step 2) |
dlopen failed: library "libGenie.so" not found |
Same as above |
INSTALL_FAILED_NO_MATCHING_ABIS |
Add ndk { abiFilters += "arm64-v8a" } in your app's build.gradle.kts |
flutter doctor complains about Android licenses |
flutter doctor --android-licenses and accept |
The SDK code in TheStageCore.aar ships under TheStage AI's
license; see LICENSE alongside this file (if present) or
contact TheStage AI. Third-party components:
- ONNX Runtime -- MIT (redistribution permitted).
- ORT-genai -- MIT.
- QAIRT runtime libraries -- Qualcomm AI Stack Software License Agreement; each integrator installs QAIRT themselves and accepts those terms at install time.
The same SDK is available on iOS as
TheStageAI.AppleSDK.
The Flutter plugin Dart API (qlip_sdk.dart) and the platform
channel names are identical, so Dart code is portable across
both platforms.