██╗ ██╗ ██╗ ███████╗
██║ ██║ ██║ ██╔════╝
██║ ██║ ██║ ███████╗
╚██╗ ██╔╝ ██║ ╚════██║
╚████╔╝ ██║ ███████║
╚═══╝ ╚═╝ ╚══════╝
Visual Tester
VIS is an autonomous testing agent for Android that combines UIAutomator accessibility trees with multi-modal vision models. It sees the screen, understands context, and takes action — no brittle XPath selectors required.
Core Value: Test Android apps like a human would.
- Semantic understanding — finds elements by meaning, not static IDs
- Self-healing — falls back to visual analysis when standard selectors fail
- Local sovereignty — runs models via Ollama, your data stays on your machine
- Fast — optimized Go core with async capture and streaming
| Capability | Description |
|---|---|
| Launch any app | Resolves human-readable names ("Calculator", "Chrome") to Android packages automatically |
| Tap, swipe, type | Full device interaction — buttons, text fields, scroll, navigation keys |
| Describe what's on screen | Vision model reads and interprets the live display in natural language |
| Find elements visually | Locates UI components by description ("the red submit button") when IDs are unavailable |
| Wait for conditions | Polls the screen until a target element appears or a timeout is reached |
| Run structured flows | Execute multi-step YAML test plans with Maestro-compatible syntax |
| Generate reports | Produces HTML and JUnit XML reports per session, with automatic cleanup |
| Multi-device orchestration | Run tasks across multiple connected devices in parallel |
| MCP server mode | Expose VIS as a tool server for AI agents and IDE integrations |
| Dry-run validation | Parse and plan tasks without touching the device — verify before you execute |
VIS works with any Android app — production builds, debug builds, Expo Go, system apps. No source code access or instrumentation required.
# Install
git clone https://github.com/uelkerd/vis.git
cd vis && make build
# Ensure prerequisites are running
ollama pull moondream:latest # Lightweight vision model (recommended)
adb devices # Verify device connected
# Run your first task
./bin/vis --task "open the Settings app"git clone https://github.com/uelkerd/vis.git
cd vis
make build # Binary at ./bin/vis
make install # Installs to $GOPATH/binDownload pre-built binaries for your platform from Releases:
# macOS (Apple Silicon)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Darwin_arm64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/
# Linux (x86_64)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Linux_x86_64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/| Dependency | Purpose | Install |
|---|---|---|
| Go 1.24+ | Build from source | golang.org/dl |
| ADB | Android device control | brew install android-platform-tools or Android SDK |
| Ollama | Local vision model inference | ollama.com |
| Android device | Physical or emulated | USB debugging enabled |
Describe what you want in plain English. VIS parses the intent, resolves app names, and executes on the device.
# Launch apps (human-readable names resolved automatically)
vis --task "open the Settings app"
vis --task "open Calculator"
vis --task "open Chrome"
# Navigation
vis --task "scroll down"
vis --task "press back"
vis --task "go home"
# Interact with elements
vis --task "tap on the search button"
vis --task "type 'hello world' into the search field"
# With verbose logging
vis --task "open Settings" -v # DEBUG level
vis --task "open Settings" -vv # TRACE level (most detailed)
vis --task "open Settings" -q # Quiet (warnings/errors only)Parse and plan without touching the device — useful for validating NLP parsing.
vis --task "open Calculator and type 123" --dry-run -v
# Logs: "dry-run: would execute action" with parsed intent detailsContinuous screen analysis — VIS captures and describes what it sees in real-time.
vis --stream # Run indefinitely (Ctrl+C to stop)
vis --stream -v # With debug outputRun structured test flows defined in YAML.
vis --maestro flows/login-test.yaml
vis --maestro flows/checkout.yaml -vCombine structured flows with vision-based fallbacks.
vis --hybrid flows/search-flow.yamlRun continuous iteration cycles for stress testing.
vis --test-cycle 10 # Run 10 iterations
vis --test-cycle 50 -v # 50 iterations with debug loggingStart VIS as an MCP (Model Context Protocol) server for integration with other tools.
vis --server # Start MCP server on stdin/stdout
vis --mcp # Alias for --serverCheck prerequisites and download required models.
vis setupTarget a specific device when multiple are connected.
vis --task "open Settings" --device 29021FDH2009DQ
vis --task "open Settings" --device emulator-5554Reports are generated by default to reports/ (auto-cleaned, keeps 10 most recent).
vis --task "open Settings" --report=false # Disable report generation| Variable | Default | Description |
|---|---|---|
VIS_MODEL |
moondream:latest |
Vision model for screen analysis |
VIS_NLU_MODEL |
llama3.1:latest |
NLU model for natural language parsing |
VIS_OLLAMA_URL |
http://localhost:11434/api/generate |
Ollama API endpoint |
VIS_TIMEOUT |
120 |
Model timeout in seconds |
TEST_DEVICE_ID |
(none) | Specific ADB device for tests |
# Defaults work out of the box with moondream
# For higher accuracy on complex screens, upgrade the vision model:
export VIS_MODEL="llama3.2-vision:11b"
export VIS_NLU_MODEL="qwen-agentic:latest"
export VIS_TIMEOUT=180VIS resolves human-readable app names to Android package IDs automatically:
| Name | Package |
|---|---|
| Settings | com.android.settings |
| Calculator | com.google.android.calculator |
| Chrome | com.android.chrome |
| Gmail | com.google.android.gm |
| Maps | com.google.android.apps.maps |
| Camera | com.google.android.GoogleCamera |
| Calendar | com.google.android.calendar |
| Phone | com.google.android.dialer |
| Files | com.google.android.apps.nbu.files |
| Clock | com.google.android.deskclock |
| Photos | com.google.android.apps.photos |
| Expo Go | host.exp.exponent |
Any unrecognized name is passed through as a raw package ID.
VIS follows a Capture-Analyze-Decide-Act (CADA) autonomous agent loop:
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ CAPTURE │───▶│ ANALYZE │───▶│ DECIDE │───▶│ ACT │
│ ADB │ │ Ollama │ │ Agent │ │ ADB │
│ screencap│ │ Vision │ │ NLP │ │ tap/ │
│ uidump │ │ Model │ │ Parser │ │ swipe │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
▲ │
└────────────────────────────────────────────┘
(continuous loop)
- Capture — Screenshots via ADB with JPEG compression and caching
- Analyze — Vision models interpret screen content semantically
- Decide — NLP parser + agent logic determines the next action
- Act — ADB executes taps, swipes, inputs, key events
cmd/vis/ CLI entry point
internal/
├── adb/ ADB device control (taps, swipes, inputs, key events)
├── agent/ Core CADA loop orchestration
├── capture/ Screenshot acquisition and caching
├── config/ Environment-based configuration
├── hybrid/ Hybrid selector engine
├── livefeed/ Scrcpy live feed integration
├── mcp/ Model Context Protocol server
├── nlp/ Natural language task parsing
├── reporting/ HTML and JUnit report generation
├── resilience/ Circuit breaker and retry patterns
├── selector/ Self-healing element location engine
├── setup/ Ollama environment setup
├── types/ Shared domain types
└── vis/ Vision model client (Ollama API)
scripts/ Build & test automation
e2e/ End-to-end tests (requires device + Ollama)
make build # Build binary
make test # Run unit tests
make test-cover # Run tests with coverage
make lint # Run linter
make clean # Clean build artifacts
# Physical device test suite (requires connected Android device + Ollama)
./scripts/device-test.shDistributed under the MIT License. See LICENSE for details.