VIS — Autonomous Android Testing Agent

  ██╗   ██╗ ██╗ ███████╗
  ██║   ██║ ██║ ██╔════╝
  ██║   ██║ ██║ ███████╗
  ╚██╗ ██╔╝ ██║ ╚════██║
   ╚████╔╝  ██║ ███████║
    ╚═══╝   ╚═╝ ╚══════╝

  Visual Tester

VIS — Autonomous Android Testing Agent

VIS is an autonomous testing agent for Android that combines UIAutomator accessibility trees with multi-modal vision models. It sees the screen, understands context, and takes action — no brittle XPath selectors required.

Core Value: Test Android apps like a human would.

Semantic understanding — finds elements by meaning, not static IDs
Self-healing — falls back to visual analysis when standard selectors fail
Local sovereignty — runs models via Ollama, your data stays on your machine
Fast — optimized Go core with async capture and streaming

What VIS Can Do

Capability	Description
Launch any app	Resolves human-readable names ("Calculator", "Chrome") to Android packages automatically
Tap, swipe, type	Full device interaction — buttons, text fields, scroll, navigation keys
Describe what's on screen	Vision model reads and interprets the live display in natural language
Find elements visually	Locates UI components by description ("the red submit button") when IDs are unavailable
Wait for conditions	Polls the screen until a target element appears or a timeout is reached
Run structured flows	Execute multi-step YAML test plans with Maestro-compatible syntax
Generate reports	Produces HTML and JUnit XML reports per session, with automatic cleanup
Multi-device orchestration	Run tasks across multiple connected devices in parallel
MCP server mode	Expose VIS as a tool server for AI agents and IDE integrations
Dry-run validation	Parse and plan tasks without touching the device — verify before you execute

VIS works with any Android app — production builds, debug builds, Expo Go, system apps. No source code access or instrumentation required.

Quick Start

# Install
git clone https://github.com/uelkerd/vis.git
cd vis && make build

# Ensure prerequisites are running
ollama pull moondream:latest      # Lightweight vision model (recommended)
adb devices                       # Verify device connected

# Run your first task
./bin/vis --task "open the Settings app"

Installation

From Source (recommended)

git clone https://github.com/uelkerd/vis.git
cd vis
make build          # Binary at ./bin/vis
make install        # Installs to $GOPATH/bin

From GitHub Releases

Download pre-built binaries for your platform from Releases:

# macOS (Apple Silicon)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Darwin_arm64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/

# Linux (x86_64)
curl -L https://github.com/uelkerd/vis/releases/latest/download/vis_Linux_x86_64.tar.gz | tar xz
chmod +x vis && mv vis /usr/local/bin/

Prerequisites

Dependency	Purpose	Install
Go 1.24+	Build from source	golang.org/dl
ADB	Android device control	`brew install android-platform-tools` or Android SDK
Ollama	Local vision model inference	ollama.com
Android device	Physical or emulated	USB debugging enabled

Usage

Natural Language Tasks (`--task`)

Describe what you want in plain English. VIS parses the intent, resolves app names, and executes on the device.

# Launch apps (human-readable names resolved automatically)
vis --task "open the Settings app"
vis --task "open Calculator"
vis --task "open Chrome"

# Navigation
vis --task "scroll down"
vis --task "press back"
vis --task "go home"

# Interact with elements
vis --task "tap on the search button"
vis --task "type 'hello world' into the search field"

# With verbose logging
vis --task "open Settings" -v      # DEBUG level
vis --task "open Settings" -vv     # TRACE level (most detailed)
vis --task "open Settings" -q      # Quiet (warnings/errors only)

Dry Run Mode (`--dry-run`)

Parse and plan without touching the device — useful for validating NLP parsing.

vis --task "open Calculator and type 123" --dry-run -v
# Logs: "dry-run: would execute action" with parsed intent details

Vision Streaming (`--stream`)

Continuous screen analysis — VIS captures and describes what it sees in real-time.

vis --stream                  # Run indefinitely (Ctrl+C to stop)
vis --stream -v               # With debug output

Maestro Flows (`--maestro`)

Run structured test flows defined in YAML.

vis --maestro flows/login-test.yaml
vis --maestro flows/checkout.yaml -v

Hybrid Vision-Flows (`--hybrid`)

Combine structured flows with vision-based fallbacks.

vis --hybrid flows/search-flow.yaml

Test Cycles (`--test-cycle`)

Run continuous iteration cycles for stress testing.

vis --test-cycle 10            # Run 10 iterations
vis --test-cycle 50 -v         # 50 iterations with debug logging

MCP Server Mode (`--server`)

Start VIS as an MCP (Model Context Protocol) server for integration with other tools.

vis --server                  # Start MCP server on stdin/stdout
vis --mcp                     # Alias for --server

Environment Setup (`setup`)

Check prerequisites and download required models.

vis setup

Device Targeting (`--device`)

Target a specific device when multiple are connected.

vis --task "open Settings" --device 29021FDH2009DQ
vis --task "open Settings" --device emulator-5554

Report Control (`--report`)

Reports are generated by default to reports/ (auto-cleaned, keeps 10 most recent).

vis --task "open Settings" --report=false   # Disable report generation

Environment Variables

Variable	Default	Description
`VIS_MODEL`	`moondream:latest`	Vision model for screen analysis
`VIS_NLU_MODEL`	`llama3.1:latest`	NLU model for natural language parsing
`VIS_OLLAMA_URL`	`http://localhost:11434/api/generate`	Ollama API endpoint
`VIS_TIMEOUT`	`120`	Model timeout in seconds
`TEST_DEVICE_ID`	(none)	Specific ADB device for tests

# Defaults work out of the box with moondream
# For higher accuracy on complex screens, upgrade the vision model:
export VIS_MODEL="llama3.2-vision:11b"
export VIS_NLU_MODEL="qwen-agentic:latest"
export VIS_TIMEOUT=180

Known Apps

VIS resolves human-readable app names to Android package IDs automatically:

Name	Package
Settings	`com.android.settings`
Calculator	`com.google.android.calculator`
Chrome	`com.android.chrome`
Gmail	`com.google.android.gm`
Maps	`com.google.android.apps.maps`
Camera	`com.google.android.GoogleCamera`
Calendar	`com.google.android.calendar`
Phone	`com.google.android.dialer`
Files	`com.google.android.apps.nbu.files`
Clock	`com.google.android.deskclock`
Photos	`com.google.android.apps.photos`
Expo Go	`host.exp.exponent`

Any unrecognized name is passed through as a raw package ID.

Architecture

VIS follows a Capture-Analyze-Decide-Act (CADA) autonomous agent loop:

┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐
│ CAPTURE  │───▶│ ANALYZE  │───▶│  DECIDE  │───▶│   ACT   │
│ ADB      │    │ Ollama   │    │ Agent    │    │ ADB     │
│ screencap│    │ Vision   │    │ NLP      │    │ tap/    │
│ uidump   │    │ Model    │    │ Parser   │    │ swipe   │
└─────────┘    └─────────┘    └─────────┘    └─────────┘
     ▲                                            │
     └────────────────────────────────────────────┘
                    (continuous loop)

Capture — Screenshots via ADB with JPEG compression and caching
Analyze — Vision models interpret screen content semantically
Decide — NLP parser + agent logic determines the next action
Act — ADB executes taps, swipes, inputs, key events

Project Structure

cmd/vis/              CLI entry point
internal/
├── adb/              ADB device control (taps, swipes, inputs, key events)
├── agent/            Core CADA loop orchestration
├── capture/          Screenshot acquisition and caching
├── config/           Environment-based configuration
├── hybrid/           Hybrid selector engine
├── livefeed/         Scrcpy live feed integration
├── mcp/              Model Context Protocol server
├── nlp/              Natural language task parsing
├── reporting/        HTML and JUnit report generation
├── resilience/       Circuit breaker and retry patterns
├── selector/         Self-healing element location engine
├── setup/            Ollama environment setup
├── types/            Shared domain types
└── vis/              Vision model client (Ollama API)
scripts/              Build & test automation
e2e/                  End-to-end tests (requires device + Ollama)

Development

make build          # Build binary
make test           # Run unit tests
make test-cover     # Run tests with coverage
make lint           # Run linter
make clean          # Clean build artifacts

# Physical device test suite (requires connected Android device + Ollama)
./scripts/device-test.sh

License

Distributed under the MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.docker		.docker
.github/workflows		.github/workflows
.planning		.planning
cmd/vis		cmd/vis
docs		docs
e2e		e2e
internal		internal
scripts		scripts
testdata/uiautomator		testdata/uiautomator
.claudeignore		.claudeignore
.deepsource.toml		.deepsource.toml
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
.opencodeignore		.opencodeignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
get_comments.js		get_comments.js
go.mod		go.mod
go.sum		go.sum
lefthook.yml		lefthook.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VIS — Autonomous Android Testing Agent

What VIS Can Do

Quick Start

Installation

From Source (recommended)

From GitHub Releases

Prerequisites

Usage

Natural Language Tasks (`--task`)

Dry Run Mode (`--dry-run`)

Vision Streaming (`--stream`)

Maestro Flows (`--maestro`)

Hybrid Vision-Flows (`--hybrid`)

Test Cycles (`--test-cycle`)

MCP Server Mode (`--server`)

Environment Setup (`setup`)

Device Targeting (`--device`)

Report Control (`--report`)

Environment Variables

Known Apps

Architecture

Project Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VIS — Autonomous Android Testing Agent

What VIS Can Do

Quick Start

Installation

From Source (recommended)

From GitHub Releases

Prerequisites

Usage

Natural Language Tasks (--task)

Dry Run Mode (--dry-run)

Vision Streaming (--stream)

Maestro Flows (--maestro)

Hybrid Vision-Flows (--hybrid)

Test Cycles (--test-cycle)

MCP Server Mode (--server)

Environment Setup (setup)

Device Targeting (--device)

Report Control (--report)

Environment Variables

Known Apps

Architecture

Project Structure

Development

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Natural Language Tasks (`--task`)

Dry Run Mode (`--dry-run`)

Vision Streaming (`--stream`)

Maestro Flows (`--maestro`)

Hybrid Vision-Flows (`--hybrid`)

Test Cycles (`--test-cycle`)

MCP Server Mode (`--server`)

Environment Setup (`setup`)

Device Targeting (`--device`)

Report Control (`--report`)

Packages