| title | Testing | |
|---|---|---|
| permalink | /testing | |
| redirect_from |
|
Jaiph includes a built-in test harness for workflow testing. Test files (*.test.jh) let you mock prompt responses, stub workflows, rules, and scripts, run workflows through the same Node runtime used by jaiph run, and assert on captured output — all without calling real LLMs or depending on external state.
Workflow runs combine prompts, shell commands, and orchestration logic. Without a harness, outcomes depend on live models, timing, and the host machine — making regressions hard to catch in CI or during refactors. The test harness solves this by giving you fixed prompt responses, in-process execution, and deterministic assertions.
Test files use the .test.jh suffix (for example workflow_greeting.test.jh).
A test file supports the same top-level forms as any .jh file (import, config, workflow, etc.), but the CLI only executes test "..." { ... } blocks. Other declarations are parsed into the runtime graph — for example, a local workflow is visible to single-segment references.
Recommended style: keep test files to import statements and test blocks. Define the workflows under test in separate modules so files stay small and focused.
Import paths in import "..." as alias resolve relative to the test file's directory, with the same extension handling as ordinary modules (.jh is appended when omitted). See Grammar — Lexical notes.
# All *.test.jh files under the detected workspace root (recursive)
jaiph test
# All tests under a directory (recursive)
jaiph test ./e2e
# One file
jaiph test ./e2e/workflow_greeting.test.jh
# Equivalent shorthand (a *.test.jh path is treated as jaiph test)
jaiph ./e2e/workflow_greeting.test.jhDiscovery: jaiph test walks the given directory recursively, or the workspace root when no path is passed. The workspace root is found by walking up from the current directory until a .jaiph or .git directory exists; if neither is found, the current directory is used.
If no *.test.jh files are found, the command prints an error and exits with status 1. A file must contain at least one test block; otherwise the CLI reports a parse error. Passing a plain *.jh file that is not named *.test.jh is rejected — use jaiph run for those.
Each test block is a named test case containing ordered steps:
import "workflow_greeting.jh" as w
test "runs happy path and prints PASS" {
mock prompt "e2e-greeting-mock"
const response = run w.default()
expect_contain response "e2e-greeting-mock"
expect_contain response "done"
}
Inside a test block, steps execute in order. The following step types are available.
Queues a fixed response for the next prompt call in the workflow under test. Multiple mock prompt lines queue in order — one is consumed per prompt call.
mock prompt "hello from mock"
mock prompt "second response"
The response must be a double-quoted string. Standard escape sequences (\", \n, \\) work inside double-quoted strings.
Dispatches different responses based on the prompt text using pattern matching. Arms are tested top-to-bottom; the first match wins.
mock prompt {
/greeting/ => "hello"
/farewell/ => "goodbye"
_ => "default response"
}
Each arm is pattern => "response". Patterns can be:
- String literal (
"greeting") — exact match against the prompt text - Regex (
/greeting/) — tested against the prompt text - Wildcard (
_) — matches anything (like a default/else branch)
Without a _ wildcard arm, an unmatched prompt fails the test.
Do not combine mock prompt { ... } with inline mock prompt "..." in the same test block — when a block mock is present, inline queue entries are ignored.
Replaces a workflow body for this test case with Jaiph steps:
mock workflow w.greet() {
return "stubbed greeting"
}
The reference format is <alias>.<workflow> (preferred) or <name> for a workflow defined in the test file itself.
Same as mock workflow, but for rules (body uses Jaiph steps, not shell):
mock rule w.validate() {
return "stubbed validation"
}
Stubs a module script block:
mock script w.helper() {
echo "stubbed script"
}
The former mock function syntax is no longer accepted — the parser emits an error with migration guidance.
Runs a workflow and captures its output into a variable:
const response = run w.default()
Capture semantics match production behavior:
- If the workflow exits 0 with a non-empty explicit
returnvalue, that string is captured. - If the workflow fails (non-zero exit), the runtime error string is captured (when present).
- Otherwise, the harness reads all
*.outfiles in the run directory sorted by filename, or falls back to the runtime's aggregated output.
The test fails on non-zero exit unless allow_failure is specified.
Variants:
# With an argument
const response = run w.default("my input")
# Allow failure
const response = run w.default() allow_failure
# With argument and allow failure
const response = run w.default("my input") allow_failure
Runs a workflow without storing output. Still fails on non-zero exit unless allow_failure is appended:
run w.setup()
run w.setup("arg")
run w.setup() allow_failure
After capturing workflow output, use these to check the result:
expect_contain response "expected substring"
expect_not_contain response "unwanted text"
expect_equal response "exact expected value"
Expected strings must be double-quoted. Escape " inside the string with \". Failures print expected vs. actual previews.
When a workflow uses typed prompts (returns "{ ... }"), mock text must be a single line of valid JSON matching the schema so that parsing and field variables work correctly. Fields are accessed with dot notation — ${result.field} — in log, return, and other interpolation contexts. See e2e/prompt_returns_run_capture.test.jh and e2e/dot_notation.test.jh for examples.
Each test block runs in isolation. Assertions, shell errors, or a workflow exiting non-zero (without allow_failure) mark that case as failed.
The runner output looks like:
testing workflow_greeting.test.jh
▸ runs happy path
✓ 0s
▸ handles error case
✗ expect_contain failed: "out" does not contain "expected" 1s
✗ 1 / 2 test(s) failed
- handles error case
When all tests pass: ✓ N test(s) passed. Exit status is 0 on full success, non-zero if any test failed.
The CLI parses each test file and hands test { ... } blocks to runTestFile() in the test runner. That function:
- Calls
buildRuntimeGraph(testFile)once per file to build the import closure. - Prepares
scriptartifacts for the workspace viabuildScripts()into a temporary directory (test files are excluded from this walk). - Sets
JAIPH_SCRIPTSto that directory and runs each block withJAIPH_TEST_MODE=1.
There is no Bash transpilation of workflows on this path — only extracted script files are shell, same as production. The runtime graph is cached per file; mutating imported files on disk mid-run is not supported.
For each workflow run inside a test block, the harness builds the runtime environment from process.env plus:
| Variable | Value |
|---|---|
JAIPH_TEST_MODE |
1 |
JAIPH_WORKSPACE |
Project root (from detectWorkspaceRoot) |
JAIPH_RUNS_DIR |
Per-block temp directory |
JAIPH_SCRIPTS |
Temp buildScripts output |
You do not set JAIPH_TEST_MODE yourself; the harness manages it.
A Given / When / Then structure works well but is not required — comments and blank lines are fine:
import "app.jh" as app
test "default workflow prints greeting" {
# Given
mock prompt "hello"
# When
const out = run app.default()
# Then
expect_contain out "hello"
}
Compiler tests verify parse and validate outcomes using a language-agnostic txtar format. Unlike the TypeScript-embedded tests in src/, these fixtures are plain text files that can be reused by alternative implementations (e.g. a Rust compiler).
Test fixture files live in compiler-tests/ as .txt files. Each file contains multiple test cases separated by === delimiters:
=== test name here
# @expect ok
--- input.jh
workflow default() {
log "hello"
}
=== another test
# @expect error E_PARSE "unterminated workflow block"
--- input.jh
workflow default() {
log "hello"
=== <name>starts a new test case. Everything until the next===(or EOF) belongs to that case.--- <filename>starts a virtual file within the test case. Filenames must end in.jh.# @expect <directive>declares the expected outcome and must appear before the first---marker.
| Directive | Meaning |
|---|---|
# @expect ok |
Parse + validate succeed with no errors |
# @expect error E_CODE "substring" |
An error is thrown whose message contains both E_CODE and substring |
# @expect error E_CODE "substring" @L |
Same, and the error must be reported at line L (any column) |
# @expect error E_CODE "substring" @L:C |
Same, and the error must be reported at line L, column C |
- Single-file: use
--- input.jh. The runner compilesinput.jh. - Single test file: use
--- input.test.jhfor test-specific fixtures. - Multi-file: use
--- main.jhas the entry file plus additional--- lib.jhetc. The runner compilesmain.jh.
The entry file is determined by priority: main.jh if present, otherwise input.jh, otherwise input.test.jh, otherwise the first file.
npm run test:compilerThe runner discovers all .txt files in compiler-tests/, parses them, writes virtual files to a temp directory per case, runs parsejaiph + validateReferences, and asserts the expected outcome. Results are reported per test case via node:test. Compiler tests are also included in npm test.
Test cases are organized by error type and single-vs-multi-module:
| File | Cases | What it covers |
|---|---|---|
compiler-tests/valid.txt |
103 | Success cases — source compiles without error (single-module) |
compiler-tests/parse-errors.txt |
108 | E_PARSE error cases — syntax and grammar violations |
compiler-tests/validate-errors.txt |
24 | E_VALIDATE, E_IMPORT_NOT_FOUND, E_SCHEMA error cases (single-module) |
compiler-tests/validate-errors-multi-module.txt |
3 | Validation errors requiring imports (multi-file) |
The initial cases were extracted from TypeScript test files across src/parse/*.test.ts and src/transpile/*.test.ts. Additional cases were written directly as txtar fixtures to cover compiler error paths that had no prior test coverage. Only tests that verify "source in, pass/fail out" qualify — tests that check AST structure or internal APIs remain in TypeScript.
- One
.txtfile per category. - Test names should be descriptive and unique within a file.
- Keep test cases minimal — only include what is necessary to trigger the expected outcome.
The format is documented in detail in compiler-tests/README.md.
Golden AST tests verify that the parser produces the expected tree shape for successful parses. While compiler tests (txtar) cover pass/fail outcomes and E2E tests cover runtime behavior, golden AST tests lock in what the parser actually produced — so refactors cannot silently change tree structure.
Each .jh fixture in golden-ast/fixtures/ is parsed and serialized to deterministic JSON (locations and file paths stripped, keys sorted). The result is compared against a checked-in .json golden file in golden-ast/expected/.
- Txtar tests = error messages and "this compiles."
- Golden AST tests = parse tree shape for successful parses.
- E2E tests = full CLI + runtime behavior.
npm run test:golden-astGolden AST tests are also included in npm test.
When an intentional parser change alters AST shape, regenerate the golden files:
UPDATE_GOLDEN=1 npm run test:golden-astReview the diff to confirm the changes are expected, then commit the updated .json files.
- Create a small, focused
.jhfile ingolden-ast/fixtures/(one concern per file). - Run
UPDATE_GOLDEN=1 npm run test:golden-astto generategolden-ast/expected/<name>.json. - Review the generated JSON and commit both files.
For concurrency-sensitive behavior (for example parallel inbox dispatch), the repository includes shell-based E2E scenarios that go beyond single native tests:
- High volume and fan-out to exercise locking and dispatch under concurrent writes.
- Soak loops to flush out intermittent failures.
- Order-insensitive checks (counts, uniqueness) when parallel work makes ordering non-deterministic.
See e2e/tests/91_inbox_dispatch.sh, e2e/tests/93_inbox_stress.sh, and e2e/tests/94_parallel_shell_steps.sh for examples.
Shell harnesses and CI expectations for the full repo are described in Contributing — E2E testing.
E2E tests compare full CLI output and full artifact file contents by default. Use e2e::expect_stdout, e2e::expect_out, e2e::expect_file, e2e::expect_run_file, or e2e::assert_equals. Substring checks (e2e::assert_contains) require an inline comment justifying the exception. For the full policy (two surfaces, full equality, assert_contains exceptions, normalization), see Contributing — E2E testing. For the on-disk tree under .jaiph/runs/, see Architecture — Durable artifact layout.
Every .jh sample under e2e/ must be wired into at least one test. Run bash e2e/check_orphan_samples.sh to detect unreferenced fixtures. See Contributing — Orphan sample guard for details.
Similarly, every .jh and .test.jh file under examples/ must be accounted for in e2e/tests/110_examples.sh — either exercised with strict assertions or explicitly excluded with a rationale. An orphan guard in that script enforces this. See Contributing — Example matrix guard for details.
The project includes a Playwright-based test (tests/e2e-samples/landing-page.spec.ts) that verifies landing-page code samples stay in sync with real CLI behavior. Run it with npm run test:samples. See Contributing — Landing-page sample verification for details.
- Prompt mocks are inline only — no external mock config files.
- Do not combine
mock prompt { ... }withmock prompt "..."in the same test block; only the block dispatch is active. - Capture without explicit
returnreads stdout step artifacts (*.outfiles) or falls back to aggregated runtime output. - Assertions only support double-quoted expected strings.
- Extra arguments after the test path (
jaiph test <path> [extra...]) are accepted but ignored (reserved for future use).