Compare meta-expression vs similar projects + canonical fixtures (issue #26)#33
Conversation
Adding .gitkeep for PR creation (default mode). This file will be removed when the task is complete. Issue: #26
The case study mirrors the issue-21 layout (README → REQUIREMENTS →
SOLUTION-PLAN → ONLINE-RESEARCH → companion docs) so future "survey
and close the gap" issues have a stable template.
Files:
- README.md — executive summary, scope, outcome,
cross-references.
- REQUIREMENTS.md — R26.1–R26.18 atomic requirements across
concept docs, feature docs, fixtures,
case-study deliverables, library survey,
and PR hygiene.
- SOLUTION-PLAN.md — six phases mapping to the requirements.
- ONLINE-RESEARCH.md — 50+ comparable projects (name, URL,
license, USD pricing) plus library
survey with adopt/defer/reject verdicts.
- TEST-CASES.md — canonical fixtures harvested from
comparable systems, with the
meta-expression expectation per row.
- data/issue-26.json + .md — raw issue snapshot for reproducibility.
Two top-level companion documents under docs/ that fulfill the deliverable from issue #26. - COMPARISON-CONCEPTS.md — clusters meta-expression's seven concept areas (statement interpretation, formal levels + exact computation, real-world evidence, knowledge representation, uniqueness, preference profiles, reporting) with the comparable projects, licenses, and USD pricing per cluster. Closes with "Where meta-expression sits" so the positioning statement stays reviewable. - COMPARISON-FEATURES.md — per-feature matrix (F1–F17 surfaces from the README × the seven clusters), using the documented legend (✓ / ≈ / — / ✗). A gap-analysis section calls out the intersections meta-expression uniquely ships (issue-report URL prefilled with analysis state; Lino export as a first-class surface). Both docs carry "Last checked: 2026-05-11" and a "How this matrix is maintained" footer pointing at docs/case-studies/issue-26/.
tests/issue-26-comparable-fixtures.test.js exercises the canonical
fixtures harvested from comparable systems (Wolfram Alpha, Z3, Lean,
Metamath, Wikidata P36/P397/P398/P570, Tarski/Kripke Liar paradox).
Assertion bands follow the existing project guardrails:
- Arithmetic kernel is deterministic — correctness ∈ {0, 1},
signedConfidence ∈ {-1, +1}, value matches Wolfram/Z3.
- Real-world claims are asserted by band (0.5 < c < 1 for supported,
c < 1 for refutable) so they cannot regress into binary verdicts.
- Fixtures that require the live Wikimedia resolver are tolerated
defensively (return on null) so the suite still runs offline.
- NL → triple/SRL/AMR extraction and the ClaimReview/Snopes corpora
use it.skip with explanatory titles documenting the roadmap phase
they belong to.
Three documentation-sanity assertions verify the COMPARISON docs
ship with the documented legend symbols and a "Last checked" date.
experiments/probe-fixtures.mjs records the probe script used to
discover the actual analyzeStatement return shape per fixture (kept
alongside the existing probe scripts in experiments/).
CI fix-ups for the issue-26 comparison docs: - `.changeset/issue-26-comparison-docs.md` — required by the "Check for Changesets" workflow (exactly one new changeset per PR). - `.lycheeignore` — adds well-known false-positive domains referenced in `docs/COMPARISON-CONCEPTS.md` that bot-detect or self-sign their way around HEAD probes. Follows the same convention as the existing `npmjs.com` and `medium.com` entries.
Working session summaryPR #33 is ready for review with all CI checks passing. Summary of what landed (#33):
CI: 12 SUCCESS / 11 SKIPPED / 0 fail. All 198 tests pass (189 active + 9 deferred-skip). This summary was automatically extracted from the AI working session output. |
🤖 Solution Draft LogThis log file contains the complete execution trace of the AI solution draft process. 💰 Cost: $7.555892📊 Context and tokens usage:Claude Opus 4.7: (2 sub-sessions)
Total: (10.1K new + 255.7K cache writes + 6.5M cache reads) input tokens, 75.7K output tokens, $6.801474 cost Claude Haiku 4.5:
Total: (399.8K new + 19.5K cache writes + 193.7K cache reads) input tokens, 14.2K output tokens, $0.754419 cost 🤖 Models used:
📎 Log file uploaded as Gist (3243KB)Now working session is ended, feel free to review and add any feedback on the solution draft. |
✅ Ready to mergeThis pull request is now ready to be merged:
Monitored by hive-mind with --auto-restart-until-mergeable flag |
This reverts commit 6b63ee2.
Summary
Closes #26. The issue asked for a deep comparison of meta-expression against similar projects (open-source and proprietary, free and paid, with pricing), plus a harvest of canonical test cases from those projects to close the feature gap. This PR delivers both as four new documents and one test file.
What changed
docs/COMPARISON-CONCEPTS.md— clusters meta-expression's seven concept areas with comparable projects, licenses, and USD pricing; closes with a "Where meta-expression sits" positioning statement.docs/COMPARISON-FEATURES.md— per-feature matrix (F1–F17 surfaces × seven concept clusters) using the documented legend (✓/≈/—/✗) and a gap-analysis section.docs/case-studies/issue-26/— full case-study folder mirroring issue-21's layout:README.md,REQUIREMENTS.md(R26.1–R26.18),SOLUTION-PLAN.md(six phases),ONLINE-RESEARCH.md(50+ projects with name, URL, license, USD pricing, plus a library-survey table withadopt/defer/rejectverdicts),TEST-CASES.md, and adata/snapshot of the raw issue body for reproducibility.tests/issue-26-comparable-fixtures.test.js— 22 assertions across arithmetic-kernel deterministic fixtures (Wolfram Alpha, Z3 sat/unsat, Lean rfl, Metamath2p2e4), Wikidata-structured public-fact bands (P36, P397, P398), P570 liveness templates, and the classic Liar paradox (Tarski/Kripke). 9 additional fixtures useit.skipwith explanatory titles for NL→triple/SRL/AMR extraction (Stanford OpenIE, AllenNLP, Boxer/Montague, AMR) and disputed-truth corpora (Google Fact Check, Snopes, Politifact) that depend on roadmap phases not yet shipped.experiments/probe-fixtures.mjs— the probe used to confirm the actualanalyzeStatementreturn shape per fixture before writing the assertions.Requirements coverage (R26.x)
docs/COMPARISON-CONCEPTS.mddocs/COMPARISON-FEATURES.mdtests/issue-26-comparable-fixtures.test.js,docs/case-studies/issue-26/TEST-CASES.mddocs/case-studies/issue-26/docs/case-studies/issue-26/ONLINE-RESEARCH.md§DTest plan
npm test— 198 tests / 189 pass / 9 skipped (the 9 are the deferred fixtures, each with an explanatory title) / 0 fail.npm run lint— clean.npm run format:check— clean.npm run check— clean (lint + format + jscpd + docs:formalize:check).0 < correctness < 1per project guardrail R17.correctness === 0.5,signedConfidence === 0(Tarski/Kripke undetermined).Notes
Last checked: 2026-05-11. Both comparison docs and the case-studyONLINE-RESEARCH.mdcarry the same date and a "How this matrix is maintained" footer pointing at the case study.rejectforwikibase-sdk,wikipedia(npm),@wikimedia/codex(GPL clash),nock/msw(makeFetch(routes)fixture is sufficient);deferfor@xenova/transformers,sentence-transformers,prov-js. Full rationale inONLINE-RESEARCH.md§D.