Launch the Assessment Tool — runs entirely in your browser, no installation needed.
An interactive web application for evaluating AI-powered academic search engines. Based on work by Carnegie Mellon University Libraries.
This repository also includes the Agent Orchestration Skill Library — a modular knowledge base for designing, building, and operating multi-agent AI systems.
This framework helps librarians, researchers, and evaluators systematically assess AI-powered academic search tools across four key dimensions:
- Retrieval - How the tool searches and retrieves content
- Generation - How the tool generates summaries and responses
- Output - Quality and accuracy of generated outputs
- Usability - Language support, accessibility, and sustainability
Open the tool in your browser — that's it. No installation, build tools, or server required. Works in Chrome, Firefox, Edge, and Safari.
Running locally from a downloaded copy
Download AI-Tool-Assessment-Framework.html and open it in any browser. You can also run it from the command line:
| OS | Command |
|---|---|
| Windows | start AI-Tool-Assessment-Framework.html |
| macOS | open AI-Tool-Assessment-Framework.html |
| Linux | xdg-open AI-Tool-Assessment-Framework.html |
Requires a modern browser and an internet connection on first load (for fonts and libraries from CDN).
When the app loads, you land on the Summary tab. Fill in the basic information:
- Tool Name (required) - The name of the AI tool you are evaluating
- Date Evaluated (required) - Auto-filled with today's date
- Evaluated by (required) - Your name or team name
- Documentation URL - Link to the tool's documentation
The remaining summary fields (Key Strengths, Main Concerns, etc.) are best filled in after completing all sections.
Use the tab bar to navigate between the four assessment sections:
| Tab | What You Evaluate |
|---|---|
| Retrieval | Content coverage, search mechanism, consistency, metadata filters |
| Generation | RAG implementation, document processing, LLM details, citation quality |
| Output | Accuracy, overgeneralization, relevance, completeness, bias |
| Usability | Multilingual support, output language, environmental impact |
Each subsection includes:
- A guiding question to focus your evaluation
- A "How to check" box with specific testing instructions
- Form fields for recording your findings (dropdowns, multi-select chips, text areas)
Fields marked with a red * are required for section completion tracking.
The Assessment Progress panel at the top shows:
- Overall Progress - Percentage of all required fields completed
- Per-section progress - Individual progress bars for each section
Each subsection header also shows:
- A green checkmark when all required fields are complete
- A yellow dot when partially complete
- A gray dot when not started
- A counter (e.g.,
2/3) showing required fields completed
Auto-save: Your progress is automatically saved to your browser's local storage every time you make a change. If you close and reopen the file, your data will still be there.
Named saves: Click Save Assessment to store the current assessment as a named slot. This lets you:
- Work on multiple tool evaluations simultaneously
- Switch between saved assessments using the Saved Assessments panel
- Start fresh with + New Assessment
| Button | Format | Use Case |
|---|---|---|
| Export MD | Markdown (.md) | Readable reports, sharing with colleagues, documentation |
| Export JSON | JSON (.json) | Backup, data transfer, re-importing into the tool |
| Import JSON | — | Restore a previously exported JSON assessment |
After completing some fields, the Results Overview card appears on the Summary tab. It shows a grid of all your select/multi-select responses organized by section, giving you an at-a-glance view of the tool's characteristics.
| Key | Action |
|---|---|
| Tab | Move between form fields |
| Arrow Left/Right | Navigate between section tabs |
| Home | Jump to first tab (Summary) |
| End | Jump to last tab (Usability) |
| Space/Enter | Toggle multi-select chips, expand/collapse subsections |
- All data is stored locally in your browser using
localStorage - No data is sent to any server
- Clearing your browser data will erase saved assessments
- Use Export JSON to create backups you can re-import later
- On the source device: click Export JSON to download the file
- Transfer the
.jsonfile to the target device (email, USB, cloud storage) - On the target device: open the framework and click Import JSON
- Check your internet connection. The app needs to download React and fonts from CDN on first load.
- Try a different browser.
- Check if your network blocks
unpkg.comorfonts.googleapis.com.
- Browser data was likely cleared. Use Export JSON regularly to create backups.
- Data is per-browser and per-device. It does not sync across browsers or machines.
- If you are using incognito/private browsing mode, data is not persisted.
- Click Try Again to attempt recovery.
- If the error persists, click Clear Data & Reload to reset (this erases unsaved data).
- Check that JavaScript is enabled in your browser.
- Ensure
localStorageis not disabled or full (browsers typically allow 5-10 MB).
- Open the
.mdfile in a Markdown viewer (VS Code, GitHub, any Markdown app). - The raw text will have formatting syntax like
**bold**and###headings that render properly in Markdown viewers.
| Browser | Minimum Version | Status |
|---|---|---|
| Chrome | 90+ | Fully supported |
| Firefox | 90+ | Fully supported |
| Edge | 90+ | Fully supported |
| Safari | 15+ | Fully supported |
| Opera | 76+ | Fully supported |
Internet Explorer is not supported.
The agent-orchestration/ directory contains a modular knowledge base for designing and operating multi-agent AI systems — 25,000+ lines organized into 9 skill modules.
| Skill | Purpose |
|---|---|
| formation-selection | Choose an orchestration pattern (hub-spoke, hierarchical, pipeline, mesh, swarm) |
| framework-selection | Compare SDKs and frameworks (Claude SDK, LangGraph, CrewAI, OpenAI SDK, AutoGen) |
| agent-communication | Message passing, memory architecture, consensus mechanisms |
| agent-security | Defense-in-depth, prompt injection defense, sandboxing |
| agent-observability | Monitoring, cost tracking, tracing |
| agent-debugging | Failure taxonomy, debugging tools, self-healing patterns |
| durable-workflows | Checkpointing, persistence, recovery strategies |
| operational-discipline | Token budgets, anti-patterns, session hygiene |
| cutting-edge-techniques | Mixture of Agents, swarm intelligence, evolutionary architectures |
See the agent-orchestration README for reading paths and detailed documentation.
ai_evaluation_framework_tool/
├── AI-Tool-Assessment-Framework.html # Interactive evaluation tool (single-file web app)
├── index.html # GitHub Pages redirect → assessment tool
├── README.md # This documentation
├── LICENSE # MIT License
└── agent-orchestration/ # Multi-agent systems knowledge base
├── README.md
├── formation-selection/
├── framework-selection/
├── agent-communication/
├── agent-security/
├── agent-observability/
├── agent-debugging/
├── durable-workflows/
├── operational-discipline/
└── cutting-edge-techniques/
The evaluation tool is a single HTML file using React 18, Babel Standalone, and Google Fonts — all loaded from CDN. No build tools or server required.
- Framework content: Sarah Young, Alfredo Gonzalez-Espinoza, Haoyong Lan, Huajin Wang @ Carnegie Mellon University Libraries
- Adapted from: Aaron Tay's "Testing AI Academic Search Engines" series
- Interactive tool and agent orchestration library: Dom Jebbia with Claude
This project is licensed under the MIT License.