AI-Powered Tool Assessment Framework

Launch the Assessment Tool — runs entirely in your browser, no installation needed.

An interactive web application for evaluating AI-powered academic search engines. Based on work by Carnegie Mellon University Libraries.

This repository also includes the Agent Orchestration Skill Library — a modular knowledge base for designing, building, and operating multi-agent AI systems.

What This Tool Does

This framework helps librarians, researchers, and evaluators systematically assess AI-powered academic search tools across four key dimensions:

Retrieval - How the tool searches and retrieves content
Generation - How the tool generates summaries and responses
Output - Quality and accuracy of generated outputs
Usability - Language support, accessibility, and sustainability

Getting Started

Open the tool in your browser — that's it. No installation, build tools, or server required. Works in Chrome, Firefox, Edge, and Safari.

Running locally from a downloaded copy

Download AI-Tool-Assessment-Framework.html and open it in any browser. You can also run it from the command line:

OS	Command
Windows	`start AI-Tool-Assessment-Framework.html`
macOS	`open AI-Tool-Assessment-Framework.html`
Linux	`xdg-open AI-Tool-Assessment-Framework.html`

Requires a modern browser and an internet connection on first load (for fonts and libraries from CDN).

How to Use the Framework

1. Start with the Summary Tab

When the app loads, you land on the Summary tab. Fill in the basic information:

Tool Name (required) - The name of the AI tool you are evaluating
Date Evaluated (required) - Auto-filled with today's date
Evaluated by (required) - Your name or team name
Documentation URL - Link to the tool's documentation

The remaining summary fields (Key Strengths, Main Concerns, etc.) are best filled in after completing all sections.

2. Work Through Each Section

Use the tab bar to navigate between the four assessment sections:

Tab	What You Evaluate
Retrieval	Content coverage, search mechanism, consistency, metadata filters
Generation	RAG implementation, document processing, LLM details, citation quality
Output	Accuracy, overgeneralization, relevance, completeness, bias
Usability	Multilingual support, output language, environmental impact

Each subsection includes:

A guiding question to focus your evaluation
A "How to check" box with specific testing instructions
Form fields for recording your findings (dropdowns, multi-select chips, text areas)

Fields marked with a red * are required for section completion tracking.

3. Track Your Progress

The Assessment Progress panel at the top shows:

Overall Progress - Percentage of all required fields completed
Per-section progress - Individual progress bars for each section

Each subsection header also shows:

A green checkmark when all required fields are complete
A yellow dot when partially complete
A gray dot when not started
A counter (e.g., 2/3) showing required fields completed

4. Save Your Work

Auto-save: Your progress is automatically saved to your browser's local storage every time you make a change. If you close and reopen the file, your data will still be there.

Named saves: Click Save Assessment to store the current assessment as a named slot. This lets you:

Work on multiple tool evaluations simultaneously
Switch between saved assessments using the Saved Assessments panel
Start fresh with + New Assessment

5. Export Your Results

Button	Format	Use Case
Export MD	Markdown (.md)	Readable reports, sharing with colleagues, documentation
Export JSON	JSON (.json)	Backup, data transfer, re-importing into the tool
Import JSON	—	Restore a previously exported JSON assessment

6. Review the Results Overview

After completing some fields, the Results Overview card appears on the Summary tab. It shows a grid of all your select/multi-select responses organized by section, giving you an at-a-glance view of the tool's characteristics.

Keyboard Navigation

Key	Action
Tab	Move between form fields
Arrow Left/Right	Navigate between section tabs
Home	Jump to first tab (Summary)
End	Jump to last tab (Usability)
Space/Enter	Toggle multi-select chips, expand/collapse subsections

Data Storage and Privacy

All data is stored locally in your browser using localStorage
No data is sent to any server
Clearing your browser data will erase saved assessments
Use Export JSON to create backups you can re-import later

Transferring Data Between Devices

On the source device: click Export JSON to download the file
Transfer the .json file to the target device (email, USB, cloud storage)
On the target device: open the framework and click Import JSON

Troubleshooting

The page shows a loading spinner that never goes away

Check your internet connection. The app needs to download React and fonts from CDN on first load.
Try a different browser.
Check if your network blocks unpkg.com or fonts.googleapis.com.

My saved data disappeared

Browser data was likely cleared. Use Export JSON regularly to create backups.
Data is per-browser and per-device. It does not sync across browsers or machines.
If you are using incognito/private browsing mode, data is not persisted.

The app shows "Something went wrong"

Click Try Again to attempt recovery.
If the error persists, click Clear Data & Reload to reset (this erases unsaved data).

Form fields are not saving

Check that JavaScript is enabled in your browser.
Ensure localStorage is not disabled or full (browsers typically allow 5-10 MB).

Exported Markdown looks wrong

Open the .md file in a Markdown viewer (VS Code, GitHub, any Markdown app).
The raw text will have formatting syntax like **bold** and ### headings that render properly in Markdown viewers.

Browser Compatibility

Browser	Minimum Version	Status
Chrome	90+	Fully supported
Firefox	90+	Fully supported
Edge	90+	Fully supported
Safari	15+	Fully supported
Opera	76+	Fully supported

Internet Explorer is not supported.

Agent Orchestration Skill Library

The agent-orchestration/ directory contains a modular knowledge base for designing and operating multi-agent AI systems — 25,000+ lines organized into 9 skill modules.

Skill Modules

Skill	Purpose
formation-selection	Choose an orchestration pattern (hub-spoke, hierarchical, pipeline, mesh, swarm)
framework-selection	Compare SDKs and frameworks (Claude SDK, LangGraph, CrewAI, OpenAI SDK, AutoGen)
agent-communication	Message passing, memory architecture, consensus mechanisms
agent-security	Defense-in-depth, prompt injection defense, sandboxing
agent-observability	Monitoring, cost tracking, tracing
agent-debugging	Failure taxonomy, debugging tools, self-healing patterns
durable-workflows	Checkpointing, persistence, recovery strategies
operational-discipline	Token budgets, anti-patterns, session hygiene
cutting-edge-techniques	Mixture of Agents, swarm intelligence, evolutionary architectures

See the agent-orchestration README for reading paths and detailed documentation.

Project Structure

ai_evaluation_framework_tool/
├── AI-Tool-Assessment-Framework.html   # Interactive evaluation tool (single-file web app)
├── index.html                          # GitHub Pages redirect → assessment tool
├── README.md                           # This documentation
├── LICENSE                             # MIT License
└── agent-orchestration/                # Multi-agent systems knowledge base
    ├── README.md
    ├── formation-selection/
    ├── framework-selection/
    ├── agent-communication/
    ├── agent-security/
    ├── agent-observability/
    ├── agent-debugging/
    ├── durable-workflows/
    ├── operational-discipline/
    └── cutting-edge-techniques/

The evaluation tool is a single HTML file using React 18, Babel Standalone, and Google Fonts — all loaded from CDN. No build tools or server required.

Credits

Framework content: Sarah Young, Alfredo Gonzalez-Espinoza, Haoyong Lan, Huajin Wang @ Carnegie Mellon University Libraries
Adapted from: Aaron Tay's "Testing AI Academic Search Engines" series
Interactive tool and agent orchestration library: Dom Jebbia with Claude

License

This project is licensed under the MIT License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AI-Powered Tool Assessment Framework

What This Tool Does

Getting Started

How to Use the Framework

1. Start with the Summary Tab

2. Work Through Each Section

3. Track Your Progress

4. Save Your Work

5. Export Your Results

6. Review the Results Overview

Keyboard Navigation

Data Storage and Privacy

Transferring Data Between Devices

Troubleshooting

The page shows a loading spinner that never goes away

My saved data disappeared

The app shows "Something went wrong"

Form fields are not saving

Exported Markdown looks wrong

Browser Compatibility

Agent Orchestration Skill Library

Skill Modules

Project Structure

Credits

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
agent-orchestration		agent-orchestration
.gitignore		.gitignore
AI-Tool-Assessment-Framework.html		AI-Tool-Assessment-Framework.html
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Folders and files

Latest commit

History

Repository files navigation

AI-Powered Tool Assessment Framework

What This Tool Does

Getting Started

How to Use the Framework

1. Start with the Summary Tab

2. Work Through Each Section

3. Track Your Progress

4. Save Your Work

5. Export Your Results

6. Review the Results Overview

Keyboard Navigation

Data Storage and Privacy

Transferring Data Between Devices

Troubleshooting

The page shows a loading spinner that never goes away

My saved data disappeared

The app shows "Something went wrong"

Form fields are not saving

Exported Markdown looks wrong

Browser Compatibility

Agent Orchestration Skill Library

Skill Modules

Project Structure

Credits

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages