Skip to content

cmu-lib/ai_evaluation_framework_tool

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI-Powered Tool Assessment Framework

Launch the Assessment Tool — runs entirely in your browser, no installation needed.

An interactive web application for evaluating AI-powered academic search engines. Based on work by Carnegie Mellon University Libraries.

This repository also includes the Agent Orchestration Skill Library — a modular knowledge base for designing, building, and operating multi-agent AI systems.

What This Tool Does

This framework helps librarians, researchers, and evaluators systematically assess AI-powered academic search tools across four key dimensions:

  1. Retrieval - How the tool searches and retrieves content
  2. Generation - How the tool generates summaries and responses
  3. Output - Quality and accuracy of generated outputs
  4. Usability - Language support, accessibility, and sustainability

Getting Started

Open the tool in your browser — that's it. No installation, build tools, or server required. Works in Chrome, Firefox, Edge, and Safari.

Running locally from a downloaded copy

Download AI-Tool-Assessment-Framework.html and open it in any browser. You can also run it from the command line:

OS Command
Windows start AI-Tool-Assessment-Framework.html
macOS open AI-Tool-Assessment-Framework.html
Linux xdg-open AI-Tool-Assessment-Framework.html

Requires a modern browser and an internet connection on first load (for fonts and libraries from CDN).


How to Use the Framework

1. Start with the Summary Tab

When the app loads, you land on the Summary tab. Fill in the basic information:

  • Tool Name (required) - The name of the AI tool you are evaluating
  • Date Evaluated (required) - Auto-filled with today's date
  • Evaluated by (required) - Your name or team name
  • Documentation URL - Link to the tool's documentation

The remaining summary fields (Key Strengths, Main Concerns, etc.) are best filled in after completing all sections.

2. Work Through Each Section

Use the tab bar to navigate between the four assessment sections:

Tab What You Evaluate
Retrieval Content coverage, search mechanism, consistency, metadata filters
Generation RAG implementation, document processing, LLM details, citation quality
Output Accuracy, overgeneralization, relevance, completeness, bias
Usability Multilingual support, output language, environmental impact

Each subsection includes:

  • A guiding question to focus your evaluation
  • A "How to check" box with specific testing instructions
  • Form fields for recording your findings (dropdowns, multi-select chips, text areas)

Fields marked with a red * are required for section completion tracking.

3. Track Your Progress

The Assessment Progress panel at the top shows:

  • Overall Progress - Percentage of all required fields completed
  • Per-section progress - Individual progress bars for each section

Each subsection header also shows:

  • A green checkmark when all required fields are complete
  • A yellow dot when partially complete
  • A gray dot when not started
  • A counter (e.g., 2/3) showing required fields completed

4. Save Your Work

Auto-save: Your progress is automatically saved to your browser's local storage every time you make a change. If you close and reopen the file, your data will still be there.

Named saves: Click Save Assessment to store the current assessment as a named slot. This lets you:

  • Work on multiple tool evaluations simultaneously
  • Switch between saved assessments using the Saved Assessments panel
  • Start fresh with + New Assessment

5. Export Your Results

Button Format Use Case
Export MD Markdown (.md) Readable reports, sharing with colleagues, documentation
Export JSON JSON (.json) Backup, data transfer, re-importing into the tool
Import JSON Restore a previously exported JSON assessment

6. Review the Results Overview

After completing some fields, the Results Overview card appears on the Summary tab. It shows a grid of all your select/multi-select responses organized by section, giving you an at-a-glance view of the tool's characteristics.


Keyboard Navigation

Key Action
Tab Move between form fields
Arrow Left/Right Navigate between section tabs
Home Jump to first tab (Summary)
End Jump to last tab (Usability)
Space/Enter Toggle multi-select chips, expand/collapse subsections

Data Storage and Privacy

  • All data is stored locally in your browser using localStorage
  • No data is sent to any server
  • Clearing your browser data will erase saved assessments
  • Use Export JSON to create backups you can re-import later

Transferring Data Between Devices

  1. On the source device: click Export JSON to download the file
  2. Transfer the .json file to the target device (email, USB, cloud storage)
  3. On the target device: open the framework and click Import JSON

Troubleshooting

The page shows a loading spinner that never goes away

  • Check your internet connection. The app needs to download React and fonts from CDN on first load.
  • Try a different browser.
  • Check if your network blocks unpkg.com or fonts.googleapis.com.

My saved data disappeared

  • Browser data was likely cleared. Use Export JSON regularly to create backups.
  • Data is per-browser and per-device. It does not sync across browsers or machines.
  • If you are using incognito/private browsing mode, data is not persisted.

The app shows "Something went wrong"

  • Click Try Again to attempt recovery.
  • If the error persists, click Clear Data & Reload to reset (this erases unsaved data).

Form fields are not saving

  • Check that JavaScript is enabled in your browser.
  • Ensure localStorage is not disabled or full (browsers typically allow 5-10 MB).

Exported Markdown looks wrong

  • Open the .md file in a Markdown viewer (VS Code, GitHub, any Markdown app).
  • The raw text will have formatting syntax like **bold** and ### headings that render properly in Markdown viewers.

Browser Compatibility

Browser Minimum Version Status
Chrome 90+ Fully supported
Firefox 90+ Fully supported
Edge 90+ Fully supported
Safari 15+ Fully supported
Opera 76+ Fully supported

Internet Explorer is not supported.


Agent Orchestration Skill Library

The agent-orchestration/ directory contains a modular knowledge base for designing and operating multi-agent AI systems — 25,000+ lines organized into 9 skill modules.

Skill Modules

Skill Purpose
formation-selection Choose an orchestration pattern (hub-spoke, hierarchical, pipeline, mesh, swarm)
framework-selection Compare SDKs and frameworks (Claude SDK, LangGraph, CrewAI, OpenAI SDK, AutoGen)
agent-communication Message passing, memory architecture, consensus mechanisms
agent-security Defense-in-depth, prompt injection defense, sandboxing
agent-observability Monitoring, cost tracking, tracing
agent-debugging Failure taxonomy, debugging tools, self-healing patterns
durable-workflows Checkpointing, persistence, recovery strategies
operational-discipline Token budgets, anti-patterns, session hygiene
cutting-edge-techniques Mixture of Agents, swarm intelligence, evolutionary architectures

See the agent-orchestration README for reading paths and detailed documentation.


Project Structure

ai_evaluation_framework_tool/
├── AI-Tool-Assessment-Framework.html   # Interactive evaluation tool (single-file web app)
├── index.html                          # GitHub Pages redirect → assessment tool
├── README.md                           # This documentation
├── LICENSE                             # MIT License
└── agent-orchestration/                # Multi-agent systems knowledge base
    ├── README.md
    ├── formation-selection/
    ├── framework-selection/
    ├── agent-communication/
    ├── agent-security/
    ├── agent-observability/
    ├── agent-debugging/
    ├── durable-workflows/
    ├── operational-discipline/
    └── cutting-edge-techniques/

The evaluation tool is a single HTML file using React 18, Babel Standalone, and Google Fonts — all loaded from CDN. No build tools or server required.


Credits

  • Framework content: Sarah Young, Alfredo Gonzalez-Espinoza, Haoyong Lan, Huajin Wang @ Carnegie Mellon University Libraries
  • Adapted from: Aaron Tay's "Testing AI Academic Search Engines" series
  • Interactive tool and agent orchestration library: Dom Jebbia with Claude

License

This project is licensed under the MIT License.

About

An assessment framework for AI-powered academic search engines.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 100.0%