FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

✨ Quick Start • 💻 Evaluation • 🗺️ LVLM Responses • 📜 Citation

About

FRIEDA is a benchmark designed to stress-test open-ended, multi-step cartographic reasoning in large vision-language models (LVLMs).

FRIEDA is built from real map figures collected from documents and technical reports across diverse domains (e.g., geology, urban planning, environmental assessment) and geographic regions. Grounded in GIS theory, FRIEDA spans a diverse spectrum of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). Questions are deliberately compositional: every example requires multi-hop inference, and many demand cross-map grounding, where evidence must be located and integrated across multiple maps.

There are two versions of FRIEDA:

direct: This split is to explicitly test cartographic
contextual: This split is to test the capability of getting the correct image and then answer the question

✨ Quick Start

To get started, please first set up the environment:

# Install requirements to run FRIEDA evaluation
pip install -r requirements.txt

# Install PyTorch satisfying system requirement
# Refer to: https://pytorch.org/get-started/locally/
pip install torch torchvision

# You are suggested to use `flash-attn` for running LVLMs
pip install packaging ninja
pip install flash-attn --no-build-isolation
# Note: if you have installation problem, consider using pre-built
# wheels from https://github.com/Dao-AILab/flash-attention/releases

API Keys

To use any of the proprietary model, set the API keys:

⏬ View required APIs :: click to expand ::

Access OpenAI APIs from OpenAI Console

export OPENAI_API_KEY=<your_openai_api_key>

Access Anthropic APIs from Anthropic Console

export ANTHROPIC_API_KEY=<your_anthropic_api_key>

Access Gemini APIs from Google AI Studio

export GOOGLE_API_KEY=<your_google_api_key>

💻 Evaluation

python3 main.py test \
  --model meta-llama/Meta-Llama-3.1-8B-Instruct \   # LVLM model to run
  --split [direct|contextual] \                     # Select between data splits
  --result_dir ./results \                          # Directory to store the results
  --batch_size 8 \                                  # To set batch size for open_source model
  [--use_flash] \                                   # To run flash attention
  [--evaluate] \                                    # Run performance evaluation

The answer given by the LVLM will be stored as name [model_name]--frieda-[direct|contextual].json
The accuracy result of each LVLM will be stored under name [model_name]--frieda-[direct|contextual]_eval_results.txt

🗺️ LVLM Responses

We also share the results from LVLMs we have evaluated on both the direct and contextual set:

Check the results branch which includes them.

📜 Citation

@misc{friedabenchmark2025,
      title={FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models}, 
      author={Jiyoon Pyo and Yuankun Jiao and Dongwon Jung and Zekun Li and Leeje Jang and Sofia Kirsanova and Jina Kim and Yijun Lin and Qin Liu and Junyi Xie and Hadi Askari and Nan Xu and Muhao Chen and Yao-Yi Chiang},
      year={2025},
      eprint={2512.08016},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.08016}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
_inprog_clean		_inprog_clean
data_collection		data_collection
download		download
evaluate		evaluate
visualize		visualize
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

About

✨ Quick Start

API Keys

💻 Evaluation

🗺️ LVLM Responses

📜 Citation

About

Uh oh!

Contributors 4

Uh oh!

Languages

knowledge-computing/FRIEDA

Folders and files

Latest commit

History

Repository files navigation

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

About

✨ Quick Start

API Keys

💻 Evaluation

🗺️ LVLM Responses

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 4

Uh oh!

Languages