Skip to content

[ICLR'26] FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

Notifications You must be signed in to change notification settings

knowledge-computing/FRIEDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

158 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

✨ Quick Start💻 Evaluation🗺️ LVLM Responses📜 Citation

About

FRIEDA is a benchmark designed to stress-test open-ended, multi-step cartographic reasoning in large vision-language models (LVLMs).

FRIEDA is built from real map figures collected from documents and technical reports across diverse domains (e.g., geology, urban planning, environmental assessment) and geographic regions. Grounded in GIS theory, FRIEDA spans a diverse spectrum of spatial relations: topological (border, equal, intersect, within), metric (distance), and directional (orientation). Questions are deliberately compositional: every example requires multi-hop inference, and many demand cross-map grounding, where evidence must be located and integrated across multiple maps.

There are two versions of FRIEDA:

  • direct: This split is to explicitly test cartographic
  • contextual: This split is to test the capability of getting the correct image and then answer the question

✨ Quick Start

To get started, please first set up the environment:

# Install requirements to run FRIEDA evaluation
pip install -r requirements.txt

# Install PyTorch satisfying system requirement
# Refer to: https://pytorch.org/get-started/locally/
pip install torch torchvision

# You are suggested to use `flash-attn` for running LVLMs
pip install packaging ninja
pip install flash-attn --no-build-isolation
# Note: if you have installation problem, consider using pre-built
# wheels from https://github.com/Dao-AILab/flash-attention/releases

API Keys

To use any of the proprietary model, set the API keys:

⏬ View required APIs :: click to expand ::

Access OpenAI APIs from OpenAI Console

export OPENAI_API_KEY=<your_openai_api_key>

Access Anthropic APIs from Anthropic Console

export ANTHROPIC_API_KEY=<your_anthropic_api_key>

Access Gemini APIs from Google AI Studio

export GOOGLE_API_KEY=<your_google_api_key>

💻 Evaluation

python3 main.py test \
  --model meta-llama/Meta-Llama-3.1-8B-Instruct \   # LVLM model to run
  --split [direct|contextual] \                     # Select between data splits
  --result_dir ./results \                          # Directory to store the results
  --batch_size 8 \                                  # To set batch size for open_source model
  [--use_flash] \                                   # To run flash attention
  [--evaluate] \                                    # Run performance evaluation
  • The answer given by the LVLM will be stored as name [model_name]--frieda-[direct|contextual].json
  • The accuracy result of each LVLM will be stored under name [model_name]--frieda-[direct|contextual]_eval_results.txt

🗺️ LVLM Responses

We also share the results from LVLMs we have evaluated on both the direct and contextual set:

Check the results branch which includes them.

📜 Citation

@misc{friedabenchmark2025,
      title={FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models}, 
      author={Jiyoon Pyo and Yuankun Jiao and Dongwon Jung and Zekun Li and Leeje Jang and Sofia Kirsanova and Jina Kim and Yijun Lin and Qin Liu and Junyi Xie and Hadi Askari and Nan Xu and Muhao Chen and Yao-Yi Chiang},
      year={2025},
      eprint={2512.08016},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2512.08016}, 
}

About

[ICLR'26] FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision Language Models

Resources

Stars

Watchers

Forks

Contributors 4

  •  
  •  
  •  
  •  

Languages