Skip to content
View ResearchAgents's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report ResearchAgents

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
ResearchAgents/README.md

Zhen Liu

PhD Candidate in Computer Science (D-INFK), ETH Zurich
Research focus: Multimodal Large Language Models (MLLMs), Vision-Language Reasoning, and Efficient VLM Systems.

About

I build lightweight, reproducible tools for multimodal research workflows, with an emphasis on:

  • retrieval and grounding for document-centric QA
  • evaluation pipelines for VLM experiments
  • efficiency-oriented methods for visual token reduction

Growth Snapshot (2024-2026)

  • 2024: Started building compact research utilities for multimodal retrieval and evaluation.
  • 2025: Expanded to reusable CLI tools, testable pipelines, and benchmark-style experimentation.
  • 2026: Focusing on robust multimodal systems for long-context documents and efficient inference.

Selected Open-Source Projects

  • multimodal-doc-rag: citation-aware multimodal retrieval and context building toolkit.
  • vlm-eval-lite: minimal and reproducible multimodal QA evaluation runner.
  • sparse-vl: simulation toolkit for visual-token sparsification strategies in VLM inference.

Current Interests

  • grounded multimodal RAG for PDFs and technical reports
  • long-context VLM evaluation and failure analysis
  • practical methods for reducing multimodal serving cost

Last updated: February 2026.

Popular repositories Loading

  1. ResearchAgents ResearchAgents Public

    Profile README and research highlights for multimodal LLM work

  2. multimodal-doc-rag multimodal-doc-rag Public

    A lightweight pipeline for multimodal document retrieval and QA using vision-language models

    Python

  3. vlm-eval-lite vlm-eval-lite Public

    Minimal evaluation toolkit for Vision-Language Models — VQA, MMBench, SEED-Bench and more

    Python

  4. sparse-vl sparse-vl Public

    Token pruning and sparse attention for efficient Vision-Language Model inference

    Python