RAG-powered chatbot for querying Euclid mission documents...
Euclid RAG is an open-source Retrieval-Augmented Generation (RAG) system designed to provide efficient document retrieval and knowledge augmentation for the Euclid scientific community. The project aims to integrate local Large Language Models (LLMs) with a vector database to retrieve, process, and generate relevant scientific information.
This project was initially forked from the Rubin Observatory's Rubin RAG system. While we are working in consultation and knowledge-sharing with Rubin developers, Euclid RAG is evolving in a different direction to meet the specific needs of the Euclid collaboration. Key differences will include:
- A focus on local deployment without API-based LLM dependencies.
- Different document retrieval strategies tailored to Euclid's scientific workflows.
- Potential agentic capabilities to enhance automated knowledge retrieval and processing.
Install euclid_rag in development mode:
git clone https://github.com/yourusername/euclid_rag.git
cd euclid_rag
pip install -e .
euclid_rag is developed by Euclid Consortium Science Ground Segment members at https://github.com/jeipollack/euclid_rag.
The best way to start contributing to rubin_rag is by cloning this repository, creating a virtual environment, and running the make init command:
git clone https://github.com/jeipollack/euclid_rag
cd euclid_rag
python3 -m venv .venv
source .venv/bin/activate
make init
Before running the chatbot, you must ingest data and build the vector store.
The location and type of the vector store(s) are defined in app_config.yaml, for example:
vector_store:
type: "faiss"
redmine_index_dir: "redmine_vector_store"
public_data_index_dir: "public_data_vector_store"
- type — currently only "faiss" is supported.
- {prefix}_index_dir — path where the FAISS index files (index.faiss, index.pkl) will be stored.
This can be a relative path (within the repo) or an absolute path.
If the vector store is missing, the app will fail to start with:
RuntimeError: Vector store missing. Please run ingestion before launching the app.
python python/euclid/rag/ingestion/ingest_publications.py -c /path/to/config_file
By default, this uses the python/euclid/rag/app_config.yaml file in the repository.
To use a different config file, pass the -c / --config option:
python python/euclid/rag/ingestion/ingest_publications.py -c /path/to/config_file
Once the vector store has been built, you can launch the chatbot:
cd python/euclid
streamlit run rag/app.pyYou can run tests and build documentation with tox:
toxTo learn more about the individual environments:
tox -av