📚 MyBible - Research Paper Bibliography Manager

A modern, feature-rich CLI tool for managing a curated collection of research papers with beautiful terminal output, interactive visualizations, and comprehensive testing.

Overview

MyBible is a comprehensive bibliography management system designed to help researchers organize, track, and analyze their research paper collections. This repository contains a curated list of important research papers (primarily from AI research) with tools to:

✨ Add papers from arXiv or manually with beautiful CLI prompts
🔗 Add repositories from GitHub and Hugging Face
📊 Generate markdown tables for easy navigation and sharing
📖 Export to BibTeX for use in LaTeX documents
🕸️ Visualize citation networks with interactive HTML graphs
✅ Track duplicates with intelligent detection
🧪 Comprehensive test coverage for reliability

The papers are organized into categories based on their topics, with each entry including title, authors, journal, publication year, and DOI links for easy access.

Quick Start

Installation

# Clone and set up the environment
git clone git@github.com:art-test-stack/MyBible.git
cd MyBible
uv sync

Adding Papers

From arXiv

mybib add-arxiv <arxiv_url> --category <category_name>

Example:

mybib add-arxiv https://arxiv.org/abs/2401.00001 --category "Machine Learning"

Automated Google Scholar Search

mybib add --title "Attention is all you need" --category "Machine Learning"

From GitHub or Hugging Face

mybib add-repo <repo_url> --category <category_name>

Examples:

mybib add-repo https://github.com/openai/whisper --category "Speech"
mybib add-repo https://huggingface.co/google/gemma-2-2b --category "LLM"

Manual Entry

mybib add --title "<title>" --authors "<author1>, <author2>, ..." \
  --journal "<journal>" --year <year> --doi "<doi>" --category <category>

Generating Output

Markdown Tables

mybib markdown --file references.csv --output references.md

BibTeX Export

mybib bibtex --file references.csv --output references.bib

Backfill Missing BibTeX

mybib sync-bibtex --file references.csv

This stores BibTeX entries in files under bibtex_entries/ (configurable), and keeps only paths in references.csv.

Citation Network Graph

mybib graph --file references.csv --output citation_graph.html

Features

🎨 Modern CLI User Experience

All commands feature:

Colored output: Success (✓), error (✗), warning (⚠), and info (ℹ) messages
Progress indicators: Smooth animations when fetching from APIs
Confirmation prompts: Safe defaults for destructive actions
Beautiful tables: Rich formatting for better readability

Example of adding a paper:

Title: Attention is all you need
Authors: Vaswani et al.
Journal: NeurIPS
Year: 2017
DOI: 10.1038/nature12373

Add 'Attention is all you need' to category 'Machine Learning'? [y/N]: y
✓ Reference added successfully to category 'Machine Learning'

📊 Markdown Table Generation

Automatically generate organized markdown tables from your bibliography:

Tables organized by category
Columns: Title, Authors, Journal, Year, DOI
Clickable DOI links
Proper author name formatting (et al. for 3+ authors)
Sorted by category and publication year

📖 BibTeX Export

Generate standard BibTeX files for LaTeX documents with properly formatted entries including:

Author names
Publication title
Journal/Venue
Publication year
DOI formatting

When available, MyBible now fetches and stores source BibTeX entries in files:

add-arxiv fetches citation BibTeX from arXiv
add-repo fetches citation BibTeX from repository README files (GitHub/Hugging Face)
sync-bibtex backfills missing BibTeX for references already stored in CSV

🕸️ Citation Network Visualization

Build interactive citation graphs showing how papers in your library cite each other:

Network building: Queries Crossref API for citation relationships
Interactive visualization: Zoom, pan, and drag nodes
Physics simulation: Automatic layout using Barnes-Hut algorithm
Metadata on hover: View paper details without clicking

Features:

Directed graph representation (A → B means A cites B)
Only includes edges between papers in your library
Color-coded visualization
Handles network errors gracefully with retry logic

Usage:

# Generate citation graph with verbose output
mybib graph --file references.csv --output my_citations.html --verbose

✅ Duplicate Detection

Built-in duplicate detection when adding new papers:

DOI-based matching
Case-insensitive comparison
Whitespace normalization
Prevents accidental duplicates in your bibliography

🎯 Recent Improvements (v2.0)

✨ Enhanced Data Quality

Authors Formatting

Proper "FirstAuthor et al." format instead of just "al."
Team name detection and display (K2 Team, DeepSeek-Ai, Mistral, etc.)
Intelligently handles both individual and organizational authors

ArxivID Precision

Fixed float rounding errors (2405.10938 now displays correctly, not 2405.11)
ArxivID stored as string to preserve full precision

Scholar Metadata Extraction

Improved year extraction for Google Scholar articles (full 4-digit years)
Better DOI extraction with intelligent fallback to Scholar IDs
Enhanced regex patterns for robust metadata parsing

🏷️ Category Management System

ID-based categories: Each category assigned a unique ID with persistent mappings
Case-insensitive normalization: "LLM Basics" and "llm basics" treated as same category
Interactive selection: Choose categories by ID or create new ones on-the-fly
Category persistence: All mappings stored in categories.json

# Interactive category selection during add
mybib add-arxiv https://arxiv.org/abs/2301.00001
# Shows: Available categories: 1: Alignment, 2: Deep Learning, 3: Machine Learning

🗄️ Database Foundation (SQLAlchemy ORM)

Scalable SQL database support for advanced features:

New Commands:

# Initialize database
mybib db-init --db-url sqlite:///bibliography.db

# Migrate existing CSV to database
mybib db-migrate --file references.csv --db-url sqlite:///bibliography.db

# Export database back to CSV
mybib db-export --output backup.csv --db-url sqlite:///bibliography.db

Features:

SQLite default, supports any SQLAlchemy-compatible database (PostgreSQL, MySQL, etc.)
Full referential integrity with foreign keys
Indexed queries for common search patterns
Non-destructive migration (export back to CSV anytime)
Duplicate detection based on DOI

Benefits:

Foundation for advanced search and filtering
Ready for future enhancements (tags, annotations, full-text search)
Better performance with large reference collections
API layer ready for remote access

Architecture

Project Structure

MyBible/
├── pkg/mybib/              # Main package
│   ├── __init__.py
│   ├── cli.py              # CLI command handlers
│   ├── storage.py          # CSV storage operations
│   ├── arxiv.py            # arXiv API integration
│   ├── scholar.py          # Google Scholar integration
│   ├── citation.py         # Citation BibTeX fetching and parsing
│   ├── metadata.py         # Metadata management
│   ├── markdown.py         # Markdown generation
│   ├── bibtex.py           # BibTeX export
│   ├── graph.py            # Citation graph features
│   ├── ui.py               # Terminal UI utilities
│   ├── utils.py            # Utility functions
│   ├── categories.py       # Category management system
│   ├── models.py           # SQLAlchemy ORM models
│   └── db_storage.py       # Database storage adapter
├── tests/                  # Test suite
│   ├── test_storage.py
│   ├── test_arxiv.py
│   ├── test_markdown.py
│   ├── test_metadata.py
│   ├── test_scholar.py
│   └── __init__.py
├── references.csv          # Bibliography database (CSV)
├── categories.json         # Category ID mappings
├── pyproject.toml          # Project configuration
├── pytest.ini              # Pytest configuration
├── IMPROVEMENTS_SUMMARY.md # Detailed changelog for v2.0
└── README.md              # This file

Core Modules

cli.py: Command-line interface with rich formatting and category prompts
storage.py: CSV file handling with ArxivID support and duplicate detection
arxiv.py: arXiv metadata fetching with error handling
scholar.py: Google Scholar integration with improved metadata extraction
citation.py: BibTeX extraction from arXiv and repository README files
metadata.py: Reference metadata management
markdown.py: Markdown table generation with category support and author formatting
bibtex.py: BibTeX export functionality
graph.py: Citation network building and visualization
ui.py: Terminal UI components (colors, progress, confirmations)
categories.py: Category management with ID-based persistence
models.py: SQLAlchemy ORM models for database support
db_storage.py: Database storage adapter with migration capabilities
utils.py: Utility functions including enhanced author name formatting

Dependencies

Core dependencies (installed via uv sync):

pandas: CSV data handling
requests: HTTP requests for APIs
rich: Beautiful terminal output
networkx: Graph algorithms and data structures
pyvis: Interactive network visualization
sqlalchemy: ORM framework for database abstraction

Development dependencies:

pytest: Testing framework
pytest-cov: Code coverage reporting

[!Note] See tests/README.md for details on the comprehensive test suite covering modules.

CLI Commands

Reference Management

# View help
mybib --help

# Add reference from arXiv
mybib add-arxiv https://arxiv.org/abs/2301.00001 [--category <name>]

# Add reference from GitHub or Hugging Face repository
mybib add-repo <repo_url> [--category <name>]

# Add reference from Google Scholar (with interactive search)
mybib add-scholar --title "<article name>" [--category <name>]

# Add reference manually
mybib add --title "<title>" [--authors] [--journal] [--year] [--doi] [--category]

# Fetch and store missing BibTeX for existing references
mybib sync-bibtex [--file references.csv] [--force] [--bibtex-dir bibtex_entries]

# View help for specific commands
mybib add-arxiv --help
mybib add-repo --help
mybib add-scholar --help
mybib add --help
mybib sync-bibtex --help

Output Generation

# Generate markdown tables
mybib markdown --file references.csv --output references.md [--by-category]

# Generate BibTeX file
mybib bibtex --file references.csv --output references.bib

# Build citation network graph
mybib graph --file references.csv --output citation_graph.html [--verbose]

Database Operations (v2.0)

# Initialize database
mybib db-init [--db-url sqlite:///bibliography.db]

# Migrate CSV to database
mybib db-migrate --file references.csv [--db-url sqlite:///bibliography.db]

# Export database back to CSV
mybib db-export --output backup.csv [--db-url sqlite:///bibliography.db]

Data Format

References are stored in references.csv with the following columns:

Title: Paper title
Authors: Author names (comma-separated)
Journal: Publication venue
Year: Publication year
DOI: Digital Object Identifier
Category: Research topic category
Link: URL (optional)
ArxivID: arXiv identifier (optional)
BibTeX: Legacy inline BibTeX field (kept for backward compatibility)
BibTeXPath: Path to BibTeX file stored on disk

Categories are managed in categories.json with ID-to-name mappings for case-insensitive organization.

Changelog

v2.0 (Latest)

Major improvements to data quality and scalability:

✨ Improvements:

Auto format authors as "FirstAuthor et al." with team name detection
Fixed ArxivID display precision (no more float rounding errors)
Enhanced Scholar metadata extraction (full year extraction, better DOI finding)
New category management system with persistent ID mappings
Foundation for database support with SQLAlchemy ORM

New Features:

Database initialization and migration commands
CSV ↔ Database conversion tools
Interactive category selection by ID during reference addition

See IMPROVEMENTS_SUMMARY.md for detailed technical documentation.

v1.0

Initial release with CSV-based storage, arXiv/Scholar/manual entry, markdown/BibTeX export, and citation graph visualization.

Future Enhancements

Potential features enabled by v2.0 database foundation:

Advanced search and filtering
Paper summaries and reading notes
Reading progress tracking
Topic clustering visualization
Export to other formats (RIS, Zotero)
Full-text search capabilities
Tag and annotation system
API layer for remote access

Contributing

Contributions are welcome! Feel free to:

Improve the CLI interface
Enhance visualization features
Expand test coverage
Report bugs or suggest improvements

Aknowledgements

Inspired by my need for better bibliography management tools. After struggling with manual CSV files and clunky reference managers, I wanted a modern, customizable solution that fits my workflow. MyBible is the result of that vision. Alternatively, there are paperlib which seems to be a better tool for general use cases.
I have started this project with "traditional" coding practices, but at some point (exactly from commit d8f992f) I have switched to "vibe coding" practices with Claude Haiku 4.5. Hence, I have not written most of the features.
The project is still in early stages, so there are many rough edges and missing features. Hence, it is mainly for my personal use, so it works well for computer science research. I am open to contributions and suggestions to make it better!

Example of output markdown table generated by `mybib markdown`

LLMs Basics

Title	Author(s)	Journal	Year	DOI
Attention is all you need	Vaswani et al.	arXiv	2017	1706.03762
Shampoo: Preconditioned Stochastic Tensor Optimization	Gupta et al.	arXiv	2018	1802.09568
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding	Devlin et al.	arXiv	2018	1810.04805
Language models are unsupervised multitask learners	Radford et al.	OpenAI	2019	unsupervised-multitask
Language Models are Few-Shot Learners	Brown et al.	arXiv	2020	2005.14165
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention	Katharopoulos et al.	arXiv	2020	2006.16236
Efficient Transformers: A Survey	Tay et al.	arXiv	2020	2009.06732
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	Fedus et al.	ICML	2022	2101.03961
RoFormer: Enhanced Transformer with Rotary Position Embedding	Su et al.	arXiv	2021	2104.09864
LoRA: Low-Rank Adaptation of Large Language Models	Hu et al.	ICLR	2021	2106.09685
Training Compute-Optimal Large Language Models	Hoffmann et al.	arXiv	2022	2203.15556
PaLM: Scaling Language Modeling with Pathways	Chowdhery et al.	arXiv	2022	2204.02311
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	Dao et al.	NeurIPS	2022	2205.14135
QLoRA: Efficient Finetuning of Quantized LLMs	Dettmers et al.	arXiv	2023	2305.14314
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Dao	arXiv	2023	2307.08691
YaRN: Efficient Context Window Extension of Large Language Models	Peng et al.	arXiv	2023	2309.00071
Effective Long-Context Scaling of Foundation Models	Xiong et al.	arXiv	2023	2309.16039
Mistral 7B	Jiang et al.	arXiv	2023	2310.06825
Mamba: Linear-Time Sequence Modeling with Selective State Spaces	Gu and Dao	NeurIPS	2023	2312.00752
How to Train Long-Context Language Models (Effectively)	Gao et al.	arXiv	2024	2410.02660
The Zamba2 Suite: Technical Report	Glorion et al.	arXiv	2024	2411.15242
Muon is Scalable for LLM Training	Liu et al.	2025	arXiv	2502.16982
KIMI K2: OPEN AGENTIC INTELLIGENCE	Kimi Team	arXiv	2025	2507.20534

LLM Datasets

Title	Author(s)	Journal	Year	DOI
SQUAD: 100,000+ Questions for Machine Comprehension of Text	Rajpurkar et al.	arXiv	2016	1606.05250
StarCoder 2 and The Stack v2: The Next Generation	Lozhkov et al.	arXiv	2024	2402.19173

Time-Series Foundationnal Models

Title	Author(s)	Journal	Year	DOI
Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting	Lim et al.	arXiv	2020	1912.09363
N-HiTS: Neural Hierarchical Interpolation for Time Series Forecasting	Challu et al.	arXiv	2022	2201.12886

Diffusion

Title	Author(s)	Journal	Year	DOI
Denoising Diffusion Probabilistic Models	Ho et al.	NeurIPS	2020	2006.11239
Generative Diffusion Models on Graphs: Methods and Applications	Liu et al.	arXiv	2023	2302.02591
dLLM: Simple Diffusion Language Modeling	Zhou et al.	arXiv	2026	2602.22661

EB-Models

Title	Author(s)	Journal	Year	DOI
A tutorial on Energy-Based Learning	LeCun et al.	MIT Press	2006	eb-learning
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One	Grathwohl et al.	arXiv	2019	1912.03263
How to Train Your Energy-Based Models	Song et al.	arXiv	2021	2101.03288
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly	Yen et al.	arXiv	2024	2410.02694
Energy-Based Transformers are Scalable Learners and Thinkers	Gladstone et al.	arXiv	2025	2507.02092

Alignment

Tsetlin Machines Articles

Title	Author(s)	Journal	Year	DOI
The Tsetlin Machine -- A Game Theoretic Bandit Driven Approach to Optimal Pattern Recognition with Propositional Logic	Ole-Christoffer Granmo	arXiv	2018	1804.01508
Label-Critic Tsetlin Machine: A Novel Self-supervised Learning Scheme for Interpretable Clustering	Abouzeid et al.	IEEE	2022	10.1109/ISTM54910.2022.00016

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.github/workflows		.github/workflows
copilot		copilot
lib		lib
pkg		pkg
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
test_output.md		test_output.md

Folders and files

Latest commit

History

Repository files navigation

📚 MyBible - Research Paper Bibliography Manager

Overview

Quick Start

Installation

Adding Papers

From arXiv

Automated Google Scholar Search

From GitHub or Hugging Face

Manual Entry

Generating Output

Markdown Tables

BibTeX Export

Backfill Missing BibTeX

Citation Network Graph

Features

🎨 Modern CLI User Experience

📊 Markdown Table Generation

📖 BibTeX Export

🕸️ Citation Network Visualization

✅ Duplicate Detection

🎯 Recent Improvements (v2.0)

✨ Enhanced Data Quality

🏷️ Category Management System

🗄️ Database Foundation (SQLAlchemy ORM)

Architecture

Project Structure

Core Modules

Dependencies

CLI Commands

Reference Management

Output Generation

Database Operations (v2.0)

Data Format

Changelog

v2.0 (Latest)

v1.0

Future Enhancements

Contributing

Aknowledgements

Example of output markdown table generated by mybib markdown

LLMs Basics

LLM Datasets

Time-Series Foundationnal Models

Diffusion

EB-Models

Alignment

Tsetlin Machines Articles

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Example of output markdown table generated by `mybib markdown`

Packages