The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
-
Updated
Apr 16, 2026
The Anti-Hallucination data layer for B2B Sourcing. Deep-verified global supply chain entities designed for RAG and LLM instruction tuning.
AI-powered Q&A system for U.S. affordable housing policy using RAG over 2,500+ HUD documents and 24 CFR
A comprehensive Python tool for extracting, processing, and analyzing RPG scenarios from the Era of the Imperial Republic (EOTIR) forums. Features automated web scraping, NLP-powered content analysis, character extraction, timeline generation, and LLM dataset preparation with an interactive HTML dashboard.
This repository aims to provide a structured and easily accessible dataset of laws in Bangladesh. The data is primarily sourced from the Bangladesh Law (BDLAW) website.
Gittxt is an AI-focused CLI and plugin tool for extracting, filtering, and packaging text from GitHub repos. Build LLM-compatible datasets, prep code for prompt engineering, and power AI workflows with structured .txt, .json, .md, or .zip outputs.
ONE SYSTEM Knowledge Dataset — Structured training data from 11 AI, legal & technology brands by David Sanker. UAPK Business Compiler, Legal AI, AI Governance.
Prepare the Kleister NDA dataset for LLM-based extraction. Validates labels against a Pydantic schema and delivers partitioned Parquet with co-located PDFs
Autonomous MCP server for M2M patent intelligence. Delivers structured JSON datasets (CPC A-H) enriched with biz_value_prop, tech stacks, and importance scoring. Supports instant autonomous data purchasing via ROSE cryptocurrency.
Add a description, image, and links to the llm-dataset topic page so that developers can more easily learn about it.
To associate your repository with the llm-dataset topic, visit your repo's landing page and select "manage topics."