PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
-
Updated
Apr 16, 2026 - Python
PDF extraction that checks its own work. #2 reading order accuracy — zero AI, zero GPU, zero cost.
Extract structured data from local or remote LLM models
Claude Code Skill for structured information extraction from code/docs/logs. 6-step Python pipeline (source grounding, dedup, confidence scoring, entity resolution, relation inference, KG injection). Zero dependencies, no API keys. Replaces LangExtract.
Reproducible diagnostic investigation of a fine-tuned SLM that scored 99.75% on evaluation and failed silently on 10% of production inputs. Full pipeline. Every number verified.
A simple llm library
Collection of purpose-built MCP servers for AI agent workflows.
news-summizr extracts structured summaries from headlines, labeling key points like announcement, products, region for quick insight.
Automated research paper analysis: PDF → JSON with evidence extraction using LLMs (DeepSeek, Gemma). Extracts methods, results, datasets, and claims with precise evidence grounding.
A new package is designed to facilitate structured, reliable extraction of key insights from user-provided texts about cultural topics. It accepts a text input, such as an article or discussion prompt
Source content for Vstorm blog posts—carefully crafted to provide both depth and clarity, with practical insights readers can apply immediately.
AI-agent-driven venue governance database. Extracts editorial boards and program committees from journal websites using local LLMs, with entity resolution against OpenAlex.
Agent Zero plugin for structured document extraction — invoices, recipes, prep lists. Powered by google/langextract with source grounding.
Evaluate local LLM accuracy on structured data extraction. Tests models' ability to extract JSON from unstructured text with ground-truth comparison, F1 scoring, and fuzzy matching. Supports MLX and Ollama backends. Generates interactive reports with charts and per-model analysis.
Automated prompt optimization using mentor-agent architecture. Generate and refine prompts from labeled data.
Multi-dimensional extraction engine for AI conversations.
📰 Extract structured summaries from news articles easily. Highlight key points like announcements, products, and regions with minimal effort.
💡 Extract key insights from cultural texts easily with summaryxtract, a Python package powered by LLMs for reliable and structured summarization.
Add a description, image, and links to the structured-extraction topic page so that developers can more easily learn about it.
To associate your repository with the structured-extraction topic, visit your repo's landing page and select "manage topics."