$ who am I?
Shubham | bihari-bhau | Gurugram, India ๐ฎ๐ณ
$ cat current_role.txt
LLM Post-Training Intern @ Ethara AI
โ Benchmarking AI coding agents (Kaiju pipeline)
โ Built on the Commit0 paper (ICLR 2025)
โ Evaluating: GPT-4, Claude 3.5, Gemini and more
$ cat stack.json
{
"languages": ["Python", "TypeScript", "JavaScript", "Java", "SQL"],
"backend": ["FastAPI", "Node.js", "PostgreSQL", "SQLAlchemy", "Alembic"],
"frontend": ["React", "Tailwind CSS", "HTML/CSS"],
"ai_ml": ["LLM Eval", "RLHF", "AST Manipulation", "Pytest Benchmarking"],
"devops": ["Docker", "Docker Compose", "Git", "GitHub Actions"],
"tools": ["n8n", "Postman", "VS Code"]
}Benchmarks AI agents on reconstructing Python libraries from scratch.
Based on Commit0 (arXiv:2412.01769, ICLR 2025).
GitHub Repos โ AST Stripper โ Stubs โ AI Agent โ pytest โ Score
(2000+ โญ) (function (empty (Claude / (pass (ethara
80%+ Python) bodies โ โ
) shells) GPT-4 / ...) rate) splits)
Custom Ethara splits:
ethara โ 8 libraries ย |ย ethara-lite โ 4 libraries
| Project | Stack | What it does |
|---|---|---|
| ๐ฆ Kaiju | Python ยท AST ยท pytest | AI coding agent benchmarking pipeline |
| ๐ rlhf-eval | React ยท FastAPI ยท PostgreSQL ยท Docker | Full-stack RLHF dataset builder with pairwise comparisons, JSONL export |
| ๐ ๏ธ LLM Toolkit (https://llm-toolkit.vercel.app/) | Next.js ยท TypeScript ยท TailwindCSS ยท Supabase | Modular toolkit for experimenting with LLM prompts, evaluations, and dataset workflows |
| ๐ฆ Weather-Aware Order Checker | Node.js ยท OpenWeatherMap API | Order decisions based on real-time weather via Promise.all |
| ๐ฏ Lead Sniper | n8n ยท LLM ยท Slack/Discord | GitHub stargazer โ enrichment โ LLM pitch โ auto-delivery |
| ๐ Bihar Skill Hub | HTML ยท CSS ยท REACT | Ed-tech platform concept for Bihar's skill gap |
| ๐ฝ Meal-Buddy | Python ยท Django ยท FastAPI | Meal planning and suggestion API |
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฌ LLMs Evaluated โ 6+ (GPT-4, Claude, Gemini...) โ
โ ๐ฆ Python Repos Processed โ 100+ (repo_finder.py pipeline) โ
โ โ
Eval Criteria โ 38 (per repo, automated) โ
โ ๐ Annotated RLHF Samples โ 500+ โ
โ ๐ Custom Benchmark Splits โ 2 (ethara / ethara-lite) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
education = {
"degree": "B.Tech โ Electrical & Electronics Engineering",
"college": "Sershah Engineering College, Bihar",
"batch": "2025",
"training": "Java Full Stack @ JSpiders, Noida",
"transition": "EEE โ Software โ AI/ML Engineering ๐"
}

