Shubham Singh bihari-bhau

$ who am I?
  Shubham | bihari-bhau | Gurugram, India 🇮🇳

$ cat current_role.txt
  LLM Post-Training Intern @ Ethara AI
  → Benchmarking AI coding agents (Kaiju pipeline)
  → Built on the Commit0 paper (ICLR 2025)
  → Evaluating: GPT-4, Claude 3.5, Gemini and more

$ cat stack.json
  {
    "languages":   ["Python", "TypeScript", "JavaScript", "Java", "SQL"],
    "backend":     ["FastAPI", "Node.js", "PostgreSQL", "SQLAlchemy", "Alembic"],
    "frontend":    ["React", "Tailwind CSS", "HTML/CSS"],
    "ai_ml":       ["LLM Eval", "RLHF", "AST Manipulation", "Pytest Benchmarking"],
    "devops":      ["Docker", "Docker Compose", "Git", "GitHub Actions"],
    "tools":       ["n8n", "Postman", "VS Code"]
  }

⚡ Kaiju — AI Coding Agent Benchmarking Pipeline

Benchmarks AI agents on reconstructing Python libraries from scratch.
Based on Commit0 (arXiv:2412.01769, ICLR 2025).

GitHub Repos  →  AST Stripper  →  Stubs  →  AI Agent  →  pytest  →  Score
(2000+ ⭐)        (function       (empty     (Claude /     (pass      (ethara
 80%+ Python)      bodies → ∅)    shells)    GPT-4 / ...)   rate)      splits)

Custom Ethara splits:
ethara → 8 libraries | ethara-lite → 4 libraries

🛠 Projects

Project	Stack	What it does
🦖 Kaiju	Python · AST · pytest	AI coding agent benchmarking pipeline
📊 rlhf-eval	React · FastAPI · PostgreSQL · Docker	Full-stack RLHF dataset builder with pairwise comparisons, JSONL export
🛠️ LLM Toolkit (https://llm-toolkit.vercel.app/)	Next.js · TypeScript · TailwindCSS · Supabase	Modular toolkit for experimenting with LLM prompts, evaluations, and dataset workflows
🌦 Weather-Aware Order Checker	Node.js · OpenWeatherMap API	Order decisions based on real-time weather via `Promise.all`
🎯 Lead Sniper	n8n · LLM · Slack/Discord	GitHub stargazer → enrichment → LLM pitch → auto-delivery
📚 Bihar Skill Hub	HTML · CSS · REACT	Ed-tech platform concept for Bihar's skill gap
🍽 Meal-Buddy	Python · Django · FastAPI	Meal planning and suggestion API

📈 GitHub Stats

🧠 Internship Metrics @ Ethara AI

┌──────────────────────────────────────────────────────────────────┐
│  🔬  LLMs Evaluated          →  6+  (GPT-4, Claude, Gemini...)  │
│  📦  Python Repos Processed  →  100+ (repo_finder.py pipeline)  │
│  ✅  Eval Criteria           →  38   (per repo, automated)      │
│  📝  Annotated RLHF Samples  →  500+                            │
│  🏗   Custom Benchmark Splits →  2    (ethara / ethara-lite)     │
└──────────────────────────────────────────────────────────────────┘

🎓 Background

education = {
    "degree":     "B.Tech — Electrical & Electronics Engineering",
    "college":    "Sershah Engineering College, Bihar",
    "batch":      "2025",
    "training":   "Java Full Stack @ JSpiders, Noida",
    "transition": "EEE → Software → AI/ML Engineering 🚀"
}

🔗 Connect

"Ship it. Benchmark it. Improve it."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shubham Singh bihari-bhau

Achievements

Achievements

Block or report bihari-bhau

⚡ Kaiju — AI Coding Agent Benchmarking Pipeline

🛠 Projects

📈 GitHub Stats

🧠 Internship Metrics @ Ethara AI

🎓 Background

🔗 Connect

Pinned Loading

Uh oh!