Skip to content
View bihari-bhau's full-sized avatar
๐ŸŽฏ
Focusing
๐ŸŽฏ
Focusing

Block or report bihari-bhau

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please donโ€™t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this userโ€™s behavior. Learn more about reporting abuse.

Report abuse
bihari-bhau/README.md
Typing SVG

$ who am I?
  Shubham | bihari-bhau | Gurugram, India ๐Ÿ‡ฎ๐Ÿ‡ณ

$ cat current_role.txt
  LLM Post-Training Intern @ Ethara AI
  โ†’ Benchmarking AI coding agents (Kaiju pipeline)
  โ†’ Built on the Commit0 paper (ICLR 2025)
  โ†’ Evaluating: GPT-4, Claude 3.5, Gemini and more

$ cat stack.json
  {
    "languages":   ["Python", "TypeScript", "JavaScript", "Java", "SQL"],
    "backend":     ["FastAPI", "Node.js", "PostgreSQL", "SQLAlchemy", "Alembic"],
    "frontend":    ["React", "Tailwind CSS", "HTML/CSS"],
    "ai_ml":       ["LLM Eval", "RLHF", "AST Manipulation", "Pytest Benchmarking"],
    "devops":      ["Docker", "Docker Compose", "Git", "GitHub Actions"],
    "tools":       ["n8n", "Postman", "VS Code"]
  }

โšก Kaiju โ€” AI Coding Agent Benchmarking Pipeline

Benchmarks AI agents on reconstructing Python libraries from scratch.
Based on Commit0 (arXiv:2412.01769, ICLR 2025).

GitHub Repos  โ†’  AST Stripper  โ†’  Stubs  โ†’  AI Agent  โ†’  pytest  โ†’  Score
(2000+ โญ)        (function       (empty     (Claude /     (pass      (ethara
 80%+ Python)      bodies โ†’ โˆ…)    shells)    GPT-4 / ...)   rate)      splits)

Custom Ethara splits:
ethara โ†’ 8 libraries ย |ย  ethara-lite โ†’ 4 libraries


๐Ÿ›  Projects

Project Stack What it does
๐Ÿฆ– Kaiju Python ยท AST ยท pytest AI coding agent benchmarking pipeline
๐Ÿ“Š rlhf-eval React ยท FastAPI ยท PostgreSQL ยท Docker Full-stack RLHF dataset builder with pairwise comparisons, JSONL export
๐Ÿ› ๏ธ LLM Toolkit (https://llm-toolkit.vercel.app/) Next.js ยท TypeScript ยท TailwindCSS ยท Supabase Modular toolkit for experimenting with LLM prompts, evaluations, and dataset workflows
๐ŸŒฆ Weather-Aware Order Checker Node.js ยท OpenWeatherMap API Order decisions based on real-time weather via Promise.all
๐ŸŽฏ Lead Sniper n8n ยท LLM ยท Slack/Discord GitHub stargazer โ†’ enrichment โ†’ LLM pitch โ†’ auto-delivery
๐Ÿ“š Bihar Skill Hub HTML ยท CSS ยท REACT Ed-tech platform concept for Bihar's skill gap
๐Ÿฝ Meal-Buddy Python ยท Django ยท FastAPI Meal planning and suggestion API

๐Ÿ“ˆ GitHub Stats

ย ย 

๐Ÿง  Internship Metrics @ Ethara AI

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ๐Ÿ”ฌ  LLMs Evaluated          โ†’  6+  (GPT-4, Claude, Gemini...)  โ”‚
โ”‚  ๐Ÿ“ฆ  Python Repos Processed  โ†’  100+ (repo_finder.py pipeline)  โ”‚
โ”‚  โœ…  Eval Criteria           โ†’  38   (per repo, automated)      โ”‚
โ”‚  ๐Ÿ“  Annotated RLHF Samples  โ†’  500+                            โ”‚
โ”‚  ๐Ÿ—   Custom Benchmark Splits โ†’  2    (ethara / ethara-lite)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐ŸŽ“ Background

education = {
    "degree":     "B.Tech โ€” Electrical & Electronics Engineering",
    "college":    "Sershah Engineering College, Bihar",
    "batch":      "2025",
    "training":   "Java Full Stack @ JSpiders, Noida",
    "transition": "EEE โ†’ Software โ†’ AI/ML Engineering ๐Ÿš€"
}

๐Ÿ”— Connect

LinkedIn GitHub Portfolio Email


github contribution grid snake animation

"Ship it. Benchmark it. Improve it."

Pinned Loading

  1. shubham-portfolio shubham-portfolio Public

    Personal portfolio โ€” LLM Post-Training Engineer & Full-Stack Developer. Built with React + Vite, deployed on Vercel.

    JavaScript

  2. llm-response-evaluator llm-response-evaluator Public

    A Streamlit tool to evaluate and compare LLM responses across 5 RLHF-inspired quality dimensions โ€” Instruction Following, Truthfulness, Prompt Correctness, Writing Quality, and Verbosity.

    Python 1 1

  3. bihar-skill-hub bihar-skill-hub Public

    A fully deployed educational platform targeting students in Bihar, featuring 33 courses across 11 skill categories. Built with JWT authentication, course enrollment, user profiles, success stories,โ€ฆ

    CSS

  4. llm-toolkit llm-toolkit Public

    AI-powered LLM evaluation toolkit - Prompt Quality Scorer & Multi-turn Conversation Analyzer. Built with Next.js + Claude API.

    TypeScript 1