Thoughts on incorporating AI coding agents into real engineering workflows and what I learned so far.
I'm currently introducing AI-assisted development at the company I work at. This repository is my attempt to organize what I know, what I'm still figuring out, and what resources helped me the most.
- Why Should You Care
- The Risks Are Real
- Engineering Skills Still Matter
- Understanding LLMs
- How Agents Actually Work
- Context Management Is Everything
- Writing Good Prompts for Existing Agents
- What AI Makes Possible
- Credits and References
- License
Nolan Lawson wrote a great piece about the emotional side of this shift. He described it honestly: "The worst fact about these tools is that they work." He's not celebrating the new world, but he's not resisting it either.
Whether you like it or not, AI coding tools are changing how we work. Your junior colleagues are already using Cursor, Claude Code, Copilot. They write code faster. Not always better, but faster. And the tools keep improving.
The question isn't "should I use AI for coding?" anymore. The question is: how do I use it without making a mess?
Jake Nations wrote about this in Vibe Coding Our Way to Disaster. His argument is based on Rich Hickey's ideas about simplicity vs. ease. The short version: vibe coding (just chatting with AI and letting it write whatever) is choosing ease over simplicity. It feels productive but creates tangled, complex systems.
VIBE CODING (ease) DISCIPLINED AI CODING (simplicity)
You: "make a login page" You: research auth flow in our codebase
AI writes 200 lines AI maps existing patterns
You: "it doesn't work, fix it" You: review research, plan approach
AI rewrites 150 lines AI creates implementation plan
You: "now add validation" You: review plan, approve
AI patches on top of patches AI implements following the plan
You: "why is everything broken?" You: review code, run tests
AI apologizes, rewrites again Working code that fits the codebase
Result: tangled mess of corrections Result: clean code that follows
buried in context existing patterns
The key problems with naive AI coding:
- Context complexity becomes code complexity. When you have long conversations with AI, corrections and clarifications pile up. The AI starts making connections between unrelated parts of the conversation. Your code becomes a reflection of that mess.
- AI amplifies your approach. If you rush to code without understanding the problem, AI helps you build the wrong thing faster. If you think first, AI becomes a powerful implementation tool.
- Most critical bugs come from misunderstanding the problem, not from implementation errors. This was true before AI, and it's even more true now when AI can generate hundreds of lines of code from a vague prompt.
- The Stanford study found that AI tools often lead to rework. Code shipped with AI one week gets rewritten next week. In large established codebases, AI can actually make developers less productive.
This isn't a reason to avoid AI tools. It's a reason to use them with discipline.
AI doesn't replace the need to understand your system. You still need to:
- Know how your codebase works before asking AI to change it
- Review generated code with the same rigor as human-written code
- Design systems that are simple, not just easy to generate
- Understand when the AI is wrong (and it'll be wrong sometimes)
As Dex Horthy (HumanLayer) puts it in 12-Factor Agents: the best production AI agents are "comprised of mostly just software." The LLM is a powerful component, but it's the engineering around it that makes it reliable.
Where you spend your human attention matters. A mistake at the research level cascades into everything below it.
/\
/ \ A bad line of RESEARCH
/ re \ = misunderstanding the codebase
/search\ = thousands of bad lines of code
/--------\
/ \ A bad line of a PLAN
/ plan \ = wrong approach
/ \ = hundreds of bad lines of code
/----------------\
/ \
/ implementation \ A bad line of CODE
/ \ = a bad line of code
/________________________\
HUMAN EFFORT GOES HERE ^^^
(review research and plans, not just code)
You need to be able to read the research AI produces and tell when it's wrong. You need to be able to look at a plan and spot the flaw. The human review at research and planning stages is the highest-leverage intervention in the whole process.
Before we talk about agents, it helps to understand what an LLM actually is.
An LLM is a function. You give it text, it gives you text back. That's it.
f(input_text) → output_text
There's no memory between calls. There's no hidden state. Every time you send a message, the model sees the entire conversation from scratch. What feels like a "conversation" is actually your client re-sending the full history every single time.
Call 1: f("What is 2+2?") → "4"
Call 2: f("What is 2+2?" + "4" + "Now multiply by 3") → "12"
Call 3: f("What is 2+2?" + "4" + "Now multiply by 3" + "12"
+ "What was the original number?") → "4"
The model didn't "remember" that the original number was 4. It saw the full conversation in the input and found the answer there. If you removed the earlier messages, it would have no idea.
This has practical consequences:
- Context is everything. The model only knows what you put in the input. If you don't include it, it doesn't exist.
- Longer conversations degrade. Every message adds tokens. At some point the input is so large that the model loses focus on what matters.
- You pay for every token, every time. The full conversation is re-sent on each call. A 50-message conversation means message 1 has been sent 50 times.
When you type a message in ChatGPT or Claude, it looks like a simple chat. Behind the scenes, the API call looks more like this:
┌─────────────────────────────────────────────────────────┐
│ API Call │
│ │
│ messages: [ │
│ { role: "system", content: "You are a helpful..." │
│ }, │
│ { role: "user", content: "What is 2+2?" │
│ }, │
│ { role: "assistant", content: "4" │
│ }, │
│ { role: "user", content: "Now multiply by 3" │
│ } │
│ ] │
│ │
│ → model reads ALL of this, generates the next response │
└─────────────────────────────────────────────────────────┘
The model doesn't have a session. It doesn't "know" it already answered the first question. It receives the entire list of messages and produces the next one. The chat interface is an illusion maintained by the client.
You may have heard that LLMs are "non-deterministic." This is half true. The randomness is a design choice, not a flaw.
At each step, the model predicts the probability of every possible next token. Temperature controls how it picks from those probabilities:
Prompt: "The capital of France is"
Token probabilities:
"Paris" → 92%
"Lyon" → 3%
"a" → 2%
"the" → 1%
...
Temperature = 0: Always picks "Paris" (highest probability)
Temperature = 0.7: Usually picks "Paris", sometimes surprises
Temperature = 1.0: More random, might pick "Lyon" or "a"
For coding tasks, lower temperature is almost always better. You want predictable, correct output, not creative variation. Most coding agents run at low temperature by default.
LLMs don't read characters or words. They read tokens. A token is roughly 3-4 characters in English, but it varies.
"Hello, world!" → ["Hello", ",", " world", "!"] = 4 tokens
"def fibonacci(n):" → ["def", " fibon", "acci", "(n", "):"] = 5 tokens
"東京" → ["東", "京"] = 2 tokens
This matters because:
- Context windows are measured in tokens. When Claude says 200k context, that's 200k tokens, not characters. Roughly 150k words, or about 500 pages of text.
- You pay per token. Both input and output. Reading a 5000-line file costs more than reading a 100-line file.
- Code is token-expensive. Variable names, syntax, and whitespace all consume tokens. A 200-line function might cost more tokens than a 200-word paragraph.
Everything in the rest of this article builds on these basics. If you remember one thing from this section: the LLM doesn't know anything you didn't tell it. Everything else follows from that.
The next sections cover how agents loop around this stateless function (How Agents Actually Work) and why managing context is the most important skill you can develop (Context Management Is Everything).
I did an internal presentation at my company about how to write good agents, based on 12-Factor Agents by Dex Horthy. I didn't take all 12 factors because many of them are about building agent frameworks, which isn't what most of us do day-to-day. We use agents, we don't build runtimes for them. Claude Code and Copilot control the runtime; we can partially control the tools and fully control the prompts.
At its core, every agent is just this:
┌─────────────────────────────────────────────┐
│ │
│ ┌─────────┐ ┌──────────┐ ┌────────┐ │
│ │ │ │ │ │ │ │
│ │ Context ├───>│ LLM ├───>│ Tool │ │
│ │ window │ │ (decide │ │ call │ │
│ │ │<───┤ next │ │(execute│ │
│ │ │ │ action) │ │ action)│ │
│ └─────────┘ └──────────┘ └───┬────┘ │
│ ^ │ │
│ │ result │ │
│ └─────────────────────────────┘ │
│ │
│ Repeat until "done" │
└─────────────────────────────────────────────┘
The problem: after many iterations, the context window fills up. The agent starts looping on the same broken approach. It forgets what it tried. Even as models support longer context, focused prompts always work better.
| Component | What it is | What you control |
|---|---|---|
| Prompt | Instructions for the LLM | Fully. You write it. |
| Context | Accumulated history of steps and results | Partially. You shape what goes in. |
| Tools | Actions the agent can take (read files, run commands, etc.) | Partially. You pick which tools are available. |
| Loop | Keep going until done | Partially. You define when to pause/stop. |
From the original 12 factors, these five are most relevant when you write prompts for coding assistants:
Your words become structured tool calls. You type "find auth code," and the LLM decides: call Grep with pattern "auth", then Glob on **/auth/**. Specific tool, specific parameters, structured as JSON. Then deterministic code picks that up and runs it.
What you type: What the agent actually does:
───────────── ─────────────────────────────
"/commit" → git status
→ git diff
→ git add <files>
→ git commit -m "..."
"find auth code" → Grep: "auth"
→ Glob: **/auth/**
→ LS: src/services/auth/
"explain the login flow" → Read: src/auth/login.ts
→ Read: src/auth/middleware.ts
→ Trace calls between files
Don't outsource your prompt engineering to a black box. Some frameworks hide the actual prompt behind abstractions like "role," "goal," and "personality." Fine for prototyping. Hard to tune for production. You want to see and modify the exact tokens going to the model.
With Claude Code you can't control the internal system prompt. But you can own CLAUDE.md, custom commands in .claude/commands/, agent definitions in .claude/agents/. That's your prompt surface area. Write it, version it, test it.
"Our library gives you the best output!" ... "SHOW ME THE PROMPT."
Everything is context engineering. As covered above, LLMs are stateless functions. The only thing that affects output quality is the quality of the input. The original factor goes further: you don't have to stick with standard message-based formats. You can structure context however you want, pack more signal into fewer tokens. Andrej Karpathy popularized the term "context engineering" for this. Fill the context window with the right information, not just more information.
Context window (200k tokens):
┌──────────────────────────────────────────────────────┐
│ [system prompt] [documents] [conversation] [tools] │
│ │
│ 40% used ✓ Good │░░░░░░░░░░░░░░░░░░░░░░░░░░░░│ │
│ 60% used ~ OK │████████████░░░░░░░░░░░░░░░░│ │
│ 80% used ✗ Danger │████████████████████░░░░░░░░│ │
│ 95% used ✗ Lost │████████████████████████████│ │
│ │
│ More noise = worse output │
│ Focused context = better output │
└──────────────────────────────────────────────────────┘
Build checkpoints into prompts so the agent knows when to stop and ask.
# From implement_plan.md
# (https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/implement_plan.md):
"Phase [N] Complete - Ready for Verification.
Automated checks passed:
- [x] Tests pass
- [x] Lint clean
Please perform manual verification:
- [ ] Feature works in UI
- [ ] No regressions
Let me know when complete so I can proceed to Phase [N+1]."
Instead of one big agent, create small agents that each do one specific thing.
BAD: One Universal Agent GOOD: Focused Micro Agents
┌─────────────────────────┐ ┌────────────────────┐
│ Universal Researcher │ │ codebase-locator │
│ │ │ Tools: Grep, │
│ Tools: ALL OF THEM │ │ Glob, LS │
│ │ │ Job: find files │
│ - Find files │ └────────────────────┘
│ - Analyze code │ ┌────────────────────┐
│ - Query database │ │ codebase-analyzer │
│ - Understand patterns │ │ Tools: Read, │
│ - Synthesize findings │ │ Grep, Glob, LS │
│ │ │ Job: explain code │
│ 50+ steps │ └────────────────────┘
│ Huge context │ ┌────────────────────┐
│ Gets lost │ │ web-researcher │
│ │ │ Tools: WebSearch, │
│ │ │ WebFetch, Read │
│ │ │ Job: find docs │
└─────────────────────────┘ └────────────────────┘
Each: 5-10 steps, stays focused
These patterns come from real prompt engineering experience. They're not in the 12 Factors.
Tell the agent what NOT to do. This prevents drift.
# Bad: only positive instructions
"Analyze the codebase and describe what you find."
# Good: positive + negative instructions
"Analyze the codebase and describe what you find.
DO NOT suggest improvements.
DO NOT perform root cause analysis.
DO NOT critique the implementation.
ONLY describe what exists, how it works, and how components interact."
Without negative instructions, the agent starts "helping": suggesting improvements, critiquing code, going off on tangents. With them, it stays focused. So be explicit about what you don't want.
Define exact format for consistent, parseable results.
# In codebase-analyzer.md
# (https://github.com/humanlayer/humanlayer/blob/main/.claude/agents/codebase-analyzer.md):
## Analysis: [Component Name]
### Overview
[2-3 sentence summary]
### Entry Points
- `file.ts:45` - description of what's there
### Core Implementation
#### 1. [Step name] (`file.ts:15-32`)
- What it does
- How it connects to the next step
### Data Flow
1. Request arrives at `api/routes.ts:45`
2. Routed to `handlers/webhook.ts:12`
3. Validated at `handlers/webhook.ts:15-32`
Without a template, every response looks different. But with a template, results are predictable and can be parsed by other agents.
Limit tools to limit what the agent CAN do. This is a physical constraint, not just instructions.
| Agent | Tools | What it CAN do | What it CANNOT do |
|---|---|---|---|
codebase-locator |
Grep, Glob, LS | Find files | Read file contents |
codebase-analyzer |
Read, Grep, Glob, LS | Read and analyze | Run commands, edit files |
web-researcher |
WebSearch, WebFetch, Read | Search the web | Modify local files |
If the agent doesn't have the Edit tool, it physically can't edit files. Not "please don't" but "literally impossible."
The orchestrator must understand context before delegating to sub-agents.
WRONG: RIGHT:
User asks question User asks question
│ │
├──> Spawn agent 1 ├──> READ mentioned files first
├──> Spawn agent 2 │ (understand the full context)
└──> Spawn agent 3 │
├──> Plan sub-tasks based on
Agents get vague tasks │ what you actually read
Results are unfocused │
├──> Spawn agent 1 (specific task)
├──> Spawn agent 2 (specific task)
└──> Spawn agent 3 (specific task)
Agents get precise tasks
Results are focused
Stop and ask instead of guessing. Five seconds to clarify saves hours of rework.
# From create_plan.md
# (https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/create_plan.md):
"If you encounter open questions during planning, STOP.
Research or ask for clarification immediately.
Do NOT write the plan with unresolved questions."
# From implement_plan.md
# (https://github.com/humanlayer/humanlayer/blob/main/.claude/commands/implement_plan.md):
"When things don't match the plan:
Issue in Phase [N]:
Expected: [what the plan says]
Found: [actual situation]
Why this matters: [explanation]
How should I proceed?"
Dex Horthy's Advanced Context Engineering for Coding Agents explains why context management is the most important skill for working with AI coding tools. The key insight: the context window is the only lever you have to affect the quality of the output.
Context window filling up:
[system prompt ████]
[user message ██]
[grep results ████████████████████] <-- searching for files
[file contents ████████████████████████] <-- reading code
[more grep ████████████] <-- more searching
[edit attempts ████████████████] <-- trial and error
[test output ████████████████████████████] <-- build logs
[error logs ████████████████] <-- debugging
[more edits ████████████████████] <-- fixes
════════════════════════════════════════════
Context: 87% full. Agent is lost.
It forgot the original goal 40 messages ago.
Design your entire workflow around context management. Keep utilization in the 40-60% range. Split work into roughly three phases (sometimes you skip research and go straight to planning, sometimes you do multiple research passes before you're ready):
Phase 1: RESEARCH Phase 2: PLAN Phase 3: IMPLEMENT
(fresh context) (fresh context) (fresh context)
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Input: │ │ Input: │ │ Input: │
│ - ticket/issue │ │ - research.md │ │ - plan.md │
│ - codebase │ │ - ticket/issue │ │ - codebase │
│ │ │ │ │ │
│ Agent searches, │ │ Agent creates │ │ Agent follows │
│ reads, maps the │ │ step-by-step │ │ plan phase by │
│ codebase │ │ implementation │ │ phase │
│ │ │ plan │ │ │
│ Output: │ │ Output: │ │ Output: │
│ research.md │ │ plan.md │ │ working code │
└────────┬────────┘ └────────┬────────┘ └────────┬────────┘
│ │ │
v v v
┌───────────┐ ┌───────────┐ ┌───────────┐
│ HUMAN │ │ HUMAN │ │ HUMAN │
│ REVIEW │ │ REVIEW │ │ REVIEW │
│ │ │ │ │ │
│ Is the │ │ Is the │ │ Does the │
│ research │ │ plan │ │ code │
│ correct? │ │ sound? │ │ work? │
└───────────┘ └───────────┘ └───────────┘
Highest leverage! High leverage! Standard review
Each phase starts with a fresh context window. The output of one phase becomes a compact input for the next. This is the core idea: instead of one long messy conversation, you have three focused sessions.
Sub-agents aren't about role-playing. They're about using a fresh context window for searching and summarizing, so the main agent stays clean.
Main Agent (orchestrator)
Context: 35% used
┌──────────────────────────────────────────────┐
│ system prompt + user question + sub-agent │
│ results (compact summaries) │
│░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░│
└──────────────────────────────────────────────┘
Sub-agent 1 (locator) Sub-agent 2 (analyzer) Sub-agent 3 (researcher)
Uses own context Uses own context Uses own context
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ 15 grep calls │ │ reads 8 files │ │ 5 web searches │
│ 10 glob calls │ │ traces 3 flows │ │ 3 page fetches │
│ 80% used │ │ 70% used │ │ 60% used │
└───────┬────────┘ └───────┬────────┘ └───────┬────────┘
│ │ │
v v v
Returns: 15 lines Returns: 40 lines Returns: 20 lines
(file locations) (code analysis) (documentation)
All that noise stays in sub-agent context.
Main agent only sees the compact summaries.
The prompts I reference throughout this article are from the humanlayer/humanlayer/.claude repository. You can look at the originals to understand the full picture. They are a good example of "prompts as code" that you can version control, test, and share.
| File | What it does | |
|---|---|---|
| Agents | codebase-analyzer.md | Reads and explains code |
| codebase-locator.md | Finds files (no Read tool!) | |
| codebase-pattern-finder.md | Finds code patterns | |
| web-search-researcher.md | Searches the web | |
| Commands | commit.md | Simple: analyze changes, commit |
| create_plan.md | Workflow: research, plan, iterate | |
| describe_pr.md | Simple: generate PR description | |
| implement_plan.md | Workflow: execute plan phase by phase | |
| iterate_plan.md | Workflow: update existing plans | |
| research_codebase.md | Orchestrator: spawn agents, synthesize |
Not all prompts are the same. Here's how they differ (see commit.md, implement_plan.md, research_codebase.md):
SIMPLE PROMPT (commit.md)
─────────────────────────
User: /commit
│
├─ git status + diff
├─ analyze changes
├─ present plan ─────── Human: "looks good" ──── execute commits
│
One task. Linear. One human checkpoint.
WORKFLOW PROMPT (implement_plan.md)
───────────────────────────────────
User: /implement plan.md
│
├─ read plan
├─ execute Phase 1
├─ update checkboxes
├─ ─── Human verifies Phase 1 ───
├─ execute Phase 2
├─ update checkboxes
├─ ─── Human verifies Phase 2 ───
└─ ... until all phases done
Sequential. Multiple human gates. Persistent state (plan file).
ORCHESTRATOR PROMPT (research_codebase.md)
──────────────────────────────────────────
User: /research "how does auth work?"
│
├─ READ mentioned files first
│
├─ Spawn sub-agents in parallel:
│ ├─ codebase-locator ──── finds files
│ ├─ codebase-analyzer ─── explains code
│ └─ web-researcher ────── finds docs
│
├─ WAIT for all sub-agents
│
└─ Synthesize into research document
Delegates work. Parallel execution. Synthesis focus.
| Type | Who does the work | Sub-agents | Human interaction |
|---|---|---|---|
| Simple | Agent directly | None | Confirm then execute |
| Workflow | Agent, phase by phase | Optional | Gates between phases |
| Orchestrator | Sub-agents | Core mechanism | Minimal (review synthesis) |
Rule of thumb: Start simple. Add workflow when you need human checkpoints between phases. Add orchestrator when you need parallel research.
You can't just tell the agent "use ticket NUMBER-123 and research." That's too vague. The agent won't know what to look for, what's important, or when to stop.
BAD GOOD
─── ────
"Research ticket ENG-1234" "Research the payment processing flow.
Focus on Stripe webhook handling.
I need to understand how payment
status gets updated in the database.
Relevant code: src/services/payments/
and src/api/webhooks/."
"Fix the bug" "/create_plan eng_1234.md
Think about the migration strategy.
We cannot have downtime.
Look at how we handled PR #456."
"Implement the feature" "/implement plan.md
Start with Phase 1 only.
Run tests after each change.
If something doesn't match the plan,
stop and tell me."
Every good prompt to an existing agent follows this structure:
┌──────────────────────────────────────────┐
│ 1. SCOPE: What exactly to work on │
│ "Research the payment processing │
│ flow in our codebase" │
│ │
│ 2. FOCUS: Where to look │
│ "Relevant code is probably in │
│ src/services/payments/" │
│ │
│ 3. CONTEXT: What matters and why │
│ "We need to understand this because │
│ we are migrating to Stripe v3" │
│ │
│ 4. BOUNDARIES: When to stop or ask │
│ "If you find more than 3 services │
│ involved, stop and tell me before │
│ going deeper" │
└──────────────────────────────────────────┘
The prompts in .claude/commands/ already have good structure built in (negative instructions, output templates, step-by-step strategies, human checkpoints). Your job is to give them specific context to work with, not vague directions.
Here's what makes the prompts in .claude/agents/ effective. Using codebase-analyzer.md as an example:
---
name: codebase-analyzer
tools: Read, Grep, Glob, LS # Limited tools = limited scope
model: sonnet # Cheaper model for focused tasks
---# Role (one sentence)
"You are a specialist at understanding HOW code works."
# Negative instructions (prevent drift)
"DO NOT suggest improvements"
"DO NOT critique the implementation"
"ONLY describe what exists"
# Step-by-step strategy (how to do the job)
Step 1: Read Entry Points
Step 2: Follow the Code Path
Step 3: Document Key Logic
# Output template (consistent format)
## Analysis: [Name]
### Overview
### Entry Points
- `file:line` - description
### Core Implementation
### Data Flow
# Closing reminder
"REMEMBER: You are a documentarian, not a critic."This structure works because each part prevents a specific failure mode:
- Limited tools prevent the agent from doing things outside its scope
- Negative instructions prevent it from drifting into "helpful" suggestions
- Step-by-step strategy prevents random, inconsistent analysis
- Output template prevents unparseable responses
- Closing reminder reinforces the constraints (LLMs pay attention to the end of prompts)
Some things are just hard to do without AI tools:
- Navigating unfamiliar codebases. Dex Horthy shipped a solo bug fix to BAML, a 300k LOC Rust codebase he'd never touched. Then he and Vaibhav paired for 7 hours and shipped 35k LOC (cancellation support + WASM compilation). The BAML team estimated each of those at 3-5 days for a senior engineer.
- Parallel research. You can spawn multiple focused agents to investigate different parts of the codebase at the same time. One finds files, another analyzes code, another checks the database schema. The orchestrator synthesizes everything.
- Once you have a good plan, the implementation phase is straightforward. The agent follows the spec, and the code style matches your existing codebase because the agent read it first. Consistent code generation without the usual drift.
- Onboarding. An intern at HumanLayer shipped 2 PRs on his first day and 10 on his 8th day. Research prompts let new team members get up to speed fast.
- And there's mental alignment. Instead of reading 2000 lines of code in a PR, you read 200 lines of a well-written implementation plan. You know what's being built and why.
These are real benefits. They don't make you 10x faster at everything. But they make some previously painful tasks easier.
This work is heavily based on and inspired by other people's work. I want to give proper credit.
12-Factor Agents by Dex Horthy (HumanLayer, YC24). The foundation for understanding how to build reliable AI agents. My article adapts 5 of the 12 factors for prompt engineering use. The original content is licensed under CC BY-SA 4.0.
Advanced Context Engineering for Coding Agents by Dex Horthy. The article about frequent intentional compaction and the research/plan/implement workflow.
We Mourn Our Craft by Nolan Lawson. An honest and emotional piece about accepting the AI shift in software development.
Vibe Coding Our Way to Disaster by Jake Nations. About the risks of unstructured AI coding, based on Rich Hickey's ideas about simplicity vs. ease.
Context Engineering - term popularized by Andrej Karpathy for the art of providing all the context needed for a task to be plausibly solvable by an LLM.
The .claude prompts referenced in this article are from humanlayer/humanlayer/.claude, created by Dex Horthy for use with Claude Code inside the CodeLayer IDE.
Specs Are the New Code by Sean Grove. The idea that specifications will become the real source code.
Stanford Study on AI's Impact on Developer Productivity - research showing that AI tools sometimes reduce productivity in established codebases.
Content in this repository is licensed under CC BY-SA 4.0, consistent with the 12-Factor Agents content license.
Code examples are licensed under Apache 2.0.