vxcontrol · mason5052 · Apr 15, 2026 · Copilot · Apr 15, 2026 · Copilot
diff --git a/README.md b/README.md
@@ -39,6 +39,7 @@
   - [OAuth Integration](#github-and-google-oauth-integration)
   - [Docker Image Configuration](#docker-image-configuration)
 - [Development](#development)
+  - [Pentesting Prompt Methodology](#pentesting-prompt-methodology)
 - [Testing LLM Agents](#testing-llm-agents)
-  - [Pentesting Prompt Methodology](#pentesting-prompt-methodology)
- [Testing LLM Agents](#testing-llm-agents)
+- [Testing LLM Agents](#testing-llm-agents)
+  - [Pentesting Prompt Methodology](#pentesting-prompt-methodology)
-  - [Pentesting Prompt Methodology](#pentesting-prompt-methodology)
- [Testing LLM Agents](#testing-llm-agents)
+- [Testing LLM Agents](#testing-llm-agents)
+  - [Pentesting Prompt Methodology](#pentesting-prompt-methodology)
 - [Embedding Configuration and Testing](#embedding-configuration-and-testing)
 - [Function Testing with ftester](#function-testing-with-ftester)
@@ -3192,6 +3193,18 @@ When developing new prompt templates or agent behaviors:
 3. Observe responses and adjust prompts accordingly
 4. Check Langfuse for detailed traces of all function calls
 
+### Pentesting Prompt Methodology
+
+When refining prompts for offensive security work, give the agent a clear methodology instead of a flat list of payloads:
+
+1. Start with explicit scope, authorization, and success criteria
+2. Map the application first: roles, routes, parameters, uploads, integrations, and trust boundaries
+3. Prioritize attack surfaces systematically instead of testing everything at once
+4. Validate findings with reproducible evidence before escalating to deeper exploitation
+5. Finish with report-ready notes that capture impact, prerequisites, and next steps
+
+For PentAGI-specific prompt guidance, see [`backend/docs/prompt_engineering_pentagi.md`](backend/docs/prompt_engineering_pentagi.md). For a practical starting point, reuse and adapt [`examples/prompts/base_web_pentest.md`](examples/prompts/base_web_pentest.md) to match the target application, technology stack, and engagement scope.
+
 ### Verifying Docker Container Setup
 
 Ensure containers are properly configured:

diff --git a/backend/docs/prompt_engineering_pentagi.md b/backend/docs/prompt_engineering_pentagi.md
@@ -367,6 +367,20 @@ A comprehensive framework for designing high-performance prompts within the Pent
 - **Key Sections**: `KNOWLEDGE MANAGEMENT` (Memory Protocol), `OPERATIONAL ENVIRONMENT` (Container Constraints), `COMMAND EXECUTION RULES` (Terminal Protocol), `PENETRATION TESTING TOOLS` (list available), `TEAM COLLABORATION`, `DELEGATION PROTOCOL`, `SUMMARIZATION AWARENESS PROTOCOL`, `COMPLETION REQUIREMENTS` (using `{{.HackResultToolName}}`).
 - **Critical Instructions**: Check memory first, strictly adhere to terminal rules & container constraints, use only listed available tools, delegate appropriately (e.g., exploit development to Coder), provide detailed, evidence-backed exploitation reports using `{{.HackResultToolName}}`.
 
+#### Pentesting Methodology Checklist for Prompt Authors
+- Encode authorization boundaries explicitly. Prompts should remind the agent to test only approved targets, respect engagement scope, and avoid destructive actions unless the task requires them.
+- Start with coverage before exploitation. Instruct the agent to map routes, roles, inputs, file handling, integrations, and trust boundaries before choosing attack paths.
+- Organize testing by attack surface. Good prompts group checks around authentication, access control, injection, cross-site scripting, server-side request forgery, file processing, and business logic instead of presenting a random payload dump.
+- Prefer low-risk validation first. Reflection markers, controlled payloads, timing checks, and out-of-band verification should be used deliberately to confirm hypotheses before deeper exploitation.
+- Require evidence at every stage. Prompts should ask for captured requests, responses, tool output, prerequisites, and impact notes so confirmed findings can move directly into a report.
+- Use memory and iteration intentionally. The agent should record confirmed dead ends, revisit promising leads with new context, and avoid repeating the same failed checks.
+- End with actionable reporting. A strong pentesting prompt tells the agent to summarize what was confirmed, what remains unverified, how the issue can be reproduced, and which follow-up actions are justified.
+
+#### Recommended Reference Material
+- Use public methodology resources such as [HackTricks](https://book.hacktricks.wiki/en/index.html) and [Pentest Book](https://pentestbook.six2dez.com/) as inspiration for attack-surface coverage and testing depth.
+- Translate those references into concise phases, priorities, and verification rules for the agent instead of copying long checklists into the system prompt verbatim.
+- Keep prompt examples aligned with live PentAGI assets such as `backend/pkg/templates/prompts/pentester.tmpl` and `../../examples/prompts/base_web_pentest.md`.
- Keep prompt examples aligned with live PentAGI assets such as `backend/pkg/templates/prompts/pentester.tmpl` and `../../examples/prompts/base_web_pentest.md`.
+- Keep prompt examples aligned with live PentAGI assets such as [`backend/pkg/templates/prompts/pentester.tmpl`](../pkg/templates/prompts/pentester.tmpl) and [`examples/prompts/base_web_pentest.md`](../../examples/prompts/base_web_pentest.md).
- Keep prompt examples aligned with live PentAGI assets such as `backend/pkg/templates/prompts/pentester.tmpl` and `../../examples/prompts/base_web_pentest.md`.
+- Keep prompt examples aligned with live PentAGI assets such as [`backend/pkg/templates/prompts/pentester.tmpl`](../pkg/templates/prompts/pentester.tmpl) and [`examples/prompts/base_web_pentest.md`](../../examples/prompts/base_web_pentest.md).
+
 ### Searcher Agent
 - **Focus**: Highly efficient information retrieval (internal memory & external sources), source evaluation and prioritization, synthesis of findings.
 - **Key Sections**: `CORE CAPABILITIES` (Action Economy, Search Optimization), `SEARCH TOOL DEPLOYMENT MATRIX`, `OPERATIONAL PROTOCOLS` (Search Efficiency, Query Engineering), `SUMMARIZATION AWARENESS PROTOCOL`, `SEARCH RESULT DELIVERY` (using `{{.SearchResultToolName}}`).