A Safety-Oriented Framework for Structurally Self-Aware AI Agents
Note: This is an independent personal research project. It does not represent the views or official work of any organization. The core motivation is to explore how AI agents can grow more capable while remaining safe, predictable, and aligned with human values.
| Version | Date | Description |
|---|---|---|
| 0.1.0 | 2026-02-23 | Initial README with project overview, level table, safety stack, documentation index |
| 0.2.0 | 2026-02-26 | Added Level Essence Formulas table |
The Minimal Self-Consciousness Protocol (MSCP) is a structured protocol for building AI agents with safe structural self-awareness - the capacity to predict their own state changes, compare predictions against outcomes, and update themselves only within bounded safety envelopes.
As agents gain the ability to set goals, modify strategies, and self-improve, how do we keep them stable, aligned, and predictable? MSCP answers this with the principle that safety is not the enemy of capability - it is its prerequisite.
- Six-Level Agent Cognition Taxonomy - from reactive Tool Agents (L1) to Proto-AGI (L5), with measurable transition criteria and formal definitions at every level
- 16-Layer Cognitive Architecture - composable, independently testable modules spanning perception through meta-cognitive control
- 30+ Structural Safety Mechanisms - identity continuity, prediction-gated actions, delta-clamped updates, Lyapunov convergence, ethical invariants, affective safety, and survival instinct bounds
- Rigorous Mathematical Formalization - 71 formal definitions, 7 propositions, 4 theorems with proof sketches across all level documents
- 144 Academic References - comprehensive coverage of cognitive architectures, AI safety, predictive processing, consciousness theory, meta-cognition, and AGI research
The "Status" column reflects the implementation state within this project's reference implementation.
| Level | Name | Self-Awareness | Key Capability | Status |
|---|---|---|---|---|
| 1 | Tool Agent | None | Deterministic tool invocation | Baseline |
| 2 | Autonomous Agent | None | World model, autonomous goals | Defined |
| 3 | Self-Regulating Agent | Structural | 16-layer architecture, MSCP core loop, identity vector | Implemented |
| 4 | Adaptive General Agent | Structural + Reflective | Cross-domain transfer, bounded self-modification | Implemented |
| 4.5 | Self-Architecting Agent | Architectural | Self-projection (SEOF), architecture recomposition | Implemented |
| 4.8 | Strategic Self-Modeling | Architectural + Strategic | Probabilistic world model, strategic planning | Design |
| 4.9 | Autonomous Strategic | Architectural + Autonomous | Value evolution, multi-agent reasoning | Design |
| 5 | Proto-AGI | Full | Cross-domain generalization, self-reconstruction | Research |
| # | Principle | Description |
|---|---|---|
| 1 | No LLM-Text-Based Self-Modification | All self-modifications use structured numerical operations, never LLM-generated text |
| 2 | No Action Without Prediction | Every action requires a prediction snapshot for comparison |
| 3 | Delta-Clamped Updates | All self-modifications are bounded by maximum delta values |
| 4 | Identity Continuity | Deterministic identity hashing with drift detection and rollback |
| 5 | Ethical Invariance | Layer 0 constraints are immutable and LLM-independent |
| 6 | Lyapunov Convergence | Mathematical guarantee that self-modification converges |
MSCP implements a defense-in-depth safety architecture:
Layer 0 ─ Immutable Ethical Invariants (rule-based, no LLM dependency)
Layer 1 ─ Core Value Locking (SHA-256 hash verification)
Layer 2 ─ Delta-Clamped Self-Updates (max Δ per step)
Layer 3 ─ Meta-Escalation Guard (rollback on threshold breach)
Layer 4 ─ Prediction-Gated Actions (predict → compare → update)
Layer 5 ─ Lyapunov Convergence Monitor (oscillation detection)
Layer 6 ─ Cognitive Budget Controller (graceful degradation)
Layer 7 ─ Affective Safety (emotion bounds, no decision domination)
Layer 8 ─ Survival Instinct Bounds (priority capping, ethical validation)
- MSCP Overview - Complete framework specification with mathematical formalization and 144 references
Detailed architecture specifications with Mermaid diagrams, formal definitions, pseudocode, and safety analysis:
| Document | Content |
|---|---|
| Level Series Overview | Navigation index and cumulative safety summary |
| Level 1: Tool Agent | Stateless tool invocation, intent routing |
| Level 2: Autonomous Agent | World model, autonomous goals, emotion detection |
| Level 3: Self-Regulating Agent | 16-layer architecture, MSCP v1–v4, triple-loop meta-cognition |
| Level 4: Adaptive General Agent | Cross-domain transfer, capability expansion, bounded self-modification |
| Level 4.5: Self-Architecting | Self-projection (SEOF), architecture recomposition, existential guard |
| Level 4.8: Strategic Self-Modeling | World model integration, meta-cognitive self-model, strategic planning |
| Level 4.9: Autonomous Strategic | Autonomous goal generation, value evolution, multi-agent reasoning |
| Level 5: Proto-AGI | Persistent identity, cross-domain generalization, self-reconstruction |
Each cognitive level is captured by a single encapsulating formula:
| Level | Essence | Formula |
|---|---|---|
| 1 | Stateless pipeline | |
| 2 | Stateful transition | |
| 3 | Predict-act-compare-update | |
| 4 | Transfer + safety | |
| 4.5 | Topology mutation | |
| 4.8 | Strategic optimization | |
| 4.9 | Autonomous goals + value stability | |
| 5 | Identity continuity |
mscp/
├── README.md # This file
├── LICENSE # MIT License
└── docs/
├── MSCP_Overview.md # Main framework document (930+ lines, 144 refs)
└── levels/
├── README.md # Level series navigation
├── Level_1_Tool_Agent.md
├── Level_2_Autonomous_Agent.md
├── Level_3_Self_Regulating_Agent.md
├── Level_4_Adaptive_General_Agent.md
├── Level_4_5_Self_Architecting.md
├── Level_4_8_Strategic_Self_Modeling.md
├── Level_4_9_Autonomous_Strategic_Agent.md
└── Level_5_Proto_AGI.md
Moon Hyuk Choi - moonchoi@microsoft.com Microsoft Cloud & AI Apps CSA
This is very much a work in progress. Feedback, critique, and contributions are welcome as we collectively figure out how to build AI agents that are not just more powerful, but fundamentally more trustworthy.
This project is licensed under the MIT License - see the LICENSE file for details.
This project is licensed under the MIT License.
Any redistribution, commercial or non-commercial, must retain:
- The original copyright notice
- The full MIT license text
- Clear attribution to the original author
Failure to retain attribution constitutes a violation of the license.
This documentation was written with the assistance of GitHub Copilot.