# HARNESS.md ## Purpose - Define how Claude Code, Codex, and local LLM collaborate safely and reproducibly. - Prioritize harness quality (context, constraints, verification, logs) over prompt size. ## Roles - Codex: task decomposition, design decisions, review criteria, documentation updates. - Claude Code: implementation, command execution, verification runs, patch application. - Local LLM: lightweight drafting, refactoring suggestions, alternative implementations. ## Task Contract (required for each request) Use exactly 3 lines: 1. Purpose: what outcome is needed. 2. Target: repo/file/environment. 3. Done: objective completion criteria. ## Operating Rules - One task at a time. Do not mix implementation and broad redesign in one request. - Keep prompts map-first: pass file paths and priorities, not long chat logs. - Prefer small reversible changes. Large changes must be split into staged commits. - Update docs in the same task when behavior/spec changes. ## Context Policy - Never inject full conversation history. - Provide only: - current task contract, - relevant file list, - latest failing command + first 5 and last 5 log lines, - current assumptions. - If context exceeds budget, summarize into a short state snapshot and continue. ## Safety Rails - Destructive operations require explicit user approval. - Never expose secrets in prompts, logs, or commits. - Limit edits to task-related paths. - If unexpected unrelated file changes are detected, stop and confirm with user. ## Verification Policy Minimum verification per change: - L1: static checks (format/lint/type). - L2: focused tests for touched behavior. - L3: scenario/regression checks for user-facing flows. A task is not complete unless verification commands and results are recorded. ## Quality Gates - Build/test/lint must pass (or failures explicitly accepted by user). - Architecture constraints must pass (dependency direction, layering, naming). - No silent behavior changes without test or doc updates. ## Logging and Traceability For each task, record: - task_id - input contract - changed files - verification commands and results - follow-up risks Store task records under `state/tasks/` as markdown or JSON. ## Golden Dataset Loop - Capture recurring failures into `tests/golden/`. - Add a reproduction case before fixing when possible. - Re-run golden set on relevant changes. ## Maintenance Cadence - Weekly: prune stale rules/context and merge duplicates. - Monthly: review failure patterns and update guardrails. - After incidents: add regression tests and update this harness. ## Escalation Rules Escalate to user when: - requirement is ambiguous and risky, - security/compliance impact exists, - production-impacting changes are needed, - verification cannot be executed locally.