78 lines
2.7 KiB
Markdown
78 lines
2.7 KiB
Markdown
# HARNESS.md
|
|
|
|
## Purpose
|
|
- Define how Claude Code, Codex, and local LLM collaborate safely and reproducibly.
|
|
- Prioritize harness quality (context, constraints, verification, logs) over prompt size.
|
|
|
|
## Roles
|
|
- Codex: task decomposition, design decisions, review criteria, documentation updates.
|
|
- Claude Code: implementation, command execution, verification runs, patch application.
|
|
- Local LLM: lightweight drafting, refactoring suggestions, alternative implementations.
|
|
|
|
## Task Contract (required for each request)
|
|
Use exactly 3 lines:
|
|
1. Purpose: what outcome is needed.
|
|
2. Target: repo/file/environment.
|
|
3. Done: objective completion criteria.
|
|
|
|
## Operating Rules
|
|
- One task at a time. Do not mix implementation and broad redesign in one request.
|
|
- Keep prompts map-first: pass file paths and priorities, not long chat logs.
|
|
- Prefer small reversible changes. Large changes must be split into staged commits.
|
|
- Update docs in the same task when behavior/spec changes.
|
|
|
|
## Context Policy
|
|
- Never inject full conversation history.
|
|
- Provide only:
|
|
- current task contract,
|
|
- relevant file list,
|
|
- latest failing command + first 5 and last 5 log lines,
|
|
- current assumptions.
|
|
- If context exceeds budget, summarize into a short state snapshot and continue.
|
|
|
|
## Safety Rails
|
|
- Destructive operations require explicit user approval.
|
|
- Never expose secrets in prompts, logs, or commits.
|
|
- Limit edits to task-related paths.
|
|
- If unexpected unrelated file changes are detected, stop and confirm with user.
|
|
|
|
## Verification Policy
|
|
Minimum verification per change:
|
|
- L1: static checks (format/lint/type).
|
|
- L2: focused tests for touched behavior.
|
|
- L3: scenario/regression checks for user-facing flows.
|
|
|
|
A task is not complete unless verification commands and results are recorded.
|
|
|
|
## Quality Gates
|
|
- Build/test/lint must pass (or failures explicitly accepted by user).
|
|
- Architecture constraints must pass (dependency direction, layering, naming).
|
|
- No silent behavior changes without test or doc updates.
|
|
|
|
## Logging and Traceability
|
|
For each task, record:
|
|
- task_id
|
|
- input contract
|
|
- changed files
|
|
- verification commands and results
|
|
- follow-up risks
|
|
|
|
Store task records under `state/tasks/` as markdown or JSON.
|
|
|
|
## Golden Dataset Loop
|
|
- Capture recurring failures into `tests/golden/`.
|
|
- Add a reproduction case before fixing when possible.
|
|
- Re-run golden set on relevant changes.
|
|
|
|
## Maintenance Cadence
|
|
- Weekly: prune stale rules/context and merge duplicates.
|
|
- Monthly: review failure patterns and update guardrails.
|
|
- After incidents: add regression tests and update this harness.
|
|
|
|
## Escalation Rules
|
|
Escalate to user when:
|
|
- requirement is ambiguous and risky,
|
|
- security/compliance impact exists,
|
|
- production-impacting changes are needed,
|
|
- verification cannot be executed locally.
|