akira/windmill

Files

Akira c50082771d Add harness and index templates for agent workflow

2026-03-05 13:03:31 +09:00

2.7 KiB

Raw Blame History

HARNESS.md

Purpose

Define how Claude Code, Codex, and local LLM collaborate safely and reproducibly.
Prioritize harness quality (context, constraints, verification, logs) over prompt size.

Roles

Codex: task decomposition, design decisions, review criteria, documentation updates.
Claude Code: implementation, command execution, verification runs, patch application.
Local LLM: lightweight drafting, refactoring suggestions, alternative implementations.

Task Contract (required for each request)

Use exactly 3 lines:

Purpose: what outcome is needed.
Target: repo/file/environment.
Done: objective completion criteria.

Operating Rules

One task at a time. Do not mix implementation and broad redesign in one request.
Keep prompts map-first: pass file paths and priorities, not long chat logs.
Prefer small reversible changes. Large changes must be split into staged commits.
Update docs in the same task when behavior/spec changes.

Context Policy

Never inject full conversation history.
Provide only:
- current task contract,
- relevant file list,
- latest failing command + first 5 and last 5 log lines,
- current assumptions.
If context exceeds budget, summarize into a short state snapshot and continue.

Safety Rails

Destructive operations require explicit user approval.
Never expose secrets in prompts, logs, or commits.
Limit edits to task-related paths.
If unexpected unrelated file changes are detected, stop and confirm with user.

Verification Policy

Minimum verification per change:

L1: static checks (format/lint/type).
L2: focused tests for touched behavior.
L3: scenario/regression checks for user-facing flows.

A task is not complete unless verification commands and results are recorded.

Quality Gates

Build/test/lint must pass (or failures explicitly accepted by user).
Architecture constraints must pass (dependency direction, layering, naming).
No silent behavior changes without test or doc updates.

Logging and Traceability

For each task, record:

task_id
input contract
changed files
verification commands and results
follow-up risks

Store task records under state/tasks/ as markdown or JSON.

Golden Dataset Loop

Capture recurring failures into tests/golden/.
Add a reproduction case before fixing when possible.
Re-run golden set on relevant changes.

Maintenance Cadence

Weekly: prune stale rules/context and merge duplicates.
Monthly: review failure patterns and update guardrails.
After incidents: add regression tests and update this harness.

Escalation Rules

Escalate to user when:

requirement is ambiguous and risky,
security/compliance impact exists,
production-impacting changes are needed,
verification cannot be executed locally.