Files
windmill/HARNESS.md

2.7 KiB

HARNESS.md

Purpose

  • Define how Claude Code, Codex, and local LLM collaborate safely and reproducibly.
  • Prioritize harness quality (context, constraints, verification, logs) over prompt size.

Roles

  • Codex: task decomposition, design decisions, review criteria, documentation updates.
  • Claude Code: implementation, command execution, verification runs, patch application.
  • Local LLM: lightweight drafting, refactoring suggestions, alternative implementations.

Task Contract (required for each request)

Use exactly 3 lines:

  1. Purpose: what outcome is needed.
  2. Target: repo/file/environment.
  3. Done: objective completion criteria.

Operating Rules

  • One task at a time. Do not mix implementation and broad redesign in one request.
  • Keep prompts map-first: pass file paths and priorities, not long chat logs.
  • Prefer small reversible changes. Large changes must be split into staged commits.
  • Update docs in the same task when behavior/spec changes.

Context Policy

  • Never inject full conversation history.
  • Provide only:
    • current task contract,
    • relevant file list,
    • latest failing command + first 5 and last 5 log lines,
    • current assumptions.
  • If context exceeds budget, summarize into a short state snapshot and continue.

Safety Rails

  • Destructive operations require explicit user approval.
  • Never expose secrets in prompts, logs, or commits.
  • Limit edits to task-related paths.
  • If unexpected unrelated file changes are detected, stop and confirm with user.

Verification Policy

Minimum verification per change:

  • L1: static checks (format/lint/type).
  • L2: focused tests for touched behavior.
  • L3: scenario/regression checks for user-facing flows.

A task is not complete unless verification commands and results are recorded.

Quality Gates

  • Build/test/lint must pass (or failures explicitly accepted by user).
  • Architecture constraints must pass (dependency direction, layering, naming).
  • No silent behavior changes without test or doc updates.

Logging and Traceability

For each task, record:

  • task_id
  • input contract
  • changed files
  • verification commands and results
  • follow-up risks

Store task records under state/tasks/ as markdown or JSON.

Golden Dataset Loop

  • Capture recurring failures into tests/golden/.
  • Add a reproduction case before fixing when possible.
  • Re-run golden set on relevant changes.

Maintenance Cadence

  • Weekly: prune stale rules/context and merge duplicates.
  • Monthly: review failure patterns and update guardrails.
  • After incidents: add regression tests and update this harness.

Escalation Rules

Escalate to user when:

  • requirement is ambiguous and risky,
  • security/compliance impact exists,
  • production-impacting changes are needed,
  • verification cannot be executed locally.