Evidence-Bound Development Workflow
Agent-driven development with enforced quality gates, adversarial reviews, and telemetry as a first-class concern.
How to Start Working
Say “next task” — Claude handles the rest:
You: "next task"
Claude: "Next task: [ARCH-2] Decompose execute_ask(). Confirm, or pick another?"
You: "confirm"
Claude: → pulls main → creates branch → spawns researcher → presents brief
You: "looks good"
Claude: → spawns coder → TDD implementation → verifier → skeptic → PRThat’s it. One prompt to start, one confirmation to approve the approach. Claude orchestrates all 6 subagents automatically.
Architecture: Agents + Hooks + Commands
Subagents (.claude/agents/) — Isolated, Model-Appropriate
Main Conversation (opus) — talks to you, picks tasks, spawns agents
│
├── researcher (sonnet) ← "what should we build?"
├── coder (opus) ← "build it with TDD"
├── eval-writer (sonnet) ← "write the eval first" (LLM/retrieval only)
├── verifier (haiku) ← "did it pass?"
├── skeptic (sonnet) ← "is it safe?"
└── doc-sync (haiku) ← "are docs current?"| Agent | Model | What it does | When |
|---|---|---|---|
| researcher | sonnet | Gathers context, patterns, risks. Returns structured brief. | Before any FR/NFR (NON-NEGOTIABLE) |
| coder | opus | Implements with TDD. Follows project invariants. Returns files + tests. | After research brief approved |
| eval-writer | sonnet | Writes failing eval before AI behavior changes. | When touching retrieval/evidence/policy/verification |
| verifier | haiku | Runs lint + types + tests + telemetry grep. Returns pass/fail table. | After implementation |
| skeptic | sonnet | Adversarial review: AI failures, data leakage, security, telemetry audit. | Before PR merge (NON-NEGOTIABLE) |
| doc-sync | haiku | Checks if docs are stale after code changes. | Before PR, parallel with skeptic |
Hooks (.claude/settings.json) — Automatic, Can’t Skip
| Hook | Trigger | Blocks? |
|---|---|---|
| Branch protection | git push to main | Yes — use feature branches |
| Pre-commit gates | git commit | Yes — ruff + mypy + pytest must pass |
| Pre-push adversarial scan | git push | Yes — agent scans diff for security issues |
| DB safety | DELETE, alembic | Yes — requires explicit approval |
| Post-edit lint | Edit/Write .py | No — informational |
Commands (.claude/commands/) — Lightweight Utilities
| Command | What it does |
|---|---|
/wsstatus | Quick STATUS.md update |
/wsmistake | Log a mistake to CLAUDE.md |
The Flow
HOW A FEATURE GETS BUILT
─────────────────────────
1. git checkout -b feat/TASK-ID-description
┌─────────────────────────────────┐
│ researcher (sonnet) │ ← returns structured brief
│ • reads REQUIREMENTS.md │
│ • finds patterns in codebase │
│ • identifies risks + invariants│
└─────────────────────────────────┘
│
You approve the approach
│
┌──────────────────────────────────────────┐
│ coder (opus) │
│ • receives research brief │
│ • writes test first (RED) │
│ • implements (GREEN) │
│ • spawns eval-writer if LLM code │
│ • returns files changed + test results │
└──────────────────────────────────────────┘
│
┌─────────────────────────────────┐
│ verifier (haiku) │ ← lint + types + tests + telemetry check
└─────────────────────────────────┘
│
┌──────────────────────────────────────────┐
│ skeptic (sonnet) ← adversarial review │
│ doc-sync (haiku) ← doc drift check │ IN PARALLEL
└──────────────────────────────────────────┘
│
git commit ← [hook: ruff + mypy + pytest BLOCK on fail]
git push ← [hook: blocks main + adversarial scan]
gh pr create ← PR to main
merge ← auto-deploys API + Web + DocsGit Flow
main is protected. All changes go through feature branches + PRs.
main (production — no direct push)
↑
PR required, skeptic review must pass
│
feat/TASK-ID-description (where work happens)
↑
git checkout -b feat/ARCH-2-decompose-askBranch naming: feat/TASK-ID-desc, fix/TASK-ID-desc, chore/desc
Auto-deploy on merge to main:
- API → Azure Container Apps (via GitHub Actions)
- Frontend → Vercel
- Docs → knowledge.bound.legal (Nextra on Vercel)
Telemetry Invariant
Telemetry is a first-class concern, enforced at 3 levels:
| Level | Agent/Hook | What it checks |
|---|---|---|
| Implementation | coder | Every LLM call uses traced_llm_call(). Every request calls record_telemetry(). All @_observe have capture_input=False. |
| Verification | verifier | Greps for raw httpx.post/httpx.get calls that bypass telemetry wrapper. |
| Review | skeptic | Reviews for missing @_observe decorators, missing record_telemetry(), PII in logs. |
Every LLM call → traced_llm_call() wrapper
Every request → record_telemetry() (including refusals)
Every @observe → capture_input=False, capture_output=False (PII safety)Published Documentation — knowledge.bound.legal
Source of truth: docs/*.md in the repo. Published via Nextra on Vercel.
docs/*.md (edit these)
↓ sync-docs.sh (copies on build)
apps/docs/content/*.mdx (gitignored, generated)
↓ Nextra v4
knowledge.bound.legal (static site)To update: Edit docs/, commit, push. Auto-deploys.
Local preview: cd apps/docs && npm run dev
Key Files
| File | Purpose |
|---|---|
STATUS.md | Current phase, Now/Next/Done |
REQUIREMENTS.md | FRs/NFRs with acceptance criteria |
CLAUDE.md | AI assistant rules + auto-trigger protocol |
CHECKPOINT.md | Autonomous work log |
.claude/agents/ | 6 subagent definitions |
.claude/commands/ | 2 utility commands |
.claude/settings.json | 5 enforced hooks |
Autonomous Work Mode
When user says “work on this, I’ll check back”:
- Read STATUS.md → identify tasks
- For each task: researcher → coder → verifier → skeptic → doc-sync
- Log to CHECKPOINT.md after each task
- Stop conditions:
- Test failures after 2 fix attempts
- Need to modify
policy.pyorevidence.py - Architecture decision needed
- Ambiguous requirement
NON-NEGOTIABLE Rules
| Rule | Enforcement |
|---|---|
researcher before implementation | CLAUDE.md auto-trigger |
skeptic before PR merge | CLAUDE.md auto-trigger |
| Tests pass before commit | Pre-commit hook (blocks) |
| No direct push to main | Branch protection hook (blocks) |
| Telemetry on all LLM calls | verifier grep + skeptic audit |
| No PII in logs | skeptic audit |