Skip to Content
Development Workflow

Evidence-Bound Development Workflow

A structured approach to AI-assisted development with built-in quality gates, adversarial reviews, and autonomous work tracking.


Overview

This project uses a command-driven workflow system that enforces:

  • TDD (Test-Driven Development) — tests before code
  • Research before implementation — understand patterns first
  • Adversarial reviews — AI-specific failure mode checks
  • Phased delivery — FRs/NFRs organized into ship milestones
  • Autonomous work logging — session tracking in CHECKPOINT.md

Workflow Commands

Skills (user-invoked, require judgment)

CommandRoleWhen to Use
/wsorchestrateProject ManagerStarting a session, picking work, batching FRs
/wsresearchInvestigatorBefore coding, gather context and patterns
/wsstartDeveloperPlan + implement with TDD
/wsverifyQARun lint, types, tests (also enforced by pre-commit hook)
/wsskepticSecurity AuditorAdversarial review for AI failure modes
/wseddEval EngineerWrite failing eval before retrieval/LLM changes
/wsredteamRed TeamFull adversarial attack suite (6 vectors). For major features/pre-release. Lightweight version runs automatically on every push via hook.
/wsdocsTechnical WriterCheck which docs/diagrams need updating after changes
/wsstatusReporterUpdate STATUS.md
/wsmistakeHistorianDocument mistakes for future reference

Hooks (enforced automatically, can’t be skipped)

Configured in .claude/settings.json. These fire on every matching tool call:

HookTriggerWhat it does
Pre-commit gatesgit commitRuns ruff + mypy --strict + pytest. Blocks commit on failure.
Pre-push adversarial scangit pushAgent scans diff for: missing tenant_id, raw LLM calls, PII in logs, unauthed endpoints, hardcoded secrets. Blocks on failure.
DB safetyDELETE, alembicPrompt hook blocks destructive DB commands without approval
Post-edit lintEdit/Write .pyAuto-runs ruff after every Python file change

Published Documentation — knowledge.bound.legal

Docs are published to knowledge.bound.legal via Nextra (Next.js docs framework) on Vercel.

How it works:

  • Source of truth: docs/*.md in the repo (edit these, not the site)
  • apps/docs/scripts/sync-docs.sh copies docs/*.mdapps/docs/content/*.mdx
  • Nextra renders them as a searchable, navigable docs site
  • Vercel auto-deploys on push to main when docs/ or apps/docs/ changes

To update published docs:

  1. Edit files in docs/ (never edit apps/docs/content/*.mdx directly)
  2. Run /wsdocs to check what else needs updating
  3. Commit and push — site auto-deploys

Local preview:

cd apps/docs && npm run dev # Opens http://localhost:3000

Typical Development Session

┌─────────────────────────────────────────────────────────────────┐ │ SESSION START │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ /wsorchestrate │ │ ───────────────── │ │ • Reads STATUS.md, REQUIREMENTS.md, CHECKPOINT.md │ │ • Identifies current phase and available work │ │ • Creates session plan with batched FRs │ │ • Waits for approval before proceeding │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ FOR EACH FR/NFR IN BATCH: │ │ ═══════════════════════════ │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ /wsresearch │ → │ /wsstart │ → │ /wsverify │ │ │ │ │ │ │ │ │ │ │ │ • Patterns │ │ • TDD cycle │ │ • ruff │ │ │ │ • Similar │ │ • RED→GREEN │ │ • mypy │ │ │ │ code │ │ • Implement │ │ • pytest │ │ │ │ • Risks │ │ │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────┐ │ │ │ /wsskeptic │ │ │ │ │ │ │ │ • Failure │ │ │ │ modes │ │ │ │ • Data leak │ │ │ │ • Citations │ │ │ └──────────────┘ │ │ │ │ │ ▼ │ │ Log to CHECKPOINT.md │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ SESSION END │ │ ─────────── │ │ • /wsstatus → Update STATUS.md │ │ • /wscommit → Commit with (FR-XXX) reference, push, PR │ └─────────────────────────────────────────────────────────────────┘

Command Details

/wsorchestrate — Project Manager

Purpose: Route work to the right workflows, manage phases, batch related FRs.

Input triggers:

  • “Let’s work on Phase 7”
  • “Implement FR-011, FR-014”
  • “What should I work on?”
  • “Continue where we left off”

Protocol:

  1. Assess state — Read STATUS.md, REQUIREMENTS.md, CHECKPOINT.md
  2. Determine scope — Validate phase, check dependencies, batch FRs
  3. Create plan — Present numbered plan with batches
  4. Wait for approval — Don’t proceed without “Y”
  5. Execute — Route through research → start → verify → skeptic
  6. Log — Update CHECKPOINT.md after each FR

Phase Rules:

PhaseContentsPrerequisite
1. Core RAGFR-010–025None
2. Citations UIFR-030–032Phase 1
3. Multi-tenancyFR-001–004Phase 2
4. Provider AbstractionNFR-032–036Phase 3
5. AuthFR-050–053Phase 4
6. AuditFR-040–043Phase 5
7. PolishFR-011, FR-014, FR-015Phase 6
8. NFRsNFR-001–046Phase 7

/wsresearch — Investigator

Purpose: Gather context before coding. Prevent “code first, understand later.”

Output includes:

  • Acceptance criteria from REQUIREMENTS.md
  • Architecture patterns to follow
  • Similar existing code to reference
  • Database schema considerations
  • Evidence-Bound invariants checklist:
    • Tenant isolation (FR-001)
    • Matter isolation (FR-002)
    • LLM telemetry (NFR-030)
    • PII redaction (NFR-004)
    • Citation validation unchanged
  • TDD test outline
  • Risk assessment

Ends with: “Ready for /wsstart? [Y/n]“


/wsstart — Developer

Purpose: Plan and implement with TDD enforcement.

Protocol:

  1. Read STATUS.md → identify task
  2. Create feature branch
  3. Move task from “Next” to “Now”
  4. Enter Plan mode:
    • What files need to change?
    • What tests are needed? (write FIRST)
    • What telemetry is needed?
    • Any env vars to add?
  5. Wait for approval
  6. Implement following TDD: RED → GREEN → REFACTOR

/wsverify — QA

Purpose: Run all quality gates.

Commands executed:

ruff check apps/ # Lint mypy apps/api/app --strict # Type check pytest tests/ -v --tb=short # Unit + integration tests pytest evals/ -v # Golden query evals

On failure: Analyze error, suggest fix, ask before applying.


/wsskeptic — Security Auditor

Purpose: Adversarial review for AI-specific failure modes.

Checklist:

  1. Failure Modes

    • Empty retrieval handling
    • Low confidence gating (threshold 0.70)
    • LLM timeout behavior
    • Malformed input handling
    • Token limit exceeded
  2. Data Leakage

    • Tenant isolation on every query
    • Prompt injection risks
    • PII in logs
    • Error message safety
  3. Citation Integrity

    • Every claim validated against chunks
    • Fabrication risk assessment
    • Validation failure = refusal
  4. Refusal Behavior

    • Explicit refusal triggers
    • No silent failures
    • Confidence bypass check

Output format:

CRITICAL: [description] Location: [file:line] Risk: [what could go wrong] Fix: [how to fix] HIGH: [description] ... Summary: X critical, Y high, Z low Recommendation: BLOCK / APPROVE WITH FIXES / APPROVE

Rule: If CRITICAL issues exist → BLOCK. No exceptions.


/wsstatus — Reporter

Purpose: Update STATUS.md with current progress.

Updates:

  • Move completed items to “Done (This Week)”
  • Update phase progress tables
  • Add decisions made
  • Note any blockers

/wscommit — Release Manager

Purpose: Commit, push, create PR.

Commit format:

type(scope): description (FR-NNN) Co-Authored-By: Claude <noreply@anthropic.com>

Types: feat, fix, test, docs, refactor, chore


Key Files

FilePurposeUpdated BySync Requirement
STATUS.mdCurrent phase, Now/Next/Blocked/wsstatusAfter every FR
REQUIREMENTS.mdFRs/NFRs with acceptance criteriaManualWhen scope changes
ARCHITECTURE.mdPatterns, schemas, interfacesManualWhen adding patterns
CHECKPOINT.mdAutonomous work log/wsorchestrateAfter each task
CLAUDE.mdAI assistant instructionsManualWhen rules change
docs/WORKFLOW.mdDevelopment workflowManualWhen process changes

Documentation Sync Protocol

These docs are the source of truth. They must stay in sync with code.

When to Update Each Doc

DocumentUpdate When
REQUIREMENTS.mdFR/NFR scope changes, acceptance criteria updated
ARCHITECTURE.mdNew pattern added, interface changed, schema modified
STATUS.mdTask started, completed, or blocked
CHECKPOINT.mdAfter each FR in autonomous mode

Manual Review Checklist (Before PR)

  • Added new pattern? → Update ARCHITECTURE.md
  • Added env vars? → Update .env.example + deployment docs
  • Changed interface? → Update docs/architecture/interfaces.md
  • Changed schema? → Update docs/architecture/data-model.md
  • Shipped FR? → Update STATUS.md (move to Done)

Automated Checks (Future CI)

# Proposed CI checks for documentation drift - name: Check STATUS.md freshness run: | # Warn if items in "Now" older than 3 days without update - name: Check file references run: | # Verify ARCHITECTURE.md file paths exist - name: Check env var documentation run: | # Verify all env vars in config.py are in .env.example

Invariants (Always Enforced)

⛔ NON-NEGOTIABLE (Cannot Skip Under Any Circumstances)

RuleWhy
/wsresearch before implementationPrevents “code first, understand later” failures
/wsskeptic before commitCatches AI-specific failure modes before they ship

These two steps are the minimum viable process. Everything else can be adapted, but these two cannot be skipped even under deadline pressure.

Required (Should Not Skip)

RuleEnforcement
TDD requiredCLAUDE.md: “Write failing test first”
Tenant isolationEvery DB query includes tenant_id
Citation validationEvery answer has verified citations
Confidence gating< 0.70 = refuse
LLM telemetryAll calls through traced wrapper
No PII in logsRedaction in telemetry.py
Documentation syncUpdate docs when code patterns change

Autonomous Work Mode

When user says “work on this, I’ll check back”:

  1. Follow /wsorchestrate protocol
  2. Log every FR to CHECKPOINT.md
  3. Stop conditions:
    • Red flag triggered (see CLAUDE.md)
    • Test failures after 2 fix attempts
    • Ambiguous requirement
    • Need to modify policy.py or evidence.py
    • Architecture decision needed

Quick Reference

Start a session:

User: "Let's work on NFR-045" → /wsorchestrate activates → Creates plan, waits for approval → Routes through: research → start → verify → skeptic → Logs to CHECKPOINT.md → Updates STATUS.md → Commits with (NFR-045) reference

Single FR mode:

/wsresearch FR-011 /wsstart /wsverify /wsskeptic /wscommit

Check what’s next:

/wsorchestrate → "What should I work on?"

Benefits

  1. Consistency — Same process every time
  2. Quality gates — Lint, types, tests, adversarial review
  3. Traceability — Every change linked to FR/NFR
  4. Knowledge capture — Decisions logged in STATUS.md
  5. Safe autonomy — Clear stop conditions prevent runaway changes
  6. AI-specific safety — /wsskeptic catches hallucination, leakage, citation issues