How it works

Trust by design

Approval gates, audit trails, and rollback aren't friction on top of an agent — they're the product. Mock tools and a dry-run default let you prove the safety model with zero blast radius.

1
Goal

State a plain-language objective.

2
Plan

The planner proposes an ordered set of tool calls (tool, args, reasoning). In live mode, claude-opus-4-8 proposes them via real tool-use — and they are intercepted, never executed.

3
Approve

Every step starts as 'proposed'. Nothing runs until a human approves it — the load-bearing safety gate.

4
Execute

Only approved steps run, against mock tools. A step whose tool isn't in the allowed list is blocked and logged as a policy violation.

5
Audit

Every transition — proposed, approved, rejected, executed, blocked, failed, rolled-back — is recorded in an append-only trail.

6
Recover

A failure is never auto-retried; the agent surfaces it with a rollback suggestion for a human to decide.

Why a manual tool-calling loop

Instead of an auto tool-runner, the planner uses a manual loop so every proposed tool call is intercepted before execution and routed to the human approval queue. That interception point is where trust lives.

Why mock tools only

Autonomy is the risk, so the demo proves the scaffolding — approval, permissions, audit, rollback — against mock tools (send_email_mock, create_ticket_mock, …). No real Gmail, Slack, Jira, or CRM is connected, by design.

The guardrail

Unauthorized tool attempts have a target of zero. Uncheck a tool in safety settings, approve a step that uses it, and watch it get blocked and logged — the policy holds.