Skip to main content
Evidence collection is the process by which Critiqor transforms a live OpenClaw agent run into a structured, auditable record. During every monitored session, the Critiqor OpenClaw plugin observes the agent’s execution from two complementary layers, normalizes each event into a consistent schema, and writes the result to a single session file. That file becomes the permanent source of truth for every diagnosis Critiqor produces.

The Two Collection Layers

The Critiqor OpenClaw plugin is intentionally narrow. It does not score runs, generate diagnoses, or interact with the agent’s logic in any way. Its only job is to observe and record.

Extension API Layer

The Extension API layer attaches via OpenClaw’s api.on(...) event hooks. This layer captures the high-level agent and session lifecycle:
  • Agent and session timelineagent_start, turn_start, turn_end, agent_end
  • Provider requests and responsesbefore_provider_request, after_provider_response
  • Messages and input — user messages, agent messages, input events
  • User bash events — shell command activity initiated during the session
These events give Critiqor the skeleton of the run: when each turn started and ended, how many times the provider was called, and what was exchanged at the message level.

Tool Hooks Layer

The Tool Hooks layer attaches via OpenClaw’s AgentSession.installAgentToolHooks(). This layer captures the granular, per-tool activity that exposes the most diagnostic signal:
  • Tool calls — tool name, arguments, call ID, timestamp
  • Tool results — result content, error flag, duration_ms, status
  • Memory searchmemory_search events with query and status
  • Memory getmemory_get events with key and result status
  • Errors — tool-level exceptions and failure events
  • Duration — execution time per tool invocation
This is the layer that makes failure detection possible. Without it, loops, ignored outputs, and memory failures are invisible.

Evidence Hierarchy

The two layers together produce the complete runtime picture Critiqor needs for diagnosis:
Runtime Traces
  ├── Tool Calls (tool name, arguments, call ID, timestamp)
  ├── Tool Outputs (result, error flag, duration_ms, status)
  ├── Memory Events (memory_search, memory_get, status)
  ├── Provider Events (before_provider_request, after_provider_response)
  ├── Session Timeline (agent_start, turn_start, turn_end, agent_end)
  └── Execution Metadata (process start/end, exit code, latency)
→ Final Response
→ Self-Report
Each level provides stronger evidence than the level below it. Critiqor’s failure detectors work from the top of this hierarchy down — using raw runtime events as primary evidence and treating the final response as a secondary signal, not the other way around.

What a Session File Contains

All collected evidence is written to:
runs/<run_id>/session.json
The session file follows the critiqor.session.v1 schema:
FieldDescription
schema_versionAlways critiqor.session.v1 — identifies the file format
run_idThe unique identifier for this run (e.g. run_001)
session_idMatches run_id; used internally by the session layer
events[]Ordered array of all normalized runtime events
metrics{}Aggregate counts — total_events, by_event_type, by_source_layer
Each event in events[] includes an event type, a timestamp, a source_layer (either extension_api or tool_hooks), and a payload containing event-specific fields. The metrics object provides a fast summary of what was collected without requiring a full scan of the events array:
{
  "total_events": 42,
  "by_event_type": {
    "tool_call": 8,
    "tool_output": 8,
    "memory_event": 4,
    "token_usage": 3,
    "state_transition": 6,
    "error_event": 1
  },
  "by_source_layer": {
    "extension_api": 20,
    "tool_hooks": 22
  }
}

Immutability and Rerunnable Diagnosis

Evidence is write-once during collection. Once critiqor finalize closes the active session, session.json is sealed. Critiqor does not modify it afterward. The derived artifact — runs/<run_id>/diagnosis.json — is written separately after the session closes. This split is deliberate: the raw evidence is permanently auditable, while the diagnosis logic can improve over time without requiring the OpenClaw session to be re-run. If Critiqor releases an improved detector, you can re-run diagnosis against the same session.json and get a more accurate result without re-executing the agent.