Evidence Collection: How Critiqor Captures Runtime Data

Evidence collection is the process by which Critiqor transforms a live OpenClaw agent run into a structured, auditable record. During every monitored session, the Critiqor OpenClaw plugin observes the agent’s execution from two complementary layers, normalizes each event into a consistent schema, and writes the result to a single session file. That file becomes the permanent source of truth for every diagnosis Critiqor produces.

The Two Collection Layers

The Critiqor OpenClaw plugin is intentionally narrow. It does not score runs, generate diagnoses, or interact with the agent’s logic in any way. Its only job is to observe and record.

Extension API Layer

The Extension API layer attaches via OpenClaw’s api.on(...) event hooks. This layer captures the high-level agent and session lifecycle:

Agent and session timeline — agent_start, turn_start, turn_end, agent_end
Provider requests and responses — before_provider_request, after_provider_response
Messages and input — user messages, agent messages, input events
User bash events — shell command activity initiated during the session

These events give Critiqor the skeleton of the run: when each turn started and ended, how many times the provider was called, and what was exchanged at the message level.

Tool Hooks Layer

The Tool Hooks layer attaches via OpenClaw’s AgentSession.installAgentToolHooks(). This layer captures the granular, per-tool activity that exposes the most diagnostic signal:

Tool calls — tool name, arguments, call ID, timestamp
Tool results — result content, error flag, duration_ms, status
Memory search — memory_search events with query and status
Memory get — memory_get events with key and result status
Errors — tool-level exceptions and failure events
Duration — execution time per tool invocation

This is the layer that makes failure detection possible. Without it, loops, ignored outputs, and memory failures are invisible.

Evidence Hierarchy

The two layers together produce the complete runtime picture Critiqor needs for diagnosis:

Runtime Traces
  ├── Tool Calls (tool name, arguments, call ID, timestamp)
  ├── Tool Outputs (result, error flag, duration_ms, status)
  ├── Memory Events (memory_search, memory_get, status)
  ├── Provider Events (before_provider_request, after_provider_response)
  ├── Session Timeline (agent_start, turn_start, turn_end, agent_end)
  └── Execution Metadata (process start/end, exit code, latency)
→ Final Response
→ Self-Report

Each level provides stronger evidence than the level below it. Critiqor’s failure detectors work from the top of this hierarchy down — using raw runtime events as primary evidence and treating the final response as a secondary signal, not the other way around.

What a Session File Contains

All collected evidence is written to:

runs/<run_id>/session.json

The session file follows the critiqor.session.v1 schema:

Field	Description
`schema_version`	Always `critiqor.session.v1` — identifies the file format
`run_id`	The unique identifier for this run (e.g. `run_001`)
`session_id`	Matches `run_id`; used internally by the session layer
`events[]`	Ordered array of all normalized runtime events
`metrics{}`	Aggregate counts — `total_events`, `by_event_type`, `by_source_layer`

Each event in events[] includes an event type, a timestamp, a source_layer (either extension_api or tool_hooks), and a payload containing event-specific fields. The metrics object provides a fast summary of what was collected without requiring a full scan of the events array:

{
  "total_events": 42,
  "by_event_type": {
    "tool_call": 8,
    "tool_output": 8,
    "memory_event": 4,
    "token_usage": 3,
    "state_transition": 6,
    "error_event": 1
  },
  "by_source_layer": {
    "extension_api": 20,
    "tool_hooks": 22
  }
}

Immutability and Rerunnable Diagnosis

Evidence is write-once during collection. Once critiqor finalize closes the active session, session.json is sealed. Critiqor does not modify it afterward. The derived artifact — runs/<run_id>/diagnosis.json — is written separately after the session closes. This split is deliberate: the raw evidence is permanently auditable, while the diagnosis logic can improve over time without requiring the OpenClaw session to be re-run. If Critiqor releases an improved detector, you can re-run diagnosis against the same session.json and get a more accurate result without re-executing the agent.

​The Two Collection Layers

​Extension API Layer

​Tool Hooks Layer

​Evidence Hierarchy

​What a Session File Contains

​Immutability and Rerunnable Diagnosis