> ## Documentation Index
> Fetch the complete documentation index at: https://critiqor.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Evidence Collection: How Critiqor Captures Runtime Data

> Critiqor captures tool calls, tool outputs, provider events, memory events, and execution metadata into a structured session file for every OpenClaw run.

Evidence collection is the process by which Critiqor transforms a live OpenClaw agent run into a structured, auditable record. During every monitored session, the Critiqor OpenClaw plugin observes the agent's execution from two complementary layers, normalizes each event into a consistent schema, and writes the result to a single session file. That file becomes the permanent source of truth for every diagnosis Critiqor produces.

## The Two Collection Layers

The Critiqor OpenClaw plugin is intentionally narrow. It does not score runs, generate diagnoses, or interact with the agent's logic in any way. Its only job is to observe and record.

### Extension API Layer

The Extension API layer attaches via OpenClaw's `api.on(...)` event hooks. This layer captures the high-level agent and session lifecycle:

* **Agent and session timeline** — `agent_start`, `turn_start`, `turn_end`, `agent_end`
* **Provider requests and responses** — `before_provider_request`, `after_provider_response`
* **Messages and input** — user messages, agent messages, input events
* **User bash events** — shell command activity initiated during the session

These events give Critiqor the skeleton of the run: when each turn started and ended, how many times the provider was called, and what was exchanged at the message level.

### Tool Hooks Layer

The Tool Hooks layer attaches via OpenClaw's `AgentSession.installAgentToolHooks()`. This layer captures the granular, per-tool activity that exposes the most diagnostic signal:

* **Tool calls** — tool name, arguments, call ID, timestamp
* **Tool results** — result content, error flag, `duration_ms`, status
* **Memory search** — `memory_search` events with query and status
* **Memory get** — `memory_get` events with key and result status
* **Errors** — tool-level exceptions and failure events
* **Duration** — execution time per tool invocation

This is the layer that makes failure detection possible. Without it, loops, ignored outputs, and memory failures are invisible.

## Evidence Hierarchy

The two layers together produce the complete runtime picture Critiqor needs for diagnosis:

```
Runtime Traces
  ├── Tool Calls (tool name, arguments, call ID, timestamp)
  ├── Tool Outputs (result, error flag, duration_ms, status)
  ├── Memory Events (memory_search, memory_get, status)
  ├── Provider Events (before_provider_request, after_provider_response)
  ├── Session Timeline (agent_start, turn_start, turn_end, agent_end)
  └── Execution Metadata (process start/end, exit code, latency)
→ Final Response
→ Self-Report
```

Each level provides stronger evidence than the level below it. Critiqor's failure detectors work from the top of this hierarchy down — using raw runtime events as primary evidence and treating the final response as a secondary signal, not the other way around.

## What a Session File Contains

All collected evidence is written to:

```
runs/<run_id>/session.json
```

The session file follows the `critiqor.session.v1` schema:

| Field            | Description                                                           |
| ---------------- | --------------------------------------------------------------------- |
| `schema_version` | Always `critiqor.session.v1` — identifies the file format             |
| `run_id`         | The unique identifier for this run (e.g. `run_001`)                   |
| `session_id`     | Matches `run_id`; used internally by the session layer                |
| `events[]`       | Ordered array of all normalized runtime events                        |
| `metrics{}`      | Aggregate counts — `total_events`, `by_event_type`, `by_source_layer` |

Each event in `events[]` includes an `event` type, a `timestamp`, a `source_layer` (either `extension_api` or `tool_hooks`), and a `payload` containing event-specific fields.

The `metrics` object provides a fast summary of what was collected without requiring a full scan of the events array:

```json theme={null}
{
  "total_events": 42,
  "by_event_type": {
    "tool_call": 8,
    "tool_output": 8,
    "memory_event": 4,
    "token_usage": 3,
    "state_transition": 6,
    "error_event": 1
  },
  "by_source_layer": {
    "extension_api": 20,
    "tool_hooks": 22
  }
}
```

## Immutability and Rerunnable Diagnosis

Evidence is **write-once during collection**. Once `critiqor finalize` closes the active session, `session.json` is sealed. Critiqor does not modify it afterward.

The derived artifact — `runs/<run_id>/diagnosis.json` — is written separately after the session closes. This split is deliberate: the raw evidence is permanently auditable, while the diagnosis logic can improve over time without requiring the OpenClaw session to be re-run. If Critiqor releases an improved detector, you can re-run diagnosis against the same `session.json` and get a more accurate result without re-executing the agent.
