> ## Documentation Index
> Fetch the complete documentation index at: https://critiqor.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Evidence Types: What Critiqor Collects During Agent Runs

> Detailed guide to the seven evidence categories Critiqor records during OpenClaw runs, with JSON examples and diagnostic relevance for each type.

Every Critiqor diagnosis is grounded in evidence — raw, structured observations captured passively from a running OpenClaw agent. Rather than relying on self-reported agent summaries or LLM judgment, Critiqor records what actually happened at runtime and reasons from that record. All evidence for a given run is written incrementally to a `session.json` file inside the run's directory under `runs/<run_id>/session.json`. The seven evidence categories below describe exactly what is collected, where each item comes from in the plugin source, and how each type flows into the final diagnosis.

***

## Tool Calls

**What it is:** A record of every tool invocation observed during the agent run. The plugin's `TOOL_EVENTS` listener captures `tool_call` and `tool_execution_start` events from OpenClaw's `tool_hooks` source layer. Each entry records the tool name, the arguments passed, a call ID that pairs it with its result, and the timestamp at which the call was issued.

**Example:**

```json theme={null}
{
  "timestamp": "2025-06-01T12:00:01.234Z",
  "event_type": "tool_call",
  "source_layer": "tool_hooks",
  "tool_name": "read_file",
  "tool_call_id": "tc_abc123",
  "status": "ok",
  "duration_ms": null,
  "payload": {
    "toolName": "read_file",
    "toolCallId": "tc_abc123"
  }
}
```

`duration_ms` is `null` on the call record because the timer has not yet stopped — it is populated on the matching tool output event once the result arrives.

**Why it matters:** The `infinite_tool_loop` detector in `openclaw.py` groups tool calls by a fingerprint of `(tool_name, arguments)`. When the same fingerprint appears three or more times in a run, a loop failure is raised. The total count of tool calls also appears in the Executive Summary and in the `tool_calls` field of `evidence_summary`.

**In the dashboard:** Every tool call is listed in the Evidence panel. The tool call count from `evidence_summary.tool_calls` appears in the Executive Summary header alongside the trust score.

***

## Tool Outputs

**What it is:** The result returned by a tool after it executes. The plugin correlates a `tool_result` or `tool_execution_end` event with its originating `tool_call` using the shared `tool_call_id`. The `isError` flag on the payload becomes the `status` field (`"ok"` or `"error"`), and `duration_ms` is computed as the elapsed time between the call's recorded start timestamp and the moment the result arrives.

**Example:**

```json theme={null}
{
  "timestamp": "2025-06-01T12:00:01.890Z",
  "event_type": "tool_result",
  "source_layer": "tool_hooks",
  "tool_name": "read_file",
  "tool_call_id": "tc_abc123",
  "status": "ok",
  "duration_ms": 656,
  "payload": {
    "isError": false,
    "result": "File contents here..."
  }
}
```

**Why it matters:** The `ignoring_tool_outputs` detector looks for tool output events where `used` is `false`, `referenced` is `false`, or `status` is `"ignored"`. When the agent receives a result but does not incorporate it into subsequent decisions, those outputs are flagged and penalise the `tool_output_utilization` dimension. The `duration_ms` value is also used in cost and efficiency analysis — slow tool calls with low utilization are a strong signal of redundant execution.

**In the dashboard:** Tool outputs are displayed alongside their paired calls in the Evidence panel. A `tool_result` event with `status: "error"` increments the `error_events` metric counter in `session.json`.

***

## Runtime Events

**What it is:** Agent lifecycle events that describe state transitions, retries, errors, memory operations, decisions, skill invocations, and context changes. These are emitted by the OpenClaw extension API rather than the tool hooks layer. Critiqor's plugin subscribes to all events defined in `TIMELINE_EVENTS` — including `agent_start`, `agent_end`, `turn_start`, `turn_end`, `session_start`, `session_end`, `message_received`, and `message_sent` — and records them with `source_layer: "extension_api"`.

The full set of OpenClaw event types recognized by the Python diagnosis engine (`OPENCLAW_EVENT_TYPES`) includes:

| Event type         | Description                |
| ------------------ | -------------------------- |
| `tool_call`        | Tool invocation            |
| `tool_output`      | Tool result                |
| `memory_event`     | Memory storage or recall   |
| `retry_event`      | Retry attempt              |
| `error_event`      | Runtime error              |
| `state_transition` | Agent state change         |
| `decision`         | Agent decision point       |
| `skill_event`      | OpenClaw skill invocation  |
| `token_usage`      | Provider token consumption |
| `context_event`    | Context window change      |
| `process_output`   | Raw stdout/stderr line     |
| `process_start`    | Process launch             |
| `process_end`      | Process exit               |

**Example:**

```json theme={null}
{
  "timestamp": "2025-06-01T12:00:00.100Z",
  "event_type": "retry_event",
  "source_layer": "extension_api",
  "payload": { "reason": "tool_timeout", "attempt": 2 }
}
```

**Why it matters:** Runtime events are the backbone of causal analysis. A `retry_event` following a repeated `tool_call` triggers the `infinite_tool_loop` detector. Memory events with `action` values of `recall_failed`, `ignored`, `lost`, or `miss` trigger `memory_degradation` detection. Context events with `saturation ≥ 85` or `action: compaction` trigger `context_pollution` detection. Skill events with `status: ignored`, `mismatch`, or `failed` trigger `skill_failure` detection.

**In the dashboard:** Runtime events appear in the Runtime Timeline section as a chronological event log. Retry and error events are highlighted and linked to the failure causes they contributed to.

***

## Provider Requests

**What it is:** Events that bracket each LLM call — `before_provider_request` when the agent sends a prompt to the model, and `after_provider_response` when the model reply arrives. The `after_provider_response` payload contains a `usage` block with `input_tokens`, `output_tokens`, and `total`. The session normalizer in `session.py` maps `after_provider_response` to the `token_usage` event type so the Python diagnosis engine can accumulate totals across all turns.

**Example:**

```json theme={null}
{
  "timestamp": "2025-06-01T12:00:02.000Z",
  "event_type": "after_provider_response",
  "source_layer": "extension_api",
  "payload": {
    "usage": { "input_tokens": 1204, "output_tokens": 387, "total": 1591 }
  }
}
```

**Why it matters:** The `cost_explosion` detector sums `usage.total` across all `token_usage` events in the session. If the cumulative total reaches 12,000 tokens and duplicate tool actions are also present, a `cost_explosion` failure is raised. At 30,000 tokens or more the severity escalates to `critical`, which directly triggers an `unsafe_for_production` readiness level regardless of the trust score. Token totals also appear in the `cost_analysis` block of the final diagnosis.

**In the dashboard:** Token usage is shown in the Cost Analysis section. The total token count and the estimated token waste from redundant calls are both displayed.

***

## Execution Metadata

**What it is:** Process-level information captured when an agent process is launched and when it exits. Critiqor records a `process_start` event at launch (including the command used) and a `process_end` event on exit with the exit code and total wall-clock latency. When the agent is monitored via `monitor_openclaw_process()`, stdout and stderr are also captured line-by-line; lines that parse as JSON are recorded as typed events, and plain-text lines become `process_output` events.

**Example:**

```json theme={null}
{
  "event": "process_end",
  "timestamp": "2025-06-01T12:05:22.000Z",
  "pid": 12345,
  "exit_code": 0,
  "framework": "openclaw"
}
```

**Why it matters:** A non-zero exit code causes `monitor_openclaw_process()` to emit an `error_event` before the `process_end` record. That `error_event` is then available to every failure detector. Total latency from `process_end` is included in the run payload under `runtime_metrics.latency` and surfaced in the dashboard Agent Health view.

**In the dashboard:** Exit code and latency appear in the run summary row shown by `critiqor runs`. Non-zero exit codes are flagged in the Evidence panel header.

***

## Session Metadata

**What it is:** The top-level envelope that wraps all evidence in `session.json`. It is written by the plugin's `ensureSessionFile()` function when the first event arrives and updated incrementally on every subsequent event. The `metrics` block is computed incrementally — `total_events`, `by_event_type`, `by_source_layer`, and `error_events` are all live counters maintained inside `updateSessionSummary()` in the plugin.

**Example:**

```json theme={null}
{
  "session_id": "run_001",
  "run_id": "run_001",
  "schema_version": "critiqor.session.v1",
  "events_file": "session.json",
  "metrics": {
    "total_events": 47,
    "by_event_type": { "tool_call": 14, "tool_result": 14, "retry_event": 3 },
    "by_source_layer": { "tool_hooks": 28, "extension_api": 19 },
    "error_events": 1
  }
}
```

The `schema_version` field (`critiqor.session.v1`) is used by `session.py` to identify valid evidence files during finalization. The `run_id` ties the evidence file to the matching `runs/run_001.json` session record.

**In the dashboard:** The metrics block is read during `finalize_session()` to populate the Evidence summary panel. `by_event_type` and `by_source_layer` breakdowns are shown as compact tables in the run detail view.

***

## Runtime Timeline

**What it is:** The complete, ordered audit trail of everything Critiqor observed during the run. All seven evidence categories above appear as entries in the `events[]` array in `session.json`, sorted by the timestamp at which they were recorded. The timeline is the primary input to `diagnose_openclaw_events()` — all six failure detectors, all dimension scores, and the causal graph are derived from sequential analysis of this array.

**Why it matters:** The causal graph built by `build_openclaw_causal_graph()` creates a `precedes` edge between every consecutive pair of events, so the temporal ordering of the timeline directly determines which events are identified as causes of later failures. An earlier `retry_event` that immediately follows a repeated `tool_call` is connected to a `loop_flagged` failure node via a `causes` edge. Repeated evidence items within a single failure cause are connected with `reinforces` edges.

**In the dashboard:** The Runtime Timeline section renders the `events[]` array as a scrollable, chronological event log. Each entry shows the event type, source layer, timestamp, and a summary of its payload. Clicking an event that has an outgoing `causes` edge in the causal graph highlights the associated failure cause.
