> ## Documentation Index
> Fetch the complete documentation index at: https://critiqor.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenClawRuntimeObserver: Python API for OpenClaw Evidence

> OpenClawRuntimeObserver and monitor_openclaw_process() record OpenClaw runtime events and build run payloads with trust scores and failure diagnosis.

`OpenClawRuntimeObserver`, `monitor_openclaw_process()`, and `diagnose_openclaw_events()` are the Python-level counterparts to the Critiqor CLI workflow for OpenClaw agents. They are useful for scripting evaluation into CI pipelines, building custom integrations, or replaying saved event traces through the diagnosis engine without invoking a live agent. All diagnosis is deterministic — no LLM judgment is used.

## Import

```python theme={null}
from critiqor import OpenClawRuntimeObserver, monitor_openclaw_process, diagnose_openclaw_events
```

***

## `monitor_openclaw_process()`

```python theme={null}
monitor_openclaw_process(
    command,
    agent_id="openclaw_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    cwd=None,
    timeout=None,
) → dict
```

Runs an OpenClaw command as a subprocess, captures its stdout and stderr as a structured event stream, runs the OpenClaw diagnosis engine over the collected events, and returns a complete run payload dict. JSON-structured lines in the process output are parsed as typed events automatically; plain text lines are recorded as `process_output` events.

If the subprocess exits with a non-zero return code, an `error_event` is appended before the diagnosis runs. If `timeout` is exceeded, a `"timeout"` error event is recorded and the partial event stream is diagnosed.

```python theme={null}
from critiqor import monitor_openclaw_process

payload = monitor_openclaw_process(
    command=["openclaw", "chat"],
    agent_id="my_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    timeout=300.0,
)
print(payload["trust_score"])
print(payload["primary_diagnosis"])
```

### Parameters

| Parameter         | Default                 | Description                                                                                 |
| ----------------- | ----------------------- | ------------------------------------------------------------------------------------------- |
| `command`         | —                       | List of command parts to execute (e.g. `["openclaw", "run", "--task", "solve"]`).           |
| `agent_id`        | `"openclaw_agent"`      | Agent identifier written into the run payload.                                              |
| `tenant_id`       | `"default"`             | Tenant identifier written into the run payload.                                             |
| `visibility`      | `"private"`             | Evidence visibility: `"private"`, `"anonymous"`, or `"public"`.                             |
| `benchmark_id`    | `"openclaw_runtime_v1"` | Benchmark specification ID.                                                                 |
| `difficulty_tier` | `"standard"`            | Difficulty tier affecting score weighting: `"easy"`, `"standard"`, `"hard"`, or `"stress"`. |
| `cwd`             | `None`                  | Working directory for the subprocess. Defaults to the current directory.                    |
| `timeout`         | `None`                  | Optional timeout in seconds. Raises a timeout error event if exceeded.                      |

**Returns:** A `dict` run payload containing all `OpenClawDiagnosis` fields plus agent metadata, benchmark spec, the raw event trace, and runtime metrics. Key top-level fields: `trust_score`, `readiness_level`, `scores`, `failure_causes`, `primary_diagnosis`, `causal_graph`, `cost_analysis`, `evidence_summary`, `trace`.

***

## `OpenClawRuntimeObserver`

For manual event recording — when you want to build an event stream programmatically rather than by running a subprocess:

```python theme={null}
from critiqor import OpenClawRuntimeObserver

observer = OpenClawRuntimeObserver(
    agent_id="my_agent",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
)

# Record events manually
observer.record("tool_call", {"tool": "read_file", "args": {"path": "/tmp/x"}})
observer.record("tool_output", {"tool": "read_file", "output": "contents"})
observer.record("error_event", {"error": "timeout"})

# Build the run payload (runs diagnosis engine internally)
payload = observer.payload(visibility="private")
print(payload["trust_score"])
```

### Constructor Parameters

| Parameter         | Default                 | Description                                                       |
| ----------------- | ----------------------- | ----------------------------------------------------------------- |
| `agent_id`        | `"openclaw_agent"`      | Agent identifier.                                                 |
| `tenant_id`       | `"default"`             | Tenant identifier.                                                |
| `benchmark_id`    | `"openclaw_runtime_v1"` | Benchmark specification ID.                                       |
| `difficulty_tier` | `"standard"`            | Difficulty tier: `"easy"`, `"standard"`, `"hard"`, or `"stress"`. |

### `observer.record()`

```python theme={null}
observer.record(event_type, payload=None) → dict
```

Appends a timestamped event to the observer's internal event list. The event is returned as a dict.

If `event_type` is not one of the recognized `OPENCLAW_EVENT_TYPES`, it is coerced to `"process_output"`.

| Parameter    | Type           | Description                                                                                       |
| ------------ | -------------- | ------------------------------------------------------------------------------------------------- |
| `event_type` | `str`          | One of the recognized OpenClaw event types (see [OPENCLAW\_EVENT\_TYPES](#openclaw_event_types)). |
| `payload`    | `dict \| None` | Additional fields merged into the event dict.                                                     |

### `observer.payload()`

```python theme={null}
observer.payload(visibility="private") → dict
```

Runs the diagnosis engine over all recorded events and returns the complete run payload dict, identical in structure to the return value of `monitor_openclaw_process()`. Wall-clock latency from observer construction to this call is included in `runtime_metrics`.

| Parameter    | Type  | Description                                |
| ------------ | ----- | ------------------------------------------ |
| `visibility` | `str` | `"private"`, `"anonymous"`, or `"public"`. |

***

## `diagnose_openclaw_events()`

```python theme={null}
diagnose_openclaw_events(events) → OpenClawDiagnosis
```

Runs the OpenClaw diagnosis engine directly on a list of event dicts. Use this when you already have a saved event trace and want to run or re-run diagnosis without going through `monitor_openclaw_process()` or `OpenClawRuntimeObserver`.

```python theme={null}
from critiqor import diagnose_openclaw_events

diagnosis = diagnose_openclaw_events(events)
print(diagnosis.trust_score)       # 0-100
print(diagnosis.readiness_level)   # "ready_for_runtime" etc.
print(diagnosis.failure_causes)    # list of OpenClawFailureCause
print(diagnosis.primary_diagnosis) # dict with root cause
```

| Parameter | Type             | Description                                                   |
| --------- | ---------------- | ------------------------------------------------------------- |
| `events`  | `Sequence[dict]` | List of raw event dicts, each with at least an `"event"` key. |

**Returns:** [`OpenClawDiagnosis`](#openclawdiagnosis-fields)

***

## `OpenClawDiagnosis` Fields

`OpenClawDiagnosis` is a frozen dataclass returned by `diagnose_openclaw_events()` and available inside run payloads via `.to_dict()`.

| Field               | Type                         | Description                                                                                                                                  |
| ------------------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- |
| `trust_score`       | `int`                        | 0–100 weighted reliability score across all six OpenClaw dimensions.                                                                         |
| `readiness_level`   | `str`                        | `"ready_for_runtime"`, `"review_recommended"`, or `"unsafe_for_production"`.                                                                 |
| `scores`            | `dict[str, int]`             | Per-dimension scores: `loop_control`, `memory_integrity`, `tool_output_utilization`, `context_health`, `cost_efficiency`, `skill_adherence`. |
| `failure_causes`    | `list[OpenClawFailureCause]` | All detected OpenClaw failure causes, each with type, severity, evidence, causal chain, impact score, and description.                       |
| `causal_graph`      | `dict`                       | `{"nodes": [...], "edges": [...]}` — structured causal graph linking runtime events to failures.                                             |
| `cost_analysis`     | `dict`                       | `total_tokens`, `token_waste`, `duplicate_calls`, `redundancy_score`, `cost_efficiency`.                                                     |
| `primary_diagnosis` | `dict`                       | `root_cause_failure_type`, `causal_chain_explanation`, `severity`, `description` — the highest-impact failure cause.                         |
| `evidence_summary`  | `dict`                       | `event_count`, `tool_calls`, `tool_outputs`, `memory_events`, `retries`, `errors`, `state_transitions`.                                      |

***

## `OPENCLAW_EVENT_TYPES`

The set of recognized event type strings for `OpenClawRuntimeObserver.record()`. Events with unrecognized types are stored as `"process_output"`.

```python theme={null}
OPENCLAW_EVENT_TYPES = {
    "tool_call",
    "tool_output",
    "memory_event",
    "retry_event",
    "error_event",
    "state_transition",
    "decision",
    "skill_event",
    "token_usage",
    "context_event",
    "process_output",
    "process_start",
    "process_end",
}
```

***

## `OPENCLAW_FAILURE_TAXONOMY`

A reference dict mapping each OpenClaw failure type to its plain-language definition. These are the six failure modes that Critiqor's OpenClaw diagnosis engine detects:

| Failure Type            | Definition                                                             |
| ----------------------- | ---------------------------------------------------------------------- |
| `infinite_tool_loop`    | Repeated tool calls or retries without progress.                       |
| `memory_degradation`    | Stored or retrieved memory is lost, ignored, or fails recall.          |
| `ignoring_tool_outputs` | Tool outputs are available but not incorporated into decisions.        |
| `context_pollution`     | Context growth, saturation, or compaction causes useful state loss.    |
| `cost_explosion`        | Token or call waste grows without matching progress.                   |
| `skill_failure`         | Relevant OpenClaw skill is ignored, mis-selected, or fails invocation. |
