OpenClawRuntimeObserver: Python API for OpenClaw Evidence

OpenClawRuntimeObserver, monitor_openclaw_process(), and diagnose_openclaw_events() are the Python-level counterparts to the Critiqor CLI workflow for OpenClaw agents. They are useful for scripting evaluation into CI pipelines, building custom integrations, or replaying saved event traces through the diagnosis engine without invoking a live agent. All diagnosis is deterministic — no LLM judgment is used.

Import

from critiqor import OpenClawRuntimeObserver, monitor_openclaw_process, diagnose_openclaw_events

`monitor_openclaw_process()`

monitor_openclaw_process(
    command,
    agent_id="openclaw_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    cwd=None,
    timeout=None,
) → dict

Runs an OpenClaw command as a subprocess, captures its stdout and stderr as a structured event stream, runs the OpenClaw diagnosis engine over the collected events, and returns a complete run payload dict. JSON-structured lines in the process output are parsed as typed events automatically; plain text lines are recorded as process_output events. If the subprocess exits with a non-zero return code, an error_event is appended before the diagnosis runs. If timeout is exceeded, a "timeout" error event is recorded and the partial event stream is diagnosed.

from critiqor import monitor_openclaw_process

payload = monitor_openclaw_process(
    command=["openclaw", "chat"],
    agent_id="my_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    timeout=300.0,
)
print(payload["trust_score"])
print(payload["primary_diagnosis"])

Parameters

Parameter	Default	Description
`command`	—	List of command parts to execute (e.g. `["openclaw", "run", "--task", "solve"]`).
`agent_id`	`"openclaw_agent"`	Agent identifier written into the run payload.
`tenant_id`	`"default"`	Tenant identifier written into the run payload.
`visibility`	`"private"`	Evidence visibility: `"private"`, `"anonymous"`, or `"public"`.
`benchmark_id`	`"openclaw_runtime_v1"`	Benchmark specification ID.
`difficulty_tier`	`"standard"`	Difficulty tier affecting score weighting: `"easy"`, `"standard"`, `"hard"`, or `"stress"`.
`cwd`	`None`	Working directory for the subprocess. Defaults to the current directory.
`timeout`	`None`	Optional timeout in seconds. Raises a timeout error event if exceeded.

Returns: A dict run payload containing all OpenClawDiagnosis fields plus agent metadata, benchmark spec, the raw event trace, and runtime metrics. Key top-level fields: trust_score, readiness_level, scores, failure_causes, primary_diagnosis, causal_graph, cost_analysis, evidence_summary, trace.

`OpenClawRuntimeObserver`

For manual event recording — when you want to build an event stream programmatically rather than by running a subprocess:

from critiqor import OpenClawRuntimeObserver

observer = OpenClawRuntimeObserver(
    agent_id="my_agent",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
)

# Record events manually
observer.record("tool_call", {"tool": "read_file", "args": {"path": "/tmp/x"}})
observer.record("tool_output", {"tool": "read_file", "output": "contents"})
observer.record("error_event", {"error": "timeout"})

# Build the run payload (runs diagnosis engine internally)
payload = observer.payload(visibility="private")
print(payload["trust_score"])

Constructor Parameters

Parameter	Default	Description
`agent_id`	`"openclaw_agent"`	Agent identifier.
`tenant_id`	`"default"`	Tenant identifier.
`benchmark_id`	`"openclaw_runtime_v1"`	Benchmark specification ID.
`difficulty_tier`	`"standard"`	Difficulty tier: `"easy"`, `"standard"`, `"hard"`, or `"stress"`.

`observer.record()`

observer.record(event_type, payload=None) → dict

Appends a timestamped event to the observer’s internal event list. The event is returned as a dict. If event_type is not one of the recognized OPENCLAW_EVENT_TYPES, it is coerced to "process_output".

Parameter	Type	Description
`event_type`	`str`	One of the recognized OpenClaw event types (see OPENCLAW_EVENT_TYPES).
`payload`	`dict \| None`	Additional fields merged into the event dict.

`observer.payload()`

observer.payload(visibility="private") → dict

Runs the diagnosis engine over all recorded events and returns the complete run payload dict, identical in structure to the return value of monitor_openclaw_process(). Wall-clock latency from observer construction to this call is included in runtime_metrics.

Parameter	Type	Description
`visibility`	`str`	`"private"`, `"anonymous"`, or `"public"`.

`diagnose_openclaw_events()`

diagnose_openclaw_events(events) → OpenClawDiagnosis

Runs the OpenClaw diagnosis engine directly on a list of event dicts. Use this when you already have a saved event trace and want to run or re-run diagnosis without going through monitor_openclaw_process() or OpenClawRuntimeObserver.

from critiqor import diagnose_openclaw_events

diagnosis = diagnose_openclaw_events(events)
print(diagnosis.trust_score)       # 0-100
print(diagnosis.readiness_level)   # "ready_for_runtime" etc.
print(diagnosis.failure_causes)    # list of OpenClawFailureCause
print(diagnosis.primary_diagnosis) # dict with root cause

Parameter	Type	Description
`events`	`Sequence[dict]`	List of raw event dicts, each with at least an `"event"` key.

Returns: OpenClawDiagnosis

`OpenClawDiagnosis` Fields

OpenClawDiagnosis is a frozen dataclass returned by diagnose_openclaw_events() and available inside run payloads via .to_dict().

Field	Type	Description
`trust_score`	`int`	0–100 weighted reliability score across all six OpenClaw dimensions.
`readiness_level`	`str`	`"ready_for_runtime"`, `"review_recommended"`, or `"unsafe_for_production"`.
`scores`	`dict[str, int]`	Per-dimension scores: `loop_control`, `memory_integrity`, `tool_output_utilization`, `context_health`, `cost_efficiency`, `skill_adherence`.
`failure_causes`	`list[OpenClawFailureCause]`	All detected OpenClaw failure causes, each with type, severity, evidence, causal chain, impact score, and description.
`causal_graph`	`dict`	`{"nodes": [...], "edges": [...]}` — structured causal graph linking runtime events to failures.
`cost_analysis`	`dict`	`total_tokens`, `token_waste`, `duplicate_calls`, `redundancy_score`, `cost_efficiency`.
`primary_diagnosis`	`dict`	`root_cause_failure_type`, `causal_chain_explanation`, `severity`, `description` — the highest-impact failure cause.
`evidence_summary`	`dict`	`event_count`, `tool_calls`, `tool_outputs`, `memory_events`, `retries`, `errors`, `state_transitions`.

`OPENCLAW_EVENT_TYPES`

The set of recognized event type strings for OpenClawRuntimeObserver.record(). Events with unrecognized types are stored as "process_output".

OPENCLAW_EVENT_TYPES = {
    "tool_call",
    "tool_output",
    "memory_event",
    "retry_event",
    "error_event",
    "state_transition",
    "decision",
    "skill_event",
    "token_usage",
    "context_event",
    "process_output",
    "process_start",
    "process_end",
}

`OPENCLAW_FAILURE_TAXONOMY`

A reference dict mapping each OpenClaw failure type to its plain-language definition. These are the six failure modes that Critiqor’s OpenClaw diagnosis engine detects:

Failure Type	Definition
`infinite_tool_loop`	Repeated tool calls or retries without progress.
`memory_degradation`	Stored or retrieved memory is lost, ignored, or fails recall.
`ignoring_tool_outputs`	Tool outputs are available but not incorporated into decisions.
`context_pollution`	Context growth, saturation, or compaction causes useful state loss.
`cost_explosion`	Token or call waste grows without matching progress.
`skill_failure`	Relevant OpenClaw skill is ignored, mis-selected, or fails invocation.

​Import

​monitor_openclaw_process()

​Parameters

​OpenClawRuntimeObserver

​Constructor Parameters

​observer.record()

​observer.payload()

​diagnose_openclaw_events()

​OpenClawDiagnosis Fields

​OPENCLAW_EVENT_TYPES

​OPENCLAW_FAILURE_TAXONOMY