OpenClawRuntimeObserver, monitor_openclaw_process(), and diagnose_openclaw_events() are the Python-level counterparts to the Critiqor CLI workflow for OpenClaw agents. They are useful for scripting evaluation into CI pipelines, building custom integrations, or replaying saved event traces through the diagnosis engine without invoking a live agent. All diagnosis is deterministic — no LLM judgment is used.
Import
monitor_openclaw_process()
process_output events.
If the subprocess exits with a non-zero return code, an error_event is appended before the diagnosis runs. If timeout is exceeded, a "timeout" error event is recorded and the partial event stream is diagnosed.
Parameters
| Parameter | Default | Description |
|---|---|---|
command | — | List of command parts to execute (e.g. ["openclaw", "run", "--task", "solve"]). |
agent_id | "openclaw_agent" | Agent identifier written into the run payload. |
tenant_id | "default" | Tenant identifier written into the run payload. |
visibility | "private" | Evidence visibility: "private", "anonymous", or "public". |
benchmark_id | "openclaw_runtime_v1" | Benchmark specification ID. |
difficulty_tier | "standard" | Difficulty tier affecting score weighting: "easy", "standard", "hard", or "stress". |
cwd | None | Working directory for the subprocess. Defaults to the current directory. |
timeout | None | Optional timeout in seconds. Raises a timeout error event if exceeded. |
dict run payload containing all OpenClawDiagnosis fields plus agent metadata, benchmark spec, the raw event trace, and runtime metrics. Key top-level fields: trust_score, readiness_level, scores, failure_causes, primary_diagnosis, causal_graph, cost_analysis, evidence_summary, trace.
OpenClawRuntimeObserver
For manual event recording — when you want to build an event stream programmatically rather than by running a subprocess:
Constructor Parameters
| Parameter | Default | Description |
|---|---|---|
agent_id | "openclaw_agent" | Agent identifier. |
tenant_id | "default" | Tenant identifier. |
benchmark_id | "openclaw_runtime_v1" | Benchmark specification ID. |
difficulty_tier | "standard" | Difficulty tier: "easy", "standard", "hard", or "stress". |
observer.record()
event_type is not one of the recognized OPENCLAW_EVENT_TYPES, it is coerced to "process_output".
| Parameter | Type | Description |
|---|---|---|
event_type | str | One of the recognized OpenClaw event types (see OPENCLAW_EVENT_TYPES). |
payload | dict | None | Additional fields merged into the event dict. |
observer.payload()
monitor_openclaw_process(). Wall-clock latency from observer construction to this call is included in runtime_metrics.
| Parameter | Type | Description |
|---|---|---|
visibility | str | "private", "anonymous", or "public". |
diagnose_openclaw_events()
monitor_openclaw_process() or OpenClawRuntimeObserver.
| Parameter | Type | Description |
|---|---|---|
events | Sequence[dict] | List of raw event dicts, each with at least an "event" key. |
OpenClawDiagnosis
OpenClawDiagnosis Fields
OpenClawDiagnosis is a frozen dataclass returned by diagnose_openclaw_events() and available inside run payloads via .to_dict().
| Field | Type | Description |
|---|---|---|
trust_score | int | 0–100 weighted reliability score across all six OpenClaw dimensions. |
readiness_level | str | "ready_for_runtime", "review_recommended", or "unsafe_for_production". |
scores | dict[str, int] | Per-dimension scores: loop_control, memory_integrity, tool_output_utilization, context_health, cost_efficiency, skill_adherence. |
failure_causes | list[OpenClawFailureCause] | All detected OpenClaw failure causes, each with type, severity, evidence, causal chain, impact score, and description. |
causal_graph | dict | {"nodes": [...], "edges": [...]} — structured causal graph linking runtime events to failures. |
cost_analysis | dict | total_tokens, token_waste, duplicate_calls, redundancy_score, cost_efficiency. |
primary_diagnosis | dict | root_cause_failure_type, causal_chain_explanation, severity, description — the highest-impact failure cause. |
evidence_summary | dict | event_count, tool_calls, tool_outputs, memory_events, retries, errors, state_transitions. |
OPENCLAW_EVENT_TYPES
The set of recognized event type strings for OpenClawRuntimeObserver.record(). Events with unrecognized types are stored as "process_output".
OPENCLAW_FAILURE_TAXONOMY
A reference dict mapping each OpenClaw failure type to its plain-language definition. These are the six failure modes that Critiqor’s OpenClaw diagnosis engine detects:
| Failure Type | Definition |
|---|---|
infinite_tool_loop | Repeated tool calls or retries without progress. |
memory_degradation | Stored or retrieved memory is lost, ignored, or fails recall. |
ignoring_tool_outputs | Tool outputs are available but not incorporated into decisions. |
context_pollution | Context growth, saturation, or compaction causes useful state loss. |
cost_explosion | Token or call waste grows without matching progress. |
skill_failure | Relevant OpenClaw skill is ignored, mis-selected, or fails invocation. |