Skip to main content
OpenClawRuntimeObserver, monitor_openclaw_process(), and diagnose_openclaw_events() are the Python-level counterparts to the Critiqor CLI workflow for OpenClaw agents. They are useful for scripting evaluation into CI pipelines, building custom integrations, or replaying saved event traces through the diagnosis engine without invoking a live agent. All diagnosis is deterministic — no LLM judgment is used.

Import

from critiqor import OpenClawRuntimeObserver, monitor_openclaw_process, diagnose_openclaw_events

monitor_openclaw_process()

monitor_openclaw_process(
    command,
    agent_id="openclaw_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    cwd=None,
    timeout=None,
) → dict
Runs an OpenClaw command as a subprocess, captures its stdout and stderr as a structured event stream, runs the OpenClaw diagnosis engine over the collected events, and returns a complete run payload dict. JSON-structured lines in the process output are parsed as typed events automatically; plain text lines are recorded as process_output events. If the subprocess exits with a non-zero return code, an error_event is appended before the diagnosis runs. If timeout is exceeded, a "timeout" error event is recorded and the partial event stream is diagnosed.
from critiqor import monitor_openclaw_process

payload = monitor_openclaw_process(
    command=["openclaw", "chat"],
    agent_id="my_agent",
    tenant_id="default",
    visibility="private",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
    timeout=300.0,
)
print(payload["trust_score"])
print(payload["primary_diagnosis"])

Parameters

ParameterDefaultDescription
commandList of command parts to execute (e.g. ["openclaw", "run", "--task", "solve"]).
agent_id"openclaw_agent"Agent identifier written into the run payload.
tenant_id"default"Tenant identifier written into the run payload.
visibility"private"Evidence visibility: "private", "anonymous", or "public".
benchmark_id"openclaw_runtime_v1"Benchmark specification ID.
difficulty_tier"standard"Difficulty tier affecting score weighting: "easy", "standard", "hard", or "stress".
cwdNoneWorking directory for the subprocess. Defaults to the current directory.
timeoutNoneOptional timeout in seconds. Raises a timeout error event if exceeded.
Returns: A dict run payload containing all OpenClawDiagnosis fields plus agent metadata, benchmark spec, the raw event trace, and runtime metrics. Key top-level fields: trust_score, readiness_level, scores, failure_causes, primary_diagnosis, causal_graph, cost_analysis, evidence_summary, trace.

OpenClawRuntimeObserver

For manual event recording — when you want to build an event stream programmatically rather than by running a subprocess:
from critiqor import OpenClawRuntimeObserver

observer = OpenClawRuntimeObserver(
    agent_id="my_agent",
    benchmark_id="openclaw_runtime_v1",
    difficulty_tier="standard",
)

# Record events manually
observer.record("tool_call", {"tool": "read_file", "args": {"path": "/tmp/x"}})
observer.record("tool_output", {"tool": "read_file", "output": "contents"})
observer.record("error_event", {"error": "timeout"})

# Build the run payload (runs diagnosis engine internally)
payload = observer.payload(visibility="private")
print(payload["trust_score"])

Constructor Parameters

ParameterDefaultDescription
agent_id"openclaw_agent"Agent identifier.
tenant_id"default"Tenant identifier.
benchmark_id"openclaw_runtime_v1"Benchmark specification ID.
difficulty_tier"standard"Difficulty tier: "easy", "standard", "hard", or "stress".

observer.record()

observer.record(event_type, payload=None) → dict
Appends a timestamped event to the observer’s internal event list. The event is returned as a dict. If event_type is not one of the recognized OPENCLAW_EVENT_TYPES, it is coerced to "process_output".
ParameterTypeDescription
event_typestrOne of the recognized OpenClaw event types (see OPENCLAW_EVENT_TYPES).
payloaddict | NoneAdditional fields merged into the event dict.

observer.payload()

observer.payload(visibility="private") → dict
Runs the diagnosis engine over all recorded events and returns the complete run payload dict, identical in structure to the return value of monitor_openclaw_process(). Wall-clock latency from observer construction to this call is included in runtime_metrics.
ParameterTypeDescription
visibilitystr"private", "anonymous", or "public".

diagnose_openclaw_events()

diagnose_openclaw_events(events) → OpenClawDiagnosis
Runs the OpenClaw diagnosis engine directly on a list of event dicts. Use this when you already have a saved event trace and want to run or re-run diagnosis without going through monitor_openclaw_process() or OpenClawRuntimeObserver.
from critiqor import diagnose_openclaw_events

diagnosis = diagnose_openclaw_events(events)
print(diagnosis.trust_score)       # 0-100
print(diagnosis.readiness_level)   # "ready_for_runtime" etc.
print(diagnosis.failure_causes)    # list of OpenClawFailureCause
print(diagnosis.primary_diagnosis) # dict with root cause
ParameterTypeDescription
eventsSequence[dict]List of raw event dicts, each with at least an "event" key.
Returns: OpenClawDiagnosis

OpenClawDiagnosis Fields

OpenClawDiagnosis is a frozen dataclass returned by diagnose_openclaw_events() and available inside run payloads via .to_dict().
FieldTypeDescription
trust_scoreint0–100 weighted reliability score across all six OpenClaw dimensions.
readiness_levelstr"ready_for_runtime", "review_recommended", or "unsafe_for_production".
scoresdict[str, int]Per-dimension scores: loop_control, memory_integrity, tool_output_utilization, context_health, cost_efficiency, skill_adherence.
failure_causeslist[OpenClawFailureCause]All detected OpenClaw failure causes, each with type, severity, evidence, causal chain, impact score, and description.
causal_graphdict{"nodes": [...], "edges": [...]} — structured causal graph linking runtime events to failures.
cost_analysisdicttotal_tokens, token_waste, duplicate_calls, redundancy_score, cost_efficiency.
primary_diagnosisdictroot_cause_failure_type, causal_chain_explanation, severity, description — the highest-impact failure cause.
evidence_summarydictevent_count, tool_calls, tool_outputs, memory_events, retries, errors, state_transitions.

OPENCLAW_EVENT_TYPES

The set of recognized event type strings for OpenClawRuntimeObserver.record(). Events with unrecognized types are stored as "process_output".
OPENCLAW_EVENT_TYPES = {
    "tool_call",
    "tool_output",
    "memory_event",
    "retry_event",
    "error_event",
    "state_transition",
    "decision",
    "skill_event",
    "token_usage",
    "context_event",
    "process_output",
    "process_start",
    "process_end",
}

OPENCLAW_FAILURE_TAXONOMY

A reference dict mapping each OpenClaw failure type to its plain-language definition. These are the six failure modes that Critiqor’s OpenClaw diagnosis engine detects:
Failure TypeDefinition
infinite_tool_loopRepeated tool calls or retries without progress.
memory_degradationStored or retrieved memory is lost, ignored, or fails recall.
ignoring_tool_outputsTool outputs are available but not incorporated into decisions.
context_pollutionContext growth, saturation, or compaction causes useful state loss.
cost_explosionToken or call waste grows without matching progress.
skill_failureRelevant OpenClaw skill is ignored, mis-selected, or fails invocation.