Critiqor FAQ: Common Questions About Runtime Reliability

What is Critiqor?

Critiqor is a runtime reliability intelligence platform for OpenClaw agents. Instead of asking an agent to grade its own output, Critiqor observes what actually happens at runtime: every tool call made, every tool output returned, memory events, provider requests, retries, token usage, and state transitions. This observed evidence is used to detect failure modes, build causal chains, and generate a structured diagnosis — without relying on the agent’s self-reported behaviour.

How does Critiqor observe runtime behaviour?

Critiqor ships a bundled OpenClaw plugin (critiqor/clawhub/critiqor-openclaw/) that hooks into two OpenClaw collection layers:

Extension API (api.on(...)): captures agent, turn, and session lifecycle events; provider request/response events; message events; user input; and bash commands.
Tool hooks: captures tool_call, tool_result, tool_execution_start, tool_execution_update, and tool_execution_end events, including tool name, call ID, duration, and error status.

The plugin writes each normalised event to a local session.json file as it occurs. When you run critiqor finalize, Critiqor reads this structured evidence and produces a diagnosis.json with failure analysis, a causal graph, and trust scores. The dashboard then renders diagnosis.json — it does not re-compute any findings.

Why is runtime evaluation different from regular evaluation?

Regular evaluation typically asks the agent (or a judge model) to assess its own output after the fact. That approach cannot see what actually happened during execution.Runtime evaluation observes the execution directly: which tools were called, how many times, with what arguments, what outputs were returned, whether those outputs were used in the final answer, how many tokens were consumed, and whether any retries or errors occurred. An agent can produce a confident-sounding answer while having looped seven times on the same tool call, ignored the result, or hallucinated a conclusion not supported by any retrieved evidence.Critiqor’s core rule: captured execution data is stronger evidence than post-hoc explanations.

Does Critiqor collect prompts? How is data handled?

All data stays on your machine. Critiqor does not upload any evidence to a remote server.The OpenClaw plugin writes runtime events — including provider request/response payloads, which may contain prompt content — to a local session.json file in your runs/ directory. The critiqor finalize command reads that file locally and writes diagnosis.json to the same directory. The local dashboard reads from both files. Nothing leaves your machine by default.Critiqor also does not read your code, scan filesystem contents, or intercept unrelated processes. It only processes events explicitly emitted by the connected OpenClaw runtime.

Where are runs stored?

Runs are stored in the runs/ directory inside your current working directory by default. You can override this with the --runs-dir flag on any command.Each completed run uses this layout:

runs/
└── <run_id>/
    ├── session.json    ← raw runtime evidence (written by the plugin)
    └── diagnosis.json  ← derived findings (written by critiqor finalize)

session.json contains the complete structured event log, tool activity, memory events, runtime metadata, and aggregate metrics. diagnosis.json contains the failure analysis, causal graph, trust score, and cost analysis. Keeping them separate means the original evidence stays auditable while the diagnosis logic can improve without rerunning the agent.

How do I revisit previous runs?

Use critiqor runs to list all completed runs with summary information:

critiqor runs

To open the dashboard for a specific historical run:

critiqor dashboard <run_id>

You can also browse previous runs from within the dashboard UI using the Run History section.

How does the dashboard work?

The Critiqor dashboard is a local web application (critiqor-core-engine) started by critiqor finalize (or critiqor dashboard) using bun or npm. It reads diagnosis.json files from your runs/ directory and serves them at http://127.0.0.1:<port>.No cloud account, no internet connection, and no remote upload are required. The dashboard only displays precomputed data from your local diagnosis.json — it does not re-compute trust scores, failure causes, causal graphs, or cost analysis.Dashboard sections include: Overview, Diagnosis, Cost, Evidence (full trace + tool outputs + causal graph), Why It Happened, Benchmarks, and Trust & Privacy.

How do I publish a new release?

Bump the version number in pyproject.toml
Add an entry to CHANGELOG.md describing the changes
Build the package:
```
python -m build
```
Upload to PyPI:
```
pip install twine
twine upload dist/*
```

See the Contributing guide for the full development workflow.

How do I contribute?

Clone the repository:

git clone https://github.com/web3curtis/Critiqor.git
cd Critiqor

Install in editable mode:
```
pip install -e .
```
Run the tests to confirm everything works:
```
python -m pytest tests/
```
Make your changes on a feature branch and open a pull request on GitHub.

See the Contributing guide for full details on the workflow, CI matrix, commit conventions, and contribution guidelines.

Which frameworks are supported today vs planned?

Today: OpenClaw only. Critiqor provides a full bundled plugin, CLI integration (critiqor monitor openclaw), and local dashboard support for OpenClaw agents.Planned: Claude Code, Hermes, and other frameworks including LangChain, CrewAI, and AutoGen are on the roadmap but not yet available.Note: the Python core already includes a CritiqorTracer that can auto-detect and attach to LangGraph, CrewAI, OpenAI Agents SDK, PydanticAI, AutoGen, and Mastra via their event hook interfaces. The CLI-based monitor/finalize/dashboard workflow is OpenClaw-specific today.See the Integrations Roadmap for more detail.