What is Critiqor?
What is Critiqor?
Critiqor is a runtime reliability intelligence platform for OpenClaw agents. Instead of asking an agent to grade its own output, Critiqor observes what actually happens at runtime: every tool call made, every tool output returned, memory events, provider requests, retries, token usage, and state transitions. This observed evidence is used to detect failure modes, build causal chains, and generate a structured diagnosis — without relying on the agent’s self-reported behaviour.
How does Critiqor observe runtime behaviour?
How does Critiqor observe runtime behaviour?
Critiqor ships a bundled OpenClaw plugin (
critiqor/clawhub/critiqor-openclaw/) that hooks into two OpenClaw collection layers:- Extension API (
api.on(...)): captures agent, turn, and session lifecycle events; provider request/response events; message events; user input; and bash commands. - Tool hooks: captures
tool_call,tool_result,tool_execution_start,tool_execution_update, andtool_execution_endevents, including tool name, call ID, duration, and error status.
session.json file as it occurs. When you run critiqor finalize, Critiqor reads this structured evidence and produces a diagnosis.json with failure analysis, a causal graph, and trust scores. The dashboard then renders diagnosis.json — it does not re-compute any findings.Why is runtime evaluation different from regular evaluation?
Why is runtime evaluation different from regular evaluation?
Regular evaluation typically asks the agent (or a judge model) to assess its own output after the fact. That approach cannot see what actually happened during execution.Runtime evaluation observes the execution directly: which tools were called, how many times, with what arguments, what outputs were returned, whether those outputs were used in the final answer, how many tokens were consumed, and whether any retries or errors occurred. An agent can produce a confident-sounding answer while having looped seven times on the same tool call, ignored the result, or hallucinated a conclusion not supported by any retrieved evidence.Critiqor’s core rule: captured execution data is stronger evidence than post-hoc explanations.
Does Critiqor collect prompts? How is data handled?
Does Critiqor collect prompts? How is data handled?
All data stays on your machine. Critiqor does not upload any evidence to a remote server.The OpenClaw plugin writes runtime events — including provider request/response payloads, which may contain prompt content — to a local
session.json file in your runs/ directory. The critiqor finalize command reads that file locally and writes diagnosis.json to the same directory. The local dashboard reads from both files. Nothing leaves your machine by default.Critiqor also does not read your code, scan filesystem contents, or intercept unrelated processes. It only processes events explicitly emitted by the connected OpenClaw runtime.Where are runs stored?
Where are runs stored?
Runs are stored in the
runs/ directory inside your current working directory by default. You can override this with the --runs-dir flag on any command.Each completed run uses this layout:session.json contains the complete structured event log, tool activity, memory events, runtime metadata, and aggregate metrics. diagnosis.json contains the failure analysis, causal graph, trust score, and cost analysis. Keeping them separate means the original evidence stays auditable while the diagnosis logic can improve without rerunning the agent.How do I revisit previous runs?
How do I revisit previous runs?
Use To open the dashboard for a specific historical run:You can also browse previous runs from within the dashboard UI using the Run History section.
critiqor runs to list all completed runs with summary information:How does the dashboard work?
How does the dashboard work?
The Critiqor dashboard is a local web application (
critiqor-core-engine) started by critiqor finalize (or critiqor dashboard) using bun or npm. It reads diagnosis.json files from your runs/ directory and serves them at http://127.0.0.1:<port>.No cloud account, no internet connection, and no remote upload are required. The dashboard only displays precomputed data from your local diagnosis.json — it does not re-compute trust scores, failure causes, causal graphs, or cost analysis.Dashboard sections include: Overview, Diagnosis, Cost, Evidence (full trace + tool outputs + causal graph), Why It Happened, Benchmarks, and Trust & Privacy.How do I publish a new release?
How do I publish a new release?
- Bump the version number in
pyproject.toml - Add an entry to
CHANGELOG.mddescribing the changes - Build the package:
- Upload to PyPI:
How do I contribute?
How do I contribute?
- Clone the repository:
- Install in editable mode:
- Run the tests to confirm everything works:
- Make your changes on a feature branch and open a pull request on GitHub.
Which frameworks are supported today vs planned?
Which frameworks are supported today vs planned?
Today: OpenClaw only. Critiqor provides a full bundled plugin, CLI integration (
critiqor monitor openclaw), and local dashboard support for OpenClaw agents.Planned: Claude Code, Hermes, and other frameworks including LangChain, CrewAI, and AutoGen are on the roadmap but not yet available.Note: the Python core already includes a CritiqorTracer that can auto-detect and attach to LangGraph, CrewAI, OpenAI Agents SDK, PydanticAI, AutoGen, and Mastra via their event hook interfaces. The CLI-based monitor/finalize/dashboard workflow is OpenClaw-specific today.See the Integrations Roadmap for more detail.