diagnosis.json for a given run. It starts automatically when you run critiqor finalize, and can be reopened at any time with critiqor dashboard <run_id>. Nothing is uploaded to a remote server — all data stays on your machine, and the dashboard reads the pre-computed artifacts Critiqor has already written to runs/<run_id>/.
Dashboard Sections
The dashboard is organized for progressive disclosure. New users see the simplest answer first, while engineers can drill into evidence and causal detail when needed.| Section | Purpose | What You’ll See |
|---|---|---|
| Overview | Executive-friendly summary | Trust score (0–100), readiness level, primary failure type, recommended action, and latest run status |
| Diagnosis | Root cause explanation | Primary failure mode, causal chain, severity, impact score, and recommended fix |
| Cost | Operational waste | Total tokens, token waste, duplicate tool calls, redundancy score, and cost efficiency |
| Evidence | Technical audit trail | Tool calls (name, arguments, call ID), tool outputs (result, error, duration), memory events, retries, errors, state transitions, and full raw trace |
| Why It Happened | Causal explanation | Pre-computed causal graph, step-by-step causal chain, and root cause walkthrough |
| Benchmarks | Reliability trend | Benchmark score, difficulty tier, percentile, and reliability trend across runs |
| Trust & Privacy | Transparency | Evidence collection model, data access boundaries, visibility controls, and FAQ |
Readiness Levels
Every run receives one of three readiness levels. The level is shown prominently in the Overview section and drives the recommended action.ready_for_runtime
Trust score >=80, no high-severity or critical failures.
The agent completed the session without triggering any high-impact failure detectors. Critiqor did not observe tool loops, memory degradation, ignored outputs, context pollution, cost explosion, or skill failures at a severity that warrants review. The agent is cleared for use in production or next-stage evaluation.
review_recommended
Trust score 60–79, or any high-severity failure detected.
The agent functioned but showed one or more patterns that reduce reliability under pressure. Common triggers include a small number of ignored tool outputs, memory recall failures, or context events approaching saturation. The agent should be reviewed and the detected failures addressed before deploying to production. The Diagnosis section shows specific guidance for each detected failure.
unsafe_for_production
Trust score less than 60, or any critical-severity failure detected.
The agent showed serious reliability problems during execution. This includes scenarios like a tool call loop with >=5 repetitions, >=30,000 tokens consumed in a single session, or a combination of failures that together reduced the trust score below 60. Deployment is not recommended. Review the Diagnosis and Why It Happened sections for a detailed breakdown of what failed and why.
Data Source
The dashboard reads exclusively from pre-computed artifacts:runs/<run_id>/diagnosis.json— trust score, readiness level, failure causes, causal graph, cost analysis, and failure analysisruns/<run_id>/session.json— raw event stream used in the Evidence section
Navigating Multiple Runs
The Benchmarks section lists completed runs in order and shows how each run’s score compares to prior results. Each entry shows therun_id, trust score, benchmark percentile, and difficulty tier. Use critiqor runs to list all completed evaluations from the CLI, then open any run with critiqor dashboard <run_id> to view its full diagnosis.
For a step-by-step guide to using the dashboard after a session, see the Dashboard Guide.