Skip to main content
The Critiqor dashboard is a local-only web interface that renders the contents of diagnosis.json for a given run. It starts automatically when you run critiqor finalize, and can be reopened at any time with critiqor dashboard <run_id>. Nothing is uploaded to a remote server — all data stays on your machine, and the dashboard reads the pre-computed artifacts Critiqor has already written to runs/<run_id>/.

Dashboard Sections

The dashboard is organized for progressive disclosure. New users see the simplest answer first, while engineers can drill into evidence and causal detail when needed.
SectionPurposeWhat You’ll See
OverviewExecutive-friendly summaryTrust score (0–100), readiness level, primary failure type, recommended action, and latest run status
DiagnosisRoot cause explanationPrimary failure mode, causal chain, severity, impact score, and recommended fix
CostOperational wasteTotal tokens, token waste, duplicate tool calls, redundancy score, and cost efficiency
EvidenceTechnical audit trailTool calls (name, arguments, call ID), tool outputs (result, error, duration), memory events, retries, errors, state transitions, and full raw trace
Why It HappenedCausal explanationPre-computed causal graph, step-by-step causal chain, and root cause walkthrough
BenchmarksReliability trendBenchmark score, difficulty tier, percentile, and reliability trend across runs
Trust & PrivacyTransparencyEvidence collection model, data access boundaries, visibility controls, and FAQ

Readiness Levels

Every run receives one of three readiness levels. The level is shown prominently in the Overview section and drives the recommended action.

ready_for_runtime

Trust score >=80, no high-severity or critical failures. The agent completed the session without triggering any high-impact failure detectors. Critiqor did not observe tool loops, memory degradation, ignored outputs, context pollution, cost explosion, or skill failures at a severity that warrants review. The agent is cleared for use in production or next-stage evaluation. Trust score 60–79, or any high-severity failure detected. The agent functioned but showed one or more patterns that reduce reliability under pressure. Common triggers include a small number of ignored tool outputs, memory recall failures, or context events approaching saturation. The agent should be reviewed and the detected failures addressed before deploying to production. The Diagnosis section shows specific guidance for each detected failure.

unsafe_for_production

Trust score less than 60, or any critical-severity failure detected. The agent showed serious reliability problems during execution. This includes scenarios like a tool call loop with >=5 repetitions, >=30,000 tokens consumed in a single session, or a combination of failures that together reduced the trust score below 60. Deployment is not recommended. Review the Diagnosis and Why It Happened sections for a detailed breakdown of what failed and why.

Data Source

The dashboard reads exclusively from pre-computed artifacts:
  • runs/<run_id>/diagnosis.json — trust score, readiness level, failure causes, causal graph, cost analysis, and failure analysis
  • runs/<run_id>/session.json — raw event stream used in the Evidence section
The dashboard does not compute trust scores, re-run detectors, or use an LLM. Everything you see was determined by the diagnosis engine at finalization time and written to disk. This means the dashboard is fast, fully offline, and reproducible — the same run will show the same results every time. The Benchmarks section lists completed runs in order and shows how each run’s score compares to prior results. Each entry shows the run_id, trust score, benchmark percentile, and difficulty tier. Use critiqor runs to list all completed evaluations from the CLI, then open any run with critiqor dashboard <run_id> to view its full diagnosis. For a step-by-step guide to using the dashboard after a session, see the Dashboard Guide.