Skip to main content
EvidenceRecorder is the SDK-level instrumentation primitive for agents that do not run on OpenClaw or a supported framework adapter. Wrapping an execution block inside a monitor() context upgrades the evidence quality from response_only to fully_instrumented, which raises evaluation_confidence and unlocks more accurate failure cause detection in the resulting CritiqorResult. Use EvidenceRecorder when you are calling a custom agent, a bare LLM client, or any tool-using pipeline where automatic framework detection does not apply.

Import

from critiqor import monitor, EvidenceRecorder

monitor()

monitor(prompt="") → EvidenceRecorder
Module-level factory function that creates an EvidenceRecorder context manager scoped to a single agent execution. The prompt parameter is optional at construction time and can be supplied later when calling finish().
ParameterTypeDescription
promptstrThe user prompt for this execution. Optional here; required by finish().
Returns: EvidenceRecorder — a context manager that begins capturing on __enter__ and closes on __exit__.

Usage

from critiqor import Critiqor, monitor

agent = Critiqor(your_agent)

with monitor("What is 2 + 2?") as recorder:
    recorder.record_tool_call("calculator", {"expression": "2 + 2"})
    result = your_agent.run("What is 2 + 2?")
    recorder.record_tool_output("calculator", "4")
    evidence = recorder.finish(result, "What is 2 + 2?")

critiqor_result = agent.evaluate(
    prompt="What is 2 + 2?",
    response=result,
    tool_calls=evidence.tool_calls,
    tool_outputs=evidence.tool_outputs,
    evidence_level="trace_available",
)
The with block automatically records agent_start and agent_finish trace events, captures any unhandled exceptions as error events, and resets the recorder context variable when the block exits. You can also call methods on the recorder directly inside any synchronous code without the with block — just call finish() manually when done.

Methods

record_tool_call()

recorder.record_tool_call(tool, args=None, call_id=None)
Records a tool invocation. Appends a ToolCall to the recorder’s internal list and emits a tool_start trace event.
ParameterTypeDescription
toolstrName of the tool being called.
argsdict | NoneArguments passed to the tool. Defaults to an empty dict if not provided.
call_idstr | NoneOptional identifier used to correlate this call with its output.

record_tool_output()

recorder.record_tool_output(tool, output, call_id=None, error=None)
Records a tool result. Appends a ToolOutput and emits a tool_end trace event. If error is provided, it is also appended to the recorder’s error list.
ParameterTypeDescription
toolstrName of the tool that produced the output.
outputAnyThe tool’s return value.
call_idstr | NoneCorrelates with a prior record_tool_call() call.
errorstr | NoneError message string if the tool call failed.

record_llm_call()

recorder.record_llm_call(model=None, token_usage=None)
Records an LLM invocation for token counting and cost analysis. Token usage data is merged into the recorder’s token_usage dict and propagated to RuntimeMetrics when finish() is called.
ParameterTypeDescription
modelstr | NoneModel identifier (e.g. "gpt-4o", "llama3.2").
token_usagedict | NoneToken usage dict, e.g. {"prompt_tokens": 120, "completion_tokens": 80, "total_tokens": 200}.

record_event()

recorder.record_event(name, **payload)
Records a generic named event with an arbitrary keyword-argument payload. The event is timestamped automatically and appended to the trace. Use this for framework-specific events that don’t fit the tool call or LLM call shapes.
ParameterTypeDescription
namestrEvent name (e.g. "state_transition", "decision_made").
**payloadAnyArbitrary keyword arguments included in the trace event dict.

wrap_tool()

recorder.wrap_tool(name, func) → callable
Returns an instrumented wrapper around a callable tool that automatically records record_tool_call() and record_tool_output() for every invocation. If the underlying function raises an exception, the error is recorded and the exception is re-raised.
ParameterTypeDescription
namestrThe tool name used in recorded evidence.
funccallableThe tool function to wrap.
Returns: A new callable with the same signature as func.
calculator = recorder.wrap_tool("calculator", raw_calculator_fn)
result = calculator("2 + 2")  # automatically recorded

finish()

recorder.finish(response="", prompt=None) → EvaluationEvidence
Closes the recorder and assembles the collected tool calls, outputs, trace events, and runtime metrics into an EvaluationEvidence object. Wall-clock latency is measured from the time the context manager was entered. The returned evidence always has evidence_level="fully_instrumented".
ParameterTypeDescription
responsestrThe agent’s final response string.
promptstr | NoneOverrides the prompt set at construction time.
Returns: EvaluationEvidence

EvaluationEvidence Fields

EvaluationEvidence is a frozen dataclass returned by finish() and also accessible as CritiqorResult.evidence. It holds the complete normalized evidence snapshot used during evaluation.
FieldTypeDescription
promptstrThe input prompt for the evaluated run.
responsestrThe agent’s response.
tool_callslist[ToolCall]All captured tool calls in order.
tool_outputslist[ToolOutput]All captured tool outputs in order.
tracelist[dict]Full event trace, including agent_start, tool_start, tool_end, LLM calls, and agent_finish events.
metricsRuntimeMetricsWall-clock latency, token usage, retry count, and error strings.
evidence_levelEvidenceLevel"response_only", "trace_available", or "fully_instrumented". Inferred automatically if not overridden.