EvidenceRecorder is the SDK-level instrumentation primitive for agents that do not run on OpenClaw or a supported framework adapter. Wrapping an execution block inside a monitor() context upgrades the evidence quality from response_only to fully_instrumented, which raises evaluation_confidence and unlocks more accurate failure cause detection in the resulting CritiqorResult.
Use EvidenceRecorder when you are calling a custom agent, a bare LLM client, or any tool-using pipeline where automatic framework detection does not apply.
Import
monitor()
EvidenceRecorder context manager scoped to a single agent execution. The prompt parameter is optional at construction time and can be supplied later when calling finish().
| Parameter | Type | Description |
|---|---|---|
prompt | str | The user prompt for this execution. Optional here; required by finish(). |
EvidenceRecorder — a context manager that begins capturing on __enter__ and closes on __exit__.
Usage
with block automatically records agent_start and agent_finish trace events, captures any unhandled exceptions as error events, and resets the recorder context variable when the block exits. You can also call methods on the recorder directly inside any synchronous code without the with block — just call finish() manually when done.
Methods
record_tool_call()
ToolCall to the recorder’s internal list and emits a tool_start trace event.
| Parameter | Type | Description |
|---|---|---|
tool | str | Name of the tool being called. |
args | dict | None | Arguments passed to the tool. Defaults to an empty dict if not provided. |
call_id | str | None | Optional identifier used to correlate this call with its output. |
record_tool_output()
ToolOutput and emits a tool_end trace event. If error is provided, it is also appended to the recorder’s error list.
| Parameter | Type | Description |
|---|---|---|
tool | str | Name of the tool that produced the output. |
output | Any | The tool’s return value. |
call_id | str | None | Correlates with a prior record_tool_call() call. |
error | str | None | Error message string if the tool call failed. |
record_llm_call()
token_usage dict and propagated to RuntimeMetrics when finish() is called.
| Parameter | Type | Description |
|---|---|---|
model | str | None | Model identifier (e.g. "gpt-4o", "llama3.2"). |
token_usage | dict | None | Token usage dict, e.g. {"prompt_tokens": 120, "completion_tokens": 80, "total_tokens": 200}. |
record_event()
| Parameter | Type | Description |
|---|---|---|
name | str | Event name (e.g. "state_transition", "decision_made"). |
**payload | Any | Arbitrary keyword arguments included in the trace event dict. |
wrap_tool()
record_tool_call() and record_tool_output() for every invocation. If the underlying function raises an exception, the error is recorded and the exception is re-raised.
| Parameter | Type | Description |
|---|---|---|
name | str | The tool name used in recorded evidence. |
func | callable | The tool function to wrap. |
func.
finish()
EvaluationEvidence object. Wall-clock latency is measured from the time the context manager was entered. The returned evidence always has evidence_level="fully_instrumented".
| Parameter | Type | Description |
|---|---|---|
response | str | The agent’s final response string. |
prompt | str | None | Overrides the prompt set at construction time. |
EvaluationEvidence
EvaluationEvidence Fields
EvaluationEvidence is a frozen dataclass returned by finish() and also accessible as CritiqorResult.evidence. It holds the complete normalized evidence snapshot used during evaluation.
| Field | Type | Description |
|---|---|---|
prompt | str | The input prompt for the evaluated run. |
response | str | The agent’s response. |
tool_calls | list[ToolCall] | All captured tool calls in order. |
tool_outputs | list[ToolOutput] | All captured tool outputs in order. |
trace | list[dict] | Full event trace, including agent_start, tool_start, tool_end, LLM calls, and agent_finish events. |
metrics | RuntimeMetrics | Wall-clock latency, token usage, retry count, and error strings. |
evidence_level | EvidenceLevel | "response_only", "trace_available", or "fully_instrumented". Inferred automatically if not overridden. |