OptimizationTaskResult interface

Per-task evaluation result for a single candidate.

Properties

composite_score	Composite score combining all evaluator scores.
duration_seconds	Wall-clock seconds for this task's agent execution.
error_message	Error message if the task failed during execution.
passed	Whether the task met the pass threshold.
query	The user query / input for the task.
rationales	Per-evaluator reasoning keyed by evaluator name.
response	Raw agent response text.
run_id	Identifier of the agent run that produced this result.
scores	Per-evaluator scores keyed by evaluator name.
task_name	Task name (from the dataset).
tokens	Total tokens consumed during the agent run for this task.

composite_score

Composite score combining all evaluator scores.

composite_score: number

Property Value

number

duration_seconds

Wall-clock seconds for this task's agent execution.

duration_seconds: number

Property Value

number

error_message

Error message if the task failed during execution.

error_message?: string

Property Value

string

passed

Whether the task met the pass threshold.

passed: boolean

Property Value

boolean

query

The user query / input for the task.

query?: string

Property Value

string

rationales

Per-evaluator reasoning keyed by evaluator name.

rationales?: Record<string, string>

Property Value

Record<string, string>

response

Raw agent response text.

response?: string

Property Value

string

run_id

Identifier of the agent run that produced this result.

run_id?: string

Property Value

string

scores

Per-evaluator scores keyed by evaluator name.

scores: Record<string, number>

Property Value

Record<string, number>

task_name

Task name (from the dataset).

task_name: string

Property Value

string

tokens

Total tokens consumed during the agent run for this task.

tokens: number

Property Value

number

OptimizationTaskResult interface

Properties

Property Details

composite_score

Property Value

duration_seconds

Property Value

error_message

Property Value

passed

Property Value

query

Property Value

rationales

Property Value

response

Property Value

run_id

Property Value

scores

Property Value

task_name

Property Value

tokens

Property Value