OptimizationTaskResult interface

Per-task evaluation result for a single candidate.

Properties

composite_score

Composite score combining all evaluator scores.

duration_seconds

Wall-clock seconds for this task's agent execution.

error_message

Error message if the task failed during execution.

passed

Whether the task met the pass threshold.

query

The user query / input for the task.

rationales

Per-evaluator reasoning keyed by evaluator name.

response

Raw agent response text.

run_id

Identifier of the agent run that produced this result.

scores

Per-evaluator scores keyed by evaluator name.

task_name

Task name (from the dataset).

tokens

Total tokens consumed during the agent run for this task.

Property Details

composite_score

Composite score combining all evaluator scores.

composite_score: number

Property Value

number

duration_seconds

Wall-clock seconds for this task's agent execution.

duration_seconds: number

Property Value

number

error_message

Error message if the task failed during execution.

error_message?: string

Property Value

string

passed

Whether the task met the pass threshold.

passed: boolean

Property Value

boolean

query

The user query / input for the task.

query?: string

Property Value

string

rationales

Per-evaluator reasoning keyed by evaluator name.

rationales?: Record<string, string>

Property Value

Record<string, string>

response

Raw agent response text.

response?: string

Property Value

string

run_id

Identifier of the agent run that produced this result.

run_id?: string

Property Value

string

scores

Per-evaluator scores keyed by evaluator name.

scores: Record<string, number>

Property Value

Record<string, number>

task_name

Task name (from the dataset).

task_name: string

Property Value

string

tokens

Total tokens consumed during the agent run for this task.

tokens: number

Property Value

number