OptimizationTaskResult interface
Per-task evaluation result for a single candidate.
Properties
| composite_score | Composite score combining all evaluator scores. |
| duration_seconds | Wall-clock seconds for this task's agent execution. |
| error_message | Error message if the task failed during execution. |
| passed | Whether the task met the pass threshold. |
| query | The user query / input for the task. |
| rationales | Per-evaluator reasoning keyed by evaluator name. |
| response | Raw agent response text. |
| run_id | Identifier of the agent run that produced this result. |
| scores | Per-evaluator scores keyed by evaluator name. |
| task_name | Task name (from the dataset). |
| tokens | Total tokens consumed during the agent run for this task. |
Property Details
composite_score
Composite score combining all evaluator scores.
composite_score: number
Property Value
number
duration_seconds
Wall-clock seconds for this task's agent execution.
duration_seconds: number
Property Value
number
error_message
Error message if the task failed during execution.
error_message?: string
Property Value
string
passed
Whether the task met the pass threshold.
passed: boolean
Property Value
boolean
query
The user query / input for the task.
query?: string
Property Value
string
rationales
Per-evaluator reasoning keyed by evaluator name.
rationales?: Record<string, string>
Property Value
Record<string, string>
response
Raw agent response text.
response?: string
Property Value
string
run_id
Identifier of the agent run that produced this result.
run_id?: string
Property Value
string
scores
Per-evaluator scores keyed by evaluator name.
scores: Record<string, number>
Property Value
Record<string, number>
task_name
Task name (from the dataset).
task_name: string
Property Value
string
tokens
Total tokens consumed during the agent run for this task.
tokens: number
Property Value
number