Share via


QAEvaluator Class

Initialize a question-answer evaluator configured for a specific Azure OpenAI model.

Note

To align with our support of a diverse set of models, keys without the gpt_ prefix has been added.

To maintain backwards compatibility, the old keys with the gpt_ prefix are still be present in the output;

however, it is recommended to use the new keys moving forward as the old keys will be deprecated in the future.

Constructor

QAEvaluator(model_config, *, groundedness_threshold: int = 3, relevance_threshold: int = 3, coherence_threshold: int = 3, fluency_threshold: int = 3, similarity_threshold: int = 3, f1_score_threshold: float = 0.5, **kwargs)

Parameters

Name Description
model_config
Required

Configuration for the Azure OpenAI model.

groundedness_threshold
Required
int

The threshold for groundedness evaluation. Default is 3.

relevance_threshold
Required
int

The threshold for relevance evaluation. Default is 3.

coherence_threshold
Required
int

The threshold for coherence evaluation. Default is 3.

fluency_threshold
Required
int

The threshold for fluency evaluation. Default is 3.

similarity_threshold
Required
int

The threshold for similarity evaluation. Default is 3.

f1_score_threshold
Required

The threshold for F1 score evaluation. Default is 0.5.

kwargs
Required
Any

Additional arguments to pass to the evaluator.

Keyword-Only Parameters

Name Description
groundedness_threshold
Default value: 3
relevance_threshold
Default value: 3
coherence_threshold
Default value: 3
fluency_threshold
Default value: 3
similarity_threshold
Default value: 3
f1_score_threshold
Default value: 0.5

Examples

Initialize with threshold and call a QAEvaluator.


   import os
   from azure.ai.evaluation import QAEvaluator

   model_config = {
       "azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
       "api_key": os.environ.get("AZURE_OPENAI_KEY"),
       "azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
   }

   qa_eval = QAEvaluator(
       model_config=model_config, 
       groundedness_threshold=2,
       relevance_threshold=2,
       coherence_threshold=2,
       fluency_threshold=2,
       similarity_threshold=2,
       f1_score_threshold=0.5
   )
   qa_eval(query="This's the color?", response="Black", ground_truth="gray", context="gray")

Attributes

id

Evaluator identifier, experimental and to be used only with evaluation in cloud.

id = 'qa'