QAEvaluator Class
Initialize a question-answer evaluator configured for a specific Azure OpenAI model.
Note
To align with our support of a diverse set of models, keys without the gpt_ prefix has been added.
To maintain backwards compatibility, the old keys with the gpt_ prefix are still be present in the output;
however, it is recommended to use the new keys moving forward as the old keys will be deprecated in the future.
Constructor
QAEvaluator(model_config, *, groundedness_threshold: int = 3, relevance_threshold: int = 3, coherence_threshold: int = 3, fluency_threshold: int = 3, similarity_threshold: int = 3, f1_score_threshold: float = 0.5, **kwargs)
Parameters
Name | Description |
---|---|
model_config
Required
|
Configuration for the Azure OpenAI model. |
groundedness_threshold
Required
|
The threshold for groundedness evaluation. Default is 3. |
relevance_threshold
Required
|
The threshold for relevance evaluation. Default is 3. |
coherence_threshold
Required
|
The threshold for coherence evaluation. Default is 3. |
fluency_threshold
Required
|
The threshold for fluency evaluation. Default is 3. |
similarity_threshold
Required
|
The threshold for similarity evaluation. Default is 3. |
f1_score_threshold
Required
|
The threshold for F1 score evaluation. Default is 0.5. |
kwargs
Required
|
Additional arguments to pass to the evaluator. |
Keyword-Only Parameters
Name | Description |
---|---|
groundedness_threshold
|
Default value: 3
|
relevance_threshold
|
Default value: 3
|
coherence_threshold
|
Default value: 3
|
fluency_threshold
|
Default value: 3
|
similarity_threshold
|
Default value: 3
|
f1_score_threshold
|
Default value: 0.5
|
Examples
Initialize with threshold and call a QAEvaluator.
import os
from azure.ai.evaluation import QAEvaluator
model_config = {
"azure_endpoint": os.environ.get("AZURE_OPENAI_ENDPOINT"),
"api_key": os.environ.get("AZURE_OPENAI_KEY"),
"azure_deployment": os.environ.get("AZURE_OPENAI_DEPLOYMENT"),
}
qa_eval = QAEvaluator(
model_config=model_config,
groundedness_threshold=2,
relevance_threshold=2,
coherence_threshold=2,
fluency_threshold=2,
similarity_threshold=2,
f1_score_threshold=0.5
)
qa_eval(query="This's the color?", response="Black", ground_truth="gray", context="gray")
Attributes
id
Evaluator identifier, experimental and to be used only with evaluation in cloud.
id = 'qa'