Wrong Output evaluation with default questions - LLMs Evaluation with Azure AI Studio

Question

Wrong Output evaluation with default questions - LLMs Evaluation with Azure AI Studio

ferrand 55

Dear Microsoft community, I am following step by step this official tutorial: https://learn.microsoft.com/en-us/azure/ai-studio/tutorials/deploy-copilot-ai-studio#customize-prompt-flow-with-multiple-data-sources When I run the evaluation I get a result with the questions and answers I prepared on the test dataset. This is right:
User's image

However, the result with evaluation metrics is wrong. The questions of the test dataset are not considered. Instead I get always the same default question and answer pair: User's image

How can I solve this problem? I used the custom evaluation instead of the built-in evaluation. When I use the built-in evaluation I get this other issue that I posted here https://learn.microsoft.com/en-us/answers/questions/1598940/error-flow-runtime-not-found-llms-built-in-evaluat.

Thanks in advance for your kind support.

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-02-28T11:21:22.9+00:00

@ferrand I just ran through the document to setup my prompt flow and ran built in evaluation, which seems to have run as expected except for some errors in metrics since some of them timed out with token limit.

In your case I think you used custom evaluation. I do not have a customized evaluation to select in this case to check for similar behavior, I think you can check the data tab under components of your project to check if your test set of questions/answers(truth) are correctly uploaded since you are seeing only one question for all indexes as per screen shot.

Also, you can check the logs tab of the evaluation details to check if there is any timeout in processing all the metrics. This tab helped me to understand failures in some of the metrics in my case. I hope this helps!!

1 answer

Your answer

romungi-MSFT 48,906 Reputation points Microsoft Employee Moderator

2024-02-28T11:21:22.9+00:00

@ferrand I just ran through the document to setup my prompt flow and ran built in evaluation, which seems to have run as expected except for some errors in metrics since some of them timed out with token limit.

In your case I think you used custom evaluation. I do not have a customized evaluation to select in this case to check for similar behavior, I think you can check the data tab under components of your project to check if your test set of questions/answers(truth) are correctly uploaded since you are seeing only one question for all indexes as per screen shot.

Also, you can check the logs tab of the evaluation details to check if there is any timeout in processing all the metrics. This tab helped me to understand failures in some of the metrics in my case. I hope this helps!!

Answer 1

ferrand 55

@romungi-MSFT I changed the model to GPT4-32k and used a jsonl format instead of csv. I made it. Thanks a lot for your support.

Share via

Wrong Output evaluation with default questions - LLMs Evaluation with Azure AI Studio

1 answer

Your answer