Événement
Créer des applications et des agents IA
17 mars, 21 h - 21 mars, 10 h
Rejoignez la série de rencontres pour créer des solutions IA évolutives basées sur des cas d’usage réels avec d’autres développeurs et experts.
S’inscrire maintenantCe navigateur n’est plus pris en charge.
Effectuez une mise à niveau vers Microsoft Edge pour tirer parti des dernières fonctionnalités, des mises à jour de sécurité et du support technique.
The Microsoft.Extensions.AI.Evaluation libraries (currently in preview) simplify the process of evaluating the quality and accuracy of responses generated by AI models in .NET intelligent apps. Various metrics measure aspects like relevance, truthfulness, coherence, and completeness of the responses. Evaluations are crucial in testing, because they help ensure that the AI model performs as expected and provides reliable and accurate results.
The evaluation libraries, which are built on top of the Microsoft.Extensions.AI abstractions, are composed of the following NuGet packages:
The libraries are designed to integrate smoothly with existing .NET apps, allowing you to leverage existing testing infrastructures and familiar syntax to evaluate intelligent apps. You can use any test framework (for example, MSTest, xUnit, or NUnit) and testing workflow (for example, Test Explorer, dotnet test, or a CI/CD pipeline). The library also provides easy ways to do online evaluations of your application by publishing evaluation scores to telemetry and monitoring dashboards.
The evaluation libraries were built in collaboration with data science researchers from Microsoft and GitHub, and were tested on popular Microsoft Copilot experiences. The following table shows the built-in evaluators.
Metric | Description | Evaluator type |
---|---|---|
Relevance, truth, and completeness | How effectively a response addresses a query | RelevanceTruthAndCompletenessEvaluator |
Fluency | Grammatical accuracy, vocabulary range, sentence complexity, and overall readability | FluencyEvaluator |
Coherence | The logical and orderly presentation of ideas | CoherenceEvaluator |
Equivalence | The similarity between the generated text and its ground truth with respect to a query | EquivalenceEvaluator |
Groundedness | How well a generated response aligns with the given context | GroundednessEvaluator |
You can also customize to add your own evaluations by implementing the IEvaluator interface or extending the base classes such as ChatConversationEvaluator and SingleNumericMetricEvaluator.
The library uses response caching functionality, which means responses from the AI model are persisted in a cache. In subsequent runs, if the request parameters (prompt and model) are unchanged, responses are then served from the cache to enable faster execution and lower cost.
The library contains support for storing evaluation results and generating reports. The following image shows an example report in an Azure DevOps pipeline:
The dotnet aieval
tool, which ships as part of the Microsoft.Extensions.AI.Evaluation.Console
package, also includes functionality for generating reports and managing the stored evaluation data and cached responses.
The libraries are designed to be flexible. You can pick the components that you need. For example, you can disable response caching or tailor reporting to work best in your environment. You can also customize and configure your evaluations, for example, by adding customized metrics and reporting options.
For a more comprehensive tour of the functionality and APIs available in the Microsoft.Extensions.AI.Evaluation libraries, see the API usage examples (dotnet/ai-samples repo). These examples are structured as a collection of unit tests. Each unit test showcases a specific concept or API and builds on the concepts and APIs showcased in previous unit tests.
Commentaires sur .NET
.NET est un projet open source. Sélectionnez un lien pour fournir des commentaires :
Événement
Créer des applications et des agents IA
17 mars, 21 h - 21 mars, 10 h
Rejoignez la série de rencontres pour créer des solutions IA évolutives basées sur des cas d’usage réels avec d’autres développeurs et experts.
S’inscrire maintenantFormation
Module
Exécuter des évaluations et des jeux de données synthétiques génératifs - Training
Découvrez comment exécuter des évaluations et générer des jeux de données synthétiques avec le SDK Azure AI Évaluation.
Certification
Microsoft Certified : Azure AI Engineer Associate - Certifications
Concevez et mettez en œuvre une solution Azure AI à l’aide d’Azure AI Services, de Recherche Azure AI et d’Azure Open AI.