Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowThis browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
An embedding is a special format of data representation that machine learning models and algorithms can easily use. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar. Embeddings power vector similarity search in retrieval systems such as Azure AI Search (recommended) and in Azure databases such as Azure Cosmos DB for MongoDB vCore , Azure SQL Database, and Azure Database for PostgreSQL - Flexible Server.
Embeddings make it easier to do machine learning on large inputs representing words by capturing the semantic similarities in a vector space. Therefore, you can use embeddings to determine if two text chunks are semantically related or similar, and provide a score to assess similarity.
Azure OpenAI embeddings often rely on cosine similarity to compute similarity between documents and a query.
From a mathematic perspective, cosine similarity measures the cosine of the angle between two vectors projected in a multidimensional space. This measurement is beneficial, because if two documents are far apart by Euclidean distance because of size, they could still have a smaller angle between them and therefore higher cosine similarity. For more information about cosine similarity equations, see Cosine similarity.
An alternative method of identifying similar documents is to count the number of common words between documents. This approach doesn't scale since an expansion in document size is likely to lead to a greater number of common words detected even among disparate topics. For this reason, cosine similarity can offer a more effective alternative.
Events
Mar 17, 11 PM - Mar 21, 11 PM
Join the meetup series to build scalable AI solutions based on real-world use cases with fellow developers and experts.
Register nowTraining
Module
Perform vector search and retrieval in Azure AI Search - Training
Perform vector search and retrieval in Azure AI Search.
Certification
Microsoft Certified: Azure AI Engineer Associate - Certifications
Design and implement an Azure AI solution using Azure AI services, Azure AI Search, and Azure Open AI.
Documentation
How to generate embeddings with Azure OpenAI Service - Azure OpenAI
Learn how to generate embeddings with Azure OpenAI
Azure OpenAI Service embeddings tutorial - Azure OpenAI
Learn how to use Azure OpenAI's embeddings API for document search with the BillSum dataset
Learn more about the concepts behind customizing an LLM with Azure OpenAI.