Azure AI services in Azure Synapse Analytics
Using pretrained models from Azure AI services, you can enrich your data with artificial intelligence (AI) in Azure Synapse Analytics.
Azure AI services help developers and organizations rapidly create intelligent, cutting-edge, market-ready, and responsible applications with out-of-the-box and pre-built and customizable APIs and models.
There are a few ways that you can use a subset of Azure AI services with your data in Synapse Analytics:
The "Azure AI services" wizard in Synapse Analytics generates PySpark code in a Synapse notebook that connects to a with Azure AI services using data in a Spark table. Then, using pretrained machine learning models, the service does the work for you to add AI to your data. Check out Sentiment analysis wizard and Anomaly detection wizard for more details.
Synapse Machine Learning (SynapseML) allows you to build powerful and highly scalable predictive and analytical models from various Spark data sources. Synapse Spark provide built-in SynapseML libraries including synapse.ml.cognitive.
Starting from the PySpark code generated by the wizard, or the example SynapseML code provided in the tutorial, you can write your own code to use other Azure AI services with your data. See What are Azure AI services? for more information about available services.
Get started
The tutorial, Pre-requisites for using Azure AI services in Azure Synapse, walks you through a couple steps you need to perform before using Azure AI services in Synapse Analytics.
Usage
Vision
- Describe: provides description of an image in human readable language (Scala, Python)
- Analyze (color, image type, face, adult/racy content): analyzes visual features of an image (Scala, Python)
- OCR: reads text from an image (Scala, Python)
- Recognize Text: reads text from an image (Scala, Python)
- Thumbnail: generates a thumbnail of user-specified size from the image (Scala, Python)
- Recognize domain-specific content: recognizes domain-specific content (celebrity, landmark) (Scala, Python)
- Tag: identifies list of words that are relevant to the input image (Scala, Python)
- Detect: detects human faces in an image (Scala, Python)
- Verify: verifies whether two faces belong to a same person, or a face belongs to a person (Scala, Python)
- Identify: finds the closest matches of the specific query person face from a person group (Scala, Python)
- Find similar: finds similar faces to the query face in a face list (Scala, Python)
- Group: divides a group of faces into disjoint groups based on similarity (Scala, Python)
Speech
- Speech-to-text: transcribes audio streams (Scala, Python)
- Conversation Transcription: transcribes audio streams into live transcripts with identified speakers. (Scala, Python)
- Text to Speech: Converts text to realistic audio (Scala, Python)
Language
- Language detection: detects language of the input text (Scala, Python)
- Key phrase extraction: identifies the key talking points in the input text (Scala, Python)
- Named entity recognition: identifies known entities and general named entities in the input text (Scala, Python)
- Sentiment analysis: returns a score between 0 and 1 indicating the sentiment in the input text (Scala, Python)
- Healthcare Entity Extraction: Extracts medical entities and relationships from text. (Scala, Python)
Translation
- Translate: Translates text. (Scala, Python)
- Transliterate: Converts text in one language from one script to another script. (Scala, Python)
- Detect: Identifies the language of a piece of text. (Scala, Python)
- BreakSentence: Identifies the positioning of sentence boundaries in a piece of text. (Scala, Python)
- Dictionary Lookup: Provides alternative translations for a word and a small number of idiomatic phrases. (Scala, Python)
- Dictionary Examples: Provides examples that show how terms in the dictionary are used in context. (Scala, Python)
- Document Translation: Translates documents across all supported languages and dialects while preserving document structure and data format. (Scala, Python)
Document Intelligence
Document Intelligence (formerly known as Azure AI Document Intelligence)
- Analyze Layout: Extract text and layout information from a given document. (Scala, Python)
- Analyze Receipts: Detects and extracts data from receipts using optical character recognition (OCR) and our receipt model, enabling you to easily extract structured data from receipts such as merchant name, merchant phone number, transaction date, transaction total, and more. (Scala, Python)
- Analyze Business Cards: Detects and extracts data from business cards using optical character recognition (OCR) and our business card model, enabling you to easily extract structured data from business cards such as contact names, company names, phone numbers, emails, and more. (Scala, Python)
- Analyze Invoices: Detects and extracts data from invoices using optical character recognition (OCR) and our invoice understanding deep learning models, enabling you to easily extract structured data from invoices such as customer, vendor, invoice ID, invoice due date, total, invoice amount due, tax amount, ship to, bill to, line items and more. (Scala, Python)
- Analyze ID Documents: Detects and extracts data from identification documents using optical character recognition (OCR) and our ID document model, enabling you to easily extract structured data from ID documents such as first name, last name, date of birth, document number, and more. (Scala, Python)
- Analyze Custom Form: Extracts information from forms (PDFs and images) into structured data based on a model created from a set of representative training forms. (Scala, Python)
- Get Custom Model: Get detailed information about a custom model. (Scala, Python)
- List Custom Models: Get information about all custom models. (Scala, Python)
Decision
- Anomaly status of latest point: generates a model using preceding points and determines whether the latest point is anomalous (Scala, Python)
- Find anomalies: generates a model using an entire series and finds anomalies in the series (Scala, Python)
Search
Prerequisites
- Follow the steps in Setup environment for Azure AI services to set up your Azure Databricks and Azure AI services environment. This tutorial shows you how to install SynapseML and how to create your Spark cluster in Databricks.
- After you create a new notebook in Azure Databricks, copy the following Shared code and paste into a new cell in your notebook.
- Choose one of the following service samples and copy paste it into a second new cell in your notebook.
- Replace any of the service subscription key placeholders with your own key.
- Choose the run button (triangle icon) in the upper right corner of the cell, then select Run Cell.
- View results in a table below the cell.
Shared code
To get started, we'll need to add this code to the project:
from pyspark.sql.functions import udf, col
from synapse.ml.io.http import HTTPTransformer, http_udf
from requests import Request
from pyspark.sql.functions import lit
from pyspark.ml import PipelineModel
from pyspark.sql.functions import col
import os
from pyspark.sql import SparkSession
from synapse.ml.core.platform import *
# Bootstrap Spark Session
spark = SparkSession.builder.getOrCreate()
from synapse.ml.core.platform import materializing_display as display
from synapse.ml.cognitive import *
# A multi-service resource key for Text Analytics, Computer Vision and Document Intelligence (or use separate keys that belong to each service)
service_key = find_secret("cognitive-api-key")
service_loc = "eastus"
# A Bing Search v7 subscription key
bing_search_key = find_secret("bing-search-key")
# An Anomaly Detector subscription key
anomaly_key = find_secret("anomaly-api-key")
anomaly_loc = "westus2"
# A Translator subscription key
translator_key = find_secret("translator-key")
translator_loc = "eastus"
# An Azure search key
search_key = find_secret("azure-search-key")
Text Analytics sample
The Text Analytics service provides several algorithms for extracting intelligent insights from text. For example, we can find the sentiment of given input text. The service will return a score between 0.0 and 1.0 where low scores indicate negative sentiment and high score indicates positive sentiment. This sample uses three simple sentences and returns the sentiment for each.
# Create a dataframe that's tied to it's column names
df = spark.createDataFrame(
[
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The Azure AI services on spark aint bad", "en-US"),
],
["text", "language"],
)
# Run the Text Analytics service with options
sentiment = (
TextSentiment()
.setTextCol("text")
.setLocation(service_loc)
.setSubscriptionKey(service_key)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language")
)
# Show the results of your text query in a table format
display(
sentiment.transform(df).select(
"text", col("sentiment.document.sentiment").alias("sentiment")
)
)
Text Analytics for Health Sample
The Text Analytics for Health Service extracts and labels relevant medical information from unstructured texts such as doctor's notes, discharge summaries, clinical documents, and electronic health records.
df = spark.createDataFrame(
[
("20mg of ibuprofen twice a day",),
("1tsp of Tylenol every 4 hours",),
("6-drops of Vitamin B-12 every evening",),
],
["text"],
)
healthcare = (
AnalyzeHealthText()
.setSubscriptionKey(service_key)
.setLocation(service_loc)
.setLanguage("en")
.setOutputCol("response")
)
display(healthcare.transform(df))
Translator sample
Translator is a cloud-based machine translation service and is part of the Azure AI services family of APIs used to build intelligent apps. Translator is easy to integrate in your applications, websites, tools, and solutions. It allows you to add multi-language user experiences in 90 languages and dialects and can be used for text translation with any operating system. In this sample, we do a simple text translation by providing the sentences you want to translate and target languages you want to translate to.
from pyspark.sql.functions import col, flatten
# Create a dataframe including sentences you want to translate
df = spark.createDataFrame(
[(["Hello, what is your name?", "Bye"],)],
[
"text",
],
)
# Run the Translator service with options
translate = (
Translate()
.setSubscriptionKey(translator_key)
.setLocation(translator_loc)
.setTextCol("text")
.setToLanguage(["zh-Hans"])
.setOutputCol("translation")
)
# Show the results of the translation.
display(
translate.transform(df)
.withColumn("translation", flatten(col("translation.translations")))
.withColumn("translation", col("translation.text"))
.select("translation")
)
Document Intelligence sample
Document Intelligence (formerly known as "Azure AI Document Intelligence") is a part of Azure AI services that lets you build automated data processing software using machine learning technology. Identify and extract text, key/value pairs, selection marks, tables, and structure from your documents. The service outputs structured data that includes the relationships in the original file, bounding boxes, confidence and more. In this sample, we analyze a business card image and extract its information into structured data.
from pyspark.sql.functions import col, explode
# Create a dataframe containing the source files
imageDf = spark.createDataFrame(
[
(
"https://mmlspark.blob.core.windows.net/datasets/FormRecognizer/business_card.jpg",
)
],
[
"source",
],
)
# Run the Document Intelligence service
analyzeBusinessCards = (
AnalyzeBusinessCards()
.setSubscriptionKey(service_key)
.setLocation(service_loc)
.setImageUrlCol("source")
.setOutputCol("businessCards")
)
# Show the results of recognition.
display(
analyzeBusinessCards.transform(imageDf)
.withColumn(
"documents", explode(col("businessCards.analyzeResult.documentResults.fields"))
)
.select("source", "documents")
)
Computer Vision sample
Computer Vision analyzes images to identify structure such as faces, objects, and natural-language descriptions. In this sample, we tag a list of images. Tags are one-word descriptions of things in the image like recognizable objects, people, scenery, and actions.
# Create a dataframe with the image URLs
base_url = "https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/"
df = spark.createDataFrame(
[
(base_url + "objects.jpg",),
(base_url + "dog.jpg",),
(base_url + "house.jpg",),
],
[
"image",
],
)
# Run the Computer Vision service. Analyze Image extracts information from/about the images.
analysis = (
AnalyzeImage()
.setLocation(service_loc)
.setSubscriptionKey(service_key)
.setVisualFeatures(
["Categories", "Color", "Description", "Faces", "Objects", "Tags"]
)
.setOutputCol("analysis_results")
.setImageUrlCol("image")
.setErrorCol("error")
)
# Show the results of what you wanted to pull out of the images.
display(analysis.transform(df).select("image", "analysis_results.description.tags"))
Bing Image Search sample
Bing Image Search searches the web to retrieve images related to a user's natural language query. In this sample, we use a text query that looks for images with quotes. It returns a list of image URLs that contain photos related to our query.
# Number of images Bing will return per query
imgsPerBatch = 10
# A list of offsets, used to page into the search results
offsets = [(i * imgsPerBatch,) for i in range(100)]
# Since web content is our data, we create a dataframe with options on that data: offsets
bingParameters = spark.createDataFrame(offsets, ["offset"])
# Run the Bing Image Search service with our text query
bingSearch = (
BingImageSearch()
.setSubscriptionKey(bing_search_key)
.setOffsetCol("offset")
.setQuery("Martin Luther King Jr. quotes")
.setCount(imgsPerBatch)
.setOutputCol("images")
)
# Transformer that extracts and flattens the richly structured output of Bing Image Search into a simple URL column
getUrls = BingImageSearch.getUrlTransformer("images", "url")
# This displays the full results returned, uncomment to use
# display(bingSearch.transform(bingParameters))
# Since we have two services, they are put into a pipeline
pipeline = PipelineModel(stages=[bingSearch, getUrls])
# Show the results of your search: image URLs
display(pipeline.transform(bingParameters))
Speech-to-Text sample
The Speech-to-text service converts streams or files of spoken audio to text. In this sample, we transcribe one audio file.
# Create a dataframe with our audio URLs, tied to the column called "url"
df = spark.createDataFrame(
[("https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav",)], ["url"]
)
# Run the Speech-to-text service to translate the audio into text
speech_to_text = (
SpeechToTextSDK()
.setSubscriptionKey(service_key)
.setLocation(service_loc)
.setOutputCol("text")
.setAudioDataCol("url")
.setLanguage("en-US")
.setProfanity("Masked")
)
# Show the results of the translation
display(speech_to_text.transform(df).select("url", "text.DisplayText"))
Text-to-Speech sample
Text to speech is a service that allows one to build apps and services that speak naturally, choosing from more than 270 neural voices across 119 languages and variants.
from synapse.ml.cognitive import TextToSpeech
fs = ""
if running_on_databricks():
fs = "dbfs:"
elif running_on_synapse_internal():
fs = "Files"
# Create a dataframe with text and an output file location
df = spark.createDataFrame(
[
(
"Reading out loud is fun! Check out aka.ms/spark for more information",
fs + "/output.mp3",
)
],
["text", "output_file"],
)
tts = (
TextToSpeech()
.setSubscriptionKey(service_key)
.setTextCol("text")
.setLocation(service_loc)
.setVoiceName("en-US-JennyNeural")
.setOutputFileCol("output_file")
)
# Check to make sure there were no errors during audio creation
display(tts.transform(df))
Anomaly Detector sample
Anomaly Detector is great for detecting irregularities in your time series data. In this sample, we use the service to find anomalies in the entire time series.
# Create a dataframe with the point data that Anomaly Detector requires
df = spark.createDataFrame(
[
("1972-01-01T00:00:00Z", 826.0),
("1972-02-01T00:00:00Z", 799.0),
("1972-03-01T00:00:00Z", 890.0),
("1972-04-01T00:00:00Z", 900.0),
("1972-05-01T00:00:00Z", 766.0),
("1972-06-01T00:00:00Z", 805.0),
("1972-07-01T00:00:00Z", 821.0),
("1972-08-01T00:00:00Z", 20000.0),
("1972-09-01T00:00:00Z", 883.0),
("1972-10-01T00:00:00Z", 898.0),
("1972-11-01T00:00:00Z", 957.0),
("1972-12-01T00:00:00Z", 924.0),
("1973-01-01T00:00:00Z", 881.0),
("1973-02-01T00:00:00Z", 837.0),
("1973-03-01T00:00:00Z", 9000.0),
],
["timestamp", "value"],
).withColumn("group", lit("series1"))
# Run the Anomaly Detector service to look for irregular data
anomaly_detector = (
SimpleDetectAnomalies()
.setSubscriptionKey(anomaly_key)
.setLocation(anomaly_loc)
.setTimestampCol("timestamp")
.setValueCol("value")
.setOutputCol("anomalies")
.setGroupbyCol("group")
.setGranularity("monthly")
)
# Show the full results of the analysis with the anomalies marked as "True"
display(
anomaly_detector.transform(df).select("timestamp", "value", "anomalies.isAnomaly")
)
Arbitrary web APIs
With HTTP on Spark, any web service can be used in your big data pipeline. In this example, we use the World Bank API to get information about various countries/regions around the world.
# Use any requests from the python requests library
def world_bank_request(country):
return Request(
"GET", "http://api.worldbank.org/v2/country/{}?format=json".format(country)
)
# Create a dataframe with specifies which countries/regions we want data on
df = spark.createDataFrame([("br",), ("usa",)], ["country"]).withColumn(
"request", http_udf(world_bank_request)(col("country"))
)
# Much faster for big data because of the concurrency :)
client = (
HTTPTransformer().setConcurrency(3).setInputCol("request").setOutputCol("response")
)
# Get the body of the response
def get_response_body(resp):
return resp.entity.content.decode()
# Show the details of the country data returned
display(
client.transform(df).select(
"country", udf(get_response_body)(col("response")).alias("response")
)
)
Azure Cognitive Search sample
In this example, we show how you can enrich data using Cognitive Skills and write to an Azure Search Index using SynapseML.
search_service = "mmlspark-azure-search"
search_index = "test-33467690"
df = spark.createDataFrame(
[
(
"upload",
"0",
"https://mmlspark.blob.core.windows.net/datasets/DSIR/test1.jpg",
),
(
"upload",
"1",
"https://mmlspark.blob.core.windows.net/datasets/DSIR/test2.jpg",
),
],
["searchAction", "id", "url"],
)
tdf = (
AnalyzeImage()
.setSubscriptionKey(service_key)
.setLocation(service_loc)
.setImageUrlCol("url")
.setOutputCol("analyzed")
.setErrorCol("errors")
.setVisualFeatures(
["Categories", "Tags", "Description", "Faces", "ImageType", "Color", "Adult"]
)
.transform(df)
.select("*", "analyzed.*")
.drop("errors", "analyzed")
)
tdf.writeToAzureSearch(
subscriptionKey=search_key,
actionCol="searchAction",
serviceName=search_service,
indexName=search_index,
keyCol="id",
)
Other Tutorials
The following tutorials provide complete examples of using Azure AI services in Synapse Analytics.
Sentiment analysis with Azure AI services - Using an example data set of customer comments, you build a Spark table with a column that indicates the sentiment of the comments in each row.
Anomaly detection with Azure AI services - Using an example data set of time series data, you build a Spark table with a column that indicates whether the data in each row is an anomaly.
Build machine learning applications using Microsoft Machine Learning for Apache Spark - This tutorial demonstrates how to use SynapseML to access several models from Azure AI services.
Document Intelligence with Azure AI services demonstrates how to use Document Intelligence to analyze your forms and documents, extracts text and data on Azure Synapse Analytics.
Text Analytics with Azure AI services shows how to use Text Analytics to analyze unstructured text on Azure Synapse Analytics.
Translator with Azure AI services shows how to use Translator to build intelligent, multi-language solutions on Azure Synapse Analytics
Computer Vision with Azure AI services demonstrates how to use Computer Vision to analyze images on Azure Synapse Analytics.
Available Azure AI services APIs
Bing Image Search
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Bing Image Search | BingImageSearch | Images - Visual Search V7.0 | Not Supported |
Anomaly Detector
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Detect Last Anomaly | DetectLastAnomaly | Detect Last Point V1.0 | Supported |
Detect Anomalies | DetectAnomalies | Detect Entire Series V1.0 | Supported |
Simple DetectAnomalies | SimpleDetectAnomalies | Detect Entire Series V1.0 | Supported |
Computer vision
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
OCR | OCR | Recognize Printed Text V2.0 | Supported |
Recognize Text | RecognizeText | Recognize Text V2.0 | Supported |
Read Image | ReadImage | Read V3.1 | Supported |
Generate Thumbnails | GenerateThumbnails | Generate Thumbnail V2.0 | Supported |
Analyze Image | AnalyzeImage | Analyze Image V2.0 | Supported |
Recognize Domain Specific Content | RecognizeDomainSpecificContent | Analyze Image By Domain V2.0 | Supported |
Tag Image | TagImage | Tag Image V2.0 | Supported |
Describe Image | DescribeImage | Describe Image V2.0 | Supported |
Translator
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Translate Text | Translate | Translate V3.0 | Not Supported |
Transliterate Text | Transliterate | Transliterate V3.0 | Not Supported |
Detect Language | Detect | Detect V3.0 | Not Supported |
Break Sentence | BreakSentence | Break Sentence V3.0 | Not Supported |
Dictionary lookup (alternate translations) | DictionaryLookup | Dictionary Lookup V3.0 | Not Supported |
Document Translation | DocumentTranslator | Document Translation V1.0 | Not Supported |
Face
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Detect Face | DetectFace | Detect With Url V1.0 | Supported |
Find Similar Face | FindSimilarFace | Find Similar V1.0 | Supported |
Group Faces | GroupFaces | Group V1.0 | Supported |
Identify Faces | IdentifyFaces | Identify V1.0 | Supported |
Verify Faces | VerifyFaces | Verify Face To Face V1.0 | Supported |
Document Intelligence
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Analyze Layout | AnalyzeLayout | Analyze Layout Async V2.1 | Supported |
Analyze Receipts | AnalyzeReceipts | Analyze Receipt Async V2.1 | Supported |
Analyze Business Cards | AnalyzeBusinessCards | Analyze Business Card Async V2.1 | Supported |
Analyze Invoices | AnalyzeInvoices | Analyze Invoice Async V2.1 | Supported |
Analyze ID Documents | AnalyzeIDDocuments | identification (ID) document model V2.1 | Supported |
List Custom Models | ListCustomModels | List Custom Models V2.1 | Supported |
Get Custom Model | GetCustomModel | Get Custom Models V2.1 | Supported |
Analyze Custom Model | AnalyzeCustomModel | Analyze With Custom Model V2.1 | Supported |
Speech-to-text
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Speech To Text | SpeechToText | SpeechToText V1.0 | Not Supported |
Speech To Text SDK | SpeechToTextSDK | Using Speech SDK Version 1.14.0 | Not Supported |
Text Analytics
API Type | SynapseML APIs | Azure AI services APIs (Versions) | DEP VNet Support |
---|---|---|---|
Text Sentiment V2 | TextSentimentV2 | Sentiment V2.0 | Supported |
Language Detector V2 | LanguageDetectorV2 | Languages V2.0 | Supported |
Entity Detector V2 | EntityDetectorV2 | Entities Linking V2.0 | Supported |
NER V2 | NERV2 | Entities Recognition General V2.0 | Supported |
Key Phrase Extractor V2 | KeyPhraseExtractorV2 | Key Phrases V2.0 | Supported |
Text Sentiment | TextSentiment | Sentiment V3.1 | Supported |
Key Phrase Extractor | KeyPhraseExtractor | Key Phrases V3.1 | Supported |
PII | PII | Entities Recognition Pii V3.1 | Supported |
NER | NER | Entities Recognition General V3.1 | Supported |
Language Detector | LanguageDetector | Languages V3.1 | Supported |
Entity Detector | EntityDetector | Entities Linking V3.1 | Supported |
Next steps
Feedback
Submit and view feedback for