Quick Examples
The following snippets are ready to run and will help get you started with using Cognitive Services on Spark. The samples below are in Scala.
The samples use these Cognitive Services:
- Language service - get the sentiment (or mood) of a set of sentences.
- Computer Vision - get the tags (one-word descriptions) associated with a set of images.
- Speech-to-text - transcribe audio files to extract text-based transcripts.
- Anomaly Detector - detect anomalies within a time series data.
Prerequisites
- Follow the steps in Getting started to set up your Azure Databricks and Cognitive Services environment. This tutorial will include how to install MMLSpark and how to create your Spark cluster in Databricks.
- After you create a new notebook in Azure Databricks, copy the Shared code below and paste into a new cell in your notebook.
- Choose a service sample, below, and copy paste it into a second new cell in your notebook.
- Replace any of the service subscription key placeholders with your own key.
- Choose the run button (triangle icon) in the upper right corner of the cell, then select Run Cell.
- View results in a table below the cell.
Shared code
To get started, add this code to your project:
import com.microsoft.ml.spark.cognitive._
import spark.implicits._
val serviceKey = "ADD-YOUR-SUBSCRIPTION-KEY"
val location = "eastus"
Language service
The Language service provides several algorithms for extracting intelligent insights from text. For example, we can find the sentiment of given input text. The service will return a score between 0.0
and 1.0
where low scores indicate negative sentiment and high score indicates positive sentiment. The sample below uses three simple sentences and returns the sentiment score for each.
import org.apache.spark.sql.functions.col
val df = Seq(
("I am so happy today, its sunny!", "en-US"),
("I am frustrated by this rush hour traffic", "en-US"),
("The cognitive services on spark aint bad", "en-US")
).toDF("text", "language")
val sentiment = new TextSentiment()
.setTextCol("text")
.setLocation(location)
.setSubscriptionKey(serviceKey)
.setOutputCol("sentiment")
.setErrorCol("error")
.setLanguageCol("language")
display(sentiment.transform(df).select(col("text"), col("sentiment")(0).getItem("score").alias("sentiment")))
Expected result
text | sentiment |
---|---|
I am so happy today, its sunny! | 0.9789592027664185 |
I am frustrated by this rush hour traffic | 0.023795604705810547 |
The cognitive services on spark aint bad | 0.8888956308364868 |
Computer Vision
Computer Vision analyzes images to identify structure such as faces, objects, and natural-language descriptions. In this sample, we tag a list of images. Tags are one-word descriptions of things in the image like recognizable objects, people, scenery, and actions.
// Create a dataframe with the image URLs
val df = Seq(
("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/objects.jpg"),
("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/dog.jpg"),
("https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/house.jpg")
).toDF("image")
// Run the Computer Vision service. Analyze Image extracts infortmation from/about the images.
val analysis = new AnalyzeImage()
.setLocation(location)
.setSubscriptionKey(serviceKey)
.setVisualFeatures(Seq("Categories","Color","Description","Faces","Objects","Tags"))
.setOutputCol("results")
.setImageUrlCol("image")
.setErrorCol("error"))
// Show the results of what you wanted to pull out of the images.
display(analysis.transform(df).select(col("image"), col("results").getItem("tags").getItem("name")).alias("results")))
// Uncomment for full results with all visual feature requests
//display(analysis.transform(df).select(col("image"), col("results")))
Expected result
image | tags |
---|---|
https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/objects.jpg | ['skating' 'person' 'man' 'outdoor' 'riding' 'sport' 'skateboard' 'young' 'board' 'shirt' 'air' 'black' 'park' 'boy' 'side' 'jumping' 'trick' 'ramp' 'doing' 'flying'] |
https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/dog.jpg | ['dog' 'outdoor' 'fence' 'wooden' 'small' 'brown' 'building' 'sitting' 'front' 'bench' 'standing' 'table' 'walking' 'board' 'beach' 'white' 'holding' 'bridge' 'track'] |
https://raw.githubusercontent.com/Azure-Samples/cognitive-services-sample-data-files/master/ComputerVision/Images/house.jpg | ['outdoor' 'grass' 'house' 'building' 'old' 'home' 'front' 'small' 'church' 'stone' 'large' 'grazing' 'yard' 'green' 'sitting' 'leading' 'sheep' 'brick' 'bench' 'street' 'white' 'country' 'clock' 'sign' 'parked' 'field' 'standing' 'garden' 'water' 'red' 'horse' 'man' 'tall' 'fire' 'group'] |
Speech-to-Text
The Speech-to-text service converts streams or files of spoken audio to text. In this sample, we transcribe two audio files. The first file is easy to understand, and the second is more challenging.
import org.apache.spark.sql.functions.col
// Create a dataframe with audio URLs, tied to the column called "url"
val df = Seq(("https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav"),
("https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3")).toDF("url")
// Run the Speech-to-text service to translate the audio into text
val speechToText = new SpeechToTextSDK()
.setSubscriptionKey(serviceKey)
.setLocation("eastus")
.setOutputCol("text")
.setAudioDataCol("url")
.setLanguage("en-US")
.setProfanity("Masked")
// Show the results of the translation
display(speechToText.transform(df).select(col("url"), col("text").getItem("DisplayText")))
Expected result
url | DisplayText |
---|---|
https://mmlspark.blob.core.windows.net/datasets/Speech/audio2.wav | Custom speech provides tools that allow you to visually inspect the recognition quality of a model by comparing audio data with the corresponding recognition result from the custom speech portal. You can playback uploaded audio and determine if the provided recognition result is correct. This tool allows you to quickly inspect quality of Microsoft's baseline speech to text model or a trained custom model without having to transcribe any audio data. |
https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3 | Add a gentleman Sir thinking visual check. |
https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3 | I hear me. |
https://mmlspark.blob.core.windows.net/datasets/Speech/audio3.mp3 | I like the reassurance for radio that I can hear it as well. |
Anomaly Detector
Anomaly Detector is great for detecting irregularities in your time series data. In this sample, we use the service to find anomalies in the entire time series.
import org.apache.spark.sql.functions.{col, lit}
val anomalyKey = "84a2c303cc7e49f6a44d692c27fb9967"
val df = Seq(
("1972-01-01T00:00:00Z", 826.0),
("1972-02-01T00:00:00Z", 799.0),
("1972-03-01T00:00:00Z", 890.0),
("1972-04-01T00:00:00Z", 900.0),
("1972-05-01T00:00:00Z", 766.0),
("1972-06-01T00:00:00Z", 805.0),
("1972-07-01T00:00:00Z", 821.0),
("1972-08-01T00:00:00Z", 20000.0),
("1972-09-01T00:00:00Z", 883.0),
("1972-10-01T00:00:00Z", 898.0),
("1972-11-01T00:00:00Z", 957.0),
("1972-12-01T00:00:00Z", 924.0),
("1973-01-01T00:00:00Z", 881.0),
("1973-02-01T00:00:00Z", 837.0),
("1973-03-01T00:00:00Z", 9000.0)
).toDF("timestamp", "value").withColumn("group", lit("series1"))
// Run the Anomaly Detector service to look for irregular data
val anamolyDetector = new SimpleDetectAnomalies()
.setSubscriptionKey(anomalyKey)
.setLocation("eastus")
.setTimestampCol("timestamp")
.setValueCol("value")
.setOutputCol("anomalies")
.setGroupbyCol("group")
.setGranularity("monthly")
// Show the full results of the analysis with the anomalies marked as "True"
display(anamolyDetector.transform(df).select("timestamp", "value", "anomalies.isAnomaly"))
Expected result
timestamp | value | isAnomaly |
---|---|---|
1972-01-01T00:00:00Z | 826 | False |
1972-02-01T00:00:00Z | 799 | False |
1972-03-01T00:00:00Z | 890 | False |
1972-04-01T00:00:00Z | 900 | False |
1972-05-01T00:00:00Z | 766 | False |
1972-06-01T00:00:00Z | 805 | False |
1972-07-01T00:00:00Z | 821 | False |
1972-08-01T00:00:00Z | 20000 | True |
1972-09-01T00:00:00Z | 883 | False |
1972-10-01T00:00:00Z | 898 | False |
1972-11-01T00:00:00Z | 957 | False |
1972-12-01T00:00:00Z | 924 | False |
1973-01-01T00:00:00Z | 881 | False |
1973-02-01T00:00:00Z | 837 | False |
1973-03-01T00:00:00Z | 9000 | True |
Feedback
Submit and view feedback for