How to retrieve R data visualization from Azure Machine Learning
As you may know, Azure Machine Learning can execute R scripts. You can interactively see the output console. But what about retrieving the result as part of a production call to the API generated by Azure ML?
Let’s test with a word cloud example in R. Mollie Taylor has posted one here (https://gist.github.com/mollietaylor/3671518) that we can reuse in Azure Machine Learning:
The details on how to create an Azure ML workspace, insert a dataset and an R script can be found here:
for R, just use that module:
The input of the Web API is set to the input dataset of the R Script and the output is set to the R Device port. As a reminder, here is how the inputs and outputs are positioned in an R Script module:
the detail is available in the help documentation.
In our case the interesting ports to publish are the following:
and
After running the experiment, we can see the result in Azure ML Studio:
So, how could we retrieve the pictures from an API that is published that way:
Here is some sample script in Python that shows how to do it. The script is a modified version of the sample given in the API Help page for Batch Execution. The idea is to get the base64 encoded pictures from the output file and decode them out to local disk.
# -*- coding: utf-8 -*-
# How this works:
#
# 1. Assume the input is present in a local file
# 2. Upload the file to an Azure blob - you'd need an Azure storage account
# 3. Call BES to process the data in the blob.
# 4. The results get written to another Azure blob.
# 5. Download the output blob to a local file
#
# Note: You may need to download/install the Azure SDK for Python.
# See: https://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/
import urllib2
import json
import time
from azure.storage import *
import sys
import base64
import json
storage_account_name = 'a****obfuscated***4'
storage_account_key = '/aV****obfuscated***vXA76w=='
storage_container_name = 'benjguin'
input_file = ur"C:\be****obfuscated***os\WordCloud\conventions.csv"
output_file = ur'C:\be****obfuscated***os\WordCloud\myresults.csv'
input_blob_name = 'conventions.csv'
api_key = r'Cczx****obfuscated***WemQ=='
url = 'https://ussouthcentral.services.azureml.net/workspaces/a7c****obfuscated***756/services/d328e03****obfuscated***5c2/jobs'
uploadfile=True
executeBES=True
blob_service = BlobService(account_name=storage_account_name, account_key=storage_account_key)
if uploadfile:
print("Uploading the input to blob storage...")
data_to_upload = open(input_file, 'r').read()
blob_service.put_blob(storage_container_name, input_blob_name, data_to_upload, x_ms_blob_type='BlockBlob')
input_blob_path = '/' + storage_container_name + '/' + input_blob_name
debug_blob = blob_service.get_blob(storage_container_name, input_blob_name)
if executeBES:
print("Submitting the BES job...")
connection_string = "DefaultEndpointsProtocol=https;AccountName=" + storage_account_name + ";AccountKey=" + storage_account_key
payload = {
"Input": {
"ConnectionString": connection_string,
"RelativeLocation": input_blob_path
}
}
body = str.encode(json.dumps(payload))
headers = { 'Content-Type':'application/json', 'Authorization':('Bearer ' + api_key)}
req = urllib2.Request(url, body, headers)
response = urllib2.urlopen(req)
result = response.read()
job_id = result[1:-1] # remove the enclosing double-quotes
url2 = url + '/' + job_id
while True:
time.sleep(1) # wait a second
authHeader = { 'Authorization':('Bearer ' + api_key)}
request = urllib2.Request(url2, headers=authHeader)
response = urllib2.urlopen(request)
result = json.loads(response.read())
status = result['StatusCode']
if (status == 0):
print("Not started...")
elif (status == 1):
print("Running...")
elif (status == 2):
print("Failed...")
break
elif (status == 3):
print("Cancelled...")
break
elif (status == 4):
print("Finished!")
result_blob_location = result['Result']
sas_token = result_blob_location['SasBlobToken']
base_url = result_blob_location['BaseLocation']
relative_url = result_blob_location['RelativeLocation']
url3 = base_url + relative_url + sas_token
response = urllib2.urlopen(url3)
with open(output_file, 'w') as f:
f.write(response.read())
break
outputdata=open(output_file)
outputtxt=outputdata.read()
outputdata.close()
s=outputtxt.index('\"{')
e=len(outputtxt)
o1=outputtxt[s+1:e-3]
jsonresult = json.loads(o1)
i=1
for gd in jsonresult['Graphics Device']:
fname = output_file + "." + str(i) + ".png"
print 'writing png #' + str(i) + ' to ' + fname
f = open(fname, 'wb')
f.write(base64.b64decode(gd))
f.close()
i += 1
print("Done!")
Here is a sample execution output:
Uploading the input to blob storage...
Submitting the BES job...
Running...
Running...
Running...
Running...
Running...
Running...
Running...
Finished!
writing png #1 to C:\be***obfuscated***os\WordCloud\myresults.csv.1.png
writing png #2 to C:\be***obfuscated***os\WordCloud\myresults.csv.2.png
Done!
The output sent back by Azure ML looks like this:
R Output JSON
"{"Standard Output":"RWorker pushed \"port1\" to R workspace.\r\nBeginning R Execute Script\n\n[1] 56000\r\nLoading objects:\r\n port1\r\n[1] \"Loading variable port1...\"\r\npng \r\n 2 \r\nnull device \r\n 1 \r\n","Standard Error":"R reported no errors.","visualizationType":"rOutput","Graphics Device":["iVBORw0K***(...)***RvX/wFzB5s8eym6ZgAAAABJRU5ErkJggg==","iVBORw0KGgo***(...)***dVorBuiQAAAABJRU5ErkJggg=="]}"
You can see the pictures
well, Python does:
The resulting files are:
and
R has tons of great data visualisation. Have a look at those blogs for instance:
Benjamin (@benjguin)