Build Your Own R Modules in Azure ML
This post is by Roope Astala, Senior Program Manager in Microsoft’s Information Management and Machine Learning team.
Azure ML currently offers almost 100 modules to solve a wide spectrum of data science problems that our customers may encounter. Nevertheless, what if you need more, or maybe something a bit different from what we have to offer?
Custom R Modules
Custom R Modules give you a way to extend the built-in module set with your own. You can share these modules with friends or co-workers by putting them in GitHub.
Custom R modules are first-class citizens – they can be used in experiments and operationalized in web services just like built-in modules. You can use such modules for things such as:
Handling of domain-specific data formats.
Flexible data transformations.
Customized feature construction and extraction.
Within your R script, you can use hundreds of R packages preinstalled in Azure ML. You can even bundle your own packages with the module.
Example
As an example, let’s create a module that takes some JSON-formatted data and parses it into an Azure ML dataset. The module consists of 3 parts:
An R code file that defines what the module does.
Optionally, any accompanying files – e.g. configuration files or R code packages.
An XML file that defines what inputs and output and parameters the module will have. In a sense, the XML is the skeleton of the module, and the R code its muscle.
The module takes in one input, a dataset which consists of a JSON-formatted string, and one output, the contents of JSON objects as a flattened dataset. It also has one parameter: a string that specifies null replacement value. The corresponding R script is:
parse_json.R:
parse_json <- function(data_in, nullvalue="NA") {
library(RJSONIO)
library(plyr)
data_out <- ldply(fromJSON(as.character(data_in[1,1]),nullValue=nullvalue,simplify=TRUE))
return(data_out)
}
The XML description defines the name of the module, which R function to call to run the module, as well as input and output datasets, and input parameters.
parse_json.xml:
<Module name="Parse JSON Strings">
<Owner>AzureML User</Owner>
<Description>This is my module description. </Description>
<Language name="R" sourceFile="parse_json.R" entryPoint="parse_json"/>
<Ports>
<Output id="data_out" name="Parsed dataset" type="DataTable">
<Description>Combined Data</Description>
</Output>
<Input id="data_in" name="JSON formatted dataset" type="DataTable">
<Description>Input dataset</Description>
</Input>
</Ports>
<Arguments>
<Arg id="nullvalue" name="Null replacement value" type="string" isOptional = "true">
<Description>Value used to replace JSON null value</Description>
</Arg>
</Arguments>
</Module>
To add the module to Azure ML, you simply put the different files into a zip package and upload the package by selecting +NEW > Module in your Azure ML Studio workspace. Once uploaded, your module appears in “Custom” category in the module palette, alongside all the built-in modules:
You can now use the new R module to build experiments, and deploy it to production by publishing your experiment as web service.
Summary
Custom R Modules are a great way for you to extend Azure ML’s built-in modules. Such modules can be used in experiments, operationalized in web services and shared with your colleagues and the community. Although the example provided in this blog post is a simple one, custom R modules can be far more complex and can take multiple inputs and outputs and parameters of different types. Also, they have access to the same user interfaces as built-in modules, e.g. column selectors and drop-down menus of parameters. In the future, we plan to add support for input and output types beyond datasets: e.g. learners and transformations.
Do give it a try and share your feedback with us below.
Roope