Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This post is by Roope Astala, Senior Program Manager in Microsoft’s Information Management and Machine Learning team.
Azure ML currently offers almost 100 modules to solve a wide spectrum of data science problems that our customers may encounter. Nevertheless, what if you need more, or maybe something a bit different from what we have to offer?
Custom R Modules
Custom R Modules give you a way to extend the built-in module set with your own. You can share these modules with friends or co-workers by putting them in GitHub.
Custom R modules are first-class citizens – they can be used in experiments and operationalized in web services just like built-in modules. You can use such modules for things such as:
Handling of domain-specific data formats.
Flexible data transformations.
Customized feature construction and extraction.
Within your R script, you can use hundreds of R packages preinstalled in Azure ML. You can even bundle your own packages with the module.
Example
As an example, let’s create a module that takes some JSON-formatted data and parses it into an Azure ML dataset. The module consists of 3 parts:
An R code file that defines what the module does.
Optionally, any accompanying files – e.g. configuration files or R code packages.
An XML file that defines what inputs and output and parameters the module will have. In a sense, the XML is the skeleton of the module, and the R code its muscle.
The module takes in one input, a dataset which consists of a JSON-formatted string, and one output, the contents of JSON objects as a flattened dataset. It also has one parameter: a string that specifies null replacement value. The corresponding R script is:
parse_json.R:
parse_json <- function(data_in, nullvalue="NA") {
library(RJSONIO)
library(plyr)
data_out <- ldply(fromJSON(as.character(data_in[1,1]),nullValue=nullvalue,simplify=TRUE))
return(data_out)
}
The XML description defines the name of the module, which R function to call to run the module, as well as input and output datasets, and input parameters.
parse_json.xml:
<Module name="Parse JSON Strings">
<Owner>AzureML User</Owner>
<Description>This is my module description. </Description>
<Language name="R" sourceFile="parse_json.R" entryPoint="parse_json"/>
<Ports>
<Output id="data_out" name="Parsed dataset" type="DataTable">
<Description>Combined Data</Description>
</Output>
<Input id="data_in" name="JSON formatted dataset" type="DataTable">
<Description>Input dataset</Description>
</Input>
</Ports>
<Arguments>
<Arg id="nullvalue" name="Null replacement value" type="string" isOptional = "true">
<Description>Value used to replace JSON null value</Description>
</Arg>
</Arguments>
</Module>
To add the module to Azure ML, you simply put the different files into a zip package and upload the package by selecting +NEW > Module in your Azure ML Studio workspace. Once uploaded, your module appears in “Custom” category in the module palette, alongside all the built-in modules:
You can now use the new R module to build experiments, and deploy it to production by publishing your experiment as web service.
Summary
Custom R Modules are a great way for you to extend Azure ML’s built-in modules. Such modules can be used in experiments, operationalized in web services and shared with your colleagues and the community. Although the example provided in this blog post is a simple one, custom R modules can be far more complex and can take multiple inputs and outputs and parameters of different types. Also, they have access to the same user interfaces as built-in modules, e.g. column selectors and drop-down menus of parameters. In the future, we plan to add support for input and output types beyond datasets: e.g. learners and transformations.
Do give it a try and share your feedback with us below.
Roope