Build Your Own R Modules in Azure ML

Article
04/23/2015

This post is by Roope Astala, Senior Program Manager in Microsoft’s Information Management and Machine Learning team.

Azure ML currently offers almost 100 modules to solve a wide spectrum of data science problems that our customers may encounter. Nevertheless, what if you need more, or maybe something a bit different from what we have to offer?

Custom R Modules

Custom R Modules give you a way to extend the built-in module set with your own. You can share these modules with friends or co-workers by putting them in GitHub.

Custom R modules are first-class citizens – they can be used in experiments and operationalized in web services just like built-in modules. You can use such modules for things such as:

Handling of domain-specific data formats.
Flexible data transformations.
Customized feature construction and extraction.

Within your R script, you can use hundreds of R packages preinstalled in Azure ML. You can even bundle your own packages with the module.

Example

As an example, let’s create a module that takes some JSON-formatted data and parses it into an Azure ML dataset. The module consists of 3 parts:

An R code file that defines what the module does.
Optionally, any accompanying files – e.g. configuration files or R code packages.
An XML file that defines what inputs and output and parameters the module will have. In a sense, the XML is the skeleton of the module, and the R code its muscle.

The module takes in one input, a dataset which consists of a JSON-formatted string, and one output, the contents of JSON objects as a flattened dataset. It also has one parameter: a string that specifies null replacement value. The corresponding R script is:

parse_json.R:

parse_json <- function(data_in, nullvalue="NA") {

library(RJSONIO)

library(plyr)

data_out <- ldply(fromJSON(as.character(data_in[1,1]),nullValue=nullvalue,simplify=TRUE))

return(data_out)

}

The XML description defines the name of the module, which R function to call to run the module, as well as input and output datasets, and input parameters.

parse_json.xml:

<Owner>AzureML User</Owner>

<Description>This is my module description. </Description>

<Ports>

<Description>Combined Data</Description>

</Output>

<Description>Input dataset</Description>

</Input>

</Ports>

<Description>Value used to replace JSON null value</Description>

</Arg>

</Arguments>

</Module>

To add the module to Azure ML, you simply put the different files into a zip package and upload the package by selecting +NEW > Module in your Azure ML Studio workspace. Once uploaded, your module appears in “Custom” category in the module palette, alongside all the built-in modules:

You can now use the new R module to build experiments, and deploy it to production by publishing your experiment as web service.

Summary

Custom R Modules are a great way for you to extend Azure ML’s built-in modules. Such modules can be used in experiments, operationalized in web services and shared with your colleagues and the community. Although the example provided in this blog post is a simple one, custom R modules can be far more complex and can take multiple inputs and outputs and parameters of different types. Also, they have access to the same user interfaces as built-in modules, e.g. column selectors and drop-down menus of parameters. In the future, we plan to add support for input and output types beyond datasets: e.g. learners and transformations.

Do give it a try and share your feedback with us below.

Roope

Share via

Build Your Own R Modules in Azure ML

Custom R Modules

Example

Summary

Additional resources