January 2018

Volume 33 Number 1

# Data Analysis, Machine Learning and Microsoft Excel

By Michael Saunders | January 2018

Take a second to think about Excel as a whole—it’s a flexible, powerful, ubiquitous tool for analyzing data, from my handy little to-do list spreadsheet, all the way to massive 100MB financial-model workbooks at the world’s top banks. And underneath all the fancy features, behind the slick charts and formatting, the real value is calculation, called “Calc” internally. Calc is what lets you create relationships between cells: You express complex models by writing simple formulas to describe the dependency trees between values (see Figure 1). And then, as soon as you make a change, Calc updates the dependent values based on those formulas.

Figure 1 Calc Lets You Create Relationships Between Cells

Excel provides tons of helpful functions to use in your formulas, from the essential AVERAGE function (bit.ly/1Tjynwl), to string-analysis functions like SEARCH for finding substrings (bit.ly/2jhcEuV), to more complex calculations like the T.TEST statistical function (bit.ly/2ipowKE). People at work and in school learn Excel functions to use in formulas, and they interact with them naturally for their day-to-day calculation needs. So if you want to provide a powerful capability that Excel doesn’t already have, consider building an Excel function for that purpose. See Figure 2 for an example: The Contoso Cryptographers Corp. wants to release a solution that helps sleuths analyze data in Excel to decode messages. And one of the handy tools for code breaking is to recognize primes quickly, so Contoso would love to have an ISPRIME function for the millions of detectives who already use Excel. Later in this article you’ll see how Contoso builds this function and the other capabilities it needs for its add-in.

Figure 2 The Contoso Function to Identify Prime Numbers

## What Can I Build?

Before you start building, consider that not all extensions to Excel functionality should use Excel formulas. To decide if yours should, ask yourself this: “Does my function need to change anything aside from the Excel value that’s being returned?” In other words, does the function have side effects? In Excel, people don’t expect formulas to change anything except the cell in which they’re entered. That change might trigger other changes, but those would also be the result of self-contained formulas without side effects. When you type “=SUM()” in cell A1, you don’t expect a chart to appear somewhere on the sheet or a new row to be added underneath or a number to be changed in a financial database on the Internet, though you can control all those behaviors as part of a solution you build, which might contain functions and other capabilities.

The type of solution you should build to create your Excel function depends on your skills and goals. If you’re a professional developer, either selling externally like Contoso Cryptographers or building for people in your own organization, an add-in is the right tool for data analytics solutions in Excel, as is explained in the “JavaScript Custom Functions in Excel Add-Ins section.” If you’re an AI developer or data scientist looking to build intelligent models for data analysts to use at your company, check out the “Azure Machine Learning Functions” section later in this article. And if you don’t have any of those skills, Microsoft will have solutions for you in the future!

## JavaScript Custom Functions in Excel Add-Ins

Excel add-ins are tools that professional developers can build to extend Excel and interact with the data in workbooks (bit.ly/2AUOsZk). Thousands of add-ins are already available and they all run across Excel platforms, including Windows, Mac, Excel Online and even iOS. Anyone familiar with Web technologies will find add-ins easy to build, because they’re written just like Web pages: Add-ins run HTML, JavaScript, and CSS, and have the ability to call external Web servers. Best of all, the add-ins don’t need any code changes to run on all the different platforms. (Excel also supports other types of add-ins, as described at bit.ly/2qsPfLe, but they can’t run across platforms or be deployed to the Store.) Figure 3 shows the key pieces of an add-in. The main piece is the manifest.xml file, which specifies the Web location of the rest of the add-in and other metadata. The rest of the code is deployed by the developer, just as with any other Web application.

Figure 3 The Key Pieces of an Add-In

Add-ins provide lots of useful ways to extend Excel (see the documentation at bit.ly/2AV47rw). They can add UI elements, like ribbon buttons, contextual menu options, and HTML task panes and dialogs. They can interact with workbooks via thousands of APIs, such as the abilities to add and remove rows, insert and edit charts, and even apply formatting to cells. And now, with the new preview release of JavaScript custom functions, it’s possible to extend Excel formulas.

Custom functions allow developers to add any JavaScript function to Excel using an add-in (bit.ly/2AYtNUW). Users can then access custom functions like any other native function in Excel (such as AVERAGE). Let’s take a closer look at the Contoso Cryptographers ISPRIME function to see how it’s written (It’s also possible to check primality using only built-in Excel functions, but that’s much more difficult for the person requesting the calculation.):

``````function isPrime(n) {
var root = Math.sqrt(n);
if (n < 2) return false;
for (var divisor = 2; divisor <= root; divisor++){
if(n % divisor == 0) return false;
}
return true;
}
``````

The function simply checks all possible divisors up to the square root to determine whether the input is prime—a few lines of JavaScript. All the actual logic is done already. But there’s more to write than just the function itself. To be an effective extension point, custom functions must look and feel just like native ones. And native functions have a bunch of customized information. Look at Figure 4 for an example with the Contoso Crypographers ISPRIME function: In addition to the name of the function itself, Excel displays a helpful description to clarify the purpose of the function.

Figure 4 Custom Functions and Native Functions Automatically Complete While You Type

You can see another example in Figure 5. After CONTOSO.ISPRIME is selected, Excel shows the name of the parameter (in this case, n) to make it easier to pick the right inputs.

Figure 5 The Contoso ISPRIME Function

``````Excel.Script.CustomFunctions["CONTOSO"]["ISPRIME"] = {
call: isPrime,
description: "Determines whether the input is prime",
helpUrl: "https://example.com/help.html",
result: {
resultType: Excel.CustomFunctionValueType.boolean,
resultDimensionality: Excel.CustomFunctionDimensionality.scalar,
},
parameters: [{
name: "n",
description: "the number to be evaluated",
valueType: Excel.CustomFunctionValueType.number,
valueDimensionality: Excel.CustomFunctionDimensionality.scalar,
}],
options: { batched: false, streaming: false }
};
``````

Both of these pieces of information and much more are provided by the developer in the JavaScript definition of the function’s metadata, as you can see in Figure 5.

You can see that the description is specified as a string and so is the name of each parameter. I won’t discuss all the metadata here, but you can see the documentation for more info. If you’re familiar with the add-in model, you might be wondering why this information is provided in JavaScript rather than hardcoded statically somewhere, like in the manifest.xml file. The reason is flexibility. In Contoso’s case, the cryptography functions are defined and well-known beforehand. But sometimes you might want the ability to enable different functions in different situations.

Contoso Cryptographers love the simplicity of their ISPRIME function, but their next goal is a little more difficult: They want to build a function to generate random numbers for encrypting text. Excel has an excellent RAND function, but the problem is that RAND isn’t ideal for encryption because its values are pseudorandom, generated procedurally. In contrast, numbers generated by random.org are safe to use for this purpose—they’re generated from atmospheric noise. Of course, it’s no good to hardcode random numbers; instead, Contoso must design a function that can call random.org to fetch random numbers via HTTP request. Thankfully, custom functions make it easy to include Web requests. Here’s what Contoso’s asynchronous RANDOM function looks like:

``````function getRandom(min, max) {
return new OfficeExtension.Promise(function(setResult, setError){
sendRandomOrgHTTP(min, max, function(result){
if(result.number) setResult(number);
else setError(result.error);
});
});
}
``````

The key difference between this function and ISPRIME is that RANDOM is asynchronous: instead of returning a value to Excel, it immediately returns a JavaScript promise, then makes an XMLHttpRequest (not visible in the function—it’s in the sendRandom­OrgHTTP method) to the random.org service. Then, once the Web service has responded, Contoso resolves the promise with the random.org number to write it to the cell. Bringing Web data into Excel is one of the key reasons companies want to extend Excel functions, so Microsoft expects asynchronous functions to be popular.

One common aspect of all asynchronous functions is that they take some amount of time to return their result, so Excel shows a #GETTING_DATA message in the cell while it waits for the function to be resolved (see Figure 6).

Figure 6 Waiting for an Asynchronous Function to Return

In this case, the HTTP request can complete quickly, in around a tenth of a second. But if necessary, the function can make many calls and take longer than that to evaluate. For JavaScript custom functions, resolving the value in JavaScript causes it to be displayed immediately in the cell. Figure 7 shows an Excel view of the same RANDOM function after the value is returned.

Figure 7 An Excel View of the RANDOM Function After the Value Is Returned

Of course, an add-in using custom functions could have many other features, as well: Contoso Cryptographers might want to create a task pane and a ribbon tab to give their customers easy access to their custom functions; settings for how they behave; and guidance on how to use them. Perhaps the JavaScript API could even be used to let customers encrypt an entire worksheet of data with a click of a button. Whatever they decide, they can be confident that the entire add-in will run across Excel platforms without code changes.

There are two main ways to deploy an add-in containing these JavaScript custom functions, depending on the audience. The public Office Store (bit.ly/2A70L5o) is available to anyone who has Excel—there’s even a button to browse the Store on the Excel ribbon. The Store supports both free and paid add-ins, and requires submission and validation through Microsoft. However, Contoso Cryptographers Corp. deploys directly to its small business customers so they don’t have to click anything to install it. Instead, it provides each customer with the manifest file. Then an IT admin can choose which users get access via the O365 admin center (see the interface in Figure 8). Those users get the add-in installed automatically.

Figure 8 Deploying a Custom Function to an Organization

Hopefully you’ve been inspired by the resourceful Contoso Cryptographers to try writing your own custom functions (use the guide and samples at aka.ms/customfunctions). Next, I’ll explore the other new way to extend Excel formulas.

## Azure Machine Learning Functions

The second type of extensible function Microsoft has announced is Azure Machine Learning functions (abbreviated Azure ML). In contrast to JavaScript custom functions, Azure ML functions are created by AI developers (often data scientists and other experts) for analysts in their organization to use. As a result, there’s no need to create an add-in package to deploy an Azure ML function.

The Azure ML function itself is based on a service that calculates or predicts values based on a machine learning (ML) model. Once the model is built, the creators can enable it for anyone they choose. Then, every time someone wants to run the function, they simply type in a cell, just like for any other Excel function. The function calls a live Web service on the company’s Azure subscription and returns the result asynchronously.

For example, a marketing analyst at a retailer might want to predict demand for new products in various geographies. The analyst has some data on that product and the target markets in Excel. The goal is to get a simple function that lets the analyst forecast demand in each market without ever leaving Excel. Figure 9 shows a sample spreadsheet with this type of data.

Figure 9 Sample Product and Market Data That Will Be Analyzed with an Azure ML Function

An AI developer at that organization uses Azure ML services (bit.ly/2nwa0WP) to create the model: She starts by preparing training data, typically from an external database, to build an ML experiment based on historical sales for new products. Azure Machine Learning Workbench is a tool that simplifies the data preparation and lets the data scientist write Python code to train and evaluate models. Figure 10 is the Run Dashboard view in Azure ML Workbench, showing statistics on an experiment.

Figure 10 Azure Machine Learning Workbench

Once the model is ready, it can be deployed as a Web service in the organization’s Azure subscription, with permissions for the right set of users to access it. Instead of defining the metadata in JavaScript that runs locally, the metadata is defined in a Swagger (RESTful API) format that’s created automatically along with the deployed service. All that’s required for the service to appear as a function in Excel is that the analyst in Excel has permission to access it. Figure 11 shows the deployed functions appearing for the data analysts in Excel. The analysts can then run the function normally. Just as with the asynchronous JavaScript custom functions, the cell shows a #GETTING_DATA message while the service computes the result.

Figure 11 The Deployed Functions Showing in Excel

## The Future of Extensible Functions in Excel

I hope you’ll try the technology previews for extensible functions in Excel—JavaScript custom functions and add-ins if you’re a software or services vendor, and Azure ML functions if you’re an AI developer or data scientist. Microsoft is listening carefully to all the feedback during the preview (post ideas to the UserVoice page at bit.ly/2jRJQsu), so in addition to getting a head start, you’ll also have the opportunity to shape the final released product. There are many more improvements coming to both of these function types, and some entirely new ones, so stay posted for future announcements!

Michael Saunders is a program manager on the Office team, where he builds Excel features for developers. He’s originally from Toronto, Canada, and he studied Management and Materials Science Engineering at the University of Pennsylvania and Wharton. In his spare time, Saunders sings with the Seattle Esoterics and develops add-ins.

Thanks to the following Microsoft technical experts for reviewing this article: Yina Arenas, Ashvini Sharma, Sandhya Vankamamidi
Yina Arenas is a principal program manager at Microsoft and lead for Microsoft Graph. She’s taking Office and Microsoft APIs from legacy and disjointed technologies to a new, unified API world.
Ashvini Sharma is the Lead Program Manager on the Excel team responsible for Excel’s Analytics vision and roadmap.
Sandhya Vankamamidi is a Sr. Program Manager in the Information and Content Consumption Experiences team. Sandhya is responsible for designing and delivering immersive experiences across the family of Bing Apps.