Twenty Years of Machine Learning at Microsoft

Article
07/08/2014

This blog post is authored by John Platt , a Distinguished Scientist at Microsoft Research.

People may not realize it: Microsoft has more than twenty years of experience in creating machine learning systems and applying them to real problems. This experience is much longer than the recent buzz around Big Data and Deep Learning. It certainly gives us a good perspective on a variety of technologies and what it takes to actually deploy ML in production.

The story of ML at Microsoft started in 1992. We started working with Bayesian Networks, language modeling, and speech recognition. By 1993, Eric Horvitz, David Heckerman, and Jack Breese started the Decision Theory Group in Research and XD Huang started the Speech Recognition Group. In the 90s, we found that many problems, such as text categorization and email prioritization, were solvable through a combination of linear classification and Bayes networks. That work produced the first content-based spam detector and a number of other prototypes and products.

As we were working on solving specific problems for Microsoft products, we also wanted to get our tools directly into the hands of our customers. Making usable tools requires more than just clever algorithms: we need to consider the end-to-end user experience. We added predictive analytics to the Commerce Server product in order to provide recommendation service to our customers. We shipped the SQL Server Data Mining product in 2005, which allowed customers to build analytics on top of our SQL Server product.

As our algorithms became more sophisticated, we started solving tougher problems in fields related to ML, such as information retrieval, computer vision, and speech recognition. We blended the best ideas from ML and from these fields to make substantial forward progress. As I mentioned in my previous post, there are a number of such examples. Jamie Shotton, Antonio Criminisi, and others used decision forests to perform pixel-wise classification, both for human pose estimation and for medical imaging. Li Deng, Frank Seide, Dong Yu, and colleagues applied deep learning to speech recognition.

In addition to more sophisticated algorithms for existing problems, we have been exploring new frameworks for machine learning. The most common frameworks in ML are classification and regression. In these frameworks, ML learns a mapping from a vector of data to either a label (classification) or a value (regression). But, ML can do much more than produce labels or values. There’s a whole sub-field of ML called “structured output prediction”. An early example of this was “learning to rank”, where ML produces a ranked list of items (very useful for Bing, as I mentioned before). Another interesting framework is the construction of causal models, which we have used to model our advertising system. Yet another framework is generating programs directly from data (rather than through a model).

As ML researchers, we are super excited about Microsoft Azure ML. Azure ML will create models that can be deployed to the cloud, rather than being restricted to one particular data management platform (such as SQL). Creating cloud services with ML should reduce the friction of getting ML into specific applications. As researchers, we would love to capture all of our experience and algorithms into the Azure ML product, so that our customers can use their creativity to build ML-based products.

In future blog posts, we will describe some of our current ML research topics. We can also go into more detail about some of the technology mentioned, above. If you find a particular research topic interesting, please let us know and we will try to get guest blog posts written by the creator of the technology. Thanks for reading!

John Platt
Learn more about my research. Follow me on twitter.

Comments

Anonymous
January 01, 2003
very good job indeed
Anonymous
July 08, 2014
I can't wait for this to be fully rolled out. I loved using Yahoo Pipes UI and some of the Azure ML video seems to have a similar drag and drop UI. I tested AlchemyAPI, Wolfram Alpha, PredictionWiz, Prediction.IO and Google Prediction API.

What I hope MS solves is the ability to use deep learning to prepare the data rather than requiring a strict format. Actually even if it just worked like Adwords Editor it would be great. Excel Guru's are aspiring to be data scientists are not handy with writing their own scripts to modify their data. This is a big problem with Google Prediction API. They need to be able to simply upload a sql file or a spreadsheet and the system helps them identify the column headers so they can be used as classification labels and/or part of their data set.

I also hope it solves another big issue. Can it predict an undocumented classification label?

Example Classification Model:

"1st", "2nd", "3rd", "4th", "5th", "6th", "7th", "8th"
"2nd", "3rd", "4th", "5th", "6th", "7th", "8th", "9th"
"3rd", "4th", "5th", "6th", "7th", "8th", "9th", "10th"
"4th", "5th", "6th", "7th", "8th", "9th", "10th", "1st"
"5th", "6th", "7th", "8th", "9th", "10th", "1st", "2nd"
"6th", "7th", "8th", "9th", "10th", "1st", "2nd", "3rd"
"7th", "8th", "9th", "10th", "1st", "2nd", "3rd", "4th"
"8th", "9th", "10th", "1st", "2nd", "3rd", "4th", "5th"
"9th", "10th", "1st", "2nd", "3rd", "4th", "5th", "6th"

Notice that there is no label "10th" ?

This example model represents presenting an option to a user that no other user has chosen previously. Therefore the system does not know that "10th" is an expected classification.

However, when introducing new products, services or options to users, starting off on your best foot can make a huge monetary difference. If we know they started on step #1 and went all the way to step #7 what would be the most useful step to present to them next?

So, given the sequence:

"1st", "2nd", "3rd", "4th", "5th", "6th", "7th"

the algorithm should be able to see both a pattern in the horizontal data showing the sequence but also a vertical pattern predicting the next classification label.

To me me this seems like a very "Basic 101" thing a predictive system should be able to do. You might call this "3d ML" or something. :) Limiting a system to horizontal pattern identification seems to be a massive flaw inherent in every system I have tested.

I think this can be solved with some basic Excel functions. When I write a date or any type of sequence and then drag to empty cells it attempts to create the next sequences. A super advanced version of this would certainly be extremely useful.
Anonymous
July 08, 2014
I would be interested in guest posts by various teams within Microsoft that are using ML. I'd like to understand what algorithms they are using, whether COTS (well-known public algos) or internally developed ones.
Anonymous
July 09, 2014
It would be nice if the models we create in Azure ML can be somehow transferred over to a non-connected embedded device.
Anonymous
July 12, 2014
Nice

Share via

Twenty Years of Machine Learning at Microsoft

Comments

Additional resources