Exploration: Data Science... with Mario Garzia
This blog post is authored by Mario Garzia, Partner Data Sciences Architect, Technology and Research
Data Science and “big data” have become 21st century buzzwords in the tech industry. Yet in many ways the term “Big Data” is relative to our ability to collect, store and process data. Big data challenges are not new, historically there have been several notable encounters with big data. One interesting example is the US census. The 1880 census took eight years to tabulate, and at the time estimates were that the 1890 census would take more than 10 years to complete as a result of population expansion. This was a Big Data problem at the time, and a man by the name of Herman Hollerith came to the rescue with an invention that tabulated the 1890 census in a single year and under budget with his company eventually going on to become IBM. Hollerith accomplished this by developing a new and efficient way to collect and store the increasing volumes of data (punch cards) and an electric tabulating machine that could read the punch cards and compute the needed results. There are other similarly interesting big data challenges that occurred both before and after Dr. Hollerith’s time. So is today’s Big Data challenge any different from past ones?
Data volumes are growing at a rate that continues to challenge our ability to collect, store and process data leading to the development of new technologies. But now the variety of data and the speed at which we collect the data are also accelerating. These growing trends have no visible end in sight. In a 2011 report Ericsson estimated 50 Billion connected devices world wide by 2020, each of these generating their own data in addition to the data exhaust generated by the systems that will manage the collection and processing of the device data. Another big difference today that presents tremendous opportunity is the ability to collect data directly from each of our end customers to learn about their experience with a device or service to a degree never before possible. This allows us to imagine entirely new ways to help and delight customers with new products and services that were previously unimaginable, that better understand what they need now and predict what they might need next. To date high tech companies have been the leaders in this data space, where in some cases the data itself is the product such as Bing Search or social networks, but a great aspect of today’s world is that technology is facilitating the democratization of data and analytics to derive insights across the full spectrum of human endeavor. So not only Big Data leaders but also more traditional businesses and other institutions can now leverage big data to improve their services and delight their customers. We live at a fascinating point in time where the once unimaginable is now becoming possible guided by data and analytics.
Microsoft has a very rich tradition of using data to gain insights and drive product decisions going back many years, long before Data Sciences and Big Data became terms de rigueur. I joined Microsoft in 1997 and have seen first-hand how we have evolved and grown in the data space. One of the things that I have loved most about working here is the ability to surround myself with and learn from very talented and passionate people. This is a culture where learning, gaining new understandings and striving to be the best are very much engrained. Because of this, data has always played an important role at Microsoft but that role has evolved and expanded over the past decade. We have grown from focusing on having a deep understanding of the product being shipped to also developing a deep understanding of customer experiences with our products and services.
In 2000 I came to the Windows team to form the Reliability group. Right from the start Windows Reliability was a data driven effort. For example, by the time we shipped Windows server 2000, we already had approximately 100 years of reliability runtime data on internal Microsoft production servers. After Windows server 2000 shipped we expanded data collection to other enterprises by developing a Reliability service for which companies could sign up free of charge, use it to collect reliability data from their datacenter serves and upload the data to Microsoft. This data would then be automatically analyzed and the results made available to each company individually in a website containing availability and reliability results and trends segmented by server type and computing environment. In many cases this was the first time these companies had access to such detailed data on the reliability of their data centers. This data could then also be leveraged by Windows to gain insights into operating system (OS) reliability and failure modes, set release criteria for new versions of the OS, and prioritize and drive product fixes based on failure frequency and gravity. We also used the insights from this data to develop new OS features like diagnostic services. This data driven approach allowed us to make decisions for when the product was ready to ship based on actual production system runtime criteria. While deep and comprehensive, this data was focused on product quality and ship readiness. Today the Windows operating system, and indeed all our products and services are focused not only on product quality attributes but also on better understanding customer needs. There is a renewed and expanded emphasis on building a data driven culture at the company where service and product quality remain critical but just as critical is the deep understanding of customer satisfaction, engagement and wants. Insights derived from data are used across all Microsoft products and services to deliver new, powerful features and capabilities.
Being a data driven culture means that understanding the product and customer data is not just for Data Scientists but for all of Microsoft, everyone needs to be data aware and data driven at Microsoft. Big data is used for product and service experimentation, improvement, and also to deliver enhanced and customized services leveraging techniques such as Machine Learning. Bing and Bing Ads are completely data driven. There is also a very deep heritage in Machine Learning at Microsoft over the past 20 years, from its beginnings with Bayesian Networks and speech recognition research to products such as SQL Server Data Mining. We now give companies the ability to build machine learning models and easily deploy them to the cloud with Microsoft Azure ML.
An exciting aspect of being a data scientist at Microsoft is the unparalleled breadth of customer touch points we have from computers and tablets, to phones, devices, gaming, Search and a myriad of services allowing us to better understand customer wants and experiences and use those insights to impact their everyday life in new and meaningful ways. The Data Sciences disciplines are at the core of our data driven corporate strategy. At Microsoft we recognize this and have a full engineering career path for Data Scientists, Machine learning Scientists and Applied Scientists that can reach the most senior levels in the company. We have multiple data scientist groups throughout the company resulting in a very vibrant and growing community. I believe there is no better place than Microsoft for a Data Scientist to learn, grow, have fun and make an impact.
An important event that many Microsoft Data Scientists attend each year is the Knowledge Discovery and Data Mining conference that takes place in August; this year in New York City. This is a premiere conference for data sciences. I am very much looking forward to attending this year’s KDD conference, I have been attending for many years now. It is great to share in the energy and excitement, exchange ideas with colleagues and meet new people. I always come out of the conference totally charged by the new ideas and people I’ve met. Microsoft is a Gold Sponsor at KDD this year and we are very excited to be there. Please make sure to stop by our Microsoft exhibitor booth to view demos from our Data Scientists in the Azure Machine Learning team, Bing team, MSR and many others. I hope to meet some of you at the conference.
Mario
If you are interested in a career at Microsoft, check out our openings: