How we know which file formats are used

A reader wrote to ask me how it is that we know what file formats are being opened by users. I can assure you that neither the Bavarian Illuminati, UFOs nor 3-letter agencies flying black helicopters have anything at all to do with this. We're also not secretly spying on you – that would be a very serious violation of our privacy policy. What we are doing is using the Microsoft Customer Experience Improvement Program.

When you install Office, it asks if you'd like to opt-in to sending us additional information. Internally to Office, we know this as SQM (pronounced squim, and no, I'm not sure what that stands for). As a developer, this is a really cool thing, since it lets you know when unexpected things happen in the field. It allows me to handle all the errors I'm expecting, and then to create a default error handler that lets me know something unexpected happens. It encourages careful code, so you might see something like:

If( !ThisNormallyWillNeverFail() )
NotReachedSz( "Unexpected failure in MyFunc", "asdf" );
return false;

If you start hitting a lot of these, we find out about it, and can investigate further. We also use it to find out how often people do things that we think are rare – the more people use something, the more attention we give it. This is how we know that only a very small fraction of all users open really old files. It also allows us to really look into things like the misconception that people only use a small fraction of our overall functionality – most individuals only use a portion, but this varies strongly from one group to the next – an obvious example is that we now know how many people care about really old file formats. Or when I was a graduate student working on my dissertation, I used a bunch of things in Word and Excel that I don't use now. So if you spread usage out across all our customers, you find that there's a sizable number of people who use just about any given feature – we don't typically pile on features just because it sounds cool to someone – we add them because someone wanted them. For example, in Office 2007, there's some features around managing references and citations that I'd have really liked to have 10 years ago.

One downside is that because we explicitly don't collect anything that would allow us to trace the feedback to any individual, we're also missing some of the context. For example, out of the users who do X, how many of these are home users vs. business users? Some of that would be nice to have – some features ought to work differently on a home install than business.

At any rate, that's how we know things like that. And I'll say it again – just because only a few of you use something doesn't mean we should do something that breaks you without making it easy to recover.