Microsoft Office: The Mathematics of Office Compatibility

When planning an Office upgrade, simple mathematics will help you determine your appropriate compatibility-testing scenario.

Chris Jackson

“Facts are stubborn things; and whatever may be our wishes, our inclinations or the dictates of our passion, they cannot alter the state of facts and evidence.” — John Adams

So you have decided to deploy a new version of Office throughout your organization. How long should you expect it to take between signing on the dotted line and when your users are productive in their day-to-day work? For the average organization, that can be anywhere between 12 and 18 months.

Much of that delay is due to risk management efforts. You must ensure you don’t disrupt the business with your deployment efforts. While that may seem frustrating, you can’t completely forego risk management. How do you balance these conflicting goals?

The typical approach is to solve these problems with technology. You use tools to gather data, as if the very existence of data must somehow accelerate your process. Adding still more data does not alleviate the natural emotional experience of fear. There will be fear that your upgrade will cause some mission-critical business failure. You might apply the following equation: The cost of failure can equal infinity.

You must also contend with the fear of change and the insistence that nothing can be broken. You might choose to keep a virtual copy of the legacy version of Office available. If you experience a problem with a document or application, you can direct the user to immediately leverage this safety net and get productive again. You could also remediate the application in order to get back into the upgraded environment as quickly as is feasible.

Too many pin their hopes on the infallibility of a tool, process or partner without stopping to think about what all of these decisions mean. They attempt to avoid failure at all costs. Once you’ve reduced the cost of failure from “catastrophic” to “reasonable and prudent,” you should explore some of the mathematics behind these decisions. This will help you make better decisions and evaluate the claims others are making.

You should test your applications for compatibility when the cost of failure multiplied by the probability of failure is greater than the cost of testing.

As with any generalized formula, this requires some interpretation. Testing for compatibility isn’t to be considered a luxury, nor is application compatibility testing a tax. Think of it as an investment in managing risk.

What does this mean for the rest of your documents and applications—the ones you didn’t test? It doesn’t mean you should ignore them. You’ve simply chosen to not invest in proactive testing. Instead, you invest in reactive testing. You fix them when a user reports they’re broken (and you should make it extremely easy for users to do that).

Imagine you have a document that is indeed important, but won’t cause undue business disruption in the event of a failure. The user will call the help desk, and with your reactive app-compatibility support, you can resolve the issue in an average of 30 minutes. That time will be less if it’s critical and you need to use virtualization and immediately get productive again.

Assuming you’re tracking document failure, you discover you have a 5 percent failure rate for Office documents in this particular migration. Assuming a user’s time is worth $250 an hour, this failure will cost you $125 times 5 percent, or $6.25. If the amortized cost of your proactive testing process exceeds $6.25, even by just a penny, you shouldn’t bother proactively testing the document at all.

Inventory First?

The process flow of “inventory, rationalize, test and remediate” is pervasive and ingrained. Companies go through all kinds of contortions chasing an accurate inventory. Does the math really indicate this is worthwhile?

The answer, of course, depends on the inputs you bring to the equation. It’s worth doing the math, though, to determine how you should perform application discovery. You should take a complete inventory when the number of documents multiplied by the rate medium multiplied by the time to determine criticality is less than the rate high times time list documents.

The first variable is how many documents you typically generate. This number has a large standard deviation depending on the nature of your business. Then you need to figure out which resources can help determine how critical a document is to your organization.

Rationalization can be a bit more difficult with documents. There are fewer obvious throwaway items than you have with applications. Allocate how much time you think it will require to determine whether a document or application is within scope, depending on the complexity of making these determinations in your environment. The alternative is to ask business users (who work at a higher cost) which documents are critical, and allocate some amount of time for them to generate this list.

Assume you have an organization with 50,000 users who generate an average of 500 documents per person stored somewhere on the network, resulting in 25 million documents. Assume the resources for doing rationalization work for a rate of $100 an hour, and that five minutes is spent investigating each document.

Here’s how the math works out:

25,000,000 x 100 x 0.08 ? 300 x 8 x 100

200,000,000 >> 240,000

This turns out to be an easy decision. If you were to take a complete inventory and rationalize it, it would cost $200 million. And, at this point, you haven’t even determined whether anything works yet. The alternative is to spend a couple hundred thousand dollars in opportunity cost and simply have people create the list by hand.

The law of large numbers works against you when it comes to Office rationalization. If your intent is to use tools, you have to use more automation than just an inventory. What you’re looking for is business impact, which is difficult for tools to deduce. One surrogate some use is the Modified date on the file, but this is an imperfect surrogate. A better surrogate would be some measurement of actual usage from client computers.

New versions of Office will introduce a telemetry feature. This is integrated with the product (and installable on lower-level versions of Office), and collects data from users’ computers on the documents and add-ins they’re using. When generating a document inventory, look specifically at documents that end up in a user’s Most Recently Used (MRU) list. This helps refine the inventory from “this represents every document anybody has ever managed to create” to “this represents what people are really using all of the time.”

Using this approach, you’ll be able to collect and analyze telemetry data that indicates where the real risk is more likely to exist. Today, however, there’s often no better value for getting your inventory than simply asking people to submit a list.

Vendor Support Statements

Vendor research is entrenched into many app-compatibility processes, particularly when delivered by an application-compatibility factory. The question of do you need this depends on whether you’ll require vendor support for a given application. If you’ll most likely require vendor support, doing vendor research is a business requirement.

Many of the processes and lists used by vendors help determine if the application has ever been supported on the platform to which you’re migrating, and whether the application is actively supported today. This might be a great indicator of whether it’s likely to work on that platform, because the vendors claim this with their own support dollars.

Note that it won’t actually answer your business question of whether the application was ever supported and would you be willing to run it. Make sure you receive the answer to the question you asked.

The math comes in here to determine whether vendor support is worth the investment of time to do that research. The alternative is running the process of assessing compatibility yourself. However, many people do bad math, comparing the cost of testing one app to the cost of researching one application. This flawed assumption posits that you’ll find statements for everything, but it isn’t the statistically correct approach.

Here’s an effective equation: You should research vendor support when the cost of testing one app times the percent not found or critical plus the cost of research for one app is less than the cost of testing one app.

Once again, let’s put some numbers to this. Imagine a testing process that ends up costing $150 per application. Similarly, say there’s a vendor research process that averages $15 per application. The company can locate support statements for 12 percent of applications, and of those, you intend to test 10 percent anyway because they’re mission-critical.

$150 x 89.2 percent + $15 ? 150

$148.80 < $150

Yes, it does makes sense to do vendor research, but it isn’t the slam dunk you would’ve thought. After all, vendor research appears to be one-tenth the cost of application testing. On average, you end up only saving $1.20 per application.

Of course, saving anything (particularly at scale) is good, but it should give you warning that you’re pretty close to the crossover point. The failure rate doesn’t need to move too much before you discover you’re in negative numbers. In fact, with these assumptions, as soon as the percentage of applications for which you locate support statements drops to less than 10 percent, it starts to become more costly to look things up on a list. It seems counterintuitive, but the option with the lower unit cost is actually less desirable when its failure rate is too low.

The Conversion Conundrum

The latest versions of Office have significantly improved capabilities. Many of the new document-creation and editing features require you to use the latest file formats. If you want to leverage those new features within a particular document, you should convert those files and go from there.

One of the regular questions is whether you should go through and update all of the documents to the new file formats. The process seems easy, of course, and there are tools to assist with this. However, not all documents will upgrade cleanly. The cost of upgrading includes the risk management of ensuring that the upgrade doesn’t cause an application failure. Some functionality might be lost (such as the “versions” feature), some charts might look different and links to the file using the old file extension could end up broken.

Of course, this cost also incurs a benefit. By ignoring files you could manually update in order to use the new features of the product, every file will benefit from the reduced file size of the new file formats. To make this determination, you once again have to look at the math to determine if you have a positive ROI from undertaking such an activity. You should bulk-convert all of your documents when the cost of converting plus the cost of testing is less than the cost of drive space.

Starting with the cost savings, and assuming the same numbers as before (50,000 users with 500 documents each), add an average file size of 1MB. The potential cost savings with these parameters is the cost of saving 11TB of disk space. To put that into perspective, let’s compute the cost of that disk space. Using a formula computed by Matthew Komorowski and posted in his “A History of Storage Cost,” we find that cost equals 10 to the power of -0.2502 (year minus 1980) plus 6.304.

This is computed based on purchase price of disk media alone. While this would suggest you could buy a hard drive to store data for around $0.02 per gigabyte in 2012, to calibrate this against total costs, use the storage costs of Windows Azure. These are currently priced at $128 per terabyte, or just shy of 6.3 times greater than the computed cost from this formula.

So to calibrate the results to attempt to predict total cost of storage, modify the formula as follows: Cost equals 6.3 times 10 to the power of -0.2502 (year minus 1980) plus 6.304.

Assuming you keep documents for an average of 10 years, the total net savings predicted by this formula would be $3,209.53. Even if you simply assumed the storage cost on Windows Azure would remain constant, your total savings would still be $14,080.

Therefore, if you test your documents for probable conversion success and convert them for a total cost of less than $14,080 conservatively ($3,209.53 if you assume storage costs will continue to decline at the same exponential rate), then it makes sense to convert.

If it costs you more than this to both test and run the conversion, then it doesn’t make sense to convert. You should leave your documents in the existing format. Office 2010 can still read them just fine. Convert them manually when you need to use new features.

The Right Tool

The most common pattern when addressing compatibility on any platform is looking for a tool to find—and hopefully fix—all upgrade problems. You might assume a tool associated with the compatibility problem will achieve all or part of this goal and run it everywhere. At some point down the road, you’ll realize you haven’t actually solved that problem completely.

The problem is one of constraint. A compatibility bug, after all, is just a special case of a bug that happens to manifest itself on that particular platform. This is no different from finding and fixing all of the bugs on the platform you already have. Unless you simply don’t have a help desk because all of your software works all of the time, my guess is that you have plenty of compatibility bugs with that platform. That’s not because it isn’t understood, but because the problem is intractable.

Most of you have at least a passing familiarity with Alan Turing’s 1937 proof that you can’t build a program to detect if an application will finish running or continue to run forever. This constraint implements far-reaching limits that affect research even today as scientists look to build new verifiers to ensure program correctness. For example:

“Context-bounded analysis is an attractive approach to verification of concurrent programs. This approach advocates analyzing all executions of a concurrent program in which the number of contexts executed per thread is bounded by a given constant K. Bounding the number of contexts executed per thread reduces the asymptotic complexity of checking concurrent programs: while “reachability” analysis of concurrent Boolean programs is undecidable, the same analysis under a context-bound is NP-complete [18, 15]. Moreover, there is ample empirical evidence that synchronization errors, such as data races and atomicity violations, are manifested in concurrent executions with a small number of context switches [19, 16]. These two properties together make context-bounded analysis an effective approach for finding concurrency errors. At the same time, context bounding provides for a useful trade-off between the cost and coverage of verification.”

You need to define precisely what it means to be a bug. Most people agree a program crash is a bug. There are other situations where determining whether program behavior is a bug depends on context. For example, changing the color in a graph might negatively change meaning. It could also be acceptable or even desirable. A version check—which is technically a feature, because it required the developer to write code to introduce the behavior—is sometimes considered a bug by the person who falls out of compliance with version restrictions.

Knowing you can’t find all bugs, or even all bugs of a particular category, you need to ensure you focus your automation on areas where the automation makes sense. Solving the halting problem, were it possible, would give us the ability to stop all infinite loops—a common bug. That would have a huge payoff.

When you consider the concept of attempting to find and fix all bugs, not all would have a payoff. There are theoretically an infinite number of ways to write a programming bug (whether empirical or contextual), and a finite number of ways to detect them all. As the number of incidents of programming bugs decreases, the payoff for automating these checks similarly declines. This drives what we choose to automate.

The mathematics are relatively straightforward. You should write automated tests to find bugs when the cost of creating and running automation is less than the cost of debugging multiplied by the incidence of bugs.

This works out well for extremely common problems. In Windows applications, for example, User Account Control made running with Standard User accounts more realistic for many organizations. It also exposed a significant compliance problem with much of the software designed for times when running as an administrator was considered more acceptable. These problems were so common and so similar that creating automation to detect all of them had significant payoff when compared to individual troubleshooting.

Vendors who create verification tools for a given platform can amortize the cost of creating automation over many customers. They invest in making new test creation easier and less expensive, so they can work this math to the advantage of running far more tests. At the same time, the tests that might be easiest (and cheapest) to create might have a very low incidence. Watching for Type I (false positive) and Type II (false negative) errors becomes important.

Verifier technologies coming in new versions of Office combat some of the challenges with earlier versions of the tool, which were subject to both Type I and Type II errors. There was also confusion regarding applicability to particular scenarios.

The Office Migration Planning Manager toolset, for example, scans and determines any potential challenges. It also looks at the challenges of converting the document to new file formats (which wouldn’t be a challenge if you chose to simply leave the document as is). The new verification focus will be on deployment-blockers issues, such as using deprecated APIs and application crashes (typically caused by add-ins).

This work in evaluating, testing and converting is really the final step in a three-step process:

  1. Alleviate emotional fears so you can think rationally.
  2. Collect the right data.
  3. Analyze that data to convert it from data to information to knowledge.

And here’s the mathematical basis behind a few guidelines:

  • You don’t have to do something just because you can.
  • You should run your application-compatibility work as if you were writing for a newspaper: Start with the most important things and work your way down. Not everyone finishes, and you want to finish the most important stuff.
  • Don’t conflate compulsory tasks with optional tasks, even if they seem related.

Ideally, adventures in exploring the mathematics behind how to make better decisions on what to include in your Office compatibility project will inspire you to question your assumptions and improve efficiency.

Chris Jackson

Chris Jackson *is “The App Compat Guy” at Microsoft. He’s a principal consultant and the worldwide lead for enterprise application compatibility. He’s a frequent speaker at IT and developer conferences, and works with customers and partners worldwide. His mission is “restoring technology agility by removing the shackles of legacy software.” Read more from Jackson on his blog ( and on Twitter at *