Software Testing Cage Match: Amazon.com vs. Microsoft

2010-04-28

While I previously made some comparisons between Amazon.com and Microsoft's different approaches to software testing in Building Services: The Death of Big Up-Front Testing (BUFT)?, I think now would be a fun and interesting time to do a deeper dive on this.

Before I joined Amazon.com in 2005 as an SDET, while I was interviewing for said position in fact, I was told about the “QA situation” there. I was told “it’s improving”. Improving from what? you may ask. Well, the perception was QA got short shrift with the 1 to 10 (or 1 to 7, or 0 to infinity) Test to Dev ratio held up as proof.

Ratio

“Improving, eh?” Did I buy that? Not necessarily, but I quickly came to a realization: I have previously used Amazon.com a lot, and had rarely noticed any problems…it seemed to work. Even so, after I joined the QA Team there, it was still a frequent source of grousing that Amazon as a whole did not value the Quality Assurance profession, otherwise they would surely fund more of us. I later shared this grouse with my former director of engineering from my previous company over a game of cards. I expected sympathy, but instead he simply asked “And is this the right thing for Amazon?” Between that eureka moment, and years more of experience, it taught me that it’s not about the ratio, but about what do you expect from your software teams (Dev, QA, product specialists, and managers) and how do you handle and mitigate risk.

Across the Lake

In 2009 I changed companies and moved to Microsoft (across Lake Washington from Amazon’s Seattle HQ). Microsoft has a reputation as a place where testers were respected and software quality was given priority. I was eager to see the “Microsoft way” of quality…the fabled 1 to 1 ratio. Turns out that’s the Office and Windows way, but a nascent realization of how we test such “shrinkwrap” product versus how we test services was taking hold and experiments in quality processes abounded. However I think there are still fundamental differences to how Amazon.com approaches software quality versus Microsoft.

Head to Head

I manage a Test Team at Microsoft with a 1.5 to 1 Dev to Test ratio. At Amazon I had 1 SDET for every 7 or so Devs. So my new job must be easier right? Nope. Ratio is not an input into the equation, it’s an output. You set your quality expectations and you employ processes to get you there. One path takes 10 SDETs and another takes 1. How can this be? Well, let’s compare and answer the question:

How does Amazon get by with so few hours spent by its QA teams relative to Microsoft?

1. At Amazon.com whole features, services, code paths went untested. Amazonians have to pick and choose where to apply their scarce resources. At Microsoft, other than prototypes or “garage” projects you can expect the complete “triad” of Development-Test-Product Management teams to be engaged at every step of he way.

Exclusion of code from testing cuts down your need for testers. Maybe SDETs to lines of code tested is a more interesting ratio than Test to Dev? If you exclude untested features at Amazon, then the test to dev hours ratio is going to increase closer to Microsoft standards

2. Amazon has “Quality Assurance” teams while Microsoft has “Test” teams. However QA at Amazon almost never got involved in anything but testing. That is to say Microsoft and Amazon should swap the names they use for their QA teams since Amazon's are much more "test" only teams while at Microsoft we seem to achieve more actual QA.

Saving time by not reviewing the design or designing for testability is not saving time at all.

3. Functional-only testing was common at Amazon.com, however performance testing was either not done, done by developers, or given second class status.

Performance testing was often done by the dev teams, so these test hours were actually spent, just not by the test team.

4. A high operations cost was considered acceptable (developers carried pagers), so releasing a bug was OK because it was relatively quick to fix it. (lower quality bar for release). Also Amazon had better tools and processes for build and deployment which enabled rapid deployment of hot fixes.

Essentially a form of testing in production (TiP). Again tally up the hours and put them on Dev's tab

5. Better self-service analysis tools. Any issue that was found in production was easier to analyze and turn-around quickly due to better tools for monitoring servers and services, and sending alerts.

Reducing cost through automation (and tools)... this is a real savings.

6. Cheap Manual testing. I am of mixed mind listing this since I spent a great deal of energy encouraging the manual testers to automate their tests, but Amazon employs overseas teams to bang on the product via the black box interface and find problems before production users do. This had a decent yield for finding defects.

Hidden test hours. When people talk about the test to dev ratio at Amazon they often do not count these off shore teams.

Expectations

A friend of mine who is a QA manager at Amazon recently lamented:

“The test to dev ratio [is] insanely stretched …. there's soo much more we could do, but no we just rush and rush and cut things and get [blamed] when we miss something”

So maybe my “head to head” comparison does not explain away all the differences, but the message I would like to convey is that it is about expectations. I originally wrote the above list in response to a Dev manager who asked me why we couldn’t be more like Amazon and “pay less” for QA. Amazon has one expectation and Microsoft has another about quality and about risk… that’s why.

Caveat

I’ve made a lot of generalizations about how things are done at Microsoft and Amazon.com, which means what I said is going to be simply wrong when applied to several teams. Feel free to let me know in the comments how I screwed up in portraying your team. But be aware I know it’s not one size fits all…hopefully I’ve captured the big picture.

Improving?

And to close, I will say that other than the ratio, Amazon did improve while I was there. I saw the QA community come together and start interacting in positive ways. Amazon’s first ever Engineering Excellence forum was organized by the QA community. So that just leaves the final questions: Does Amazon’s ratio need to be improved, and what does improved look like? Do Microsoft’s expectations need to be changed, and what would those look like?

Technorati Tags: Software testing,Microsoft,Amazon.com,Testing,Testing in Production

Comments

Anonymous
April 28, 2010
The comment has been removed
Anonymous
April 28, 2010
Hey Rob, Thanks for your comments. I think as someone who has been an SDET at both Microsoft and Amazon your input carries a lot of weight here.
Anonymous
April 28, 2010
The comment has been removed
Anonymous
May 14, 2010
Hi Seth, I think there are other factors that are important in decided the optimal level of test investment. You discussed how easy it is to fix a bug, and how that can be quite different depending on the type of product. Another important question is how easy it is to detect a bug. If a bug causes a server to crash, that's easily detected and reported by monitoring systems. If a bug causes the wrong amount to be deducted from a bank balance, can you detect that in production, or do you need to develop specific test cases to detect such bugs before release? Yet another variable is how the system behaves when a bug hits. If the effect is that a web page does not display correctly, but when the user refreshes the page it does, that bug is much easier to tolerate in production than a bug that corrupts the database. If a system is designed from the start to tolerate certain kinds of failures or bugs, that can have a big impact on the ultimate test budget to ensure the system is meeting its requirements. Ralph Case
Anonymous
May 14, 2010
Great point Ralph. I think it can be summarized as risk assessment of bug impact. Hard to detect bugs are higher risk. Bugs without work-arounds are higher risk. Also systems like banking, aerospace, medical are less tolerant to defects in production, even if you take steps to limit the scope of these defects (such as by using exposure control when TiP).

Share via