A Taxonomy of Test Doubles

Many words have been written in the TDD community about the myriad ways of mocking in service of unit tests. After all this effort, there remains a great deal of confusion, ambiguity, in the understanding of many--maybe even most--developers who are using mocks.

No less than the likes of the eminently wise Martin Fowler has tackled the subject. Fowler's article is indispensible, and it in large part built the foundation of my own understanding of the topic. But it is quite long, and was originally written several years ago, when mocks were almost exclusively hand-rolled, or created with the record/replay idiom that was popular in mocking frameworks before lambdas and expressions were added to C# and VB.NET with Visual Studio 2008. Add to that the fact that the article was written in the context of a long-standing argument between two different philosophies of mocking.

Unfortunately these arguments continue on even today, as can be seen in the strongly-worded post that Karl Seguin wrote last week. Looking back now, with several more years of community experience and wisdom in unit testing and mocking behind us, we can bring a bit more perspective to the discussion than what was available at that time. But we won't throw away Fowler's post completely. Within his post, there are firm foundations we can build on, in the definitions of the different types of mocks that Fowler identified.

There are four primary types of test doubles. We'll start with the simplest, and move through in order of ascending complexity.

Dummies

A dummy is probably the most common type of test double. It is a "dumb" object that has no real behavior. Methods and setters may be called without any exception, but without any side-effect, and getters will return default values. Dummies are typically used as placeholders to fill an argument or property of a specific type that won't actually be used by the test subject during the test in question. While a "real" object wouldn't actually be used, an instance of a concrete type may have strings attached, such as dependencies of its own, that would make the test setup difficult or noisy.

Dummmies are most efficiently created using a mock framework. These frameworks will typically allow a mock to be created without actually configuring any of the members. Instead they will provide sensible defaults, should some innocuous behavior be necessary to satisfy the subject.

Stubs

A stub is a test double which serves up "indirect input" to the test subject. An indirect input is information that is not provided to an object by the caller of its methods or properties, but rather in response to a method call or property access by the subject itself, to one of its dependencies. An example of this would be the result of a factory creation method. Factories are a type of dependency that is quite commonly replaced by a stub. Their whole purpose is to serve up indirect input, toward the goal of avoiding having to provide the product directly when it may not be available at the time.

Stubs tend to be quite easy to set up even with more primitive mocking frameworks. Typically, all that is needed is to specify ahead of time the value that should be returned in response to a particular call. The usual simplicity of stubs should not be taken as false comfort that the doubles are not too complicated, however. Stubs can get quite complex if they need to yield a variety of different objects multiple calls. The setup for this kind of scenario can get messy quick, and that should be taken as a sign to move on to a more complex type of double.

Mocks

A mock is a type of test double that is designed to accept and verify "indirect output" from the subject class. An indirect output is a piece of information that is provided by the test subject to one of its dependencies, rather than as a return value to the caller. For example, a class that calls Console.WriteLine with a message for printing to the screen is providing an indirect output to that method.

The term "mock" for a particular type of test double is in a certain way unfortunate. In the beginning there was no differentiation. All doubles were mocks. And all the frameworks that facilitated easy double creation were called mocking frameworks. The reason that "mock" has stuck as a particular type of double is because in those beginning times, most test doubles tended to take a form close to what we today still call a "mock". Mocks were used primarily to specify an expectation of a particular series of method calls and property access.

These "behavioral mocks", or "classical mocks" as Fowler calls them, gave birth to the record/replay idiom for mock configuration that reached its peak in the days of RhinoMocks. And due to the tendency of inexperienced developers to create complicated object interactions and temporal coupling, mocks continue to be a very popular and common form of test double. Mocking frameworks make it far easier to unit test classes that rely on these types of coupling. This has led many to call for the abolishment of mocks and mocking frameworks in a general sense, claiming that they provide a crutch that makes it too easy to leave bad code in place. I'm sympathetic to the sentiment, but I think that this is throwing the baby out with the bathwater.

Fakes

Fakes are the most complicated style of test double. A fake is an object that acts simultaneously as both a stub and a mock, providing bidirectional interaction with the test subject. Often fakes are used to provide a substantial portion of the dependency's interface, or even all of it. This can be quite useful in the case of a database dependency, for example, or a disk storage service. Properly testing an object that makes use of storage or persistence mechanisms often requires testing a full cycle of behavior which includes both pushing to and pulling from the storage. An in-memory fake implementation is often a very effective way of avoiding relying on such stateful storage in your tests.

Given their usefulness, fakes are also probably the most misused type of test double. I say this because many people create fakes using a mocking framework, thinking they are creating simple mocks. Or worse, they knowingly implement a full-fledged fake using closures around the test's local variables. Unfortunately, due to the verbosity of mocking APIs in static languages, this can very easily become longer and more complex code than an explicit test-specific implementation of the interface/base class would be. Working with very noisy, complicated, and fragile test setup is dangerous, because it's too easy to lose track of what is going on and end up with false-passes. When your test's "arrange" step starts to overshadow the "act" and the "assert" steps, it's time to consider writing a "hand-rolled fake". Hand-rolled fakes not only remove brittle and probably redundant setup from your tests, but they also often can be very effectively reused throughout all the tests for a given class, or even multiple classes.

It's not Just Academic

These are the primary categories into which nearly all, if not all, test doubles can be grouped. Fowler did a great job of identifying the categories, but I think this crucial information is buried within a lot of context-setting and illustration that doesn't necessarily offer great value today. Mocking is ubiquitous among the subset of developers that are doing unit testing. But too many people go about unit testing in an ad hoc fashion, rather than deliberately with a plan and a system for making sense of things. I believe that a simple explanation of the major types and usages of test doubles, as I've tried to provide here, can aid greatly in bringing consistency and clarity of intent to developers' unit tests. At the very least, I hope it can instill some confidence that, with a little discipline, pattern and reason can be found in the often messy and overwhelming world of unit testing.

I am not a Computer Scientist

Prepare yourselves. I have an embarrassing and melodramatic admission to make.

My career is a sham.

Although my degree and education are in a field that is typically referred to as "computer science". I am not actually a "scientist". Nor do I "practice science". But I won't be satisfied to go down alone for this charade. I'll going on record saying that I am convinced that for the vast majority of people who were educated in or work in the field of "computer science", the ubiquitous presence of the word "science" in proximity to our work or education, is a tragic misnomer.

I don't know how long this has been on my mind, but I know almost precisely when I became conscious of it. It was a couple months ago. I was newly exposed to devlicio.us, and perusing the blogs hosted there, when I came across a post by Bill McCafferty about a lack of respect and discipline in our field.

Early in the post, Bill reveals an injustice he encountered during his education.

...When I started my undergrad in this subject, I recall reading articles debating whether it should be called a science at all. Gladly, I do not see this argument thrown around much anymore.

I think I am probably not going to make the exact argument here that he disagreed with back then. The things we all studied in school are definitely part of a nebulous field of study that may rightfully be called "computer science". As Bill points out,

"From Knuth's classic work in The Art of Computer Programming to the wide-spread use of pure mathematics in describing algorithmic approaches, computer science has the proper foundations to join other respected sciences such as physics, concrete mathematics, and engineering. Like other sciences, computer science demands of its participants a high level of respect and pursuit of knowledge."

I have no argument with any of this. He's right on. Donald Knuth (who is indeed my homeboy in the sense that we share our hometown) studied and practiced computer science (which if you know anything about Knuth, you'll know is an almost tragic understatement). And thousands of people who have followed in Knuth's foot steps can lay the same claim. However, that's not me. And it's not more than 99% of all programmers in the field today.

Computer science suffers the same type of misnomer as many other disciplines who have adopted the word "science" into their name, such as political science, social science, animal science, food science, etc. And it seems that most such fields, if not all, have done so because the very validity of the field of study itself was subject to severe criticism at some point in the past. So we take on the term "science" to get it through people's heads that there is a root in formal practices and honest intellectual exploration. But to then blanket every profession that derives from this root with the term "science" is a misappropriation of the term.

I can think of a number of examples.... The programmer working for the bank to develop their website, or for the manufacturing company to manage their transaction processing system is no more necessarily a "computer scientist" than the election commentator is necessarily a "political scientist". When someone gets an electrical engineering degree and goes to design circuits for a living we do not say he "works in electrical science". We say he is an electrical engineer. When someone gets a technical degree in mechanics and then goes to support or produce custom machinery, we do not say he "works in mechanical science". We say he is a mechanic, or a technician. Why, then, when someone gets an education that amounts to a "programming degree", and then goes to work doing programming, do we say that he "works in computer science"? It's a uselessly vague and largely inappropriate label.

By contrast, if you have a doctorate in computer science, I'm prepared to say you deserve the label. If you write essays, papers, articles, books, etc. for use by the general practitioner, you probably deserve the label. If you do research, or work on the unexplored fringes of the field--if you are exploring the substance and nature of the information or practices that the rest of us simply consume and implement, then in all likelihood you deserve the label.

Please, please understand that I am by no means belittling the value of our work, or the nobility of our profession. Often we simply consume the information produced by true "computer scientists". But we transform it from theory into practice. We resolve the concrete instances of the abstract problems that the true scientists formally define. We take the pure thought-stuff produced by scientists and turn it into tangible benefit.

This is not trivial. It is not easy. It deserves respect, discipline, study, and care. But it is not "practicing science".

I should say in closing that I am not as upset about all this as the tone of this post might imply. I don't even really have a big problem with the use of the word "science" to refer to a field of study or work that largely does not include research-type activities. I don't like it, but I accept that it happens. But "computer science" has a problem that other similar "sciences" don't. When someone says they work in "political science" or "food science", you can make a guess as to the type of work they do, and it's hard to be significantly incorrect. Though maybe it's my outsider's naïveté that allows me to make this claim. At any rate, "computer science" as a field is so broad and vague that I don't think the term communicates a useful amount of information. But you wouldn't know that by talking to programmers, who seem only too ready to attempt to take hold of the term and own it for themselves.

I think this is one small facet of a larger and far more critical issue in our field in general, which I fully intend to write more about very soon. But until then, lets take the small step of starting to consider what we really mean when we use much of the popular but often ambiguous terminology when discussing our profession.

I work in the field of computer science. This tells you nothing except that I am unlikely to be a prime specimen of the wondrous human physiology. But.... I am a programmer. I have a degree in Computer Engineering. I am interested in programming theory. I work as a software development consultant. And now, you know something about what I know and what I do.

Now what about you?

Update: I forgot to note that in McCafferty's blog entry, he himself makes use of "trade" terminology to categorize different levels of reading materials. Which belies the uncertain nature of programming as a profession. We certainly wouldn't say that a carpenter works in the "wood sciences", would we?