Context loss via Implicit Information

A post model is not a view model. A form view model is a subset of the post model. Even when they are structurally the same, the roles are different, and it's highly likely that changes will exert a pressure to evolve the details of those roles separately.

Here's the problem: If, at the point that your view model and your post model happen to be structurally identical, you decide that it's "overkill" or "duplication" or "unnecessary complexity", and just use a single model, then you make those two roles completely implicit, and model a scenario that is only circumstantially true. A less experienced developer will come along, and absent the experience to know that there are two roles in one place, evolve the single object in-place rather than split the roles out. They don't know any better, and you have hidden from them the information that would indicate to them the better way forward.
In short a data object is incomplete without its context. Its responsibility, in terms of the Single Responsibility Principle, is naturally coupled from its context. This isn't a bad thing, as long as you recognize and respect that a data object has no standalone identity.

So how do you respect this symbiotic state of affairs? In a strongly-typed language it's very simple: by giving it its own, layer-specific type/class. This way it will become blatantly obvious when a data object has gotten too far from home and is in danger of being misused or misinterpreted.

In a loosely-typed language, you respect the coupling by cloning or mapping a data object as it moves between layers. Since there's no type system to stop you, it's extremely easy to just pass the same thing around until a clear requirement of change appears. But the very fact of the data object's implicit role means that the developer will have precious few clues to this. By always cloning or mapping into a new object the potential damage is mitigated, and the developer gets a clue that something is being protected from external modification. Hopefully this distinction of identity encourages the developer to make even small changes to structure and naming that ensure that the object is always expressive and clear of purpose relating to the context of its usage.

Recognizing this is a nuanced design skill, and it's one that people often resist in order to avoid adding more types to their programs. But it doesn't take long to reach a level of complexity where the benefits of this care and attention to expressiveness and signalling become clear.

Dependency Relationships and Injection Modes: Part 2

Last time, I wrote about primal dependencies and the constructor injection mode. I also mentioned that in Part 2 I would cover "the other mode". In contemplating this other mode, however, I came to the conclusion that there are in fact two other modes. I'll cover both of them here in Part 2, but first we'll tackle mutator injection and optional dependencies.

Mutator Injection

Let's get the terminology out of the way first. By mutator, I mean a property or method on a class or object that is used to change some bit of state in that class or object. By optional dependency, I simply mean one which is not required in order for the object to accomplish its primary purpose.

In most code you'll read, mutator injection is ubiquitous. However, I would argue that most are misused because the usage, the need, or both, do not fit the above description of an optional dependency. So what are people doing with dependency mutators that they shouldn't be? And what should they be doing instead? Most people use mutation when they are presented with an apparent hurdle to using another, preferable injection mode. Often this comes down to an inconvenient design choice external to the class that creates an environment where it is hard to do the right thing. The appropriate response is of course to fix the context, whenever possible.

Why You Should Prefer Alternatives

Often, a dependency mutator is used to handle post-construction initialization of a primal dependency, where the appropriate dependency instantce is not known or available when the consumer is constructed. I would argue that this is very rarely an inevitable state of affairs, and is commonly due to a dependency cycle between two layers of the architecture. (I.e. an object in one layer depends on an object in the layer below it, and another object in that layer depends on an object in the first layer.)

Intra-layer dependency cycles are almost always a bad idea, not just because it makes it tough to avoid mutator injection. Maintaining a single direction of intra-layer knowledge also helps to prevent unnecessary coupling, and it aids in keeping each layer cohesive. But an extra benefit of fixing this situation is that it will typically allow for the primal dependency to be created or retrieved when it is needed for the constructor injection, rather than by the consumer or a manager object.

Why does this matter so much? Most simply, because mutator injection is a state change. And state changes are big potential points of failure. It's also well-established that it's easier to reason about immutable data structures. I've found this to be true in my own experience as well. It's easier to compose behavior out of pieces that exhibit immutability because it simplifies the interaction graph. Furthermore, when things go wrong fewer state changes means a shorter list of possible places where the bad state may have been triggered. So when designing my classes, I avoid mutators wherever I can, and that means preferring constructor injection and site injection over mutator injection whenever possible.

The other way that mutator injection is mis-used is in a situation I briefly mentioned in Part 1: when only a subset of the operations on a class actually make use of the dependency. Mutators are often seen as convenient in this situation because they cut down long constructor argument lists, which are seen as messy or complex. The fact that the object has at least a limited use without the dependency makes it possible to get away with setting it later.

In Part 1 I said that the situation itself is often due to a violation of the Single Responsibility Principle. It would be facile to cite this reason for all instances of mutator injection. Sometimes it's true and sometimes not. When it really isn't true, there are a couple of other options available before you have to resort to mutator injection.

The first option is again to drop back to constructor injection, but to do so with a sort of lazy proxy in its place. The lazy proxy exists for the sole purpose of deferring the resolution of the actual service dependency until it's needed. This is its Single Responsibility. It can then either provide an instance on demand, or it can be more transparent and just implement the same interface and delegate to the underlying object.

In a static language like C#, this isn't always possible either. Sometimes there's no interface, and some of the members are not virtual, so they can't be overridden in a derived class. If you control the code of this object, I highly recommend refactoring to an interface to resolve this problem. But if you don't, it might be time to look at the third injection mode: site injection.

Temporal Dependencies and Site Injection

Site injection is a fancy name for a method argument. It describes the use of a service as an argument to the operation for which it will be used. Such a dependency, that is passed directly to an operation, and for which no reference is maintained after that operation completes, is called a "temporal dependency". The reference is fleeting, and not actually part of the state of the object that depends on it.

There are a couple consequences of this type of injection. First is that the calling object or objects become the primary dependents of the service. It may or may not be easier to manage the dependency there. Hopefully it is, but sometimes it isn't, especially if there are many different objects that may call on this operation.

The other consequence of site injection is that it can add noise to the interface/API. If there are multiple operations in the subset of the dependent object that make use of the service, the repeated passing of the service into the objects can feel like unnecessary overhead. Not performance-wise, but in terms of keeping the code clean and readable. Expression of intent is paramount here. If the extra arguments communicate the true relationship between these two pieces of your architecture, then it's not truly noise. Instead it's a signal to readers of what's going on, and who's doing what.

There are lots of places where site injection and temporal dependencies are very natural. So many that you probably use them without even thinking about it, and it's often the most appropriate thing to do.

Truly Optional Dependencies

That's probably enough about what isn't an optional dependency. So what is? I would argue very few things are. Usually an optional dependency crops up as something that can be null when you want to do nothing, or can have an instance provided when you want to do something. For example, one step of an algorithm template, or a logging or alert solution with a silent mode. Often, even these can be treated as primal dependencies from the consumer's perspective, with a bonus of avoiding null-check noise. Just provide a Null Object by default, and make a setter available to swap the instance out as necessary. If you can say with certainty that this is not appropriate or feasible, then you may have a truly optional dependency.

A Good Start

I believe these three dependency relationships, and the injection modes associated with them, cover most of the scenarios that you'll encounter. And I've found that the advice I've included for normalizing to these modes has helped improve the design of my code in a very real and tangible way. But I'm certain my coverage isn't complete. There are surely more dependency relationships that I haven't identified here, especially if you dig into the nuances. I'd love to hear about other types of dependency relationships you've encountered if you're willing to share them in the comments.

Dependency Relationships and Injection Modes: Part 1

This week I'm going to continue my trend of adding to the cacophony of people talking about IoC and DI. There's a lot of talking, and a lot of it is noise. Not inaccuracy, mind you. Just an awful lot of shallow coverage, one step beyond assumptions and first principles, with not a lot of real analysis. So let me bring a bit of deliberate consideration and analysis to the table, and hopefully improve the signal to noise ratio.

What's different this week from the recent past is that I am specifically going to talk about DI for once, rather than IoC in general. The particular topic I want to address is something that I've seen addressed a few times, but usually in passing reference in the context of some other argument. And most often it's being both taken for granted and dismissed out of hand, by the respective debaters.

In most languages there are essentially two injection modes available: via a constructor argument, or via a mutator such as a property setter or method. I've found that each option is most appropriate in different situations. A constructor argument dependency expresses a relationship that is very different than that expressed by a mutator.

Conveniently, there are also two broad types of dependencies that map fairly well to these two modes. The first is sometimes termed a "primal dependency". (I'd love to credit the inventor of this term, but all i can find in a Google search is Jimmy Bogard, Tim Barcz, and Steve Harman all using the term as if it already existed.) The second type of relationship is most often called an "optional dependency", which is a much less evocative term, but it's a descriptive one nonetheless.

The relationship that is being evoked by the term "primal dependency" is one of raw, fundamental need. A primal dependency is one that must be satisfied in order for the class to even begin to do its duty. When a primal dependency is unfilled, the dependent object is essentially incomplete and nonfunctional. It's in an invalid state and can't properly be used.... At least not if the SRP, or Single Responsibility Principle (PDF link), is being obeyed.

The SRP is one reason why I advocate using constructor injection for primal dependencies. Constructor injection makes it explicit and unavoidable that a given dependency is a requirement for proper functioning of the object. If a dependency could be provided via a mutator without affecting the post-construction usefulness of the object, then really you have one or both of two situations. Either it's not really a primal dependency, or your object is responsible for more than one thing and this dependency is only required for some of those things. Using constructor injection avoids the possibility of this signal being hidden behind the guise of post-construction initialization.

When people oppose constructor injection, it's often on the claim that it puts an undue burden on the object performing the instantiation. While it's true that a burden is being transferred, I'd argue that it's not undue. Construction and initialization of an object into a fully valid state should be considered to be two parts of the same operation, the same responsibility. Furthermore, this responsibility is separate from the primary purpose for the existence of the object itself, and so it doesn't belong to the object itself. This is what DI is meant to resolve. Yet, the logic must be taken a step further.

If construction and initialization is truly a multi-step process pulling from different domains, and/or which can't cleanly be handled in one place, then it's a responsibility that's complex enough to base on entire object around. In short, you should strongly consider setting up a factory or builder object to handle this task. The only extra burden this leaves is on the brain of the programmer who is unfamiliar with the pattern. But trust me: just like with any exercise, the burden gets lighter with repetition.

I have a confession to make: I had never actually used the builder pattern until I came to this revelation. I didn't really understand what the benefit was over a factory. And I'd only taken the most rudimentary advantage of factories, for implementing concrete implementations of abstract base classes. But now each is as clear, distinct, and obvious of purpose to me as a screwdriver and a hammer. You'll find they come in quite handy, and really do clean up the consuming code, once you acknowledge that construction and initialization should often be a proper object responsibility all on its own.

In Part 2 of this topic, I'll address optional dependencies and the mutator injection mode. Like today's topic, they're deceptively simple to describe. But there is just as much, if not more, nuance to be found in the manner and purpose of their implementation.

Data Plus Behavior: A Brief Taxonomy of Classes

When I was first learning about classes and object-oriented design in college, classes were explained to us cutely as "data plus behavior". This is oversimplified and inadequate in the same way as would be an explanation of the Cartesian Plane as "a horizontal axis plus a vertical axis". This neat package doesn't begin to encompass all that the Cartesian Plane is, especially compared with its constituent parts.

Through my own work and experience, I've found it useful to break classes down into four different categorizations that better express the continuum of data/behavior relationships. Each category serves a different class of needs, and each interacts with other classes and categories in consistent ways. The four categories, as I have identified them, are data, algorithms, stateful algorithms, and models. Let's take a look at each of these in turn. We'll go over the traits that qualify each, the roles they serve, and their typical relationships with other classes.

Data

Data classes are quite straightforward. They are "pure" data, with little to no behavior attached. Your typical "data transfer objects", primitives, structs, event argument classes, etc. fall into this category. There are few, if any, methods defined on this type of class. When a data class does have methods, they will typically be related to type conversion or projection, simple derived value calculation, or state change validation. However, methods offering complex mutability or interactions with other objects would push a class out of this category and toward the "model" category.

Classes in this category are most often composed completely of other data classes. When I was initially considering other qualifications, I thought immutability would a be a big one, but I've found that's not necessarily true. Rather, it's dependency on services (especially stateful algorithms) that almost certainly indicates that a class is a model rather than a data class. Data classes very rarely have more than a passing dependency on external behavior, though they may have multiple dependencies on external state, made primarily of other data classes. This makes sense, since the most common thing to do with algorithms, stateful or otherwise, is to compose them into more complex behavior.

Data classes are the quanta of information in your application. They will be passed around and between other classes, created and processed by them in various ways. They exist first and foremost to be handled. They are rarely seen as dependencies except indirectly via configuration layers, but rather show up most commonly as state, or as operational inputs.

Algorithms

An algorithm is a class that is essentially functional, at least in spirit. A strong indicator that you have an algorithm on your hands is when a development tool such as Resharper suggests that methods, or the class itself, be made static. This generally indicates a near-complete lack of state.

One common application of algorithm classes is as a replacement for what would be a method library in a language which allows loose methods. For example a string library, or a numeric processing library. In fact, data processing in general is one big problem space that lends itself to algorithm classes. Also, many of the methods seen on primitives in languages such as C# could very easily be extracted into algorithm classes.

Algorithms rarely have public properties and almost never have public setters. In general algorithm classes have the potential to be expressed even without private properties, i.e. as truly functional code. Usually this means handling data that would otherwise be stored in properties as arguments instead. But often the realities of memory and processing power limitations mean that this isn't actually possible. So complex algorithm classes do often have some private state.

Depending on the amount of internal state required for performant execution of the algorithm, such classes may be expressed as either static/singleton, or by instance. For algorithms requiring no internal state, I prefer a container-enforced singleton pattern (as described in my post on lifecycle control via IoC containers). Static classes encourage tight coupling, and are very difficult to sub out for testing, even with a powerful isolation tool such as Microsoft's Moles,Telerik's JustMock, or TypeMock Isolator. As such, I avoid statics like the plague.

Instanced algorithm classes are usually lightweight and treated as throw-away objects, repeatedly created and disposed. Some may also choose to implement them with a reset/clear mechanism to allow reuse of the same object for different inputs. However, given the memory and garbage collection power currently available, I view this as an optimization, and avoid it unless it provides clear, needed performance benefits. Otherwise, it simply adds complexity to the implementation and muddies the interface with methods that don't directly relate to its purpose.

Algorithm classes often have dependencies on other algorithms. Data dependencies, however, are usually just another form of argument for the algorithm. Something that affects the output of the work from one instance to another, but is likely constant within one instance. Usually this is data that simply isn't going to be available at the call site where the algorithm will be triggered. Environmental state is one example. Mode switches are another. Anything more complicated or mutable than this is an indicator that you might in reality have a stateful algorithm on your hands.

Stateful Algorithms

The simple explanation of a stateful algorithm is that it is a bundle of behavior with a limited reliance on internal or external state. It differs from a pure algorithm in that the internal state isn't necessarily directly related to the processing itself. The most common application of a stateful algorithm is a class whose primary purpose is to expose the behavior of a model, while internally managing and obscuring the state complexities of that model. The goal of this being to encapsulate in a service layer all the concerns that the other layers need not worry about explicitly.

IO services are common instances of these types of classes. Some examples include file streams, scanner services, or printer services. Any time a complex API is wrapped so as to expose a simple interface for bringing in or flushing out data in the natural format of the application, a stateful class is the most likely mechanism. Looking at the file stream example, the state involved might consist of a buffer, a handle to the endpoint of the output, and parameters regarding how the data should be accessed (such as the file access sharing mode).

Stateful algorithms may take dependencies on any other type of class, but more commonly pure or stateful algorithms. Stateful algorithms are more likely to interact with data classes and models as arguments, or internally in the course of API wrapping.

Models

Models are a diverse, messy bunch. They're probably the closest to fitting the classical "data plus behavior" description. A class in this category will have intertwined state and behavior. The state will typically be of value without the behavior, but the behavior exists only in the context of the state. Most often, a model class comes about not because extracting the behavior and state into separate classes is impossible, but rather because it muddies the expressiveness of the design. Domain models, UI classes, and device APIs are the places where the model category of classes tend to serve best. Note, however, that these spaces also tend to attract convoluted coupling, inscrutable interfaces, unintuitive layering, and a host of other design pathologies.

These warts proliferate because it is difficult to make generalizations or rules about models. The foremost rule I keep in dealing with them is to try to avoid them. Not in the "model classes are considered harmful" sort of way. If you look at a program I've written, you won't find it devoid of model classes. In fact, they aren't even particularly rare. When I say I try to avoid them, all I really mean is that before I write a model class I make sure that the role I'm trying to fill isn't better addressed by some at least relatively simple combination of the other categories.

It's common for the behavior of a model class to depend on some private aspect of the state. Separating the behavior from the state would thus also mean dividing the state into public and private portions, to be referenced by the consumers of the model, and the model's services, respectively. This can be a nightmare to maintain, and the service implementor must often make a decision between downcasting a public data interface to a compound public/private one, or internally mapping public state objects or interfaces to their associated private state objects or interfaces. You see this type of thing crop up repeatedly in ORM solutions, where the otherwise internal state of the change and identity tracking is pulled out of the model and maintained by the ORM.

This separation is difficult and messy to make. If you find yourself facing this choice, you can be fairly certain that a model solution is at least an acceptable fit for the role. There are benefits to the separation, but often the cost of implementation is high, and should be deferred until necessary. But beware as you forge ahead with the consolidated model, because every model is different. I find that the cleanest way to incorporate a model into a project is to establish a layer in which that model and its service classes will live, and make heavy use of data transfer objects and projections to inter-operate with other layers.

A model can, and often does take dependencies on any category of class. Because of the varied roles models can serve and the many ways that state and behavior may be mixed in them, it's very difficult to identify any patterns or rules as far as how dependencies typically arise and how they should be handled. Any such effort would likely hinge on further subdividing the categories into common roles and basing the rules and patterns at that level instead. I'll leave that as an exercise for the reader, for now. =)

A Guideline

As I've said, I find these categories to be a useful guideline in my own programming efforts. It's quite easy and natural to mix state and behavior together in most every class you write, simply because it's convenient on the spot to have behavior neighboring the state it's relevant to. But, it tends to be far easier to reason about the first three categories in the abstract than it is to do so about models. This is likely because the former tend to enable and encourage immutability, which minimizes and simplifies a whole host of problems from concurrency, to persistence, to testing. For this reason I try to identify places that lend themselves well to the extraction of a data class or an algorithm class.

This strategy seems to make my code more composable, and my applications more discrete in terms of purpose and dependency. Which in turn makes the divisions between the different layers and functionality regions more clear and firm. This not only discourages leakage of concerns, but also makes the design more digestible to the reader. And that is always a good thing, even if the reader is just yourself 6 months later.

Clean Injection of Individual Settings Values

The Setup

Today I again encountered a challenge that I have dealt with numerous times while working on the product I develop for my employer. It's not insurmountable an insurmountable challenge. Honestly, it's not even all that challenging. But it has been an irritant to me in the past because all the solutions that I have come up with felt unsatisfactory in some way.

That challenge is the injection of simple settings values into the classes that need to refer to them. By simple I mean single-value settings embodied by an integer, a string, or a floating-point, for example. This seems like a simple problem. And honestly, I thought this was going to be a short and sweet post, until I realized that there is value in delineating all the dead ends I've followed on the road to my current favored solution. As you'll see, I've put a fair amount of thought and analysis into it.

Being a hip programmer, I use an IoC container to constructor-inject all my complex dependencies. And my IoC container of choice, Autofac, is only too happy to auto-wire these so I don't have to worry about them. But mixed in among these are the simple settings values that guide the functionality of certain classes. A component/service model doesn't really apply to these values. No one string value is implementing the "string" service. The values can't be easily auto-wired because unlike the complex classes there is certain to be more than one important value floating around for the primitive type in question.

Our app processes image. Large files, and large volume. And worse, the working sets are very large, so we can't handle things piecemeal. We have a serious need in a few different places for disk caching of "active" entities that are still very much in flux. Having multiple caching points necessarily means that there are several parameters that may need to be tweaked to keep things optimized. For each cache we want to be able to set the disk location for the cache and the in-memory entity limit, just to start.

Taking a simple case with two cache points, we have two string settings and two integer settings. So already we are in a position where in order to inject the values, we'll need to do some sort of "switching". Some run-time decision of which string and which integer go where.

Pitfalls and Red Herrings

As I noted in my opening, I have solved this in several ways in the past. One is to hook into the IoC via Autofac's "OnPreparing" event, where I can supply values for particular constructor parameters. This is nice, because it means I can avoid setter injection. But it complicates the IoC bootstrapper by adding exceptions and special handling for particular classes. Just as undesirable, it couples the IoC bootstrapper directly to the settings mechanism.

What about setter injection? Autofac provides a post-construction OnActivated event that is perfect for setter injection, but this is subject to the exact same disadvantages as the pre-construction event. We could leave the setters alone and let some other object fill them in, but that leaves us with a couple different problems. First, there's just as much coupling, it's just outside the IoC, which may or may not be a marginal improvement depending on how it's implemented. If you end up with some class that must be aware of both the app settings mechanism and the class(es) that receive those settings then this is really not much of an improvement.

But beyond that, refraining from providing the values until after the components are obtained is undesirable for yet a few more reasons. First and foremost, it means that your services will exist in a state of incomplete initialization. The risk of getting hold of an incompletely initialized service makes calling code more brittle. And protecting against the possibility makes it more complex. Furthermore, setter injection for these particular types of values is undesirable because it implies they are variants. The truth is that the last thing you want is for some errant code to change the cache location on disk after a bunch of files have been stored there. And putting in protection against such post-initialization changes is pathologically unintuitive: it subverts the very nature and purpose of a setter.

So we've established that these direct injection routes are problematic in a number of ways. Let's move on to indirect injection. What does that mean? Basically it means putting a provider object in the middle. Our classes can take a dependency on the setting provider, which can wrap the settings mechanism itself, or act as a facade for a bundle of different mechanisms.

The option that at first appears simplest is to have a single settings provider object through which all the app's settings can be accessed. The classes can all depend on this object, which the IoC can provide with a singleton lifecycle if we desire, for maximum consistency. But now what we have essentially done is created a service locator for settings. This is another thing that's good to avoid for two reasons. For one, it creates a huge common coupling point, and for two, it violates "tell, don't ask". Why should my dependent class have to depend on a generic interface and worry about asking for the appropriate thing, when all it cares about is just that one thing?

This is especially dangerous if the app-side interface to your settings keeps them grouped just as they are in the user-side (i.e. the config file), as the build in .NET config mechanism is wont to do. The needs of the user for the purpose of managing settings individually or en masse are vastly different than the needs of the application whose behavior is driven by those settings. While a user likely thinks the most intuitive arrangement is for all the 15 of the paths to be bundled together, it's highly unlikely that any particular class in the application is going to care about more than one or two individual paths. And if the class doesn't need them, then they shouldn't be offered to it.

A Light in the Dark

So where do we go from here? If you can believe it after all this meandering and rambling, we're very close. From here we take a little tip from the DDD community: eschew primitives. If you think back to the beginning, the whole problem centers on the fact that primitives are just too darn generic. The type doesn't mean something specific enough for it to be an indicator of what exactly the dependency is. How do we fix this? Encapsulate the individual setting in a type specific to that need. Given that the explicit purpose of these classes will be to provide particular settings to the classes that need them, it is appropriate for these to couple to the configuration mechanism, whatever that may be, and more importantly, encapsulate it in useful bite-size chunks. And because the providers will themselves be injected where needed, the coupling is one-way, one level, down the layer hierarchy, which is arguably the best kind of coupling.

Show Me The Code

Enough talking. Now that I've set the stage, here's some code to furnish and light it.

First, the setting providers. As you can see, they tend to be nice and short and sweet.

Next, the caches that depend on them. Note how the setting providers are used in the constructors.

Finally, I'll show just how easy it can be to wire these up. If you bundle all your setting providers in one namespace, you can even safely auto-register them all in one fell swoop!

Objections?

There are a few things that I can anticipate people would object to. One is the potential for proliferation of tiny classes. I don't see this as a bad thing at all. I think it's fairly well established that small classes, and methods, with laser-focused responsibilities are far easier to maintain, evolve, and comprehend. I can say from personal experience that I am utterly convinced of this. And if anecdote isn't good enough for you, I'll add an appeal to authority to it =) Go read up on what the most respected programmers out there today are saying, and you'll see them express the same sentiment, and justify it very well.

Another thing I expect people to object to is that when taken as a whole, this pile of small classes looks like a bit of a heavy solution. And it is heavy in this context, where none of the actual app code is surrounding it. But nestle it inside a 50K line desktop app with hundreds of classes and it will start to look a lot better. For one, those classes and their namespace create sort of a bubble. It's a codespace that has boundaries and purpose. You know what's inside it, and you know what's not. It's a goal-oriented mental anchor to latch onto while you code, and that's a darn useful thing.