Requirements and Acceptance Criteria Are Critical

Over the past few months, I have worked on a project with precious few requirements and no acceptance criteria. Essentially the dictate is done means “keep coding until I say it is fine”. The problem is this insanity never ends, as changes are placed into queue before the coding is “done”.

There are a few myths I have seen in action in various businesses. Here they are with the reality following:

  • Agile means loose requirements and very little documentation – This is false. Actually Agile needs requirements and documentation as much as any other methodology, the requirements are just stated in the form of a user story and some of the documentation is in the form of unit tests.
  • Business needs to be fluid, so we should not tie down “done” with a firm set of criteria – In reality, you need to define done as explicitly as you can. You may change (request) the definition over time, but a developer without a clear path is as likely to code away from the solution as code towards it.
  • Hire a bunch of really smart developers and you can get around firm specifications and acceptance criteria – While very smart developers are faster at coding their way out of a hole, and less likely to code inescapable holes, there was only one real miracle worker that ever walked the face of the earth … and it is not your developer.

    NOTE that really smart developers often over-engineer or over-complexify solutions when left to their own designs.

We keep trying to view planning (with its outcome of requirements and a definition of done) as an optional exercise. This is generally done in the name of time, but how much time is blown having a developer rewrite code? In my experience, not defining what is being built has always turned out more expensive than spending time up front to plan. If you are truly overloaded, rather than avoid planning, you should force planning. Otherwise, you will spin at the end of the project, trying to “plan” after development, which means figure out how to get the code to do what I want with the least amount of hours. This is an exercise is low quality and relies on a lot of luck.

A couple of things I have noticed in my career:

  • “Do everything this {website} does in a {mobile application} is not a requirement (you can change the {} types for your situation, if you like). As much as you might think it is a requirement, what is being done is up for interpretation, even if the basic business rules are the same, and the developer’s interpretation and yours are as likely to be different than they are to be the same. This is especially true when moving from one form factor (ex: web to mobile) or coding paradigm to another (ex: webforms to MVC).
  • “Code this until I am happy with it” is a recipe for disaster. While it leaves a lot of flexibility for the person who wants to be “happy” to massage the outcome, it causes badly bolted on code as new rules are manufactured. In the lucky cases, the code can be refactored to some level of quality, but this is more of an exception than a rule. Ouch!

Let’s take a quick example I can think of lately. The requirement, as loosely stated, was put together a tree of categories for eCommerce and append products to the tree. Then create a detailed file for the products. The acceptance was “I will let you know when it is right”. Here is part of the history:

  1. Add categories from another source
  2. Exclude some categories
  3. Replace images found in data source with a dictionary of categories and images
  4. Use a template to create a tree where none existed and append to a specific spot in the tree
  5. Create a featured products file
  6. Append that file to a specific node in the main tree file
  7. Exclude categories with no children or products
  8. Scrub HTML for certain nodes in the XML document, replacing many of the characters with HTML escape sequences
  9. Search for certain “illegal XML characters” and replace them (if acceptance criteria were built, it might have been as simple as convert from UTF-8 to UTF-7 – live an learn?)
  10. Do a binary scrub of characters not caught with the loose ruleset in #9

All in all, numerous hours (weeks?) were burned due to the lack of planning on many levels, equating to tens of thousands of dollars. If this is the rule in the organization, and you add all of the projects up, how many hundreds of thousands, perhaps millions, of dollars is the company burning through that could be avoided by setting down requirements and a definition of done?

What if requirements change during development? If this is not the best reason to use Agile methodologies, I don’t know what is.But, even if you use waterfall, you stop development on the feature(s) in question and refine the requirements before moving forward. To either a) let the development continue or b) continue altering output until “happy” is like dining in a really crappy fine dining establishment … a lot of money for a really bad meal.

The point here is you can’t plan a trip without knowing where you are going. By the same token, you can’t build an application without knowing what done looks like.

Peace and Grace,

Twitter: @gbworld

Why Planning is Important

I have been working on a project for quite some time that is considered time sensitive. Due to the time sensitive nature, actually creating full requirements and acceptance criteria is considered frivolous, so the concept is code and verify, based on some unknown standard. It is not quite that bad, but when someone (generally me) argues for planning, I am told the following are equally valid ways of approaching development for this project.

  1. Creating requirements and acceptance criteria (i.e. planning) and then coding to match the requirement and acceptance criteria.
  2. Coding based on a loose, one sentence, “specification” and then fixing issues based on User Acceptance testing.
  3. In short,  the group has opted for a “show me what you have and I will then tell what is wrong with it” versus “tell me what you want and I will show you what you have told me you desire”. While we ultimately end up with the same product, I disagree these are “equally valid ways” of developing a new application.

To illustrate this, think back to your last vacation. If you have not had one in a long time, you should go on one soon, but if you can’t imagine your last business trip. It went something like this:

You woke up at some time during the day (or night) and packed enough clothes for a week’s worth of time. Your spouse and children did the same thing. You then got in the car and drove to the airport and stated to the ticket agent “I have $1500 to fly and 4 people, so I need whatever ticket is the closest to $375 per person for a round trip”. Once you had your tickets, you got on the plane and flew.

When you landed, you went down to the car rental agencies and got a car, using the same logic. “I have $500 to spend on a car, so give me whatever car I can get for a week with that much money.”

You then drove off out of the airport and got on the road whichever direction fit your whim and drove until you found a hotel that seemed to be within your budget. You went up and got a room.

Once in the room, you looked at the local guidebook and figured out what types of things you could do in this town with the amount of day you had left and went out and had fun.

Does this sound even remotely like your last vacation (or trip)? No? Why not? Well, if you are a normal person, you can see all types of problems with this. Here are a few:

  1. What if $250 only buys a ticket to Idaho, in the middle of winter? Did you pack for snow or the beach?
  2. What if you can’t find a flight with 4 tickets available?
  3. What if the car rental agency does not have a car that fits your budget?
  4. What if the hotel does not have rooms available?
  5. What if the town you landed in has nothing you are interested in seeing?

I am sure I can think of more, but the point is that you would not schedule a vacation out of town without some idea of where you were going to go and where you were going to stay, right? But you would be this careless with a “mission critical” project?

There is a belief in some circles that not knowing everything about the final product means you should not nail anything down. To me, this is about as stupid as stating “I can’t book a room at the Hilton for our vacation because I am not sure what I am going to order for breakfast, lunch and dinner every day we are on vacation”. Yes, there are some parts of the project you may not be able to figure out right now, but not planning anything as a result means you are going to, at best, spend a lot more time (i.e. money) to develop the product.

What has happened on my current assignment is I am now getting back pieces to fix. They are logged as defects, but without a criteria of what is right, nothing, outside of failure to work, is really a defect. And, since there is no criteria, “defects” are being found in serial. Take one step forward, find a “bug”, fix that one bug. Rinse. Repeat until all the “bugs” are washed out. The biggest issue here is we can, with any confidence, state when the final product is going to be ready for release, as there is no firm definition of what done looks like.

This topic has been covered as nauseum, of course. Steve McConnell has some interesting insights in his book “The Project Software Survival Guide”, which I think should be mandatory reader for every manager that needs IT work done.

From McConnells’ company site, I have found a couple of neat gems from his book. The first talks about when a defect is created and how much it costs to correct. This is shown below.

If you do not have time to write requirements and work out the architecture and design, every defect is a requirements defect. Pretty expensive proposition, yet this is precisely what is being done.

Here are a set of images from Steve McConnell’s own site that describes perception versus reality. Management often believes this is the optimal development path:

They accept that there is a certain amount of unproductive time (meaning time not moving forward towards the end of the project, whether reworking bad code or going to get coffee), but they don’t want the burden of planning, which is seen like the picture below:

If I get rid of the planning (process), I end up with more productive time. The problem is the reality looks more like this:

Without proper planning, what started as a small amount of thrashing, ends up consuming a huge amount of time, as more and more “defects” are discovered and more and more code has to be reworked based on differing understandings of what is acceptable and what done look like. In addition, more and more time gets consumed in process, including planning how to get from “what we now have” to “what is acceptable. One thing missing from this diagram that is present in the book is the dot just before process and thrashing consume 100% of daily efforts. That dot indicates the luck projects that get released before all of the participants drown in the sea of non productivity.

The ideal situation is enough planning to understand both what done is and how to determine each feature is acceptable to fit that picture of done. This is shown in the final graphic linked from McConnell’s site.

In short: planning works. If you want to read McConnell’s entire treatise on this topic, click here.

My next topic, which I will post in a few days, will cover why “hiring rock stars” is not an adequate alternative to proper planning.

Peace and Grace,

Twitter: @gbworld

SOA Lessons: Insulate Your Client

In my last post, I lamented on the idea of creating an external layer where all of the services sit in a single project. In this post, I would like to talk about the idea of passing data directly from internal services through an External API without mapping the data. This topic was covered a bit differently in a previous post if you would like to look at the issue in greater detail.

As a disclaimer, this post focuses on a particular set of problems I have seen with immature External APIs that may not manifest themselves in more mature SOAs.

External API consumers are valuable to business. If they aren’t, there is no reason to have an External API. Initially, the external clients may not serve much value to your business, but overtime they do. This is especially true in an ECommerce situation, where you clients are using your APIs to help you sell. But this “truism” applies to all sorts of businesses.

From the standpoint of the development team, the external consumer is often “out of site out of mind”. Unless you are attempting a coordinated effort (a topic for another blog post?), your support and infrastructure teams will often have more of an idea of what your clients are doing than the development team.

The primary solution would seem to be “make the external client a first class citizen” and this is a dynamite idea. The problem is external clients are often not easily monetized, making it hard to place a priority on them.

Standard SOA

The title of this section is a bit misleading, as it suggests I am going to guide you through the creation of a “standard” SOA. Instead, I am going to cover a generic SOA setup I see in many organizations.

At the bottom layer of the SOA, services are created to expose persisted state (data). The services at this level may present objects that represent rows in database tables, especially in less mature systems, but more often tryt o distill the data a bit to serve up objects needed by various business applications.

On top of this layer, you find services that aggregate multiple services. This layer or layers of the SOA, models are being created that can be consumed by user interfaces. And it is at this layer many SOAs create the objects consumed by both internal and external clients.

The topmost layer comes in two flavors. The first is what my current client calls a “domain services layer”, which serves to map the aggregate service model to a domain model for the external world to consume. The other is a pass through layer in which the objects of the aggregate layer pass through the external service, which is not much more than an endpoint on an external service.

Insulating the Client Through Process

When the external layer is a pass though layer, breaking changes to objects on the aggregate layer will not always break the pass through. In cases where there are no changes to the external layer, service references are updated and the service is deployed. But the client can’t consume them.

To insulate the client from these changes requires process. A proper versioning strategy must be put in place and older versions of the internal services must be kept up until all clients have moved to the latest version. This also means external clients must be included in the testing process, even if their inclusion is only during the User Acceptance cycle.

In general, this works best if the External API team has some measure of governance over the internal services it consumes. At minimum, the external team needs the ability to issue a go – no go order when breaking changes occur in the pass through models or force the internal service to leave an older version alive while clients migrate to the new version.

If process and a versioning strategy are not in place, it is dangerous to maintain your external API as a pass through layer.

Insulation through Domain Models

Another means of insulation is to set up a Domain Model for the external API and “mapping” the returns from the internal service(s) to the External Domain Model. The main negatives to this approach are a bit of overhead (performance) and a maintenance requirement, especially when the maps are hand coded.

The benefit is breaking changes are instantly seen when the service reference is updated, as the changes now break the mapping. This insulates the client from change, as this breaking change has to be fixed. As the exception can often be fixed through mapping changes alone, the external client need not know the software was even changed, so the external API service can undergo a minor, rather than a major, revision.

Process or Domain Modeling?

The question comes to which should you use … process? domain modeling? both?

In my experience, attempting to solve a technological problem with process alone (or, expecting people to follow standards and rules) only works in an organization where software is reviewed completely prior to deployment. I have certainly worked in organizations with this commitment, but for every company I have consulted that had governance over process, there are 10 that do not.

This leads me to the domain modeling approach and insulation through adherence to contract. The entire underlying system may introduce breaking changes, but as long as the changes can be mapped to a consistent model, the outside world has no clue these breaking changes occurred.

The Cost of Change

As a kind of summary, I want to talk about how much change costs, or more specifically, how breaking changes to external APIs can be very costly.

Initially, an External API is a cost center. Until there is adoption, there is no financial benefit to having clients consume your data. But, even at this stage, failures in the API cause clients to shy away from using them. Enough issues and the stability of your API becomes public and most sensible people avoid it. This ensures the API will never be anything other than a cost center. Worse, it can be the proverbial albatross around your neck.

Over time, the stability of the API and its contracts, becomes more critical, as more and more clients begin to use the service. At this point, a breaking change is generally caught sooner, but rollbacks become extremely expensive.

While setting up mapping to an external domain model is not a cure all, it certainly adds a measure of safety for your clients to use the service.

The Point

The point here is changes that break software internally may be annoying, but they can most often be fixed without the outside world having any clue there is turmoil. Changes that break external software are more dangerous, as they cannot be hidden from view. By enforcing proper versioning strategies and treating the external API as a first class citizen, you can protect yourself from these public breakages. In cases where these processes cannot be put into place, breaking changes can be hidden from the outside world by insulating the clients by mapping to an external model.

From my experience, adding the small bit of overhead to protect the client from internal changes is the safest path, especially when you are in the early stages of creating processes to properly version change. If your organization is mature enough to have proper governance on releases and properly tests both internal AND external responses, a process driven approach is an option. And, if true performance is a showstopper, it may be your better option. Otherwise, adding a bit of insulation for clients is a very wise idea.

Peace and Grace,

Twitter: @gbworld

SOA Lessons: Don’t Put All Your Eggs in one Basket

When we think of the term “don’t put all of your eggs in one basket” in terms of IT, we more often think of not relying on a single vendor solution for everything. This is certainly a valid way to look at the term, but in SOA it also means we have to look at services and not just layers.

To put this in perspective, I am currently working on an external API solution for a large E-Comm company. In the current External API, which was released before I moved into the group, all of the services are found in a single “wrapper” project. On the plus side, it makes it easier to configure the services, as all of the configuration is found in a single web.config file. There are many negatives, however, which I will detail in this post.

Single Point of Failure

This is the largest negative on production. Recently, the solution was released and we discovered a vendor was overloading one of the services. The issue was discovered during deployment and manifested itself as failures in a wide variety of services. The reason for the systemic failure is the worker process for the failing service was the same worker process for all of the services. When the process was brought down, all suffered.

Fortunately, we knew which service was live and most likely to be the culprit, so we were able to alleviate the problem. If this was not the case, we would have had to pour through logs to determine the offender, which could have taken hours with the current state of affairs. That is unacceptable.

If you take this scenario a bit further and assume a coding error that kills the worker process rather than a load issue, imagine the troubleshooting. Especially if the exceptional case causing the failure was intermittent and due to the lack of a patch on the server in question.

One good reason to separate out the service endpoints is so you can move the services into their own process space. This is not mandatory in normal working conditions, but if there is an issue, and instrumentation and monitoring is not catching the culprit, separating the service into its own process accomplishes two things:

  1. Helps identify the point of failure
  2. Protects other services from failure

Both of the above are worthwhile reasons to separate each service into its own process space as a rule, and then making exceptions based on needs.

Single Point of Deployment

This negative is similar to the last negative, but focuses more on what it takes to fix a service that is in error. If all of the services exist in a single project, then all must be deployed at the same time, even if only one service has changes.

I think anyone reading this can see why this is a negative, but it is more insidious than moving pieces you should not have to move. Every time a software project is deployed, there is a potential for error. There are a variety of reasons why this is so, but we all have experienced a deployment that created a buggy condition. Often times it was code we did not even change where the bug shows up and often due to circumstances that have nothing to do with the code.

If you deploy a service that has no updates, and cause bugs due to mistakes in deployment, you have done a great disservice. Worse, the disservice was completely avoidable if you had simply segregated out the service so it did not require deployment with the other services.

Cross Contamination

When numerous service endpoints are added to a single WCF project, they will have different contracts, but often end up with the same binding rules and behaviors. This is normal and to be expected. Eventually, however, one service will be found to require more time to complete its work or a large payload (either request or response or both?) and edits will have to be made to the configuration file.

Changes that are specific to the contract and service endpoint are unlikely to have consequences outside of the service in question. But since at least parts of the configuration are shared, changes can impact the “sharing” services in a negative way. And, since we deploy at the same time, but often only test the service being updated, these negative consequences are often discovered by our consumers rather than our test team.

The Point

The point of all this is we can easily avoid the negatives explained in this post by making sure every UI project (in this case every WCF service) has its own project. Sharing a single solution is not an issue, although you will also want to make solutions for the individual services, as well. NOTE: Solutions are points of organization and a single project can exist in any number of solutions.

Separating out the services has a small negative of forcing you to make changes to multiple projects for “universal” binding and behavioral standards, but this negative is outweighed by the negatives of placing all of the services in a single project.

Peace and Grace,

Twitter: @gbworld