Automated Testing is NOT optional


Once upon a time there was a young apprentice who did not understand why automated testing was so critical. As he was under pressure to churn out magic for an evil wizard, he felt he did not have time to test his spells. In addition, he did not know how to tell if the spells worked correctly. Thusly, he randomly Harry Pottered the incantations and hoped for the best. As he had learned not to fry off his own appendages, he felt this fire and observe method was adequate and did nothing to change.

A medieval tale? Nay, I say. This is today. And I am finding despite years of talking testing, and it’s variants, at conferences, on blogs and even in video courses, so many people still don’t get it. I will distill why automated testing is important and then see if I can drill into some details and get you back on the right course. And, if you are already doing everything I state in this post, then find a place with two for one spells and that second beer is on me. 😀

A cautionary tale

In this story, our hero has inherited an app. Well, not exactly, as the app has to be stabilized and released prior to takeover. The only thing standing in the way is testing. And, since there is precious little automated, the testing is manual. And slow. And error ridden. And … okay, I will stop there, as I think you get the picture.

In each testing cycle, as things are fixed, the testing gets deeper and deeper, and finds more bugs. These bugs were not caught in the original runs, as the manual tests were running into surface bugs. As the surface bugs disappear, the testers has time to start hitting fringe cases. And, as these are wiped out, the testers start testing at the same time and run into cross account bugs, meaning one person sees someone else’s data. Along the way, you see circular bugs, where fixing one brings up the other and vice versa.

Sound familiar? Every single one of these problems can be solved by automated testing. Every single one.

But, Greg, automated tests do not solve this problem. You still have to go after surface bugs (those on the happy path) and drill deeper.

True, but once I write an automated test, it is part of my library of tests which I can run over and over again. Yes, I can do this by creating test plans and checklists and manually testing, but – and here is the key – I have to rely on discipline to ensure all tests are completed each time. Plus, the automated test session is a few minutes in length (seconds, or less, in the case of unit testing) whereas my manual testing session may take hours. I have been playing this game for the past week and I have spent more than 8 hours, plus 8 hours of another employee’s time, to manually test a solution lacking automated tests.

Yes, there is the other side of the coin, which is the time necessary to write the tests. Initially, you take a productivity hit writing tests, as it takes time. They pay back dividends when you get to maintaining your solution, however.

Here is what I have found in the past week (all of which could have been found through automated tests):

  • Buttons not wired up – UI testing
  • Items that do not delete – Unit testing
  • Items that appear to delete, but stay – UI testing
  • Validation failures – Unit tests (backend) and UI testing (front end)
  • Items that cross accounts – Unit testing and/or UI testing
  • Slow reports – performance testing
  • Autosave issues – JavaScript unit testing

Every single test above could have been automated. Let’s start there and I will talk about maintainability and acceptability first.

Lions and Tigers and Test Frameworks … Oh My!

One of the first questions I get asked is “which test framework is best?” My answer: YES.

Honestly, I don’t care what test framework you use. I most often use MSTest, which is included with Visual Studio. I sometimes augment with a BDD framework like SpecFlow. But, I have worked with NUnit, XUnit, JUnit and LMNOP (which stands for “insert your favorite here”). I can tell you strengths and weaknesses of some of them, but I have yet to find a case where one of the ones I have been using for some time just sucked. Start somewhere.

In your toolbelt, you should consider, at minimum:

  • A unit test framework – in .NET, this will likely be MSTest, NUnit or XUnit – as mentioned, I generally use MSTest. Wikipedia has a nice listing here which covers a plethora of languages.
  • A test runner – Visual Studio has one built in. Resharper adds its own, which has some nice features (you use Resharper right?). NUnit comes with one, etc. In other words, don’t sweat the small stuff.
  • A mocking framework – I bounce between RhinoMocks and MOQ, but this has been a customer driven decision.
  • A JavaScript unit test framework – Unless you never do web or hybrid mobile, I would definitely add this. I happen to gravitate to Jasmine, but have played a bit with JSUnit, Karma and Mocha.
  • A user interface focused test tool – Selenium is the main open source tool for this type of testing. I have picked up Katalon recently to test, as we are on Mac and PC now. Katalon sits on top of Selenium and works in Chrome, which was a requirement for us.

A few other things you might be interested in

  • A BDD Framework – I use SpecFlow in .NET, but there is a bit of a learning curve. Plus, you can write behavior focused tests without making the BDD paradigm shift (see later)
  • An acceptance test tool – I happen to be fond of Fitnesse, as you can set it up with either Excel or a Wiki and let business add test conditions. It gets a bit strange when you tests magically fail because they have edited the wiki, but it sure shortens the feedback loop. Please note you can test for acceptance without a separate tool by making sure you identify all of the conditions
  • Some type of static code analysis – Visual Studio contains some tooling, which you can accentuate with extensions like Microsoft Code Analysis. I also recommend having static code analysis as part of your delivery pipeline if you use Continuous Delivery disciplines.

It is also nice to have a means of getting to fringe conditions in your tests – Intellitest in Visual Studio Enterprise (the evolution of Pex) is a great tool for this (Pex and Moles were tools from Microsoft research that were included in earlier versions of Visual Studio).

Well, I don’t accept that

What are you trying to get from automated testing? When I ask this question, the general answer focuses on quality code. But what is quality code? Certainly bug free, or at least critical bug free, is a must. But is that enough? I say no.

When we look at development and maintenance, maintenance is the long pole. The better your tests are, the easier your solution is to maintain. This should be obvious, as the tests provide guides to ensure any maintenance fixes avoid creating additional bugs you now have to conquer (think about those circular, fixing A causes B and fixing B causes A, types of bugs).

But I think we have to go beyond quality alone and think in terms of acceptability. This means you have to ask questions to determine what behaviors are acceptable. More important, you have to ask the RIGHT questions. Below is a sample Q&A with a business users.

User: A person can enter a number between one and one hundred in this box.

Dev: What if they put in zero?

User: It should warn them.

Dev: Same with negative numbers?

User: Yes, and everything over one hundred as well.

Dev: Can we make it a drop down so they can only pick one to one hundred?

User: We find drop downs get too unwieldy for our users when there are 100 numbers.

Dev: And no letters, right?

User: Exactly.

Dev: Can we put in something that ignores when someone types in letters in that field and text like the user name and password fields saying 1 to 100.

User: Oh, both would be great.

From this conversation, we understand that acceptance includes a user interface that stops a user from entering anything other than the integers 1 to 100. For many developers, adding constraints to the user interface would be enough to ensure this. But one should understand the business code must also contain checks, because code should never trust user input.

As you go through this exercise with your end users, you will find acceptability is beyond quality alone. It deals with perception, as well. And users will help you create a more comprehensive test solution. In summary, think both about maintainability and acceptability as you.

Software is a Bunch of BS

Some days I think of male bovine output when I see code, as some of it really smells. But when I talk about software being BS in conference talks, I am thinking about the dual aspects of code, which are behavior and state.

  • Your users are most concerned with state. When they pull out a record and change it, they want to be confident the new data is placed in the database, so they can find it in that state during their next session.
  • Your dev team is most concerned with behavior, or what they do to state (domain objects) as a user interacts with it. The behavior should result in the correct state changes.

This is why I recommend testing that focuses on behavior. I have, in the past, used a tool called Specflow when coding in .NET languages (SpecFlow can also be loaded directly into Visual Studio through extensions – I have the search here, as there are different extensions for different versions of Visual Studio. 2017 is here). SpecFlow requires a unit testing framework to work and works with many (MSTest, nUnit and xUnit are all included, although xUnit requires a separate NuGet package).

The upside of SpecFlow, for me, was when I pushed code over to teams in India. I could write the cucumber and generate the scaffolding and have the use BDD to create the code. For the first week or two, I would correct their tests, but once they got it down, I knew I could create the test scaffolds and get the desired results. The downside was the learning curve, as writing tests in cucumber takes some practice. There is also a paradigm shift in thinking from standard unit tests.

If SpecFlow is a bit too much of a curve right now, you can write behavior tests in a unit test framework by moving away from a single “UserTests” class with multiple tests to a single test per behavior. I will write a complete entry on this later, but the following pattern works well for behavior (I have the template outline in bullets and then a very simple example).

  • Class is named for the behavior being tested. In this case the class is named “WhenMultiplyingFiveTimesFive”
  • The code that runs the behavior is in the Class Initialization method
  • The variables accessed by the tests are kept in private static variables
  • The tests contain expected values and assertions

And here is the class.

[TestClass]
public class WhenMultiplyingFiveTimesFive {
    private static _inputA = 5;
    private static _inputB = 5;
    private static int _actual;

    [ClassInitialize]
    public static ClassInitialize(TestContext context) {
        _actual = MathLib.Multiply(_inputA, _inputB);
    }

    [TestMethod]
    public void ShouldReturn25() //This can also be ThenIGet25 or similar (cucumber)
    {
        int expected = 25;
        Assert.AreEqual(expected, _actual, string.Format("Returned {0} instead of {1}.", _actual, expected);
    }
}

The downside here is you end up with a lot of individual test classes, many with a single test. As you start dealing with more complex objects, your _actual will be the object and you can test multiple values. On the positive side, I can  generate a good portion of my tests by creating a simple program to read my domain models and auto generate stub test classes. It can’t easily figure out the When condition, but I can stub in all of the Should results (i.e. the tests). If you opt for the SpecFlow direction, you will generate your scaffold test classes from cucumber statements, like:

    Given I am Logged In
    When I Call the Multiply method with 5 as A and 5 as B
    Then it returns 25

Or similar. This will write out annotated methods for each of the conditions above, which is much like a coding paint by numbers set.

Process This

I will have to draw this subject out in another blog entry, as testing discipline is very important. For the time being, I will cover the basics you should include the DOs and DO NOTs of a good testing strategy.

  • DO write tests for all behavior in your system
  • DO NOT write tests for domain objects. If the language vendor has screwed up getters and setters, you have a bigger problem than testing. One exception would be domain objects with behavior (although in my shop you would have to have a really good argument for including behavior in state objects)
  • DO ensure you have proper test coverage to match acceptability
  • DO NOT use code coverage metrics as a metric of success. 100% coverage with crap tests still yields crap software.
  • DO ensure every bug has a test. DO write the test BEFORE you fix the bug. If you cannot write a test that goes red for the condition, you do not understand the bug. If you do not understand the bug, you should not be attempting to fix it.
  • DO ask questions of your users to better understand the user interactions. This will help you create tests focused on acceptability.
  • DO look at fringe conditions and conditions outside of the box.
  • DO add tests prior to refactoring (assuming there are not adequate tests on the code at the time).

None of this is automagic. You will have to instill this discipline in yourself and your team. There are no free lunches.

With legacy code, prioritize what needs to be tested. If you are replacing certain sections of the application completely, writing tests is probably a waste of time. Also focus on the things that need to be updated first, as a priority, so you have the tests completed. I should note you may have to reorganize a bit prior to adding tests. This is dangerous. But, using refactoring tools (Visual Studio or Rehsharper), you can lessen the risk. You need only pull things out far enough you can add unit tests, so extract method is your best friend here.

Summary

If you have never read the book The Checklist Manifesto, I would advice considering it for your bookshelf. The book focuses on how checklists have been used to save lives. Literally! Once you see the importance of a checklist, you can start thinking about automating the checks. Then, you will understand why automated testing is so important.

In this post, I focused on the very high level, mostly aiming at what the developer can do up front to greatly increase quality AND acceptability of the solutions coded. I strongly adhere to the idea of testing as much as is possible. Once I see a pattern in the tests, I generate them (often with Excel, which is a whole story in and of itself). But I focus on ensuring every condition I can think of is put in a test. I will use code coverage as a means of discovering areas I don’t have tests in. This does not mean I will create a test for every piece of code (domain objects being a prime example), but that I use the tools to help me help myself … and my team.

Peace and Grace,
Greg

Twitter: @gbworld

Why EF Code First Sucks … and how to fix it if you must use it


NOTE: EF is a tool. As with all tools, it has pros and cons. The headline is designed to get you to examine your useage.

I am a contrarian in many cases. I look at what is in front of me and ask one of the following questions:

  • When would I use this thing I don’t like
  • When would I not use this thing I do like

By constantly questioning, I find myself being a good devil’s advocate. It also keeps me focused on solving problems the “right way”, which means adding in the variables as I find them and validating my knee jerk reaction over and over again. Sometimes this means not adopting the latest FUD, no matter how awesome it is, as it will cause too much disruption (read: too much risk).

As a developer, I think you should explore everything you find and constantly be curious. As an architect, and now a manager, I think you have to take in account how many moving parts you are introducing before making a decision to add something new. I also am very strong at examining pros and cons before striking out. This does not mean I don’t make very quick decisions at times, but that I strive to always do at least some research before firing up the engine and traveling away from home.

For me, Entity Framework makes things easy and much faster to develop. But, it also puts you in a box, like all software designed to develop (even Visual Studio puts you in a box, just larger). There are some projects that fit in the box, but some that do not. As I work on mission critical applications, in an Enterprise setting, I find the maintainability and acceptability of the solution to outweigh time to market in most cases. This makes EF less attractive to me, especially since most EF developers I have met understand the basics and not much more. When you build software any idiot can use, I can guarantee you some of the people using it are idiots. The easier it is, the more idiots.

In particular, this article is focused on issues I am having with a current project I am testing. The problems I have found are not unique, however, as I have seen them in other instances. In this case, there are a few additional issues that exacerbate the problem. And that takes us to “why this post?”

To put it plainly, if you are going to use EF Code First, don’t use the defaults. Let’s dive in.

The problem with EF Code First

Let me be fair. This is not really a problem with EF Code First as much as understanding how to use it correctly. The issue I see is so many people DON’T know how to use it correctly. In fact, it seems to be a one-size fits all tool for so many people. And a great number of them are ignorant about the pros and cons of the product and the cause and effect nature of the issues it creates when used in default. And, as mentioned in the last section, some are idiots. 😉

Auto-generation is attractive. I utilize my skill in coding domain objects, push build and deploy and voila (wah-lah for those who hate French?). Now I have a site. If I just went with defaults, however, I have all of the following problems.

  • My database looks just like my business objects.
  • My interfaces to my data are CRUDdy (simply create, read, update and delete on tables that simply reconstitute my business objects)
  • My database is hard to optimize if looking just like my business objects is not the optimal way to solve it.
  • If I use GUIDs, which are great for certain scenarios (multiple live servers, creation of IDs on disconnected devices, etc.) the default clustered index can create issues down the line (mostly fragmentation)
  • If I use strings in my business objects, they are NVARCHAR(MAX) in my database
  • By default, I have no constraints (other than the auto-generated primary keys). This means no foreign keys in my database.
  • Filtering is done on the user interface.

Once again, I fully realize much of this is ignorance on how you SHOULD design your application with EF Code First, but with no warnings or guidance in Visual Studio to stop the developer from doing this, I am finding this type of bad EF Code First design is more the rule than the exception.

NOTE: I use the word business objects rather than domain models in this post, as I think the lack of domain focus exacerbates this problem. I will write about domain models in a future post and why domaincentric design (I use the made up domaincentric word to distinguish from a purist Domain Driven Design approach – see: Eric Evans).

Annotating with Attributes

I can cure most of the ills with attributes on my business objects. This includes putting keys of the correct type, adding max lengths and avoiding naked strings on objects. I will go through each of the issues I have discovered (not a complete list).

  • Clustered keys on GUIDs (uniqueidentifier data type in SQL Server)
  • NVARCHAR(MAX) fields
  • Foreign keys

Clustered Keys on Guids

When you use a clustered key, the items in a table are stored in the order of the key. With numeric keys, this means 1 comes before 2, etc. With a common key type being an IDENTITY field, or autoincrementing integer, this type of field works nicely. As each key is created, the new data is placed after previous data. You do not end up with table fragmentation very often, which makes maintenance of your database easier.

As long as you continue to add to the end of the table, the clustered key is very efficient. But, when you stick something in the middle of existing records, it causes some extra work, as it fragments the data table. Without getting too much into the internals of RDBMS, or SQL Server in particular, the underlying pages get fragmented. if the data completely fills a page, the fragmentation is cured by reorganizing pages. This is not trivial, but it is far less intrusive than when multiple rows exist in a single page and the internal data of the page has to be reorganized. Yes, there are things you can do to lessen the blow, like altering the fill percentage of the page, but there is a risk of fragmentation.

Why is fragmentation bad? It reduces the efficiency of the database, which can impact performance.

When you use a Globally Unique IDentified (GUID, which is represented as a uniqueidentifier in SQL Server), the GUIDs may be created sequentially, but this is not guaranteed, especially over time. This has to do with the algorithm to create GUIDs to “guarantee” they are unique (there is still a minuscule risk of creating 2 GUIDs that are alike, but it would be extremely rare – 2^128 or 1 in 340,282,366,920,938,000,000,000,000,000,000,000,000 – a very minuscule risk). When a new GUID is created, it can be sequential for quite some time. Then it is not. And over time, you end up with a lot out of sequence. And you start fragmenting. And performance suffers. And little children start crying … really loud.

Fortunately, this is easy to fix.

Version 1: Fragmentation city

[Key()]
public GUID Id { get;set;}

This is the same as the following in SQL Server.

tbd

Version 2: Non-fragmented bliss

Please read this entire section before attempting (Why? Because I am a smart a**). Let’s solve this problem.

[Key(NonClustered)]
public GUID Id { get;set;}

Bliss!

Yes, I a screwing with you. Entity Framework leaves NO WAY to add a non-clustered primary key in Code First with data annotations. So, you have to sling some code. This will require at least EF 6.2 (released a year ago, so you are on this if you are new to EF). In the OnModelCreating event, add the following:

modelbuilder.Entity().HasKey(k => k.Id, config => config.IsClustered(false));

I have also seen it done like this (very slight variation).

p.HasKey(k => k.ColumnId).ForSqlServerIsClustered(false);

The important take away is you dink with Is Clustered and setting it to false. Both create a migration that sets up a non-clustered index.

Cat’s and dogs sleeping together … and world peace.

If you are prior to EF 6.2, I feel really sorry for you. You can create non-clustered indexes, but not non-clustered primary keys. In that case, i would stop using Code First and cry. 🙂

NVARCHAR(MAX)

Before getting into the problem, I want to rant a bit. NVARCHAR(MAX) was created to solve a problem that, for the most part, did not exist. The problem is this:

What if I have a field that normally contains a manageable amount of data, but occasionally contains the entire context of War and Peace?

I see this as a possibility, but a very rare one. In most cases, when you have a s**t ton of data in a field, that is the norm, and when you don’t, that is the norm. Usually you can either constrain or not. And, in cases where you have small amounts of data, it is more often from bad test plans on BLOB (Binary Large OBject) fields where the tester does not want to copy and paste and wait.

NOTE: I am willing to listen to cases where NVARCHAR(MAX) is a proper design choice, as I realize I do not have all of the answers. Just ping me and let’s argue over it. 😉

Okay, I am being a bit too snarky. Let me regain my composure.

NVARCHAR(MAX) is designed to add some performance gains for the rare “I can either type in my name or enter War & Peace” scenarios. Anything small enough to fit into the SQL Server page is put in the page. Anything not, is placed in a type of BLOB storage. This means it can be much faster on records that fit in the page and only slow down on those that overflow.

Sounds Great, right? Yes, but it has some implications. You cannot index an NVARCHAR(MAX) field in SQL Server. I think this should be obvious for anyone who things about it, but let me describe the scenario and see if you can spot the problem.

  • You have a table with NVARCHAR(MAX)
  • There are 1,000,000 records in the table
  • You add the contents of War & Peace into 1/2 the records
  • You create an index on that field

Do you see the problem. Hint: How big is the index.

To solve the fact a person could create a multiple terrabyte index on a field, you cannot index using any BLOB field, even one, like NVARCHAR(MAX), that is pseudo-blob.

But, guess what EF does on strings? If you guessed “creates an NVARCHAR(MAX) field” give yourself a golden star.

Is this a problem? Probably not in most cases, but if you know you are creating a field like city, will you ever have a city that contains 1,000,000 characters? Not until we explore Zeekbart in the Omega quadrant (they are very verbose aliens) or start coding in Entish (LOTR’s creatures for the non-Tolkein fans).

Fortunately, you can solve this one very, very easily. Just add a data annotation.

[MaxLength(100)]
public string City { get; set; }

 

Wala Wala Washington. Now it will no longer create a MAX field and you can index city. And, yes, this is a common use case, although I would consider extracting the city information into it’s own table if you wish to use it as part of your queries (one primary reason for indexing this field is ad hoc searching on demographic fields, like “how many users in New York City had a problem”).

Foreign Keys

This is not an issue, as attributing for foreign keys is extremely easy to do. The problem is I find many people using EF Code first FAIL to do it. Your code is auto-generating your database folks. And constraints are a necessary element of ensuring the database catching bad data by throwing an exception when you try to put bad data in. Never rely on the brilliance of your developers to keep you company in business. Instead constrain the database so they can’t mess things up.

I ran into this problem (not EF, but lack of constraints) with a client in the early 2000s. I was asked to look at a problem with their data. When I logged on an examined the schema, I found no foreign keys. Asking the dev manager, he stated “the foreign keys were causing errors, so we removed them”. I said, “of course they were causing errors, because the code was set up to create data problems.” I got paid for identifying the problem, but they were unhappy I could not fix the missing data. Needless to say, they are out of business now. All because of a lack of foreign keys.

Fortunately, you can avoid this type of embarassment by adding … dun, dun, duh … foreign keys. Guess how? Yes, data annotations.

[ForeignKey(“KeyName”)] public int OtherTableId { get; set; }

I am not going to cover it in this post, but you can do composite (multiple field) keys, as well. I try not to use composite keys as a rule. Not because I am against the idea, per se, but because I find it harder on coders in many instances. If you start with the idea each table has a derived key as its primary key, it is very clean. You can then refactor the idea where a composite key would be better.

The Data Looks the Same

This is more of a design issue, but EF makes it very easy to simply pass objects back and forth between layers. If you adhere to principles of Domain Driven Design, you understand each domain is unique in how it treats models. Let’s take an eCommerce company as an example. In particular, let’s look at the customer object.

  • When an order is taken, a customer ID is added. For the sales department, when the order is retrieved, the customer information, including addresses, is needed to track down any issues.
  • For the warehouse, there is no need to have a customer object. You only need an order number with the items to pull.
  • For the shipping department, you need an address and phone number and nothing else.

I could go on, but if you use EF in default mode, you will be tempted to have a single domain and filter on the user interface.

This is what was done in an application I recently tested. Grab everything for all roles and then filter out what is not needed. But, since the application was not designed out enough, the following problems surfaced.

  • Some smaller tables were retrieved in toto and not filtered for the location or the user. Thus, if I added something, someone in another location saw it, even though they should not have.
  • As binding was not constrained, the tables had all of the information. If I grabbed a user object, I got all of the data if it was not filtered correctly.

Sure, the above have some other design issues that contribute to the problem, but EF in default forces you to filter after retrieval. When you do not, you are likely adding extra work, which may not be justified over other methods of data access (a topic for another day).

The bigger issue is the database storage looks just like the domain model. And, if you are using the EF defaults and auto-generation, you end up stuck in this paradigm. Data can be optimized with a few tools, like indexes, but once you figure it out, you need to make sure the changes are reflected in the EF model. In some cases, this is hard, and that which is hard is rarely tried (until someone threatens to fire you – then again, they threaten and you find a new job, so it is still not done).

In Summary

I am not a huge fan of Entity Framework in the Enterprise (important caveat?). Not because it does not work in many cases, but because it has a lot of moving parts and some that are missing. In most cases, you end up having to use custom migrations to get around the fact it is so easy to step outside of the box created for you. Once you start slinging code with every migration, you start asking whether going to scripted database changes is a better approach than data migrations in EF (I will argue that it IS in a future post, but your mileage may vary and I accept that each tool has its pros and cons).

A hill I will die on is “defaults are bad”, at least when it comes to Entity Framework in an Enterprise setting. I will also state you have to annotate your data if you are going to use your model to generate it. I would prefer don’t design your RELATIONAL database with OOP objects, but your mileage may vary.

Peace and Grace,
Greg

Twitter: @gbworld