Automated Testing is NOT optional


Once upon a time there was a young apprentice who did not understand why automated testing was so critical. As he was under pressure to churn out magic for an evil wizard, he felt he did not have time to test his spells. In addition, he did not know how to tell if the spells worked correctly. Thusly, he randomly Harry Pottered the incantations and hoped for the best. As he had learned not to fry off his own appendages, he felt this fire and observe method was adequate and did nothing to change.

A medieval tale? Nay, I say. This is today. And I am finding despite years of talking testing, and it’s variants, at conferences, on blogs and even in video courses, so many people still don’t get it. I will distill why automated testing is important and then see if I can drill into some details and get you back on the right course. And, if you are already doing everything I state in this post, then find a place with two for one spells and that second beer is on me. 😀

A cautionary tale

In this story, our hero has inherited an app. Well, not exactly, as the app has to be stabilized and released prior to takeover. The only thing standing in the way is testing. And, since there is precious little automated, the testing is manual. And slow. And error ridden. And … okay, I will stop there, as I think you get the picture.

In each testing cycle, as things are fixed, the testing gets deeper and deeper, and finds more bugs. These bugs were not caught in the original runs, as the manual tests were running into surface bugs. As the surface bugs disappear, the testers has time to start hitting fringe cases. And, as these are wiped out, the testers start testing at the same time and run into cross account bugs, meaning one person sees someone else’s data. Along the way, you see circular bugs, where fixing one brings up the other and vice versa.

Sound familiar? Every single one of these problems can be solved by automated testing. Every single one.

But, Greg, automated tests do not solve this problem. You still have to go after surface bugs (those on the happy path) and drill deeper.

True, but once I write an automated test, it is part of my library of tests which I can run over and over again. Yes, I can do this by creating test plans and checklists and manually testing, but – and here is the key – I have to rely on discipline to ensure all tests are completed each time. Plus, the automated test session is a few minutes in length (seconds, or less, in the case of unit testing) whereas my manual testing session may take hours. I have been playing this game for the past week and I have spent more than 8 hours, plus 8 hours of another employee’s time, to manually test a solution lacking automated tests.

Yes, there is the other side of the coin, which is the time necessary to write the tests. Initially, you take a productivity hit writing tests, as it takes time. They pay back dividends when you get to maintaining your solution, however.

Here is what I have found in the past week (all of which could have been found through automated tests):

  • Buttons not wired up – UI testing
  • Items that do not delete – Unit testing
  • Items that appear to delete, but stay – UI testing
  • Validation failures – Unit tests (backend) and UI testing (front end)
  • Items that cross accounts – Unit testing and/or UI testing
  • Slow reports – performance testing
  • Autosave issues – JavaScript unit testing

Every single test above could have been automated. Let’s start there and I will talk about maintainability and acceptability first.

Lions and Tigers and Test Frameworks … Oh My!

One of the first questions I get asked is “which test framework is best?” My answer: YES.

Honestly, I don’t care what test framework you use. I most often use MSTest, which is included with Visual Studio. I sometimes augment with a BDD framework like SpecFlow. But, I have worked with NUnit, XUnit, JUnit and LMNOP (which stands for “insert your favorite here”). I can tell you strengths and weaknesses of some of them, but I have yet to find a case where one of the ones I have been using for some time just sucked. Start somewhere.

In your toolbelt, you should consider, at minimum:

  • A unit test framework – in .NET, this will likely be MSTest, NUnit or XUnit – as mentioned, I generally use MSTest. Wikipedia has a nice listing here which covers a plethora of languages.
  • A test runner – Visual Studio has one built in. Resharper adds its own, which has some nice features (you use Resharper right?). NUnit comes with one, etc. In other words, don’t sweat the small stuff.
  • A mocking framework – I bounce between RhinoMocks and MOQ, but this has been a customer driven decision.
  • A JavaScript unit test framework – Unless you never do web or hybrid mobile, I would definitely add this. I happen to gravitate to Jasmine, but have played a bit with JSUnit, Karma and Mocha.
  • A user interface focused test tool – Selenium is the main open source tool for this type of testing. I have picked up Katalon recently to test, as we are on Mac and PC now. Katalon sits on top of Selenium and works in Chrome, which was a requirement for us.

A few other things you might be interested in

  • A BDD Framework – I use SpecFlow in .NET, but there is a bit of a learning curve. Plus, you can write behavior focused tests without making the BDD paradigm shift (see later)
  • An acceptance test tool – I happen to be fond of Fitnesse, as you can set it up with either Excel or a Wiki and let business add test conditions. It gets a bit strange when you tests magically fail because they have edited the wiki, but it sure shortens the feedback loop. Please note you can test for acceptance without a separate tool by making sure you identify all of the conditions
  • Some type of static code analysis – Visual Studio contains some tooling, which you can accentuate with extensions like Microsoft Code Analysis. I also recommend having static code analysis as part of your delivery pipeline if you use Continuous Delivery disciplines.

It is also nice to have a means of getting to fringe conditions in your tests – Intellitest in Visual Studio Enterprise (the evolution of Pex) is a great tool for this (Pex and Moles were tools from Microsoft research that were included in earlier versions of Visual Studio).

Well, I don’t accept that

What are you trying to get from automated testing? When I ask this question, the general answer focuses on quality code. But what is quality code? Certainly bug free, or at least critical bug free, is a must. But is that enough? I say no.

When we look at development and maintenance, maintenance is the long pole. The better your tests are, the easier your solution is to maintain. This should be obvious, as the tests provide guides to ensure any maintenance fixes avoid creating additional bugs you now have to conquer (think about those circular, fixing A causes B and fixing B causes A, types of bugs).

But I think we have to go beyond quality alone and think in terms of acceptability. This means you have to ask questions to determine what behaviors are acceptable. More important, you have to ask the RIGHT questions. Below is a sample Q&A with a business users.

User: A person can enter a number between one and one hundred in this box.

Dev: What if they put in zero?

User: It should warn them.

Dev: Same with negative numbers?

User: Yes, and everything over one hundred as well.

Dev: Can we make it a drop down so they can only pick one to one hundred?

User: We find drop downs get too unwieldy for our users when there are 100 numbers.

Dev: And no letters, right?

User: Exactly.

Dev: Can we put in something that ignores when someone types in letters in that field and text like the user name and password fields saying 1 to 100.

User: Oh, both would be great.

From this conversation, we understand that acceptance includes a user interface that stops a user from entering anything other than the integers 1 to 100. For many developers, adding constraints to the user interface would be enough to ensure this. But one should understand the business code must also contain checks, because code should never trust user input.

As you go through this exercise with your end users, you will find acceptability is beyond quality alone. It deals with perception, as well. And users will help you create a more comprehensive test solution. In summary, think both about maintainability and acceptability as you.

Software is a Bunch of BS

Some days I think of male bovine output when I see code, as some of it really smells. But when I talk about software being BS in conference talks, I am thinking about the dual aspects of code, which are behavior and state.

  • Your users are most concerned with state. When they pull out a record and change it, they want to be confident the new data is placed in the database, so they can find it in that state during their next session.
  • Your dev team is most concerned with behavior, or what they do to state (domain objects) as a user interacts with it. The behavior should result in the correct state changes.

This is why I recommend testing that focuses on behavior. I have, in the past, used a tool called Specflow when coding in .NET languages (SpecFlow can also be loaded directly into Visual Studio through extensions – I have the search here, as there are different extensions for different versions of Visual Studio. 2017 is here). SpecFlow requires a unit testing framework to work and works with many (MSTest, nUnit and xUnit are all included, although xUnit requires a separate NuGet package).

The upside of SpecFlow, for me, was when I pushed code over to teams in India. I could write the cucumber and generate the scaffolding and have the use BDD to create the code. For the first week or two, I would correct their tests, but once they got it down, I knew I could create the test scaffolds and get the desired results. The downside was the learning curve, as writing tests in cucumber takes some practice. There is also a paradigm shift in thinking from standard unit tests.

If SpecFlow is a bit too much of a curve right now, you can write behavior tests in a unit test framework by moving away from a single “UserTests” class with multiple tests to a single test per behavior. I will write a complete entry on this later, but the following pattern works well for behavior (I have the template outline in bullets and then a very simple example).

  • Class is named for the behavior being tested. In this case the class is named “WhenMultiplyingFiveTimesFive”
  • The code that runs the behavior is in the Class Initialization method
  • The variables accessed by the tests are kept in private static variables
  • The tests contain expected values and assertions

And here is the class.

[TestClass]
public class WhenMultiplyingFiveTimesFive {
    private static _inputA = 5;
    private static _inputB = 5;
    private static int _actual;

    [ClassInitialize]
    public static ClassInitialize(TestContext context) {
        _actual = MathLib.Multiply(_inputA, _inputB);
    }

    [TestMethod]
    public void ShouldReturn25() //This can also be ThenIGet25 or similar (cucumber)
    {
        int expected = 25;
        Assert.AreEqual(expected, _actual, string.Format("Returned {0} instead of {1}.", _actual, expected);
    }
}

The downside here is you end up with a lot of individual test classes, many with a single test. As you start dealing with more complex objects, your _actual will be the object and you can test multiple values. On the positive side, I can  generate a good portion of my tests by creating a simple program to read my domain models and auto generate stub test classes. It can’t easily figure out the When condition, but I can stub in all of the Should results (i.e. the tests). If you opt for the SpecFlow direction, you will generate your scaffold test classes from cucumber statements, like:

    Given I am Logged In
    When I Call the Multiply method with 5 as A and 5 as B
    Then it returns 25

Or similar. This will write out annotated methods for each of the conditions above, which is much like a coding paint by numbers set.

Process This

I will have to draw this subject out in another blog entry, as testing discipline is very important. For the time being, I will cover the basics you should include the DOs and DO NOTs of a good testing strategy.

  • DO write tests for all behavior in your system
  • DO NOT write tests for domain objects. If the language vendor has screwed up getters and setters, you have a bigger problem than testing. One exception would be domain objects with behavior (although in my shop you would have to have a really good argument for including behavior in state objects)
  • DO ensure you have proper test coverage to match acceptability
  • DO NOT use code coverage metrics as a metric of success. 100% coverage with crap tests still yields crap software.
  • DO ensure every bug has a test. DO write the test BEFORE you fix the bug. If you cannot write a test that goes red for the condition, you do not understand the bug. If you do not understand the bug, you should not be attempting to fix it.
  • DO ask questions of your users to better understand the user interactions. This will help you create tests focused on acceptability.
  • DO look at fringe conditions and conditions outside of the box.
  • DO add tests prior to refactoring (assuming there are not adequate tests on the code at the time).

None of this is automagic. You will have to instill this discipline in yourself and your team. There are no free lunches.

With legacy code, prioritize what needs to be tested. If you are replacing certain sections of the application completely, writing tests is probably a waste of time. Also focus on the things that need to be updated first, as a priority, so you have the tests completed. I should note you may have to reorganize a bit prior to adding tests. This is dangerous. But, using refactoring tools (Visual Studio or Rehsharper), you can lessen the risk. You need only pull things out far enough you can add unit tests, so extract method is your best friend here.

Summary

If you have never read the book The Checklist Manifesto, I would advice considering it for your bookshelf. The book focuses on how checklists have been used to save lives. Literally! Once you see the importance of a checklist, you can start thinking about automating the checks. Then, you will understand why automated testing is so important.

In this post, I focused on the very high level, mostly aiming at what the developer can do up front to greatly increase quality AND acceptability of the solutions coded. I strongly adhere to the idea of testing as much as is possible. Once I see a pattern in the tests, I generate them (often with Excel, which is a whole story in and of itself). But I focus on ensuring every condition I can think of is put in a test. I will use code coverage as a means of discovering areas I don’t have tests in. This does not mean I will create a test for every piece of code (domain objects being a prime example), but that I use the tools to help me help myself … and my team.

Peace and Grace,
Greg

Twitter: @gbworld

Why EF Code First Sucks … and how to fix it if you must use it


NOTE: EF is a tool. As with all tools, it has pros and cons. The headline is designed to get you to examine your useage.

I am a contrarian in many cases. I look at what is in front of me and ask one of the following questions:

  • When would I use this thing I don’t like
  • When would I not use this thing I do like

By constantly questioning, I find myself being a good devil’s advocate. It also keeps me focused on solving problems the “right way”, which means adding in the variables as I find them and validating my knee jerk reaction over and over again. Sometimes this means not adopting the latest FUD, no matter how awesome it is, as it will cause too much disruption (read: too much risk).

As a developer, I think you should explore everything you find and constantly be curious. As an architect, and now a manager, I think you have to take in account how many moving parts you are introducing before making a decision to add something new. I also am very strong at examining pros and cons before striking out. This does not mean I don’t make very quick decisions at times, but that I strive to always do at least some research before firing up the engine and traveling away from home.

For me, Entity Framework makes things easy and much faster to develop. But, it also puts you in a box, like all software designed to develop (even Visual Studio puts you in a box, just larger). There are some projects that fit in the box, but some that do not. As I work on mission critical applications, in an Enterprise setting, I find the maintainability and acceptability of the solution to outweigh time to market in most cases. This makes EF less attractive to me, especially since most EF developers I have met understand the basics and not much more. When you build software any idiot can use, I can guarantee you some of the people using it are idiots. The easier it is, the more idiots.

In particular, this article is focused on issues I am having with a current project I am testing. The problems I have found are not unique, however, as I have seen them in other instances. In this case, there are a few additional issues that exacerbate the problem. And that takes us to “why this post?”

To put it plainly, if you are going to use EF Code First, don’t use the defaults. Let’s dive in.

The problem with EF Code First

Let me be fair. This is not really a problem with EF Code First as much as understanding how to use it correctly. The issue I see is so many people DON’T know how to use it correctly. In fact, it seems to be a one-size fits all tool for so many people. And a great number of them are ignorant about the pros and cons of the product and the cause and effect nature of the issues it creates when used in default. And, as mentioned in the last section, some are idiots. 😉

Auto-generation is attractive. I utilize my skill in coding domain objects, push build and deploy and voila (wah-lah for those who hate French?). Now I have a site. If I just went with defaults, however, I have all of the following problems.

  • My database looks just like my business objects.
  • My interfaces to my data are CRUDdy (simply create, read, update and delete on tables that simply reconstitute my business objects)
  • My database is hard to optimize if looking just like my business objects is not the optimal way to solve it.
  • If I use GUIDs, which are great for certain scenarios (multiple live servers, creation of IDs on disconnected devices, etc.) the default clustered index can create issues down the line (mostly fragmentation)
  • If I use strings in my business objects, they are NVARCHAR(MAX) in my database
  • By default, I have no constraints (other than the auto-generated primary keys). This means no foreign keys in my database.
  • Filtering is done on the user interface.

Once again, I fully realize much of this is ignorance on how you SHOULD design your application with EF Code First, but with no warnings or guidance in Visual Studio to stop the developer from doing this, I am finding this type of bad EF Code First design is more the rule than the exception.

NOTE: I use the word business objects rather than domain models in this post, as I think the lack of domain focus exacerbates this problem. I will write about domain models in a future post and why domaincentric design (I use the made up domaincentric word to distinguish from a purist Domain Driven Design approach – see: Eric Evans).

Annotating with Attributes

I can cure most of the ills with attributes on my business objects. This includes putting keys of the correct type, adding max lengths and avoiding naked strings on objects. I will go through each of the issues I have discovered (not a complete list).

  • Clustered keys on GUIDs (uniqueidentifier data type in SQL Server)
  • NVARCHAR(MAX) fields
  • Foreign keys

Clustered Keys on Guids

When you use a clustered key, the items in a table are stored in the order of the key. With numeric keys, this means 1 comes before 2, etc. With a common key type being an IDENTITY field, or autoincrementing integer, this type of field works nicely. As each key is created, the new data is placed after previous data. You do not end up with table fragmentation very often, which makes maintenance of your database easier.

As long as you continue to add to the end of the table, the clustered key is very efficient. But, when you stick something in the middle of existing records, it causes some extra work, as it fragments the data table. Without getting too much into the internals of RDBMS, or SQL Server in particular, the underlying pages get fragmented. if the data completely fills a page, the fragmentation is cured by reorganizing pages. This is not trivial, but it is far less intrusive than when multiple rows exist in a single page and the internal data of the page has to be reorganized. Yes, there are things you can do to lessen the blow, like altering the fill percentage of the page, but there is a risk of fragmentation.

Why is fragmentation bad? It reduces the efficiency of the database, which can impact performance.

When you use a Globally Unique IDentified (GUID, which is represented as a uniqueidentifier in SQL Server), the GUIDs may be created sequentially, but this is not guaranteed, especially over time. This has to do with the algorithm to create GUIDs to “guarantee” they are unique (there is still a minuscule risk of creating 2 GUIDs that are alike, but it would be extremely rare – 2^128 or 1 in 340,282,366,920,938,000,000,000,000,000,000,000,000 – a very minuscule risk). When a new GUID is created, it can be sequential for quite some time. Then it is not. And over time, you end up with a lot out of sequence. And you start fragmenting. And performance suffers. And little children start crying … really loud.

Fortunately, this is easy to fix.

Version 1: Fragmentation city

[Key()]
public GUID Id { get;set;}

This is the same as the following in SQL Server.

tbd

Version 2: Non-fragmented bliss

Please read this entire section before attempting (Why? Because I am a smart a**). Let’s solve this problem.

[Key(NonClustered)]
public GUID Id { get;set;}

Bliss!

Yes, I a screwing with you. Entity Framework leaves NO WAY to add a non-clustered primary key in Code First with data annotations. So, you have to sling some code. This will require at least EF 6.2 (released a year ago, so you are on this if you are new to EF). In the OnModelCreating event, add the following:

modelbuilder.Entity().HasKey(k => k.Id, config => config.IsClustered(false));

I have also seen it done like this (very slight variation).

p.HasKey(k => k.ColumnId).ForSqlServerIsClustered(false);

The important take away is you dink with Is Clustered and setting it to false. Both create a migration that sets up a non-clustered index.

Cat’s and dogs sleeping together … and world peace.

If you are prior to EF 6.2, I feel really sorry for you. You can create non-clustered indexes, but not non-clustered primary keys. In that case, i would stop using Code First and cry. 🙂

NVARCHAR(MAX)

Before getting into the problem, I want to rant a bit. NVARCHAR(MAX) was created to solve a problem that, for the most part, did not exist. The problem is this:

What if I have a field that normally contains a manageable amount of data, but occasionally contains the entire context of War and Peace?

I see this as a possibility, but a very rare one. In most cases, when you have a s**t ton of data in a field, that is the norm, and when you don’t, that is the norm. Usually you can either constrain or not. And, in cases where you have small amounts of data, it is more often from bad test plans on BLOB (Binary Large OBject) fields where the tester does not want to copy and paste and wait.

NOTE: I am willing to listen to cases where NVARCHAR(MAX) is a proper design choice, as I realize I do not have all of the answers. Just ping me and let’s argue over it. 😉

Okay, I am being a bit too snarky. Let me regain my composure.

NVARCHAR(MAX) is designed to add some performance gains for the rare “I can either type in my name or enter War & Peace” scenarios. Anything small enough to fit into the SQL Server page is put in the page. Anything not, is placed in a type of BLOB storage. This means it can be much faster on records that fit in the page and only slow down on those that overflow.

Sounds Great, right? Yes, but it has some implications. You cannot index an NVARCHAR(MAX) field in SQL Server. I think this should be obvious for anyone who things about it, but let me describe the scenario and see if you can spot the problem.

  • You have a table with NVARCHAR(MAX)
  • There are 1,000,000 records in the table
  • You add the contents of War & Peace into 1/2 the records
  • You create an index on that field

Do you see the problem. Hint: How big is the index.

To solve the fact a person could create a multiple terrabyte index on a field, you cannot index using any BLOB field, even one, like NVARCHAR(MAX), that is pseudo-blob.

But, guess what EF does on strings? If you guessed “creates an NVARCHAR(MAX) field” give yourself a golden star.

Is this a problem? Probably not in most cases, but if you know you are creating a field like city, will you ever have a city that contains 1,000,000 characters? Not until we explore Zeekbart in the Omega quadrant (they are very verbose aliens) or start coding in Entish (LOTR’s creatures for the non-Tolkein fans).

Fortunately, you can solve this one very, very easily. Just add a data annotation.

[MaxLength(100)]
public string City { get; set; }

 

Wala Wala Washington. Now it will no longer create a MAX field and you can index city. And, yes, this is a common use case, although I would consider extracting the city information into it’s own table if you wish to use it as part of your queries (one primary reason for indexing this field is ad hoc searching on demographic fields, like “how many users in New York City had a problem”).

Foreign Keys

This is not an issue, as attributing for foreign keys is extremely easy to do. The problem is I find many people using EF Code first FAIL to do it. Your code is auto-generating your database folks. And constraints are a necessary element of ensuring the database catching bad data by throwing an exception when you try to put bad data in. Never rely on the brilliance of your developers to keep you company in business. Instead constrain the database so they can’t mess things up.

I ran into this problem (not EF, but lack of constraints) with a client in the early 2000s. I was asked to look at a problem with their data. When I logged on an examined the schema, I found no foreign keys. Asking the dev manager, he stated “the foreign keys were causing errors, so we removed them”. I said, “of course they were causing errors, because the code was set up to create data problems.” I got paid for identifying the problem, but they were unhappy I could not fix the missing data. Needless to say, they are out of business now. All because of a lack of foreign keys.

Fortunately, you can avoid this type of embarassment by adding … dun, dun, duh … foreign keys. Guess how? Yes, data annotations.

[ForeignKey(“KeyName”)] public int OtherTableId { get; set; }

I am not going to cover it in this post, but you can do composite (multiple field) keys, as well. I try not to use composite keys as a rule. Not because I am against the idea, per se, but because I find it harder on coders in many instances. If you start with the idea each table has a derived key as its primary key, it is very clean. You can then refactor the idea where a composite key would be better.

The Data Looks the Same

This is more of a design issue, but EF makes it very easy to simply pass objects back and forth between layers. If you adhere to principles of Domain Driven Design, you understand each domain is unique in how it treats models. Let’s take an eCommerce company as an example. In particular, let’s look at the customer object.

  • When an order is taken, a customer ID is added. For the sales department, when the order is retrieved, the customer information, including addresses, is needed to track down any issues.
  • For the warehouse, there is no need to have a customer object. You only need an order number with the items to pull.
  • For the shipping department, you need an address and phone number and nothing else.

I could go on, but if you use EF in default mode, you will be tempted to have a single domain and filter on the user interface.

This is what was done in an application I recently tested. Grab everything for all roles and then filter out what is not needed. But, since the application was not designed out enough, the following problems surfaced.

  • Some smaller tables were retrieved in toto and not filtered for the location or the user. Thus, if I added something, someone in another location saw it, even though they should not have.
  • As binding was not constrained, the tables had all of the information. If I grabbed a user object, I got all of the data if it was not filtered correctly.

Sure, the above have some other design issues that contribute to the problem, but EF in default forces you to filter after retrieval. When you do not, you are likely adding extra work, which may not be justified over other methods of data access (a topic for another day).

The bigger issue is the database storage looks just like the domain model. And, if you are using the EF defaults and auto-generation, you end up stuck in this paradigm. Data can be optimized with a few tools, like indexes, but once you figure it out, you need to make sure the changes are reflected in the EF model. In some cases, this is hard, and that which is hard is rarely tried (until someone threatens to fire you – then again, they threaten and you find a new job, so it is still not done).

In Summary

I am not a huge fan of Entity Framework in the Enterprise (important caveat?). Not because it does not work in many cases, but because it has a lot of moving parts and some that are missing. In most cases, you end up having to use custom migrations to get around the fact it is so easy to step outside of the box created for you. Once you start slinging code with every migration, you start asking whether going to scripted database changes is a better approach than data migrations in EF (I will argue that it IS in a future post, but your mileage may vary and I accept that each tool has its pros and cons).

A hill I will die on is “defaults are bad”, at least when it comes to Entity Framework in an Enterprise setting. I will also state you have to annotate your data if you are going to use your model to generate it. I would prefer don’t design your RELATIONAL database with OOP objects, but your mileage may vary.

Peace and Grace,
Greg

Twitter: @gbworld

Microservices in .NET part 2: Silver Bullets and Free Lunches?


I have spent the better part of this week digging into micro services and I love the idea. Here are some benefits I see that can be realized by using a microservices approach:

  1. The granularity level allows developers to stay on a single context while solving a problem. This singularity of focus makes it much easier to dig into the details of a specific object or set of objects. In many cases, it will quickly expose poor planning in an organization and provide rationale for fixing the process. As an example, a service that has a lot of churn is probably one that was not planned out well (not talking finding additional uses, but rather having to rebuild contracts and re-architect the service on a regular basis)
  2. The services are simple,making it easy to maintain the individual components.
  3. The methodology forces a good separation of concerns.
  4. You can use the best tool for the job rather than stick to a single platform, programming language or paradigm, etc. This is a dual edged sword, as I will uncover a bit later.
  5. Isolated solution problems can easily be fixed without much impact. If you find your employee microservice has an issue, you can fix it without deploying the entire solution.
  6. Working with multiple services enables the use of concepts like Continuous Integration (CI) and Continuous Delivery (CD). This is also dual edged, as you almost have to go to a full blown CD implementation to use microservices architectures. I will hit this later, as well.
  7. You can get multiple teams working independently of each other. This was always possible, of course, as I have pointed out in my Core As Application blog entries (one here), if you will take the time to plan out you contracts and domain models first. (NOTE: In 2010, I was told “you cannot design contracts first, as you don’t know all of the requirements up front”. By 2011, I proved this wrong by delivering using a contract first approach, both ahead of time and under budget – a bit of planning goes a long way).
  8. Systems are loosely coupled and highly cohesive.

This is just a short list of the benefits. The problem I see is everyone is focusing on the benefits as if we have finally found the silver bullet (do you have werewolves in your organization?) and gained a free lunch. This article focuses on some of the downsides to microservices.

DevOps

As you move to smaller and smaller services, there are many more parts that have to be deployed. In order to keep the solutions using the microservices up and running, you have to be able to push the services out to the correct location (URI?) so they can be contacted properly by the solutions using them. If you go to the nth degree, you could conceivably have tens, if not hundreds, of small services running in an Enterprise.

As each service is meant to be autonomous, this means you have to come up with a strategy for deployment for each. You also have to plan for high availability and failover. And there has to be a solid monitoring and instrumentation strategy in place. In short, you need all of the pieces of a good API Management strategy in place, and you need to do this BEFORE you start implementing microservices. And I have not even started focusing on how everything is wired together and load balancing of your solutions. On the plus side, once you solve this problem, you can tune each service independently.

There is a burden on the Dev side that needs to be tackled up front, as well. You need to start thinking about the requirements for monitoring, tracing and instrumenting the code base and ensure it is part of the template for every service. And you have to plan for failure, which is another topic.

As a final point on this topic, your dev and ops team(s) must be proficient in the combined concept of DevOps to have this be a success. Developers can no longer pitch items over to Ops with a deployment document. They have to be involved in joint discussions and help come up with plans for the individual services, as well as the bigger solutions.

Planning for Failure and Avoiding Failure

Services will fail. Looking at the plethora of articles on microservices, it is suggested you use patterns like circuit breakers (avoid hitting a failed service after a few attempts) and bulkheads (when enough “compartments” are “under water”, seal the rest of the solution from the failure point). This is a fine avoidance strategy, but what if the service failing is a critical component to the solution working?

Not mentioned in the articles I have read is a means of managing services after failure. Re-deploying is an option, and you can make redeployment easier using quickly set up virtual environments and/or containers, but what if reaching that portion of the network is the point of failure. I would love to hear comments on this next idea: Why not look at some form of registry for the services (part of API Management, similar to UDDI, etc.) or a master finder service that exists in various locations and that all applications are aware of. Another idea would be to include as part of the hyper-media specification backup service locations. But either of these solution further exacerbates the reliance on DevOps, creating even more need for planning solutions and monitoring released solutions.

I don’t see microservices working well without a good CI/CD strategy in place and some form of API management. The more I look into microservices the more I see the need for a system that can discover its various points on the fly (which leads me back to the ideas of using a finder service or utilizing hypermedia to inform the solutions utilizing microservices where other nodes exist).

Contracts and Versioning

When you develop applications as a single Visual Studio solution (thinking in terms of projects and not products?), you have the ability to change contracts as needed. After all, you have all of the code sitting in front of you, right? When you switch to an internal focus on services as released products, you can’t switch out contracts as easily. You have to come up with a versioning strategy.

I was in a conversation a few weeks ago where we discussed versioning. It was easy to see how URI changes for REST services required versioning, but one person disagreed when I stated changes to the objects you expose should be a reason for versioning in many instances. The answer was “we are using JSON, so it will not break the clients if you change the objects”. I think this topic deserves a side bar.

While it is true JSON allows a lot of leeway in reorganizing objects without physically breaking the client(s) using the service, there is also a concept of logical breakage. Adding a new property is generally less of a problem, unless that new element is critical for the microservice. Changing a property may also not cause breakage up front. As an example, you change from an int to a long as planning for the future. As long as the values do not exceed the greatest value for an int, there is no breakage on a client using an int on their version of the object. The issue here is it may be months or even years before a client breaks. And finding and fixing this particular breakage could be extremely difficult and lead to long down times.

There are going to be times when contract changes are necessary. In these cases, you will have to plan out the final end game, which will include both the client(s) and service(s) utilizing the new contract, as well as transitional architectures to get to the end game without introducing a “big bang” approach (which microservices are said to help us avoid). In short, you have to treat microservice changes the same way you approach changes on an external API (as a product). Here is a simple path for a minor change.

  1. Add a second version of the contract to the microservice and deploy (do not remove the earlier version at this time)
  2. Inform all service users the old contract is set to deprecate and create a reasonable schedule in conjunction with the consumers of the microservice
  3. Update the clients to use the new contract
  4. When all clients have updated, retire the old version of  the contract

This is standard operating procedure in external APIs, but not something most people think about a huge deal when the APIs are internal.

I am going to go back to a point I have made before. Planning is critical when working with small products like microservices. To avoid regular contract breakages, your architects and Subject Matter Experts (SMEs) need to make sure the big picture is outlined before heading down this path. And the plan has to be conveyed to the development teams, especially the leads. Development should be focused on their service when building, but there has to be someone minding the shop to ensure the contracts developed are not too restrictive based on the business needs for the solutions created from the services.

Duplication of Efforts

In theory, this will not happen in microservices, as we have the individual services focusing on single concerns. And, if we can imaging a world where every single class had a service (microservices to the extreme?) we can envision this, at least in theory. But should we break down to that granlular a level? I want to answer that question first.

In Martin Fowlers article, he talks about the Domain Driven Design (DDD) concept of a bounded context. A bounded context is a grouping of required state and behavior for a particular domain. Martin Fowler uses the following diagram to show two bounded contexts.

In the diagram above, you see some duplication in the bounded contexts in the form of duplication of customer and product. In a microservices architecture, you could conceivably move customer and product to its own service and avoid the duplication, but moving a concept out simply to avoid duplication is not the best motivation in all cases. If you can also make a customer or product business capability, I would wholeheartedly support this approach, but it is not always the case (another side bar).

When would you not want to separate out Customer and Product? In short, when the domain concept of these objects is different. In the sales context, a customer contains sales specific information, including terms of sale (net 60 days?) and other items that may not exist in a support context. If we are talking a company that ships products (as opposed to a service only company), we can add other contexts, like shipping and warehousing, that have radically different customer views. In the warehouse, a customer is completely unimportant, as they are focused on pulling orders. From a shipping standpoint, a customer is a name, a shipping address and a phone number. No need for any additional information. A customer microservice either spits out a complete object, allowing the services to filter (not a great idea from a security standpoint) or it provides multiple interfaces for each of the clients (duplication of efforts, but in a single service rather than multiple consumers and/or services, so it does not avoid duplication). A product can also be radically different in each of these contexts.

My advice to starting out is starting with bigger contexts and then decomposing as needed. The original “microservice” can act as an aggregator as you move to more granular approaches. Example of transitional states from the contexts above.

  1. Discover of duplication in the sales and support microservices leads to a decision the customer and product should be separate services
  2. New customer and product services created
  3. Sales and support services altered to use the new product and customer services
  4. new version of sales and support services created to avoid serving product and customer information
  5. Clients altered to use the new services as well as the sales and support services

This is one idea of migration, as we will discover in the next section.

Where do we Aggregate?

If we go back to the bounded context discussion in the last section, we see the need to aggregate. The question is where to we aggregate? You need to come up with a strategy for handling aggregation of information. I am still groking this, so I am not offering a solution at this time. Here are some options I can see.

Option 1 – Client: In a full microservices architecture, the client may be responsible for all aggregation. But what if the user’s client is a mobile application. The chattiness of a microservice architecture is hard enough to control across your internal multi-GB network infrastructure. Moving this out onto the Internet and cell networks expounds the latency. I am not saying this is a bad option in all cases, but if you opt for this approach, more focus on the latency issue is required from your mobile development team. On a positive note, if the client application can handle single service failures gracefully, you reduce the likelihood of a single point of failure.

Option 2 – Final service boundary: In this approach, the outermost service contacts the many microservices it requires to get work done and aggregates for the client. I find this more appealing, in general, for mobile clients. And it reduces the number of “proxies” required for web, simplifying the user interface client. As a negative, it creates a single point of failure that has to be handled.

Option 3 – Aggregation of dependencies: In this approach, the higher level service (closer to the client) aggregates what it requires to work for the client. At first, I liked this option the best, as it fits a SOA approach, but the more and more I read about the microservices idea, the more I see this as a potential combination of the bad points of the first two options, as you introduce numerous points of failure on the aggregate level while still potentially creating multiple points for latency in your client applications. I still think this might be something we can think through, so I am providing it.

If you can think of other options, feel free to add them in the comments.

Testing One, Two, Three

I won’t spend a lot of time on testability, but the more moving parts you have to test, the harder it is. To understand why this is so, create an application fully covered in unit tests at every level, but developed by different teams, and then integrate. The need for integration testing becomes very clear at this moment. And what if you are not only integrating multiple libraries, but multiple discrete, and very small, services. There is a lot of discipline.

I find the only reasonable answer is to have a full suite of unit tests and integration tests, as well as other forms of testing. To keep with the idea of Continuous Integration, only the smaller tests (unit tests) will be fired off with each CI build, but there will be a step in the CD cycle that exercises the full suite.

There is also a discipline change that has to occur (perhaps you do this already, but I find most people DON’T): You must now treat every defect as something that requires a test. You write the test before the fix to verify the bug. If you can’t verify the bug, you need to keep writing before you solve it. Solving something that is not verified is really “not solving” the problem. You may luck out … but then again, you may not.

Summary

There are no werewolves, so there are no silver bullets. There is no such concept as a free lunch. Don’t run around with hammers looking for nails.The point here is microservices is one approach, but don’t assume it comes without any costs.

As a person who has focused on external APIs for various companies (start ups all the way to Fortune 50 companies), I love the idea of taking the same concepts inside the Enterprise. I am also intrigued by the idea of introducing more granularity into solutions, as it “forces” the separation of concerns (something I find so many development shops are bad at). But I also see some potential gotchas when you go to Microservices.

Here are a few suggestions I would have at this point in time:

  1. Plan out your microservices strategy and architecture as if you were exposing every service to the public. Thinking this way pushes you to figure out deployment and versioning as a product rather than a component in a system.
  2. Think about solving issues up front. Figure out how you are going to monitor your plethora of services to find problems before they become huge issues (downtime outside of SLAs, etc). Put together a disaster recovery plan, as well as a plan to failover when you can’t bring a service back up on a particular node.
  3. In like mind, plan out your deployment strategy and API management up front. If you are not into CI and CD, plan to get there, as manually pushing out microservices is a recipe for disaster.
  4. Create a template for your microservices that includes any pieces needed for logging, monitoring, tracing, etc. Get every developer in the organization to use the template when creating new microservices. These plumbing issues should not requiring solving again and again.

Peace and Grace,
Greg

Twitter: @gbworld

Robo-SPAM Recruiters


To try to weed through job SPAM (I get about 150 recruiter emails a day), I set up a rule in email to send them back an email.

I used the following ruleset:

  1. I only responded to emails marked with the urgent flag
  2. I only responded if the words in the subject had titles I knew were off target (like “entry level”, “mid level”, etc)

This is the email I sent (yes, it is probably a bit snarky):

Thank you for your interest in me filling your current job opportunity. You are receiving this email because your email was marked urgent and had certain words in it that do not fit my resume, like “entry level”, “mid level”, “developer”, “analyst”, “engineer” or “local only”. My current job position is senior architect who specializes in pre-sales, enterprise architectural assessments and new client setups. I have not worked as a developer for many years. I also do not work as a recruiter, so I cannot help you fill positions. If you could, can you adjust your response methodology and only send me positions that actually fit my resume or place me on your “do not call” list?

More than likely, you did a buzzword search on the job boards and emailed your position out to every single person who had the buzzword in their resume. If you received this email more than once, you did the same search on multiple boards and use the same mass email process on each.

If I find your position is of interest, I will send you a response to determine how we should proceed.

Peace and Grace,
Gregory A. Beamer

Here is one email that shows what is going on (not a major shocker):

Hi Gregory,

I am so sorry to disturb you for each and every position. As we are using some kind of third party software, it detects the resumes automatically and send our opening to everyone in the list. Unfortunately, we can’t remove/change your email from that list. But, you can Unsubscribe from that if you see a link in the bottom of the email. Thank you and have a great day. 

Thanks & Best Regards,

S######

Technical Recruiter

So the crux is this

  1. They are using a third party software to copy and paste job requirements into the system
  2. The software automatically searches resumes for any buzzwords in the description
  3. Every time the resume search gets a hit, it automatically sends out an email (mail merge template) to everyone that has a hit
  4. The businesses that use the software cannot stop from SPAMming me
  5. In order to stop receiving these emails, I have to make the effort to unsubscribe from their list

#5 is the most telling, as I get around 150 emails a day, most of which have unsubscribe lists, from nearly as many companies. In order to stop receiving emails that incorrectly target me, I am the one having to make effort to stop them?

I know this Robo-SPAM is efficient, at least from the standpoint of reaching 10,000 people with a single click, but is it actually working? If TekShapers, Xchange Software and ZIllion Technologies (et al) did a study to determine whether it is actually working, would they answer yes? Is the damage to their reputation worth using software that sends out more incorrectly targeted emails than correctly targeted emails? Is the response rate high enough that it warrants such a lazy method of doing “business”?

Apparently they think so. I will never work with any of these guys. I thank “S” (name masked above) for actually taking a bit of time to respond to me. Of the more than 1000 jobs responded to with my auto replier, less than a dozen have actually contacted me back, primarily asking for my resume and some type of email explaining what exactly I would like to do for a living. The rest are in the vegans in a grocery store and I am a slab of meat.

Peace and Grace,
Greg

Twitter: @gbworld

How to be an Amateur IT Recruiter


I have received more than 350 emails over the past 2 days with job opportunities. While this sounds great (I am in demand?), most simply show the hallmarks of a amateur recruiter.

As a service to those desiring to be amateur recruiters, as opposed to professional recruiters, I offer the following guide to help you in your quest. Please include each of the following in your daily habits.

Email Basics

The first thing you have to master is email basics.

1. Mark Every Job Opportunity Email as Urgent

Contemporary wisdom says people pay attention to urgent emails more than non-urgent ones, so make sure you mark your email as urgent. True, nothing in your email is really urgent to the recipient (me? other IT professionals?), but who cares. This is not about me, it is about you. The only possible kink in this plan is other people might also be marking their job opportunities as urgent. Let’s look at a picture of my inbox job folder:

image

Wow, 100% of the job folder is marked Urgent. My suggestion is to unlock the secret “super urgent” button in your mail client and use it so I really know your email is urgent … to you.

You should also bear in mind that adding words like “Immediate need” add more punch.

2. Search By Buzzwords

There is no better way to reach masses of IT professionals and simply scream “I don’t know what the hell I am doing” like firing off job emails based on buzzwords. When I get a $10 an hour support tech email, I am thrilled at the opportunity to increase my stress level at a thankless job that pays a piddly fraction of what I am making now.

And while you are at it do the same blind search on multiple job boards using their massive Spam email generator to generate thousands of emails in a single keystroke. Efficiency at its best.

3. Send Multiple Emails To the Same Person

Nothing wastes more time than making sure you are not sending out multiple responses. Why take more than a few minutes to complete your entire day’s worth of work. Peruse multiple job boards and send out emails using the same buzzword. Sure, about 95% of your list just got multiple emails and knows you are a lazy moron, but perhaps you reach a handful that are only on one board.

image

The blessing here is you not only show me you have no attention to detail, but you show me I am just a piece of meat to you, completely invalidating my existence. Yes, I want you to be my recruiter, as I like feeling like I am nothing.

4. Bold and Highlight Lots of Sh*t in the Email

Sometimes when I am reading an email, I miss the important stuff, so make sure you not only bold it, but you highlight it as well. Otherwise, I might send in a resume for this VB6 position that pays nothing, in a state far away from my home. Thank God Abhilasha bolded and highlighted Perforce, or I might have done something dumb like send my resume in for the job. Whew! Dodged that bullet.

image

Even better, bold and highlight the entire email.

image

Or if you really want to annoy people highlight and use red bolding.

image

Bonus points to Calvin since I have never worked on a criminal justice application and that is a required skill.

7. Copy and Paste the Entire Email from the Client or Account Manager

Why take time to edit stuff out of the email before sending it out. That takes time and time is money. Let the candidate see how truly lazy you are by including stuff that makes absolutely no sense to the recipient.

image

Yes, that one is at the top of a job email.

8. Send Out Emails to People Who Fail an Absolute MUST qualification

For example, let’s say you have a job for a person that must be local to Florida. Send it out to everyone. There are bound to be a few people that are ACTUALLY FROM FLORIDA (CAPPED in response to the email).

image

This particular email really took the cake as it was a complete forward (see #7).

Make The Candidate Do the Work for You

Why actually interview people when you can have them send you all the interview details to

image

This is a simpler one. Some have dozens of questions that have to be filled out.

Phone Basics

It is not enough to reach the coveted complete amateur title without having amateur phone skills. Here are a few things that can help you in this regard.

1. Show You are Using a Very Old Resume

I, like many people I know, no longer include a home phone on my resume. In fact, I have not included the home phone for about 6 years. When you call me on my home phone, it pretty much says “I don’t have your latest information” and “you are just a slab of meat to me”.

2. Ignore The Candidate’s Requirements

Your job is to convince the candidate to take the job no matter what. Don’t let things like “I currently make twice the max you are offering for this position” deter you from suggesting how much of a “great place to work” it is. If you can keep the candidate talking, maybe he will work for minimum wage.

It should also be of no importance that the candidate states “I do not want to move to Siberia”. If the position is in Siberia, then your job is to keep hounding them until they decide to take up. As long as they are still listening to you, you have a chance, right?

3. Insult Their Spouse

Since your culture devalues women, there is no reason to be polite to an IT candidates wife.

4. Be Demanding

Just because the candidate stated they are busy is of no consequence. Demand they take your call NOW and treat them like the meat slab they are.

5. Call Numerous Times

This is a two parter. In the first part, hang up and call back when you get their answering machine. Many candidates will screen the call until you irritate the crap out of them. In the second, if a candidate states “send me the job requirement and I will look at it later”, call them back every half hour until they convince you they really aren’t interested. Do this even if you get an email stating they are not interested and repeat #2 above.

6. Illustrate You Have Not Read the Resume

Nothing states you are working towards a complete dufus award than asking me a question about an item that appears prominently on my resume. Bonus points if it is both prominent and sits at the top of my resume.

Summary

While much of this is tongue in cheek it still amazes me how many think recruiting is simply a matter of finding slabs of meat and sending them to a processing plant. Considering recruiting companies, even subcontracting recruiting companies, can make a good amount with candidates, you would think we would have more professionals out there.

The reason there is a problem is very simple. IT, especially on the development side, has been a seller’s market for over 10 years. This has led to the worst developer on the team making more than 90K (or $50+/hour consulting), but it has also made it somewhat profitable to be lazy as hell and do bare minimum when it comes to recruiting.

There are plenty of professionals out there, and I know quite a few locally. But my inbox is routinely filled up with the amateur yo-yos.

In the next few days, I will show you my method of combatting the yo you farm.

Peace and Grace,
Greg

Twitter: @gbworld

Troubleshooting: The Lost Skill?


This blog entry comes from an email I received this morning asking me to “check in files”, as my failure to do so was causing a work stoppage. After a very short examination, I found the following:

  1. I had no files checked out (absolutely none)
  2. The problem was a permissions problem, not a check out problem, or the person who could not check in their files was not being stopped by my failure to act, but by someone else’s incorrect granting of permissions
  3. I had no permissions to solve the problem (i.e. grant the correct permissions

Further investigation of the problem would have revealed it was a permissions issue. In this case, the only consequence is another day lost of productivity, and a wonderful opportunity to learn. In some cases, the consequences are more dire. Consider, for example, Jack Welsh, the CEO of GE.

Jack made an assumption and ended up destroying a manufacturing plant. In one telling of Jack Welsh’s story, the dialog goes something like this:

Jack: Aren’t you going to fire me now?
Manager: Fire you? I just spent 2 million dollars training you.

Considering Jack Welch is now one of the most successful executives of all time, it is good his manager was able to troubleshoot the aftermath to a problem Jack had worked through on assumption. The point is plain: When we don’t troubleshoot a problem, we go on assumptions. In the email I received this morning, there was an assumption I had files checked out. Rather than test the assumption, work stopped.

Tony Robbins tells a story in his Personal Power program about a suit of armor. As he is walking on stage, every time he moves close to a suit of armor there is feedback. The audience eventually starts screaming at him “it’s the armor”. But he continues to stand near the armor and the feedback eventually goes away. He then moves away and the feedback comes back. Turns out there was a fire on a very close frequency and the messages were interfering with the microphone.

Personally, I think the above story is a myth, as I know the FCC is very careful on doling out bands and it is unlikely a microphone has the same band as emergency services. But this is also an assumption, and proper troubleshooting would have me examining the issue.

The path of least resistance

On Facebook … nay, on the Internet as a whole, a large majority of items are written out of assumptions or biases, and not an examination of the facts. For most people, whether you agree with Obamacare or not is not an exercise in examining the facts completely and then drawing conclusions. Instead, a quick sniff test is done to determine if you feel something smells, and then action is taken.

Let’s take an example. In 2006, the news media reported on the Duke Lacrosse team raping an African American stripper. The case seemed open and shut, as the evidence piled up. Duke University suspended the team and when the lacrosse coach refused to resign, Duke’s president cancelled the rest of the season. The case seemed so open and shut, Nancy Grace (CNN) was suggesting the team should be castrated.

When the assumptions were removed, a completely different story was told. Not only was the evidence thin, much of it was manufactured. The District Attorney, Ray Nifong, was disbarred and thrown in jail for contempt of court.

We can also look at the George Zimmerman case, where the initial wave of “evidence” painted another “open and shut” case. But the “open and shut” case, based on assumptions, began to crumble when it was discovered ABC edited the 911 tape to paint Zimmerman as a racist and carefully choose the video and picture evidence to paint a picture of a man that had no wounds and was the obvious aggressor.

The point here is not to rehash these cases, but to point out that assumptions can lead to incorrect conclusions. Some of these assumptions may lead to dire consequences, while most just result in a less than optimal solution.

Now to the title of the section: The path of least resistance.

When we look at the natural world, things take the path of least resistance. Water tends to travel downhill, eroding the softest soil. Plants find the most optimal path to the sunlight, even if it makes them crooked. Buffalos would rather roar at each other to establish dominance than fight, as the fighting takes precious energy. And humans look for the least amount of effort to produce a result.

Let’s pop back to Obamacare, or the PPACA (Patient Protection and Affordable Care Act), as it illustrates this point. Pretty much everyone I encounter has an opinion on the subject. In fact, you probably have an opinion. But is the opinion based on assumption? You might be inclined to say no, but have you actually read the bill? If not, then you are working on distillations of the bill, most likely filter through the sites you like to visit on a regular basis. And, more than likely, you have chosen these sites as they tend to fit your own biases.

I am not deriding you on this choice. I only want you to realize this choice is based more on assumptions than troubleshooting. Troubleshooting takes some effort. In most cases, not as much as reading a 900+ page bill (boring) or many more thousands of pages of DHS rules (even more boring). But, by not doing this, your opinion is likely based on incomplete, and perhaps improper, facts.

Answering Questions

I see questions all the time. Inside our organization, I see questions for the Microsoft Center of Excellence (or MSCOE). I have also spent years answering online questions in forums. The general path is:

  1. Person encounters problem
  2. Person assumes solution
  3. Person asks, on the MSCOE list, to help with the assumed solution – In general, the question is “How do I wash a walrus” type of question rather than one with proper background of the actual business problem and any steps (including code) taken to attempt to solve it
  4. Respondent answers how to solve the problem, based on their own assumptions, rather than using troubleshooting skills and asking questions to ensure they understand the problem
  5. Assumed: Person implements solution – While the solution may be inferior, this is also “path of least resistance” and safe. If the solution fails, they have the “expert” to blame for the problem (job security?). If it succeeds, they appear to have the proper troubleshooting skills. And very little effort expended.

What is interesting is how many times I have found the answer to be wrong when the actual business problem is examined. Here are some observations.

  • The original poster, not taking time to troubleshoot, makes an assumption on the solution (path of least resistance)
  • Respondent, taking path of least resistance, answers the question with links to people solving the problem posted
  • If the original poster had used troubleshooting skills, rather than assumptions, he would have thrown out other possibilities, included all relevant information to help others help him troubleshoot, and would have expressed the actual business problem
  • If the respondent had use troubleshooting skills, rather than assumptions (primarily the assumption the poster had used troubleshooting skills), he would have asked questions before giving answers.

To illustrate this, I once saw a post similar to the following on a Microsoft forum (meaning this is paraphrased from memory).

Can anybody help me. We have a site that has been working for years in IIS 4. We recently upgraded to Windows Server 2008 and the site is no longer serving up asp.net files located at C:\MyFiles. I just hate Microsoft right now, as I am sure it is something they changed in windows. I need to get this site up today, and f-ing Microsoft wants to charge for technical support.

The first answers dealt with how to solve the problem by turning off the feature in IIS that stops the web server from serving files outside of the web directory structure. While this was a solution, troubleshooting the problem would have shown it was a bad solution.

Imagine the user had written this instead.

We recently upgraded to Windows Server 2008 and the site is no longer serving up asp.net files located at C:\Windows\System32.

Turn off the feature in IIS would have still solve the problem, but there is now an open path directly to the system for hackers. And, if this is the way the person implements the solution, there is likely some other problems in the code base that will allow the exploit.

The proper troubleshooting would have been to first determine why ASP.NET files were being served from C:\MyFiles instead of IIS directories. Long story, as the reason had to do with an assumption developing on a developer box generally, perhaps always, led to development of sites that did not work in production. So every developer was working on a production server directly. The C:\MyFiles was created from an improper assumption about security, that is was more secure to have developers working from a share than an IIS directory. This led to kludges to make the files work, which failed once the site was moved to a server with a version of IIS that stopped file and folder traversing. This was done as a security provision, as hackers had learned to put in a URL like:

http://mysite.com/../../../Windows/System32/cmd.exe%20;

Or similar. I don’t have the actual syntax above, but it was similar to above and it worked. So IIS stopped you from using files outside of IIS folders. Problem solved.

Now, there are multiple “solutions” to the posters problem:

  • Turn off the IIS feature and allow traversing of directories. This makes the site work again, but also leaves a security hole.
  • Go into IIS and add the folder C:\MyFiles folder as a virtual folder. This is a better short term solution than the one above. I say short term, as there is some administrative overhead to this solution that is not needed in this particular case.
  • Educate the organization on the proper way to set up development. This is not the path of least resistance, but a necessary step to get the organization on the right path. This will more than likely involve solving the original problem that created the string of kludges that ended with a post blaming Microsoft for bringing a site down.

Troubleshooting The Original Problem

I am going to use the original “Check in your files” problem to illustrate troubleshooting. The formula is general enough you can tailor it to your use, but I am using specifics.

First, create a hypothesis.

I cannot check in the files, so I hypothesize Greg has them checked out.

Next, try to disprove the hypothesis. This is done by attempting to find checked out files. In this case, the hypothesis would have easily been destroyed by examining the files and find out none were checked out.

Graphic1

So the next step would be to set up another hypothesis. But let’s assume we found this file as “checked out”. The next step is to look at the person who has the file checked out to ensure the problem is “Greg has the file checked out” and not “someone has the file checked out”.

Graphic2

Since the name Greg Beamer is not here, even if the file were checked out, he cannot solve the problem.

Next, even if you have a possible solution, make sure you eliminate other potential issues. In this case, let’s assume only some of the files were checked out when examined, but the user was still having problems uploading. What else can cause this issue.

Here is what I did.

  1. Assume I do have things checked out first, as it is a possible reason for the problem. When that failed, look at the user’s permissions on the files in question. I found this:
    image
  2. Hypothesis: User does not have proper permissions. Attempted solution: Grant permissions
  3. Found out permissions were inherited, so it was not a good idea to grant at the page level. Move up to the site level required opening in SharePoint online, where I find the same permissions.
    image
  4. Now, my inclination is to grant permissions myself, but I noticed something else.
    image

    which leads to this
    ecGraphic4

    which further led to this (looking at Site Collection Images):
    image

The permissions here are completely different. The user is not restricted, so he can access these.

I did try to give permissions to solve the issue:
Graphic3

But end up with incomplete results:

image

I further got rid of the special permissions on some folders, as they were not needed. More than likely added to give the developer rights to those folders. I still have the above, however, which means someone more skilled needs to solve the problem.

The point here is numerous issues were found, none of which were the original hypothesis, which was reached via an assumption. The assumption was

I cannot check in, therefore I assume someone has the files checked out. Since Greg is the only other person I know working on the files, I assume he has them checked out.

Both assumptions were incorrect. But that is not the main point. The main point is even if they were correct, are there any other issues. As illustrated, there were numerous issues that needed to be solved.

Summary

Troubleshooting is a scientific endeavor. Like any experiment, you have to state the problem first. If you don’t understand a problem, you can’t solve it..

You then have to form a hypothesis. If it fails, you have to do over, perhaps even redefining the problem. You do this until you find a hypothesis that works.

After you solve the problem, you should look at other causes. Why? Because you either a) may not have the best solution and b) you may still have other issues. This is a step that is missed more often than not, especially by junior IT staff.

Let me end with a story on the importance of troubleshooting:

Almost everyone I know that has the CCIE certification took two tries to get it. If you don’t know what CCIE is, it is the Cisco Certified Internetwork Engineer certification. It is considered one of the most coveted certifications and one of the hardest to attain. The reason is you have to troubleshoot rather than simply solve the problem.

The certification is in two parts. A written exam, which most people pass the first time, and a practical exercise, which most fail. The practical exercise takes place over a day and has two parts:

  1. You have to set up a network in a lab according to specifications given at the beginning of the day.
  2. After lunch, you come back to find something not working and have to troubleshoot the problem

Many of the people I know that failed the first time solved the problem and got the network working.So why did they fail? They went on assumptions based on problems they had solved in the past rather than worked through a checklist of troubleshooting steps. Talking to one of my CCIE friends, he explained it this way (paraphrased, of course):

When you simply solve the problem, you may get things working, but you may also end up taking shortcuts that cause other problems down the line. Sometimes these further problems are more expensive than the original problem, especially if they deal with security.

Sound similar to a problem described in this blog entry? Just turn off IIS directory traversing and the site works. Both the poster and the hacker say thanks.

Please note there are times when the short solution is required, even if it is not the best. There are time constraints that dictate a less than optimal approach. Realize, however, this is technical debt, which will eventually have to be paid. When you do not take the time to troubleshoot, and run on assumption, build in time to go back through the problem later, when the crisis is over. That, of course, is a topic for another day.

Peace and Grace,
Greg

Twitter: @gbworld

Using Git with Visual Studio ALM: Why Visual Studio ALM?


In this blog entry, I want to give reasoning for using Visual Studio ALM 2013. The “case study” here is based on a client that uses Visual Studio and Git, but does not utilize any unified system for Application Lifecycle Management (or ALM). This is the intro for a series of video posts on how to do this.

This particular post uses the words “Visual Studio ALM” as a high level concept. As I drill down in later entries, you will start to see different products and parts of products coming out of this larger concept. The point here is using “Visual Studio ALM” is much like using analogies. It is a good way to start the conversation to get the idea nailed down, but breaks down eventually. As a takeaway, understand that much of what “Visual Studio ALM” will mean, in the context of this series, is “Team Foundation Server”.

ABC123 Company

The series here is going to focus on a fictional company, ABC123, which produces childhood content for websites. The company has a website, some web services and offers content services, as well as a child health advice line, for a wide variety of partners.

The development team uses Git for their source repository, a decision made both due to familiarity of one of the senior developers and the open source nature of the repository. They have adopted OnTime for time tracking and a consulting company helped them utilize the tool for Agile project management. They also use Trello in some groups to facilitate a more Kanban type of approach to Agile project management, but this team still enters time into OnTime. JetBrain’s TeamCity has been picked up for build management, which is set up in a continuous integration model. There is no continuous delivery at the time, but the concept is being considered for adoption.

The Project Management Office (PMO) handles both portfolio management and tasks in Excel initially and the tasks are then placed into OnTime and/or Trello for the development team to work on. The PMO also uses Microsoft Project to track projects.

Due to the use of a variety of programs, most which are not commonly used in the marketplace, the learning curve for new developers is rather high.

Recommendations

Here are some recommendations given to ABC123.

  1. Move from a server farm to a virtual environment utilizing multiple VMs.
  2. Implement Visual Studio ALM
    Some specific pieces of “Visual Studio ALM” I would like to implement are:
  • TFS Integration with Excel and Project (Long Term Project Server is a possibility)
  • Lab Management
  • Work Item Tracking
  • Agile Reports
  • GIT Integration

Recommendation 1: Use a Virtual Environment

This first recommendation is not part of the series, but I am including it as virtualization is mandatory on some levels. Lab management requires virtualization to work, so you have to work with virtual machines on this level at the very least. To create labs that more closely mimic production, virtualization is advised in the production environment.

I am under the belief we will eventually move more and more of our stuff to the cloud, either private or public. Technically, the cloud is possible on physical machines, but you find it more commonly done virtually, as it allows you to more quickly expand your cloud, or clouds. This is a topic far beyond the scope of this entry, but the takeaway is you are likely to virtualize your environment(s) at some time, and biting it off prior to cloud implementation will allow smaller steps over some type of “big bang”.

Enough said on this one.

Recommendation: Adopt Visual Studio ALM

This recommendation is for a variety of reasons, but most of the reasons fall into three buckets.

  1. Consolidation of toolsets to make it easier to accurately track projects
  2. Integration with common Microsoft products, including Visual Studio, which is currently used for development in the company.
  3. Open new scenarios

The first reason is focused both on a) flattening the learning curve for new developers, saving the company money and b) allowing additional scenarios that are difficult under the current toolset and environment.

The second reason is to enable the team to use the tools they currently use and eliminate the need to enter the same information into multiple tools.

The third reason focuses on the fact that the current reporting story is rather thin. There is limited reporting on productivity and few, if any, reports on different parts of the development process. There are manual processes for other types of reporting (for example, determining what items were checked in for a particular user story.

Features to Use in Visual Studio 2013 ALM

In future posts, I am going to illustrate many, if not all, of the following features The only one I am not sure I can illustrate, at least at this time, is lab management, but I will work to set up an environment once my powerhouse computer is returned from Dell support.

NOTE: These are the same features mentioned earlier

  • TFS Integration with Excel and Project (Long Term Project Server is a possibility)
  • Lab Management
  • Work Item Tracking
  • Agile Reports
  • GIT Integration
     
    Integration with Excel and Project

    The PMO should not have to change tools or enter work into multiple tools. Conversely, the developers should be able to see the work items without leaving Visual Studio. Visual Studio ALM allows project managers to enter in user stories and tasks into TFS repositories from tools like Excel and Project. The data can be pulled out at any time into these tools, altered, added to and checked back in so team members can start working on them.

    There are some scenarios that would benefit from the use of Project Server, but this is a longer term recommendation. Most of the functionality needed can be facilitated by using Project with Team Foundation Server 2013.

    Lab Management

    Lab management is useful to set up a test environment that can be reset easily when a new build is released. The ultimate goal is to facilitate continuous integration and continuous delivery scenarios. The diea here is to have the environment reset before a build is pushed and then run all tests in an environment that is similar the production environment. This does not have to be the testing environment, as it can be a developer test environment separate from environments like SIT (Staging Integration Testing).

    Work Item Tracking

    Under the current setup, ABC123 has a hard time tracking what has been changed in their code or relating code to particular user stories and tasks. While this may not seem very important, understanding the motivation for change helps you avoid breaking new features when fixing bugs in old features, and visa versa. The benefit of using Visual Studio ALM here is source is related to tasks and user stories, so you have the ability to run a variety of reports to see how change occurs.

    Agile: Reports and Features

    Currently ABC123 has very few reports. The tool they are using has a basic burndown chart with an estimated completion date, and whether or not the team is on track to its expected end date. It also contains a user productivity report that states all of the time logged in the tool. To gain further metrics is a manual process.

    Now, the word “reports” here is rather generic. The types of information I am talking about here are things like the following list. The items that can easily be determined today are shown with an asterisk. Items with a plus are items that can be determined today, but take a bit more work. Other items (bolded) require manual reporting.

    • Burndown (Scrum)
      Release Burndown*
      Sprint Burdown*
      Individual Burndown+
    • Cumulative flow  (Kanban)
    • Blocked work items and tasks
    • User work log*
    • User productivity in team
    • Velocity*
    • Code Coverage+
    • Code Quality Metrics
      This is not an all inclusive list, but covers a bit of the important features needed for ABC123 IT staff to complete projects.
      NOTE:: All of the items are technically possible today, if enough effort is put in.
    GIT Integration

    The company has no desire to move away from Git for source control and there is no reason they should. But, it would be nice to have the majority of the Git functions completely integrated into Visual Studio and to have the ability report on code in the same reporting system the project is tracked. In addition, it would be nice to be able to track what pieces of code were check-in with different tasks, work items and/or user stories. Visual Studio ALM is useful in all of these scenarios.

    Summary

    This blog entry focuses on the scenario for the rest of the series. From this point on, I am going to focus on a variety of topics, including:

    • Setting up GIT on user machines
    • Creating/adapting projects to use TFS with GIT
    • Agile project management with Visual Studio ALM
    • Next in series: Setting up GIT on WIndows
  • Peace and Grace,
    Greg

    Twitter: @gbworld
    YouTube: http://www.youtube.com/gabworld