Adventures in Refactoring: Eliminating Switches in Code


Before I get started, I need to explain that there are many constraints in this project that I cannot get rid of. For this reason, I focusing solely on refactoring out branching logic that will continue to grow if left as is.

Let’s describe the situation. A system has been put in place for other teams to put “notifications” into “queues”. The terms are in quotes, as the “notification” is an item placed in a list of work to do and the “queue” is a SharePoint list that both “queues” items and is used to update status. This is shown below:

image

Architecturally, I know of many better patterns, but this is what I have to work with. Technically, I have to facilitate the idea of serializing objects to strings so a windows service can read the string, serialize the correct object and do some work on it. The side of the equation being refactored in this blog entry is the serialization of multiple derived types

Object enter the system via the “Notification” service, which accepts an object of the type NotificationBase:

Code Snippet
  1. public interface INotificationService
  2. {
  3.     Guid NotifyChange(NotificationBase message, string queue);
  4.     ProcessStatus GetProcessStatus(Guid notificationId, string queue);
  5. }
  6.  
  7. public class NotificationBase : IExtensibleDataObject
  8. {
  9.     public Guid NotificationId { get; set; }
  10.     public string Description { get; set; }
  11.     public ExtensionDataObject ExtensionData { get; set; }
  12. }

The actual client creates a type derived from this base, as in the following:

Code Snippet
  1. public class AdvisorNotification : NotificationBase
  2. {
  3.     public Guid AdvisorID { get; set; }
  4.     public OperationType Action { get; set; }
  5.     public string Version { get; set; }
  6.     public List<GuidMap> CloningChanges { get; set; }
  7. }

There are potentially infinite types of objects, as more and more people use this service to enter work into “queues”.

The current pattern to facilitate the serialization and storage of the items in the “queues” was to switch on type.

Code Snippet
  1. string messageString;
  2.  
  3. if (notification is AdvisorNotification)
  4. {
  5.     var advisorNotification = (AdvisorNotification)notification;
  6.     messageString = SerializationHelper.SerializeToXml(advisorNotification);
  7. }
  8. else if (notification is WorkflowNotification)
  9. {
  10.     var workflowNotification = (WorkflowNotification)notification;
  11.     messageString = SerializationHelper.SerializeToXml(workflowNotification);
  12. }

The primary reason it was done this way is familiarity with the switch type of logic. The problem, from a maintenance standpoint, is I have to update code every time a new derived notification type is created. This means software that is constantly being updated. It also goes against the Single Responsibility Principle (a SOLID software practice) and the code smell of having to change logic in multiple places to facilitate a simple change.

There are a couple of ways to refactor out this pattern. A common approach, for example, is polymorphism. But due to various constraints, I cannot head this direction. I also see a simpler way to solve the problem at hand and show how to simplify the software.

It would be nice to have this down to one line, like so:

Code Snippet
  1. string messageString = SerializationHelper.SerializeToXml(notification, false);

But the serializer serializes to NotificationBase, which causes an exception. So, a look at the serializer is in order:

Code Snippet
  1. public static string SerializeToXml<T>(T obj)
  2. {
  3.     var serializer = new XmlSerializer(typeof(T));
  4.  
  5.     using (var writer = new StringWriter())
  6.     {
  7.         serializer.Serialize(writer, obj);
  8.         return writer.ToString();
  9.     }
  10. }

The issue here is typeof(T). Why? This uses the generic passed. Since the generic is implicitly stated, the compiler examines and creates instructions as if this line was fed:

Code Snippet
  1. string messageString = SerializationHelper.SerializeToXml<NotificationBase>(notification, false);

What I need the compiler to do is set it up like this:

Code Snippet
  1. string messageString = SerializationHelper.SerializeToXml<AdvisorNotification>(notification, false);

Now, your thought process, as this time, might be to dynamically supply the generic type, but the following syntax does not work:

Code Snippet
  1. string messageString = SerializationHelper.SerializeToXml<notification.GetType()>(notification, false);

It would be nice if Microsoft made a very easy facility to create generic types dynamically like the above, but it is not possible today (maybe in C# vNext? – not .NET 4.5, as it is in RC). Sans the syntactical sugar, let’s focus on the issue for this problem, which is needing to serialize the derived type rather not the base type. To understand this, look at the following code and comments:

Code Snippet
  1. Type type1 = typeof (T); //type1 == NotificationBase;
  2. Type type2 = obj.GetType(); //type2 == AdvisorNotification

 

Notice that the typeof(T) returns the base object, as that is how the derived object is passed through the system, as in the following method signature:

Code Snippet
  1. private static Guid NotifyDataChange(NotificationBase notification, string queueName)

But, obj.GetType() returns the derived object, as that is its “true identity” (Clark Kent in trouble now?). So, back to our serialize method:

Code Snippet
  1. public static string SerializeToXml<T>(T obj)
  2. {
  3.     var serializer = new XmlSerializer(typeof(T));
  4.  
  5.     using (var writer = new StringWriter())
  6.     {
  7.         serializer.Serialize(writer, obj);
  8.         return writer.ToString();
  9.     }
  10. }

I could simply do the following:

Code Snippet
  1. public static string SerializeToXml<T>(T obj)
  2. {
  3.     var serializer = new XmlSerializer(obj.GetType());
  4.  
  5.     using (var writer = new StringWriter())
  6.     {
  7.         serializer.Serialize(writer, obj);
  8.         return writer.ToString();
  9.     }
  10. }

The problem here is the code is already implemented. Perhaps some of the clients can only use the typeof(T) method of serialization. Since I don’t know this, I am going to use a safer refactoring. First, I add an optional parameter called useTypeOfT.

Code Snippet
  1. public static string SerializeToXml<T>(T obj, bool useTypeOfT = true)

Then, I then separate the type from the intiailization of the XmlSerializer:

Code Snippet
  1. Type type = typeof(T);
  2. var serializer = new XmlSerializer(type);

Finally, I add a simple branch based on useTypeOfT to either use typeof(T) or obj.GetType(), with the default being the currently used typeof(T).

Code Snippet
  1. Type type = useTypeOf ? typeof(T) : obj.GetType();
  2. var serializer = new XmlSerializer(type);

Here is the entire routine with these changes.

Code Snippet
  1. public static string SerializeToXml<T>(T obj, bool useTypeOf = true)
  2. {
  3.     Type type = useTypeOf ? typeof(T) : obj.GetType();
  4.     var serializer = new XmlSerializer(type);
  5.  
  6.     using (var writer = new StringWriter())
  7.     {
  8.         serializer.Serialize(writer, obj);
  9.         return writer.ToString();
  10.     }
  11. }

So, now let’s go back to our branching logic and refactor it down. First, let’s look at the original:

Code Snippet
  1. string messageString;
  2.  
  3. if (notification is AdvisorNotification)
  4. {
  5.     var advisorNotification = (AdvisorNotification)notification;
  6.     messageString = SerializationHelper.SerializeToXml(advisorNotification);
  7. }
  8. else if (notification is WorkflowNotification)
  9. {
  10.     var workflowNotification = (WorkflowNotification)notification;
  11.     messageString = SerializationHelper.SerializeToXml(workflowNotification);
  12. }

The first change would be set the optional argument up (useTypeOfT) and set it to false, as in the following. Simple, small change, so it is easy to test.

Code Snippet
  1. string messageString;
  2.  
  3. if (notification is AdvisorNotification)
  4. {
  5.     var advisorNotification = (AdvisorNotification)notification;
  6.     messageString = SerializationHelper.SerializeToXml(advisorNotification, false);
  7. }
  8. else if (notification is WorkflowNotification)
  9. {
  10.     var workflowNotification = (WorkflowNotification)notification;
  11.     messageString = SerializationHelper.SerializeToXml(workflowNotification, false);
  12. }

Now, since it can determine the correct derived type, we can get rid of the recasting of the base class into child objects.

Code Snippet
  1. string messageString;
  2.  
  3. if (notification is AdvisorNotification)
  4. {
  5.     messageString = SerializationHelper.SerializeToXml(notification, false);
  6. }
  7. else if (notification is WorkflowNotification)
  8. {
  9.     messageString = SerializationHelper.SerializeToXml(notification, false);
  10. }

If we look at the code above, we now notice the lines in the branches are identical, so we can refactor out the entire branching logic. This leaves us with the desired end condition, as follows:

Code Snippet
  1. string messageString = SerializationHelper.SerializeToXml(notification, false);

The switch statement is now gone and I can add as many different types of derived notification types as needed to implement the full system, without any code changes to my service or serializer. And, since I have left the base logic alone, I should not have any issues with clients who have implemented the library using typeof(T) to determine type.

Summary

Simplicity is the mark of good software. As software features grow, however, solutions often get more complex rather than simpler. This means the developer has to “take the bull by the horns” and force refactoring to simplicity.

In this blog entry, I focused on one type of refactoring to get rid of unnecessary branching logic. The logic was forced by a “flaw” in a serializer, but cascaded up to create a solution that required branching logic changes every time a new type was added to the system, rather than the simple addition of new derived domain objects. By the end, we had code that required no changes to facilitate new “queued” types.

There are likely to be a variety of other ways to “skin this cat”. For example, we could use polymorphism to get rid of the branch. If there were major logic branches (changes in behavior based on type, for example, rather than just new derived types), I would definitely explore this option. Another possible option is getting deeper into the covariance and contravariance features added to C# in .NET 4.0. If someone would like to link this type of example, I would love to read it.

Peace and Grace,
Greg

Twitter: @gbworld

Anti-Pattern: Trusting User Input taken to the Nth


Most developers are well aware that we should not trust user input. In the 90s, many learned this when they hid pricing information in hidden fields on a form and clever users changed the price of an item to a penny and successfully bought it. Others have learned this through SQL injection hacks.

But it goes beyond information that users input via a form. Anything you have to store that affects behavior in your system should be validated prior to attempting to initiate the behavior.

Just yesterday, I found an interesting form of trust that is particularly insidious. The system has a service that accepts an object from an outside system. It also takes an ID that it uses like a primary key in our system. Rather than go on typing, let me illustrate.

The system uses a variety of queues and there are objects that need to be queued. I have faked the actual objects (to protect the innocent?), but the core concept is like this. There is a base object that indicates a queued object.

Code Snippet
  1. public class QueuedObjectBase
  2. {
  3.     public Guid Id { get; set; }
  4.     public string Description { get; set; }
  5. }

 

In this object is an Id, which is used to store the object in a queue and a description so the consuming team can determine what an object was designed for. I am not worried about the description, although I do consider it a bit superfluous in this application, so I will get back to this ID in a moment.

The classes representing the objects that are queued derive from the QueuedObjectBase class, like so:

Code Snippet
  1. public class CatalogQueuedObject : QueuedObjectBase
  2. {
  3.     public Guid CatalogId { get; set; }
  4.     public string Name { get; set; }
  5.     public int CustomerSet { get; set; }
  6.     public string Country { get; set; }
  7.     public string Langauge { get; set; }
  8. }
  9.  
  10. public class BrowseQueuedObject : QueuedObjectBase
  11. {
  12.     public Guid BrowseId { get; set; }
  13.     public string Name { get; set; }
  14.     public string PageUrl { get; set; }
  15. }

The actual object (as derived type) is stored in a serialized format (not a paradigm I like, but “it is what it is”). But, before it is stored, the Id and Description are pulled. This is actually accomplished in a CMS system (used like a database, which is another paradigm I dislike), but it is easier to illustrate when we look at a database.

Suppose we have a table designed like this:

Code Snippet
  1. CREATE TABLE StorageQueue(Id uniqueidentifier PRIMARY KEY,Descript varchar(1024) NULL,Serialized varchar(max) NOT NULL)

If you want to see this visually, you end up with this:

clip_image002

When a Queue object enters the system, the following happens:

  1. The Id is pulled out as a variable to go in the Id field
  2. The Description is pulled out as a variable to go in the Descript field
  3. The entire object is serialized to go in the serialized field

The code would look something like this:

Code Snippet
  1. private void StoreObjectInQueue(QueuedObjectBase objectToStore)
  2. {
  3.     string queue = GetQueueName(objectToStore);
  4.     Guid id = objectToStore.Id;
  5.     string description = objectToStore.Description;
  6.     string serializedObject = SerializedObject(objectToStore);
  7.     QueueRepository.Store(queue, id, description, serializedObject);
  8. }

Assume the repository performs something like the following:

Code Snippet
  1. INSERT INTO StorageQueue (Id, Descript, Serialized)
  2. VALUES (@id, @descript, @serialized)

Seeing the problem yet? Let’s illustrate by setting up some code that is guaranteed to create a problem.

Code Snippet
  1. private void ClientSide_StoreObjects()
  2. {
  3.     Guid id = GetIdForQueue(1);
  4.     Guid browseId = Guid.NewGuid();
  5.     Guid catalogId = Guid.NewGuid();
  6.     string browseName = "";
  7.     string browsePageUrl = "";
  8.  
  9.     //First a browse queue
  10.     var browseQueuedObject = new BrowseQueuedObject
  11.     {
  12.         BrowseId = browseId,
  13.         Description = null,
  14.         Id = id,
  15.         Name = browseName,
  16.         PageUrl = browsePageUrl
  17.     };
  18.  
  19.     StoreObjectInQueue(browseQueuedObject);
  20.  
  21.     //Then a catalog queue
  22.     var catalogQueuedObject = new CatalogQueuedObject
  23.     {
  24.         CatalogId = catalogId,
  25.         Country = "us",
  26.         CustomerSet = 666,
  27.         Description = null,
  28.         Id = id,
  29.         Langauge = "en",
  30.         Name = "Id Clash"
  31.     };
  32.  
  33.     StoreObjectInQueue(catalogQueuedObject);
  34. }

Got it now? Yes, since we have given the client the opportunity to control our Id (the primary key in the database scenario), they can send us duplicate information.

In the way we currently have this set up, the second call throws an exception, so we can inform the client they cannot have that Id, as they have already used it in the queue. But, what if we are using a different persistent mechanism that defaults to overwriting the browseQueuedObject with a catalogQueuedObject since they have the same id?

Here is the client code to pull an object from queue that is a browse queued object.

Code Snippet
  1. private void ClientSide_PullBrowseObject()
  2. {
  3.     Guid id = GetIdForQueue(1);
  4.     BrowseQueuedObject browseQueuedObject = (BrowseQueuedObject)GetItemFromQueue(id);
  5.     //Work on browse queued object here
  6. }

What happens now is a casting exception, as the actual object returned is a catalogQueuedObject instead of a browseQueuedObject.

Now, you probably are saying something like “we can protect the user by adding a type and then validating”, but the core problem is we have tightly coupled the client to our storage mechanism when we allow them to create the key. As this is our storage engine, we should be in control of the keys. We can pass the key back for the client to store so they can communicate (one option) or we can store their id for the object (or similar object on their side) along with a type. We can also make multiple queues. But, we should not have them deciding how we identify the object.

When I mentioned this, the first argument I heard was “well this is an internal application and our developers would never do that kind of thing”. This is an argument I have heard numerous times for a variety of bad practices. The problem is we might eventually open this system to people outside of our organization. At first, it might be partners. This increases the likelihood of clashes, but not likely someone will purposefully try to crash the system. But, one day we might want to use this service like FaceBook or Twitter and open to the world. At that point, you might have hackers trying to overwrite objects to see if they can crash the site. This might just be for fun, but they may also be looking for exception messages to determine how to hack deeper into our system.

The second argument I heard was “we are using globally unique identifiers”. This sounds good and well, but while GUIDs are designed to be statistically “impossible” to end up with the same ID, over time the “impossible” becomes more possible, especially with multiple people creating IDs. But this is not the end of the story. There are people who have created their own GUID algorithms. What if someone create a bad algorithm and shared it? In addition, I have the power to manually create GUIDs like so:

Code Snippet
  1. var id = Guid.Parse("f25aa76a-b9a5-4fb9-ae08-0869312cf573");

Perhaps I see a GUID coming back and decide to use it? What would happen then.

Now, there are a lot of things wrong with storing serialized objects without some type of typing information, especially when the “data” is only useful when deserialized back to an object. It is sloppy and prone to errors. But it is further exacerbated when you allow the client to dictate the key for the object.

Summary

In this blog entry I focused on not trusting user input. Rather than hit the common scenarios, however, I focused on allowing a user to choose an identity for the object in our system. While this might not seem like a common enough scenario to blog about, I have seen numerous systems cobbled together in this manner lately.

There are a couple of reasons why this is a very bad practices. First, the user might duplicate the key information and either overwrite data or cause an error. Second, and more important in an internal application, the user is now tightly coupled to our storage mechanism, making it impossible to change our implementation of storage without consulting everyone who is using the system.

Update

After airing my concerns on allow a user to specify the primary key for the item in our data store, the decision was made to check for a duplicate and send back an exception if it already existed. This may sound like a good compromise, but it puts a lot of extra work on each side (checking and throwing the exception on our end and handling exceptions on their end (possibly even transaction logic?)). We can avoid all of this extra work if we simply do it right the first time.

Peace and Grace,
Greg

Twitter: @gbworld

Applian FLV Player Does Not Respect Opt Out on Third Party Products


So it is a common practice these days to bundle software. Look at the latest Flash Player, which also includes Google Toolbar, unless you do a custom install. Bleh!

But at least you can get around the third party ware in Flash. Here is one that is even more insidious: Applian FLV Player. You start with the following screen.

image

Looks like there is a Custom Install to not install A LOT, right?

image

Cool. So I unchecked the A LOT toolbar and clicked Accept and go to the next screen and look at the lower left hand side which details what I will be installing:

image

Hmm! I said I did not want to install that. Let me go back:

image

Sure enough, the toolbar is checked again. At least it is not going to default me to the other crap like the homepage and error search.

This is a new step in annoying. Fortunately, I have two things going for me here:

  1. I don’t get Flash Videos often
  2. I have an older version of the player installed

But, this is a new step in the game. Prior to this, you might have these installs hidden and only able to get around them by using the custom install, but now you have them installed even when you uncheck them. Bad juju here. Reminds me a bit of Real Player, which respected the auto start (and takeover) on Windows start up UNTIL you used it to play a video.

Hey guys, I get that you have a free product and need to have some way of monetizing it. I don’t have an issue if you force install of another product as long as you tell me. And, while it is a pain custom installing products, I am a big boy and can avoid it. But giving me the appearance of an option and installing the other product after I said no? That is just not right.

If you would like to contact Applian on this:

Be nice as they might not realize they made it so the install was mandatory (reading the terms suggests this).

Peace and Grace,
Greg

Twitter: @gbworld