Back to Basics: The ABCs of Development A = Algorithm


I have been grousing lately about how ill prepared today’s developers are. In particular, I have been ranting about how developers today do not know the basics of development. Rather than simply bitch, I decided to put my foot into the water and start teaching people. This particular article stems from a book idea I have that I will likely be writing soon (really this time).

The ABCs I am going to cover over the next few days are:

A = Algorithms (this article)
B = Boundaries
C = Contracts

From an logical architecture standpoint, C is the most important. From a physical architecture standpoint, B is the most important. But from a developer competency standpoint, it could be argued A is the most important, so here we go.

Algorithm

Merriam Websters defines algorithm as:

Main Entry:

al·go·rithm 
          Listen to the pronunciation of algorithm

Pronunciation:

ˈal-gə-ˌri-thəm

Function:

noun

Etymology:

alteration of Middle English algorisme, from Old French & Medieval Latin; Old French, from Medieval Latin algorismus, from Arabic al-khuwārizmi, from al-Khwārizmī fl a.d. 825 Islamic mathematician

Date:

1926

: a procedure for solving a mathematical problem (as of finding the greatest common divisor) in a finite number of steps that frequently involves repetition of an operation ; broadly : a step-by-step procedure for solving a problem or accomplishing some end especially by a computer

We will use the last part of that definition and see an algorithm as a step-by-step procedure for solving a problem or accomplishing some end. As computers only see 0s and 1s, the rest of the definition technically fits, but it leads to confusion as we deal on a higher level of abstraction in our daily work.

A Sample Algorithm

So, let’s suppose our “problem” is multiplying two numbers together, which we will call factorA and factorB. The signature for this method will end up something like the following:

public int multiply(int factorA, int factorB)
{
}

The guts of this method is the algorithm. I assume we all know that this can be resolved rather easily by returning factorA * factorB, but knowing multiplication is simply continuing to add the first factor until we do it factorB number of times. In other words 5*5 is simply 5+5+5+5+5 or five added five times. Following this logic, we might do something like this:

public int multiply(int factorA, int factorB)
{
    int returnVal = 0;
    int i = 0;

    do
    {
        returnVal += factorA;

        i++;
    } while (i < factorB);

    return returnVal;
}

What is wrong with this algorithm? For the consumer of the class? Nothing. They do not see the internals. We could place this code out into production and it would deliver the correct answer. The code is well encapsulated, which means it is completely contained and “hidden” from the consumer.

ASIDE: Encapsulation is one of the fundamental concepts of Object Oriented Programming per Wikipedia, which defines it as “Encapsulation conceals the functional details of a class from objects that send messages to it.”

There are two problems, from our standpoint.

  1. Performance – There are better algorithms for performance
  2. Clarity – This code does not make our intent clear

We will now drift a bit from algorithms to refactoring.

Refactoring the Algorithm

Refactoring is altering the code base to improve it. In our code, the do loop is not the best performer. But it has an even larger problem. The intent of the code is not clear. A do loop is generally used to indicate the need to continue to perform a task (or tasks) until a certain condition is met. For example, we might have a do loop that checks to see if a file transfer is completed and when it is, it does work (yes, I know there are better ways of doing this in event driven programming, but roll with me until I think of a better reason for the do loop).

If we want to make the intent clearer, we use a for loop, as in the following code snippet:

public int multiply(int factorA, int factorB)
{
    int returnVal = 0;

    for (int i = 0; i < factorB; i++)
    {
        returnVal += factorA;
    }

    return returnVal;
}

We now can see that the intent here is 5*5 = 5+5+5+5, as a for loop is terminated after a set number of loops (in this case, 5, as factor B is 5). This solves the clarity problem, but the first time I coded this, I found it actually ran slower. We will get back to this in a moment. Here is a simple test for looping.

DateTime start = DateTime.Now;

for (int i=0;i<numberIterations;i++)
{
   
MathLib target= new MathLib();
    target.multiply(a,b);
}

DateTime end = DateTime.Now;
long diffTicks = end.Ticks – start.Ticks;

We have solved the clarity problem, but we can still improve the algorithm even more. In fact, the end algorithm is likely the one most of us would have started with in the first place, as shown below:

public int multiply(int factorA, int factorB)
{

    return factorA * factorB;
}

Before moving into the next section, I need to write an aside.

It should be noted that refactoring without tests is not a good idea. With small bits like this, we are probably safe. But most of the algorithms we write are a bit more complex and deal with less concrete problems. In addition, we often have routines calling other routines, creating very complex interactions. I have actually written a simple test surrounding this routine, which looks like this:

[TestClass]
public class when_multiplying_five_times_five
{
    [Test]
    public void should_return_twenty_five()
    {
        int factorA = 5;
        int factorB = 5;
        int expected = 25;

        MathLib target= new MathLib();
        int actual = MathLib.Multiply(factorA, factorB);

        Assert.AreEqual(expected, actual, "Does not return 25");
    }
}

The test here uses a bastardized form of BDD (Behavior Driven Development), so don’t get caught up in the naming. I can cover TDD versus BDD (and my own bastardized attempt using a unit test framework for BDD, but that will be later). The takeaway with this test is I can verify that each of the algorithms works correctly.

Just keep this in mind: refactoring without tests is a dangerous endeavor. And, since so few have tests surrounding their code, perhaps that is why so few people actually spend time refactoring? I am sure there are other dynamics, like management’s thinking that refactoring is paying twice for code, etc. , but that will have to wait for another day and another blog entry.

Choosing an Algorithm

When I look at Microsoft forums, I generally see people asking questions like this:

Which works faster, a FileStream or BinaryStream?

ASIDE: The answer, is “it depends”. In general, if you can use binary, it will be faster. But if you are converting to text, you may find binary streaming plus conversion is slower than just using a file stream.

The entire conversation surrounds performance. But there are other considerations when choosing an algorithm. In our multiplication example, we refactored primarily for clarity, with the added benefit of greater performance. But, there are instances where you actually lose some performance for clarity.

Many years ago, I was tasked with determining the best path to go with a company’s web development efforts. The idea was to focus on the web first and then possibly apply the standardize all types of development to the same standard. I looked at a variety of factors, including the following:

  • Ease of coding
  • Ease of maintenance
  • Resources in the market for particular skill sets

There were other factors, as well, but you get the idea. In the end, I determined Visual Basic, using COM, was our best bet, with ASP as the UI technology. This sparked a debate with another programmer, who favored C++. His primary argument was C++ runs faster than VB. I could not argue against the performance characteristics, as C++ is noticeably faster than VB COM. I did, however, argue that VB, in our testing, was well below the bar in terms of maximum time to complete tasks. His insistence that performance was king led the company to spend around $50,000 to have a consulting firm reach the same conclusion.

From a personal standpoint, I think you should try to learn what performs the best. Finding ways to get better performance makes you a better developer. You should also learn many different algorithms for the same problem and when to use each one. This brings us to the second point on algorithms:

From a practical standpoint, however, maintenance costs business far more money than performance. Why? If the code base is not maintainable, you have to hire all rock stars, which come with a big price tag.  As rock stars are not fond of maintenance, you also have a bit of a revolving door problem, which cost the company even more. If the code is easy to maintain, you can hire rock stars for the hard parts and hire mid-level developers for the other work.

There are exceptions to this rule, of course. If you are working for Google, high performing algorithms will win as long as they are scalable. But most of us do not work for Google, or a company that does anything near their level of scale.

If you can create a highly maintainable system with high performance algorithms (very possible when using framework classes to perform work), then you have a win-win. But, when you must make a decision, maintainability should win the day more often than performance. This is not a hard, fast rule, of course, but I have found it is true in most places where I have worked or consulted.

Summary

In this blog entry, I covered algorithms, the A in our development ABCs. In the process, I touched on a couple of other topics:

  • Refactoring – changing algorithms
  • Testing (including basic TDD and BDD-style tests) – verifying algorithms
  • Performance versus Maintainability – choosing algorithms

In the future, I will hit Boundaries and Contracts. I am not sure whether I will choose B or C first, as Contracts are extremely important in my world. Here are some future ideas for the Back to Basics blog entries:

  • B = Boundaries
  • C = Contracts
  • Throw in a little BS – Behavior and State
  • Aspects of Development – Performance, Maintainability, Availability, Extensibility, Scalability and Security
    NOTE: I covered this topic before with classic ASP on asptoday.com (now defunct site) in an article entitled Beyond Mere Performance
  • Page_Load is for Loading Pages – Problems in ASP.NET development

if you search my blog, you will see I have already covered string concatenation versus a StringBuilder (better performing and more maintainable).

Peace and Grace,
Greg

Twitter: @gbworld

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: