Code Bloat


I was talking to Rick Taylor today about his joining Vertigo Software. He stated about his feeling intimidated, largely because this is where Jeff Atwood (www.codinghorror.com) works. I stated he was also a bit intimidated when he started with Magenic.

After talking to Rick, I was reminded that I had not gone to Coding Horror recently. On the site, I found this gem from December 23, 2007, where Jeff talks about Steve Yegge’s blog entry called Code’s Worst Enemy. Code’s worst enemy? Code bloat of course.

What Steve is talking about is a game called Wyvern, which he wrote in Java. It currently has more than 500,000 lines of code. Now, I agree with Steve’s post. As I read it, however, I found there were a few things I was not completely in agreement with. 🙂

The problem with Refactoring as applied to languages like Java, and this is really quite central to my thesis today, is that Refactoring makes the code base larger.

The most common code smell I refactor for is eliminating code duplication. This is true for friends of mine, as well.

Does this reduce code lines? You betcha! Suppose we have 100 code duplications, each with 10 lines of code. If we move this to another routine, we add a routine and place the code there. Assuming you include each brace as a line of code, we have produced 3 new lines of code. But, if all 100 dupes are the same dupe, you have eliminated 100 * 10 lines of code. The final size difference can be represented as such.

reduction = ((num dupes * num lines) – routine framework) – num lines

More realistically, you start with 100 dupes. The average number of times a single dupe appears is 4 and the number of lines is 20, making 25 sets of dupes.

reduction = (((num times * num lines) – routine framework) – num lines) * num dupes

This leads us to:
reduction = (((4 * 20) – 3) – 20) * 25

That is 1425 lines of code now eliminated through this refactoring.

You also end up with if … else then … else branches reduced in number of lines when you go to switch constructs. In addition, you can often refactor out other routines that are duped throughout the branching logic (whether switch or if … else type branches).

Or, there are cases where messages are created through massive numbers of lines of concatenation that can be reduced down rather effectively by using tools like Regex.

It is possible that Steve’s code is already tight in these areas. If so, refactoring may well increase the size of the code base, in raw lines of code. But, if one refactors, intent increases, which makes the small increase in lines of code easier to maintain.

Another means of refactoring would be moving your code to libraries. This makes it more maintainable, as there are always areas of your code base that do not need much in the way of maintenance (helper routines come to mind). In addition, there have been many Java Frameworks released since Steve began coding. Moving to a Framework does increase the number of lines of code, overall, but it is a hidden increase, as it decreases the number of lines of code he will actually have to maintain.

I’d estimate that fewer than 5% of the standard refactorings supported by IDEs today make the code smaller.

If you say, there are 100 refactorings one can apply, it is probably correct that only 5 routinely reduce the number of lines of code. But, since the most common refactoring is extracting methods (etc.) to remove duplicate code, and the 5 that reduce lines of code are amongst the most commonly used refactorings, I would say you have an 80-20 rule here. This may not be true for everyone, but it sure works for most of us.

In addition, even when you increase the number of lines, if you have them organized better, the code base is easier to maintain. Rephrased into an axiom: You can buy more pairs of socks and still find them easier if you keep them paired up and in a drawer rather than scattered about your house.

Refactoring is like cleaning your closet without being allowed to throw anything away.

Only in the most narrow-minded way of looking at this can you view it this way. You cannot throw away or alter signatures (change behavior), but you are allowed to completely change the way you get the behavior. Stating that it is like "cleaning your closet without being allowed to throw anything away" is like saying "well, sure its a Porsche instead of a Yugo in your garage, but it’s still a car". Actually it is not like that, but all analogies fail. Just because the black box (my house) has to stay the same does not mean I cannot throw out the garbage cluttering up the rooms.

If you get a bigger closet, and put everything into nice labeled boxes, then your closet will unquestionably be more organized. But programmers tend to overlook the fact that spring cleaning works best when you’re willing to throw away stuff you don’t need.

Okay, so the real problem here is developers not understanding how to refactor. That, I completely agree with. I also track with the rest of the post.

But then, many programmers (if not most) have no clue how to refactor as they code (find a dupe, immediately refactor). Then again, most do not understand Test Driven Development either (which is mandatory for painless refactoring). In fact, one very smart guy I know (no names) ranted about a TDD person telling him they had a tester for each developer and wrote test stubs before writing a line of code, to which he stated "those guys must suck". I would contend that "those guys" are scientists who can actually fix their code when the fecal matter hits the rotary oscillator.

Why does code bloat? Here are some of my thoughts:

  1. Developers assume that they will have time to do it over as they do not have time to do it right the first time.
  2. Developers assume they will remember to return back to refactor later, after they have the code working.
  3. Developers do not add tests, so they find refactoring too risky after getting burned once by a bad refactoring.
  4. Developers do not think they have time to refactor.
  5. There is too little time spent thinking through a problem and plenty of time coding the problem they have not thought through. This is probably the worst of the entire mix.
  6. Other?

Oh, well, I guess I am just rambling now, so I will go on.

Peace and Grace,
Greg

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: