Why we use StringBuilder instead of concatenation


This is another of the endless series of "I saw it in the code at my place of work" posts. Here is a snippet:
 

// Sequence Number
uint SequenceNumber = GlobalSequenceGenerator
.GetNextSequenceNumber();
strRecord = strRecord + SequenceNumber.ToString().PadLeft(7,
‘0’
);
// TransactionCode
String TransactionCode = ""
;
strRecord = strRecord + TransactionCode.PadLeft(4,
‘*’
);
// Format Indicator – 5 chars
String FormatIndicator = "EXT**";

strRecord = strRecord + FormatIndicator.PadRight(5, ‘*’);

This goes on for at least 13 different string concatenations (the top end is over 39) and runs for each record in the database. Doing a quick count, there are over 10,000,000 concatenations in the process, over 20,000 records. the process takes about 5 minutes to run.
 
I have done two major changes to the application. The first is to make the model a bit more OO in nature. The idea was to create the file in memory as objects and then dump line by line, but I ended up with an error that caused the program to dump. So, the current implementation creates the innner most container and then dumps its lines, etc. This is not the important change.
 
The second change is to move to StringBuilders to create lines. I am using a standard StringBuilder, as a single line will never be more than 256 ASCII characters. I could probably optimize the StringBuilder a bit more, but the implementation goes more like:
 

builder.Append(SequenceNumber.PadLeft(7, ZERO_CHAR));
builder.Append(((

int)TransactionCode).ToString().PadLeft(4, ZERO_CHAR));
builder.Append(FormatIndicator.ToString().PadRight(5,‘*’));

When running the same file, the new process is quite a bit faster, as there are no concatenations. In fact, it runs in about 7 seconds.
 
I found this out a few years ago when I worked on an "offline" ETL process. There were a few areas that had string concatenations. When the process originally ran, it took 4 days to complete. With StringBuilders, we got it down to 1 day. Keeping the data as binary as long as possible reduced it even further (if I remember correctly, it was about 4 hours). BinaryReaders and writers are a topic for another day.
 
Why does this happen?
This one is easy. Strings are immutable in .NET. This means you create a new string for every concatenation. This leads people to try things like this to speed things up.
 

string a = "a";
string b = "b";
string c = "c";
string d = "d";
string e = a + b + c + d;

The larger your string gets, the more impact the concatenation has on performance. There is some overhead for a StringBuilder, as well, as you have to expand the string builder in increments (2kb if I remember – NOTE: will check out and edit this later), but the expansion is simply reserving memory rather than creating a new object in memory. Much, much faster. If you know you are going to have a large string, however, you can set up the StringBuilder with a much larger initial size.
 
Peace and Grace,
Greg
 
 
In case you have not figured it out, this will not work, as you are still concatenating.
Advertisements

Funny Vimeo Video


Sometimes you find something you think is funny, but your wife cannot understand. The following video is a prime example. After watching, I was laughing and she just said "I would fire the guy". See what you think.
 
 
Peace and Grace,
Greg

Obama’s cut for charitable contributions


I was listening to the radio this morning and caught part of Obama’s latest press conference in which he states the desire to cut the maximum deduction for charitable giving from 39% to 28%. I do not wish to discuss whether I think this is a good idea, but there are a few statements he made that are misleading.
 

And what we’ve said is: Let’s go back to the rate that existed under Ronald Reagan. People are still going to be able to make charitable contributions. It just means, if you give $100 and you’re in this tax bracket, at a certain point, instead of being able to write off 36 percent or 39 percent, you’re writing off 28 percent.

Obama is suggesting the wealthiest continue to get taxed at 39% but can only write off 28% of their charitable contributions, meaning they will get taxed on their contributions to charity, although at a lower rate than the money they keep. What is misleading here is Obama suggesting he is just taking things back to the way they were under the Reagan tax plan. This is not true. Under Reagan, you could only write off 28%, as the maximum tax rate was 28%. Under Obama, the tax rate is 39% and you can only write off 28%. In real dollars:

Item            Reagan         Bush
Earnings      10,000,000    10,000,000    10,000,000
Tax               2,800,000      3,900,000     3,900,000
Charity          1,000,000      1,000,000     1,000,000
Write Off          280,000        390,000         280,000
Tax TTL        2,520,000      3,510,000      3,620,000

In that sense, what it would do is it would equalize. When I give $100, I’d get the same amount of deduction as when some, a bus driver who’s making $50,000 a year, or $40,000 a year, gives that same $100. Right now, he gets 28 percent, he gets to write off 28 percent. I get to write off 39 percent. I don’t think that’s fair.

The problem here is the math of equalities does not work out. Let me illustrate:

Joe Busdriver – paid $28 on the $100 he earned. Joe gave $100 to charity, so Joe got back $28. Net result: charity +$100, Joe $0
Rich Guy – paid $39 on the $100 he earned. Gave $100 to charity, so he got back $28. Net result: charity +$100, Rich Guy -$11

So, for every $100 given to charity, the rich guy pays an extra $11 in taxes. For every $1000 given to charity, the rich guy pays an extra $110 in taxes. For every $10,000 in taxes, the rich guy pays another $1100 in taxes. And so on. At the $1,000,000 level, it is another $110,000 in taxes.

Q: It’s not the well-to-do people. It’s the charities. Given what you’ve just said, are you confident the charities are wrong when they contend that this would discourage giving?

OBAMA: Yes, I am. I mean, if you look at the evidence, there’s very little evidence that this has a significant impact on charitable giving.

There is ample evidence that people who understand money take money into account when they make decisions. This is true whether it is poor old Joe bus driver or rich guy. If you have $20 to spend on dinner, you are not going to go to Ruth’s Chris. True, there are plenty of people who would just charge it and pay for it later (pun intended), but smart people budget their money. And, there are very few people that get rich by being money stupid. What this means is, if you have $10,000,000 to give to charity under the old tax rate, you only have $8,900,000 to give to charity under the Obama suggestion. This means charity suffers to the tune of 11%. Perhaps 11% is not considered significant? Otherwise the statement is pure male bovine fecal matter.

Unfortunately, until we really get some "leaders" in office who will really get rid of the pork (unlikely in our lifetimes), things have to get paid for with taxes. And, as long as we continue to work under the myth that it is the government’s job to baby sit us rather than govern, we will have a lot of extra pork to contend with. Someone has to pay for it. I get that. But don’t tell me it is fair to have someone pay $39 into the system and then hand him back $28 when he gives the money to the less fortunate. And don’t tell me it won’t hurt charity. And don’t tell me we are just going back to the way it was under Reagan. I am not that stupid.

Peace and Grace,
Greg

More on Linear thinking


Yesterday, I posted on linear thinking and how it can get you into trouble when your objects are containers. If you did not read it, you can find it here. Today’s topic expands on that idea. It is yet another post in the continuining series of "back to basics" posts and another one spurred by code I have seen.
 
I am currently working on ACH records for credit cards. In order to translate from the type my client uses to the type their vendor uses, I have to do some translation of fields. My best map currently is the map made in the code the previous developer worked on. So far, so good.
 
One thing that is very difficult, however, is the developer created fields as he saw fit. In other words, I need to know a transaction type, so I do something like this:
 

String strTransactionType = "";
if (MerchantTransactionsReader["TXTYPE"] != DBNull
.Value)
    strTransactionType = MerchantTransactionsReader[
"TXTYPE"].ToString();

This is still not a problem. Where it gets difficult is when I am on line 256 of the file and I find something like this:
 

// Authorization Response Code – 2 char
String AuthorizationResponseCode = "";
if (strTransactionType == "209") //Refund
   
AuthorizationResponseCode = "00";
else //any type of Sale
   
AuthorizationResponseCode = "88";

I know I can right click and choose go to definition, but there is an easier way to make code maintainable. And, it is something you have been taught already, if you have taken any courses on programming. Place you definitions at the top of the routine. If you do this, the maintenance developer (or the fireman fixing the burning house) can use a split screen and keep the definitions visible, as they are all in once place.
 
NOTE (added 3/26/2009): Per the comment left by no name – In general, most routines are short. There are a few exceptions that are valid. In this particular file, it was more split screening to see the private variables of the class to more quickly organize the load routine(s). The particular code is ultimately throw away, so the load routines are a bit longer than I would normally work with, as well. Thanks to no name, as well, for suggesting Clean Code. Will have to look at it.
 
In linear thinking, you declare variables throughout your routines, classes, etc. When you break from linear thinking, you think up the contract first and then put all of the like types in one location. In a routine, this means you declare your variables in one place, which is, in every book I have read, at the top of the routine.
 
Peace and Grace,
Greg
 

Breaking from linear thinking in development


I am working on an application that creates financial fiels. The basic format is like this:
 
Transmission Header
  Batch Header
    Merchant Header … Merchant Header N
       Financial Record … Financial Record N
  Batch Trailer
Transmission Trailer
 
Examining the file, the original programmer saw this:
 
Transmission Header Line
Batch Header Line
Merchant Header Line
   Financial Line
   Extension Line
   Repeat
Repeat
Batch Trailer Line
Transmission Line
 
This is a very linear way of thinking. Start from the top of the file and continue until you reach the last line. The problem with thinking like this is it completely ignores the containers. If there are no totals, this is not a big deal, but the reason for headers and trailers in the file is the trailers generally have some sort of summary information.
 
In this case, there is summary information, with comments like:
 
TODO: Calculate total
strRecord = strRecord + strDebitTotal.Replace(".", ‘").Replace("-","").PadLeft(12, ‘0’);
 
Unfortunately, you can’t get there from here, as the total is dependent on the lines that have come before. It can be kludged, perhaps, by running a large number of totals as you add records and using these global variables to output the lines. But, it is much easier if you think of the containers and stop thinking only of the output. The container is Transmission. The Transmission contains a header and a trailer. When done in this manner, you can actually test whether the output is correct.
 
Peace and Grace,
Greg

Fizzbin, Gotta love Scott Hanselman


If you do not follow Scott Hanselman’s site, you should. He had a gem this week about phone technical support. His idea is to have a secret code word to bypass the hour of intro calls and escalate you to someone who can actually solve your problem. Classic.
 
 
Peace and Grace,
Greg

ASP.NET MVC Released


I figured this would be one of the announcements at MIX based on what we heard at the MVP (Microsoft Most Valuable Professional) Summit. They did not give a date, but MIX makes the most sense. You can download the final version of ASP.NET MVC at http://tinyurl.com/cs3l3n.
 
 
Other new releases
 

I would personally like to see another build of Visual Studio 2010. Eye-rolling

Peace and Grace,
Greg