Why we use StringBuilder instead of concatenation


This is another of the endless series of "I saw it in the code at my place of work" posts. Here is a snippet:
 

// Sequence Number
uint SequenceNumber = GlobalSequenceGenerator
.GetNextSequenceNumber();
strRecord = strRecord + SequenceNumber.ToString().PadLeft(7,
‘0’
);
// TransactionCode
String TransactionCode = ""
;
strRecord = strRecord + TransactionCode.PadLeft(4,
‘*’
);
// Format Indicator – 5 chars
String FormatIndicator = "EXT**";

strRecord = strRecord + FormatIndicator.PadRight(5, ‘*’);

This goes on for at least 13 different string concatenations (the top end is over 39) and runs for each record in the database. Doing a quick count, there are over 10,000,000 concatenations in the process, over 20,000 records. the process takes about 5 minutes to run.
 
I have done two major changes to the application. The first is to make the model a bit more OO in nature. The idea was to create the file in memory as objects and then dump line by line, but I ended up with an error that caused the program to dump. So, the current implementation creates the innner most container and then dumps its lines, etc. This is not the important change.
 
The second change is to move to StringBuilders to create lines. I am using a standard StringBuilder, as a single line will never be more than 256 ASCII characters. I could probably optimize the StringBuilder a bit more, but the implementation goes more like:
 

builder.Append(SequenceNumber.PadLeft(7, ZERO_CHAR));
builder.Append(((

int)TransactionCode).ToString().PadLeft(4, ZERO_CHAR));
builder.Append(FormatIndicator.ToString().PadRight(5,‘*’));

When running the same file, the new process is quite a bit faster, as there are no concatenations. In fact, it runs in about 7 seconds.
 
I found this out a few years ago when I worked on an "offline" ETL process. There were a few areas that had string concatenations. When the process originally ran, it took 4 days to complete. With StringBuilders, we got it down to 1 day. Keeping the data as binary as long as possible reduced it even further (if I remember correctly, it was about 4 hours). BinaryReaders and writers are a topic for another day.
 
Why does this happen?
This one is easy. Strings are immutable in .NET. This means you create a new string for every concatenation. This leads people to try things like this to speed things up.
 

string a = "a";
string b = "b";
string c = "c";
string d = "d";
string e = a + b + c + d;

The larger your string gets, the more impact the concatenation has on performance. There is some overhead for a StringBuilder, as well, as you have to expand the string builder in increments (2kb if I remember – NOTE: will check out and edit this later), but the expansion is simply reserving memory rather than creating a new object in memory. Much, much faster. If you know you are going to have a large string, however, you can set up the StringBuilder with a much larger initial size.
 
Peace and Grace,
Greg
 
 
In case you have not figured it out, this will not work, as you are still concatenating.
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: