Real-world optimization

Last week at work, I was asked to look at one of our verification modules that was taking about three times longer to run than it had in an earlier version.  This module takes a set of result files, compares them against another file showing expected results, and reports any discrepancies that are outside the defined margin of error.  It’s some pretty heavy work involving hundreds of thousands of data points, and the old version already took more than ten minutes.  Increasing the running time by a factor of three just wasn’t acceptable.  So I started to look at what was going on.

Verification takes place in four steps:

  1. Load the data from the files
  2. Process the data
  3. Process the data some more
  4. Retrieve the results

Steps 3 and 4 only take a few seconds each.  Step 2 takes a couple minutes, but the bulk of the time is spent in step 1.  So I decided to focus on there to see if I could find what was making it take so long.  First thing to do is establish a baseline. I built the old version and turned on SQL Server Profiler for the database and Sampling Profiler, an excellent tool written in Delphi that helps you profile Delphi apps without slowing them down the way AQTime does.  I ran the entire verification process and found that yes, not only was step 1 taking most of the time, over 90% of the time was spent on one single line that matches the data from the files against the data in the database.

The data-loading system looks something like this.  Names and a few details have been changed to protect the innocent the corporate intellectual property, of course, but this is the general idea of what was going on.  See how many problems you can spot in this code.  (Bear in mind, this was the original, faster version.)

[code lang="delphi"]
procedure TVerificationDataModule.LoadFile(const filename: string;
                                           otherParams: TOtherData);
   lines: TStringList;
   fileData: TObjectList
   dbData: TOrmObjectList;
   i: integer;
   lines := TStringList.Create;
   fileData := TObjectList.Create;
      for I := 0 to lines.Count - 1 do

   dbData := GetRelevantDBData(otherParams);
      for i := 0 to fileData.Count - 1 do
         MatchFileDataAgainstDB(fileData[i] as TFileData, dbData);

procedure TVerificationDataModule.MatchFileDataAgainstDB(fileData: TFileData;
                                                         dbData: TOrmObjectList);
   i: integer;
   dbItem: TOrmVerificationObject;
   updateProcedure: IStoredProcedureRecord;
   for i := 0 to dbData.Count - 1 do
      dbItem := dbData[i] as TOrmVerificationObject;

      //90% of time is spent on this next line:
      if fileData.param1 = dbItem.param1 and
         fileData.param2 = dbItem.param2 and
         fileData.param3 = dbItem.param3 and
         fileData.param4 = dbItem.param4 then
         updateProcedure := CreateStoredProc('VERIFICATION_DATA_LOADER');
         updateProcedure.param1 := fileData.param3;
         updateProcedure.param2 := fileData.param4;
         updateProcedure.param3 := fileData.param5;
         updateProcedure.param4 := fileData.param6;
         updateProcedure.param5 := fileData.param7;
         if updateProcedure.ResultCode <> GOOD_RESULT then
            raise Exception.Create('Something went wrong');


Then I profiled the newer version and got very similar results, except that it was spending even more time in the if statement to match the objects against each other.  Close to 99% now.  So what had changed?  I looked back through version control and found that the SQL that generates the result set that goes into dbData had been changed between versions.  A new table was added to simplify the big mess of joins, but they forgot one of the on criteria, so it was returning three times as many results as it should have.  There’s your factor of three right there.  Easy enough to fix.  But that still doesn’t address the quality of the original code.  A couple things jumped right out at me, and I wondered if I could bring the time down below the original mark.

The first thing came out of the SQL profiler.  I kept seeing a call to sp_procedure_params_rowset, an undocumented procedure in SQL Server that the connection object uses internally to get information about the expected parameters for a stored procedure, immediately followed by a call to the VERIFICATION_DATA_LOADER proc.  This seemed a bit silly to me.  The signature of the stored procedure isn’t going to change!  Turns out that was called internally by the CreateStoredProc function, which was being called every time it went to save some data to the database, in order to create the proper object.

So I moved the call to CreateStoredProc out to the main procedure and set it up as an extra parameter to pass into MatchFileDataAgainstDB.  It would reuse the same basic stored procedure object and reassign its parameters for each call, so you get the same net effect, but with 50% less database hits.  Unfortunately, this didn’t yield a 50% increase in performance.  SQL Server can cache the results of redundant queries, so this call wasn’t taking much time at all to process repeatedly, but the transport layer overhead was still a factor, and removing this redundant call sped the overall process up by about 20%.

But the big one was in the matching, where the profiler said the system was spending the majority of its time.  It doesn’t exactly look like a speed bottleneck, because it’s stored inside a method call, but what it is is a linear search inside of a loop, with both lists containing a few thousand elements each.  But how do you make something like this run faster?  I could try sorting the second list and using a binary search, but have you ever written a binary search?  It’s a bunch of extra code, and it’s often confusing and hard to read.  I couldn’t use a TDictionary to index the second list, because I need to match against 4 items, not just 1.  So instead I used a very simple trick that’s been around for decades but I don’t tend to see very often these days: list comparison.

The general algorithm goes like this:

  1. Sort both lists by the same criteria.  This must also be the same as the matching criteria.
  2. Start at the top of both lists.  Pick the first item from each and compare them.
  3. If they match, handle the case and advance the index for both lists.
  4. If they don’t match, loop through, advancing the index for the list with the “lesser” value each time, until a match is found.
  5. When you reach the end of either list, you’re done.  (Unless you want to handle any leftovers from the other list.)

This is a very simple and very useful algorithm for reconciling two sets of data, and I’ve managed to find all sorts of uses for it.  Unlike a double-nested loop, which basically runs in quadratic time,  this is guaranteed to run in linear time and never walk either list more than once.  I managed to adapt this algorithm to the existing code, and suddenly processing the input files, which had previously taken at least a minute each, takes between 2 and 6 seconds per file.  Now loading the data takes about the same amount of time as performing the calculations, instead of an order of magnitude longer.

Lessons learned:

  • Profilers, especially non-invasive ones, are invaluable for finding what’s going on in your app.  I’d have probably noticed that double-nested loop soon enough, but I would never have found the stored procedure issue without SQL Server Profiler to point it out.
  • Pulling things out of loops—especially other loops!—is a great way to increase performance.
  • Reducing algorithmic time complexity is by far the best optimization for large data sets.
  • Linear, single-threaded techniques are still relevant.  A lot of people are talking these days about parallel programming and how the meaning of optimization has changed in today’s world.  They’re right, to a certain extent, but as hard as I try I can’t think of any way to parallelize this check that would make it faster than a simple list comparison.  The only thing I know of with the potential to be faster than this is a hash table lookup, which could be parallelized, but it won’t work particularly well when you need to look up your values based on more than one index value.

One Comment

  1. […] the results can be informative.  One thing I didn’t mention in my post a few weeks ago about optimizing a routine at work was the impact of as-casts. I didn’t mention it because it became irrelevant once I removed […]