<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TURBU Tech &#187; Optimization</title>
	<atom:link href="http://tech.turbu-rpg.com/tag/optimization/feed" rel="self" type="application/rss+xml" />
	<link>http://tech.turbu-rpg.com</link>
	<description>My thoughts on Delphi programming in general, and particularly on the technical aspects of developing the TURBU engine and editor.</description>
	<lastBuildDate>Fri, 27 Jan 2012 19:53:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>How default settings can slow down FastMM</title>
		<link>http://tech.turbu-rpg.com/345/how-default-settings-can-slow-down-fastmm</link>
		<comments>http://tech.turbu-rpg.com/345/how-default-settings-can-slow-down-fastmm#comments</comments>
		<pubDate>Sat, 21 May 2011 04:48:22 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Memory Management]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=345</guid>
		<description><![CDATA[One of the biggest challenges in working on the TURBU engine has been minimizing load times.  Some large projects have a whole lot of data to work with, which could take the better part of a minute to load if I tried to load it all up front.  No one wants to sit and wait [...]]]></description>
			<content:encoded><![CDATA[<p>One of the biggest challenges in working on the TURBU engine has been minimizing load times.  Some large projects have a whole lot of data to work with, which could take the better part of a minute to load if I tried to load it all up front.  No one wants to sit and wait for that, so I&#8217;ve pared down the loading so that only the stuff that&#8217;s needed right away gets loaded from the project database right at startup.</p>
<p>And yet, on one of my larger test projects, that wasn&#8217;t enough.  One of the things that has to be loaded upfront was map tile data, so that the maps can draw.  Unfortunately, this project has over 200 different tilesets, and it was taking quite a while to load that much data.  I&#8217;ve got a RTTI-based deserializer that can turn dataset records into objects, but it was taking a completely unreasonable 3.3 seconds to read the tile data.</p>
<p><span id="more-345"></span>Profiling said that most of the delay&#8211;close to 60%&#8211;was coming from FastMM&#8217;s FastFreeMem calling something in ntdll.dll.  It didn&#8217;t say what, and I didn&#8217;t figure I needed to poke around inside the memory manager.  I&#8217;d be better off making sure there weren&#8217;t so many calls into FastFreeMem, right?</p>
<p>So I poked around in the deserializer code and found several places in inner loops where strings were being created and disposed of quite unnecessarily.  I fixed the code so that that wouldn&#8217;t happen, optimizing out all the unnecessary FreeMem calls.  That should have fixed things up, I figured.  My reward was a measly 0.4 seconds, down from 3.3 to 2.9, with the bulk of the time <em>still</em> taking place in ntdll.</p>
<p>So I poked around in the FastFreeMem code a little, and was surprised to run across this:</p>
<div class="codesnip-container" >
<div class="delphi codesnip" style="font-family:monospace;"><span class="sy2">@</span>LockBlockTypeLoop<span class="sy1">:</span><br />
mov eax<span class="sy1">,</span> <span class="re0">$100</span><br />
<span class="coMULTI">{Attempt to grab the block type}</span><br />
lock cmpxchg TSmallBlockType<span class="br0">&#40;</span><span class="br0">&#91;</span>ebx<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy1">.</span><span class="me1">BlockTypeLocked</span><span class="sy1">,</span> ah<br />
je <span class="sy2">@</span>GotLockOnSmallBlockType<br />
<span class="co2">{$ifndef NeverSleepOnThreadContention}</span><br />
<span class="coMULTI">{Couldn&#8217;t grab the block type &#8211; sleep and try again}</span><br />
push ecx<br />
push edx<br />
push InitialSleepTime<br />
call <span class="kw3">Sleep</span><br />
pop edx<br />
pop ecx<br />
<span class="coMULTI">{Try again}</span><br />
mov eax<span class="sy1">,</span> <span class="re0">$100</span><br />
<span class="coMULTI">{Attempt to grab the block type}</span><br />
lock cmpxchg TSmallBlockType<span class="br0">&#40;</span><span class="br0">&#91;</span>ebx<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="sy1">.</span><span class="me1">BlockTypeLocked</span><span class="sy1">,</span> ah<br />
je <span class="sy2">@</span>GotLockOnSmallBlockType<br />
<span class="coMULTI">{Couldn&#8217;t grab the block type &#8211; sleep and try again}</span><br />
push ecx<br />
push edx<br />
push AdditionalSleepTime<br />
call <span class="kw3">Sleep</span><br />
pop edx<br />
pop ecx<br />
<span class="coMULTI">{Try again}</span><br />
jmp <span class="sy2">@</span>LockBlockTypeLoop<br />
<span class="coMULTI">{Align branch target}</span><br />
nop<br />
nop<br />
<span class="co2">{$else}</span><br />
<span class="coMULTI">{Pause instruction (improves performance on P4)}</span><br />
rep nop<br />
<span class="coMULTI">{Try again}</span><br />
jmp <span class="sy2">@</span>LockBlockTypeLoop<br />
<span class="coMULTI">{Align branch target}</span><br />
nop<br />
<span class="co2">{$endif}</span></div>
</div>
<p>So when it tries to lock the memory block to free some memory, unless a special &#8220;NeverSleepOnThreadContention&#8221; compiler flag is set, it&#8217;ll call the Winapi Sleep function, giving up the entire timeslice (several milliseconds) because it&#8217;s blocked by an operation that will take a few dozen lines of ASM to complete.</p>
<p>I looked for this option in FastMM4Options.inc, and found the following explanation:</p>
<blockquote><p><span style="color: #008000;"><em>{Enable this option to never put a thread to sleep if a thread contention occurs. This option will improve performance if the ratio of the number of active threads to the number of CPU cores is low (typically &lt; 2). With this option set a thread will enter a &#8220;busy waiting&#8221; loop instead of relinquishing</em><em> its timeslice when a thread contention occurs.}</em></span></p></blockquote>
<p>So sleeping the thread instead of spinlocking can be helpful when there are a high number of threads running.  But there&#8217;s no code to detect this.  It&#8217;s never call Sleep or always call Sleep, with the decision hardcoded in at compile time.  I wonder if it would be possible to always spinlock for a certain number of cycles first and see if that helps, before calling Sleep?</p>
<p>Anyway, it turns out that that was my problem.  I was doing some other data-intensive loading in a background thread, and the memory allocations were clashing with each other.  When I set the NeverSleepOnThreadContention flag and rebuilt, the load time for tile data dropped to a far more acceptable 1.1 seconds.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/345/how-default-settings-can-slow-down-fastmm/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>TStringList updating pitfalls</title>
		<link>http://tech.turbu-rpg.com/231/tstringlist-updating-pitfalls</link>
		<comments>http://tech.turbu-rpg.com/231/tstringlist-updating-pitfalls#comments</comments>
		<pubDate>Tue, 19 Oct 2010 05:55:47 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Dark Corners]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=231</guid>
		<description><![CDATA[What&#8217;s wrong with this code? procedure TMyCustomChecklistPopupControl.ClosePopup; var &#160; i: integer; begin &#160; inherited ClosePopup; &#160; FInternalItemStringList.Clear; &#160; for i := 0 to Self.CheckedCount - 1 do &#160; &#160; FInternalItemStringList.Add&#40;Self.CheckedItems&#91;i&#93;.Name&#41;; end; At first glance, it looks just fine. It&#8217;s semantically correct&#8211;it will do what you want it to. If you happen to have seen a [...]]]></description>
			<content:encoded><![CDATA[<p>What&#8217;s wrong with this code?</p>
<pre>
<div class="codesnip-container" >
<div class="delphi codesnip" style="font-family:monospace;"><span class="kw1">procedure</span> TMyCustomChecklistPopupControl<span class="sy1">.</span><span class="me1">ClosePopup</span><span class="sy1">;</span>
<span class="kw1">var</span>
&nbsp; i<span class="sy1">:</span> <span class="kw4">integer</span><span class="sy1">;</span>
<span class="kw1">begin</span>
&nbsp; <span class="kw1">inherited</span> ClosePopup<span class="sy1">;</span>
&nbsp; FInternalItemStringList<span class="sy1">.</span><span class="me1">Clear</span><span class="sy1">;</span>
&nbsp; <span class="kw1">for</span> i <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> <span class="kw2">Self</span><span class="sy1">.</span><span class="me1">CheckedCount</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp; FInternalItemStringList<span class="sy1">.</span><span class="me1">Add</span><span class="br0">&#40;</span><span class="kw2">Self</span><span class="sy1">.</span><span class="me1">CheckedItems</span><span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="sy1">.</span><span class="me1">Name</span><span class="br0">&#41;</span><span class="sy1">;</span>
<span class="kw1">end</span><span class="sy1">;</span></div>
</div>
</pre>
<p><span id="more-231"></span>At first glance, it looks just fine.  It&#8217;s semantically correct&#8211;it will do what you want it to.  If you happen to have seen a certain issue before, something might jump out at you, but if not, you probably think this is OK.  And most of the time, it is.</p>
<p>This is a simplified version of something I ran into at work today, in one of our custom controls.  I ran into it in the debugger, but not because it was raising exceptions or corrupting data.  No, the problem was that when I hit the Check All button, selecting all 200 or so items, and then closed the popup, it took left the UI unresponsive for a good 15 seconds or so.</p>
<p>Turns out the problem isn&#8217;t in what this code was written to do, but in what else it does.  You see, there&#8217;s an OnUpdate event handler attached to the internal <span style="text-decoration: line-through;">TSwissArmyKnife</span> TStringList which goes over the data in the list, calculates a few things, and updates some UI elements.  And yeah, you want that to happen when you make a change.  But you want it to happen once per change, from the user&#8217;s perspective.  This was happening once per change from the TStringList&#8217;s perspective, or in other words, 200+ times in total for a single user action.  And it took forever to finish.</p>
<p>You can be a really good programmer and still not know all the ins and outs of the framework you&#8217;re working with.  I&#8217;m always discovering new little details about how things work.  Turns out I&#8217;ve seen this one before, so when I hit Pause a few seconds in and dropped to the debugger, and saw the following right in the middle of the call stack, I knew what was going on right away.</p>
<pre>TStringList.Changed
TStringList.InsertItem
TStringList.AddObject
TStringList.Add</pre>
<p>What whoever coded this control apparently didn&#8217;t know, probably because they&#8217;d just never run across it before, was that Borland anticipated this very problem&#8211;or more liklely, because so many VCL classes use TStrings descendantes internally, they ran into it themselves at one point&#8211;and put a little switch into TStrings to turn off the OnChanged event handler temporarily.</p>
<p>Once I surrounded this code with a BeginUpdate and EndUpdate pair, the delay on closing up the box went from an angonizing 15 seconds to a tiny fraction of a second that I wouldn&#8217;t have noticed at all if I wasn&#8217;t watching for it.</p>
<p>Hopefully most of the people reading this are familiar with <a href="http://docwiki.embarcadero.com/VCL/en/Classes.TStrings.BeginUpdate">BeginUpdate</a> and <a href="http://docwiki.embarcadero.com/VCL/XE/en/Classes.TStrings.EndUpdate">EndUpdate</a>.  But if anyone who hasn&#8217;t seen it runs across this, now you have a new trick.  Please make sure to use it, to spare your end-users some pain.  Even if you don&#8217;t think it&#8217;s likely to be necessary, please use it anyway.  When this special checklist control was originally written, years ago, it was intended to hold a dozen or so items at most, not hundreds, and it probably performed fine at that scale.  But growing client demand means the app&#8217;s working with more data than it used to, and eventually you hit something like this unless you&#8217;re careful in your design.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/231/tstringlist-updating-pitfalls/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Inheritance baggage</title>
		<link>http://tech.turbu-rpg.com/147/inheritance-baggage</link>
		<comments>http://tech.turbu-rpg.com/147/inheritance-baggage#comments</comments>
		<pubDate>Mon, 21 Jun 2010 15:03:25 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Dark Corners]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=147</guid>
		<description><![CDATA[A couple posts ago, I mentioned that I&#8217;ve been working with code generation lately.  This is for a part of the TURBU project.  An RPG relies pretty heavily on scripting, and RPG Maker, the system I created TURBU to replace, has a fairly extensive, if limited, scripting system.  The limitations were one of the things [...]]]></description>
			<content:encoded><![CDATA[<p>A couple posts ago, I mentioned that I&#8217;ve been working with code generation lately.  This is for a part of the TURBU project.  An RPG relies pretty heavily on scripting, and RPG Maker, the system I created TURBU to replace, has a fairly extensive, if limited, scripting system.  The limitations were one of the things that made me say &#8220;I could do better than this,&#8221; in fact:  No functions, no local variables, callable procedures exist but parameters don&#8217;t, so any &#8220;passing&#8221; has to be done in global variables, only two data types: integer and boolean, no event handlers, minimal looping support, etc.</p>
<p><span id="more-147"></span></p>
<p>The upside of all this, though, is a very simple scripting system that doesn&#8217;t look much like a programming language, with a simple interface that almost anyone can pick up.  I wanted to keep that simplicity as much as possible, while adding the full flexibility and power of a real scripting language.  So I dreamed up EventBuilder, a set of objects which represent a high-level scripting interface and can also express themselves as <a href="http://www.remobjects.com/ps.aspx">PascalScript</a> code.</p>
<p>I needed some way to create EventBuilder objects that could form a hierarchical tree that can represent blocks of code.  They needed to be easily serializable to some human-readable format so people can copy and paste blocks of EventBuilder script in order to share scripts, ask for help with debugging, etc.  And it needed to be ready quickly, since I want to be able to present as much of this as possible at Delphi Live! in August.</p>
<p>So is there any pre-existing system that supports hierarchical trees of objects and easy serialization to a simple text-based format?  The answer should be obvious to any experienced Delphi user:  descend from TComponent and use its built-in serialization to &#8220;DFM format.&#8221;  I tried that and, once I&#8217;d figured out how to handle a few quirks related to object ownership, it worked great!  All the infrastructure was there for me, tested and tried and proven over the last 15 years, and I could focus on the actual Event Builder logic.  It&#8217;s taken me about a month to get the system to a workable state, and now it&#8217;s more or less all ready.</p>
<p>Then I tried running a very, very large RPG Maker project through my project importer, and it took a long time on converting the global script block.  That&#8217;s sort of to be expected, since there are almost 2000 event scripts in there, but even so it felt like it was taking far too long for the amount of work involved.  I looked at my code and couldn&#8217;t find any obvious issues, so I ran it through <a href="http://delphitools.info/samplingprofiler/">Sampling Profiler</a>.</p>
<p>It&#8217;s a good thing I did, too.  It found a very clear bottleneck in a place I&#8217;d have never thought to look.  Apparently I was spending 77% of my time in TComponent.Notification.  And why would I have never thought to look there?  Because I&#8217;ve never heard of it!  But apparently every time I added a component, it would recursively call this on the entire subtree, turning what ought to have been a O(n) conversion into O(n^2).</p>
<p>With a bit of research, it turns out that TComponent.Notification is for dealing with linked components.  For example, when you link a TDataset to a TDatasource, it needs a notification mechanism so it can clean up references if you free one of them.  Since EventBuilder doesn&#8217;t use linked components, I didn&#8217;t really need this functionality.  Good thing TComponent.Notification is virtual!  I overrode it with a blank method, and suddenly the conversion time dropped from about 12 seconds to about 3 seconds, and everything&#8217;s running smoothly again.</p>
<p>Moral of the story?  Be careful that you understand what you&#8217;re inheriting from, otherwise you might end up with <a href="http://www.snopes.com/humor/nonsense/kangaroo.asp">killer kangaroos</a> or other unwanted features.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/147/inheritance-baggage/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Real-world optimization</title>
		<link>http://tech.turbu-rpg.com/42/real-world-optimization</link>
		<comments>http://tech.turbu-rpg.com/42/real-world-optimization#comments</comments>
		<pubDate>Tue, 20 Oct 2009 15:22:27 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=42</guid>
		<description><![CDATA[Last week at work, I was asked to look at one of our verification modules that was taking about three times longer to run than it had in an earlier version.  This module takes a set of result files, compares them against another file showing expected results, and reports any discrepancies that are outside the [...]]]></description>
			<content:encoded><![CDATA[<p>Last week at work, I was asked to look at one of our verification modules that was taking about three times longer to run than it had in an earlier version.  This module takes a set of result files, compares them against another file showing expected results, and reports any discrepancies that are outside the defined margin of error.  It&#8217;s some pretty heavy work involving hundreds of thousands of data points, and the old version already took more than ten minutes.  Increasing the running time by a factor of three just wasn&#8217;t acceptable.  So I started to look at what was going on.</p>
<p><span id="more-42"></span></p>
<p>Verification takes place in four steps:</p>
<ol>
<li>Load the data from the files</li>
<li>Process the data</li>
<li>Process the data some more</li>
<li>Retrieve the results</li>
</ol>
<p>Steps 3 and 4 only take a few seconds each.  Step 2 takes a couple minutes, but the bulk of the time is spent in step 1.  So I decided to focus on there to see if I could find what was making it take so long.  First thing to do is establish a baseline. I built the old version and turned on SQL Server Profiler for the database and <a href="http://delphitools.info/samplingprofiler/">Sampling Profiler</a>, an excellent tool written in Delphi that helps you profile Delphi apps without slowing them down the way AQTime does.  I ran the entire verification process and found that yes, not only was step 1 taking most of the time, over 90% of the time was spent on one single line that matches the data from the files against the data in the database.</p>
<p>The data-loading system looks something like this.  Names and a few details have been changed to protect <span style="text-decoration: line-through;">the innocent</span> the corporate intellectual property, of course, but this is the general idea of what was going on.  See how many problems you can spot in this code.  (Bear in mind, this was the original, faster version.)</p>
<pre>
<div class="codesnip-container" >
<div class="delphi codesnip" style="font-family:monospace;"><span class="kw1">procedure</span> TVerificationDataModule<span class="sy1">.</span><span class="me1">LoadFile</span><span class="br0">&#40;</span><span class="kw1">const</span> filename<span class="sy1">:</span> <span class="kw4">string</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;otherParams<span class="sy1">:</span> TOtherData<span class="br0">&#41;</span><span class="sy1">;</span>
<span class="kw1">var</span>
&nbsp; &nbsp;lines<span class="sy1">:</span> TStringList<span class="sy1">;</span>
&nbsp; &nbsp;fileData<span class="sy1">:</span> TObjectList
&nbsp; &nbsp;dbData<span class="sy1">:</span> TOrmObjectList<span class="sy1">;</span>
&nbsp; &nbsp;i<span class="sy1">:</span> <span class="kw4">integer</span><span class="sy1">;</span>
<span class="kw1">begin</span>
&nbsp; &nbsp;lines <span class="sy1">:</span><span class="sy3">=</span> TStringList<span class="sy1">.</span><span class="me1">Create</span><span class="sy1">;</span>
&nbsp; &nbsp;fileData <span class="sy1">:</span><span class="sy3">=</span> TObjectList<span class="sy1">.</span><span class="me1">Create</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">try</span>
&nbsp; &nbsp; &nbsp; lines<span class="sy1">.</span><span class="me1">LoadFromFile</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">for</span> I <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> lines<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">Add</span><span class="br0">&#40;</span>parseLine<span class="br0">&#40;</span>lines<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">finally</span>
&nbsp; &nbsp; &nbsp; lines<span class="sy1">.</span><span class="me1">Free</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>

&nbsp; &nbsp;dbData <span class="sy1">:</span><span class="sy3">=</span> GetRelevantDBData<span class="br0">&#40;</span>otherParams<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">try</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">for</span> i <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> fileData<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;MatchFileDataAgainstDB<span class="br0">&#40;</span>fileData<span class="br0">&#91;</span>i<span class="br0">&#93;</span> <span class="kw1">as</span> TFileData<span class="sy1">,</span> dbData<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">finally</span>
&nbsp; &nbsp; &nbsp; dbData<span class="sy1">.</span><span class="me1">Free</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>
<span class="kw1">end</span><span class="sy1">;</span>

<span class="kw1">procedure</span> TVerificationDataModule<span class="sy1">.</span><span class="me1">MatchFileDataAgainstDB</span><span class="br0">&#40;</span>fileData<span class="sy1">:</span> TFileData<span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dbData<span class="sy1">:</span> TOrmObjectList<span class="br0">&#41;</span><span class="sy1">;</span>
<span class="kw1">var</span>
&nbsp; &nbsp;i<span class="sy1">:</span> <span class="kw4">integer</span><span class="sy1">;</span>
&nbsp; &nbsp;dbItem<span class="sy1">:</span> TOrmVerificationObject<span class="sy1">;</span>
&nbsp; &nbsp;updateProcedure<span class="sy1">:</span> IStoredProcedureRecord<span class="sy1">;</span>
<span class="kw1">begin</span>
&nbsp; &nbsp;<span class="kw1">for</span> i <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> dbData<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp;<span class="kw1">begin</span>
&nbsp; &nbsp; &nbsp; dbItem <span class="sy1">:</span><span class="sy3">=</span> dbData<span class="br0">&#91;</span>i<span class="br0">&#93;</span> <span class="kw1">as</span> TOrmVerificationObject<span class="sy1">;</span>

&nbsp; &nbsp; &nbsp; <span class="co1">//90% of time is spent on this next line:</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">if</span> fileData<span class="sy1">.</span><span class="me1">param1</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param1</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param2</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param2</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param3</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param3</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param4</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param4</span> <span class="kw1">then</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">begin</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure <span class="sy1">:</span><span class="sy3">=</span> CreateStoredProc<span class="br0">&#40;</span><span class="st0">'VERIFICATION_DATA_LOADER'</span><span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param1</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param3</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param2</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param4</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param3</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param5</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param4</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param6</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param5</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param7</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">Execute</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> updateProcedure<span class="sy1">.</span><span class="me1">ResultCode</span> &amp;lt<span class="sy1">;</span>&amp;gt<span class="sy1">;</span> GOOD_RESULT <span class="kw1">then</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> Exception<span class="sy1">.</span><span class="me1">Create</span><span class="br0">&#40;</span><span class="st0">'Something went wrong'</span><span class="br0">&#41;</span><span class="sy1">;</span>

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dbData<span class="sy1">.</span><span class="kw3">Delete</span><span class="br0">&#40;</span>i<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">end</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>
<span class="kw1">end</span><span class="sy1">;</span></div>
</div>
</pre>
<p>Then I profiled the newer version and got very similar results, except that it was spending even more time in the <strong>if</strong> statement to match the objects against each other.  Close to 99% now.  So what had changed?  I looked back through version control and found that the SQL that generates the result set that goes into dbData had been changed between versions.  A new table was added to simplify the big mess of joins, but they forgot one of the <strong>on</strong> criteria, so it was returning three times as many results as it should have.  There&#8217;s your factor of three right there.  Easy enough to fix.  But that still doesn&#8217;t address the quality of the original code.  A couple things jumped right out at me, and I wondered if I could bring the time down below the original mark.</p>
<p>The first thing came out of the SQL profiler.  I kept seeing a call to sp_procedure_params_rowset, an <a href="http://msdn.microsoft.com/en-us/library/ms187961.aspx">undocumented procedure in SQL Server</a> that the connection object uses internally to get information about the expected parameters for a stored procedure, immediately followed by a call to the VERIFICATION_DATA_LOADER proc.  This seemed a bit silly to me.  The signature of the stored procedure isn&#8217;t going to change!  Turns out that was called internally by the CreateStoredProc function, which was being called every time it went to save some data to the database, in order to create the proper object.</p>
<p>So I moved the call to CreateStoredProc out to the main procedure and set it up as an extra parameter to pass into MatchFileDataAgainstDB.  It would reuse the same basic stored procedure object and reassign its parameters for each call, so you get the same net effect, but with 50% less database hits.  Unfortunately, this didn&#8217;t yield a 50% increase in performance.  SQL Server can cache the results of redundant queries, so this call wasn&#8217;t taking much time at all to process repeatedly, but the transport layer overhead was still a factor, and removing this redundant call sped the overall process up by about 20%.</p>
<p>But the big one was in the matching, where the profiler said the system was spending the majority of its time.  It doesn&#8217;t exactly look like a speed bottleneck, because it&#8217;s stored inside a method call, but what it is is a linear search inside of a loop, with both lists containing a few thousand elements each.  But how do you make something like this run faster?  I could try sorting the second list and using a binary search, but have you ever written a binary search?  It&#8217;s a bunch of extra code, and it&#8217;s often confusing and hard to read.  I couldn&#8217;t use a TDictionary to index the second list, because I need to match against 4 items, not just 1.  So instead I used a very simple trick that&#8217;s been around for decades but I don&#8217;t tend to see very often these days: list comparison.</p>
<p>The general algorithm goes like this:</p>
<ol>
<li>Sort both lists by the same criteria.  This must also be the same as the matching criteria.</li>
<li>Start at the top of both lists.  Pick the first item from each and compare them.</li>
<li>If they match, handle the case and advance the index for both lists.</li>
<li>If they don&#8217;t match, loop through, advancing the index for the list with the &#8220;lesser&#8221; value each time, until a match is found.</li>
<li>When you reach the end of either list, you&#8217;re done.  (Unless you want to handle any leftovers from the other list.)</li>
</ol>
<p>This is a very simple and very useful algorithm for reconciling two sets of data, and I&#8217;ve managed to find all sorts of uses for it.  Unlike a double-nested loop, which basically runs in <a href="http://en.wikipedia.org/wiki/Big_O_notation">quadratic time</a>,  this is guaranteed to run in linear time and never walk either list more than once.  I managed to adapt this algorithm to the existing code, and suddenly processing the input files, which had previously taken at least a minute each, takes between 2 and 6 seconds per file.  Now loading the data takes about the same amount of time as performing the calculations, instead of an order of magnitude longer.</p>
<p>Lessons learned:</p>
<ul>
<li>Profilers, especially non-invasive ones, are invaluable for finding what&#8217;s going on in your app.  I&#8217;d have probably noticed that double-nested loop soon enough, but I would never have found the stored procedure issue without SQL Server Profiler to point it out.</li>
<li>Pulling things out of loops—especially other loops!—is a great way to increase performance.</li>
<li>Reducing algorithmic time complexity is by far the best optimization for large data sets.</li>
<li>Linear, single-threaded techniques are still relevant.  A lot of people are talking these days about parallel programming and how the meaning of optimization has changed in today&#8217;s world.  They&#8217;re right, to a certain extent, but as hard as I try I can&#8217;t think of any way to parallelize this check that would make it faster than a simple list comparison.  The only thing I know of with the potential to be faster than this is a hash table lookup, which could be parallelized, but it won&#8217;t work particularly well when you need to look up your values based on more than one index value.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/42/real-world-optimization/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

