<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>TURBU Tech &#187; Optimization</title>
	<atom:link href="http://tech.turbu-rpg.com/tag/optimization/feed" rel="self" type="application/rss+xml" />
	<link>http://tech.turbu-rpg.com</link>
	<description>My thoughts on Delphi programming in general, and particularly on the technical aspects of developing the TURBU engine and editor.</description>
	<lastBuildDate>Wed, 01 Sep 2010 21:11:08 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Inheritance baggage</title>
		<link>http://tech.turbu-rpg.com/147/inheritance-baggage</link>
		<comments>http://tech.turbu-rpg.com/147/inheritance-baggage#comments</comments>
		<pubDate>Mon, 21 Jun 2010 15:03:25 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Dark Corners]]></category>
		<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=147</guid>
		<description><![CDATA[A couple posts ago, I mentioned that I&#8217;ve been working with code generation lately.  This is for a part of the TURBU project.  An RPG relies pretty heavily on scripting, and RPG Maker, the system I created TURBU to replace, has a fairly extensive, if limited, scripting system.  The limitations were one of the things [...]]]></description>
			<content:encoded><![CDATA[<p>A couple posts ago, I mentioned that I&#8217;ve been working with code generation lately.  This is for a part of the TURBU project.  An RPG relies pretty heavily on scripting, and RPG Maker, the system I created TURBU to replace, has a fairly extensive, if limited, scripting system.  The limitations were one of the things that made me say &#8220;I could do better than this,&#8221; in fact:  No functions, no local variables, callable procedures exist but parameters don&#8217;t, so any &#8220;passing&#8221; has to be done in global variables, only two data types: integer and boolean, no event handlers, minimal looping support, etc.</p>
<p><span id="more-147"></span></p>
<p>The upside of all this, though, is a very simple scripting system that doesn&#8217;t look much like a programming language, with a simple interface that almost anyone can pick up.  I wanted to keep that simplicity as much as possible, while adding the full flexibility and power of a real scripting language.  So I dreamed up EventBuilder, a set of objects which represent a high-level scripting interface and can also express themselves as <a href="http://www.remobjects.com/ps.aspx">PascalScript</a> code.</p>
<p>I needed some way to create EventBuilder objects that could form a hierarchical tree that can represent blocks of code.  They needed to be easily serializable to some human-readable format so people can copy and paste blocks of EventBuilder script in order to share scripts, ask for help with debugging, etc.  And it needed to be ready quickly, since I want to be able to present as much of this as possible at Delphi Live! in August.</p>
<p>So is there any pre-existing system that supports hierarchical trees of objects and easy serialization to a simple text-based format?  The answer should be obvious to any experienced Delphi user:  descend from TComponent and use its built-in serialization to &#8220;DFM format.&#8221;  I tried that and, once I&#8217;d figured out how to handle a few quirks related to object ownership, it worked great!  All the infrastructure was there for me, tested and tried and proven over the last 15 years, and I could focus on the actual Event Builder logic.  It&#8217;s taken me about a month to get the system to a workable state, and now it&#8217;s more or less all ready.</p>
<p>Then I tried running a very, very large RPG Maker project through my project importer, and it took a long time on converting the global script block.  That&#8217;s sort of to be expected, since there are almost 2000 event scripts in there, but even so it felt like it was taking far too long for the amount of work involved.  I looked at my code and couldn&#8217;t find any obvious issues, so I ran it through <a href="http://delphitools.info/samplingprofiler/">Sampling Profiler</a>.</p>
<p>It&#8217;s a good thing I did, too.  It found a very clear bottleneck in a place I&#8217;d have never thought to look.  Apparently I was spending 77% of my time in TComponent.Notification.  And why would I have never thought to look there?  Because I&#8217;ve never heard of it!  But apparently every time I added a component, it would recursively call this on the entire subtree, turning what ought to have been a O(n) conversion into O(n^2).</p>
<p>With a bit of research, it turns out that TComponent.Notification is for dealing with linked components.  For example, when you link a TDataset to a TDatasource, it needs a notification mechanism so it can clean up references if you free one of them.  Since EventBuilder doesn&#8217;t use linked components, I didn&#8217;t really need this functionality.  Good thing TComponent.Notification is virtual!  I overrode it with a blank method, and suddenly the conversion time dropped from about 12 seconds to about 3 seconds, and everything&#8217;s running smoothly again.</p>
<p>Moral of the story?  Be careful that you understand what you&#8217;re inheriting from, otherwise you might end up with <a href="http://www.snopes.com/humor/nonsense/kangaroo.asp">killer kangaroos</a> or other unwanted features.</p>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/147/inheritance-baggage/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Real-world optimization</title>
		<link>http://tech.turbu-rpg.com/42/real-world-optimization</link>
		<comments>http://tech.turbu-rpg.com/42/real-world-optimization#comments</comments>
		<pubDate>Tue, 20 Oct 2009 15:22:27 +0000</pubDate>
		<dc:creator>Mason Wheeler</dc:creator>
				<category><![CDATA[Delphi]]></category>
		<category><![CDATA[Optimization]]></category>

		<guid isPermaLink="false">http://tech.turbu-rpg.com/?p=42</guid>
		<description><![CDATA[Last week at work, I was asked to look at one of our verification modules that was taking about three times longer to run than it had in an earlier version.  This module takes a set of result files, compares them against another file showing expected results, and reports any discrepancies that are outside the [...]]]></description>
			<content:encoded><![CDATA[<p>Last week at work, I was asked to look at one of our verification modules that was taking about three times longer to run than it had in an earlier version.  This module takes a set of result files, compares them against another file showing expected results, and reports any discrepancies that are outside the defined margin of error.  It&#8217;s some pretty heavy work involving hundreds of thousands of data points, and the old version already took more than ten minutes.  Increasing the running time by a factor of three just wasn&#8217;t acceptable.  So I started to look at what was going on.</p>
<p><span id="more-42"></span></p>
<p>Verification takes place in four steps:</p>
<ol>
<li>Load the data from the files</li>
<li>Process the data</li>
<li>Process the data some more</li>
<li>Retrieve the results</li>
</ol>
<p>Steps 3 and 4 only take a few seconds each.  Step 2 takes a couple minutes, but the bulk of the time is spent in step 1.  So I decided to focus on there to see if I could find what was making it take so long.  First thing to do is establish a baseline. I built the old version and turned on SQL Server Profiler for the database and <a href="http://delphitools.info/samplingprofiler/">Sampling Profiler</a>, an excellent tool written in Delphi that helps you profile Delphi apps without slowing them down the way AQTime does.  I ran the entire verification process and found that yes, not only was step 1 taking most of the time, over 90% of the time was spent on one single line that matches the data from the files against the data in the database.</p>
<p>The data-loading system looks something like this.  Names and a few details have been changed to protect <span style="text-decoration: line-through;">the innocent</span> the corporate intellectual property, of course, but this is the general idea of what was going on.  See how many problems you can spot in this code.  (Bear in mind, this was the original, faster version.)</p>
<pre>
<div class="codesnip-container" >
<div class="delphi codesnip" style="font-family:monospace;"><span class="kw1">procedure</span> TVerificationDataModule<span class="sy1">.</span><span class="me1">LoadFile</span><span class="br0">&#40;</span><span class="kw1">const</span> filename<span class="sy1">:</span> <span class="kw4">string</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;otherParams<span class="sy1">:</span> TOtherData<span class="br0">&#41;</span><span class="sy1">;</span>
<span class="kw1">var</span>
&nbsp; &nbsp;lines<span class="sy1">:</span> TStringList<span class="sy1">;</span>
&nbsp; &nbsp;fileData<span class="sy1">:</span> TObjectList
&nbsp; &nbsp;dbData<span class="sy1">:</span> TOrmObjectList<span class="sy1">;</span>
&nbsp; &nbsp;i<span class="sy1">:</span> <span class="kw4">integer</span><span class="sy1">;</span>
<span class="kw1">begin</span>
&nbsp; &nbsp;lines <span class="sy1">:</span><span class="sy3">=</span> TStringList<span class="sy1">.</span><span class="me1">Create</span><span class="sy1">;</span>
&nbsp; &nbsp;fileData <span class="sy1">:</span><span class="sy3">=</span> TObjectList<span class="sy1">.</span><span class="me1">Create</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">try</span>
&nbsp; &nbsp; &nbsp; lines<span class="sy1">.</span><span class="me1">LoadFromFile</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">for</span> I <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> lines<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">Add</span><span class="br0">&#40;</span>parseLine<span class="br0">&#40;</span>lines<span class="br0">&#91;</span>i<span class="br0">&#93;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">finally</span>
&nbsp; &nbsp; &nbsp; lines<span class="sy1">.</span><span class="me1">Free</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>

&nbsp; &nbsp;dbData <span class="sy1">:</span><span class="sy3">=</span> GetRelevantDBData<span class="br0">&#40;</span>otherParams<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">try</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">for</span> i <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> fileData<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;MatchFileDataAgainstDB<span class="br0">&#40;</span>fileData<span class="br0">&#91;</span>i<span class="br0">&#93;</span> <span class="kw1">as</span> TFileData<span class="sy1">,</span> dbData<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">finally</span>
&nbsp; &nbsp; &nbsp; dbData<span class="sy1">.</span><span class="me1">Free</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>
<span class="kw1">end</span><span class="sy1">;</span>

<span class="kw1">procedure</span> TVerificationDataModule<span class="sy1">.</span><span class="me1">MatchFileDataAgainstDB</span><span class="br0">&#40;</span>fileData<span class="sy1">:</span> TFileData<span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dbData<span class="sy1">:</span> TOrmObjectList<span class="br0">&#41;</span><span class="sy1">;</span>
<span class="kw1">var</span>
&nbsp; &nbsp;i<span class="sy1">:</span> <span class="kw4">integer</span><span class="sy1">;</span>
&nbsp; &nbsp;dbItem<span class="sy1">:</span> TOrmVerificationObject<span class="sy1">;</span>
&nbsp; &nbsp;updateProcedure<span class="sy1">:</span> IStoredProcedureRecord<span class="sy1">;</span>
<span class="kw1">begin</span>
&nbsp; &nbsp;<span class="kw1">for</span> i <span class="sy1">:</span><span class="sy3">=</span> 0 <span class="kw1">to</span> dbData<span class="sy1">.</span><span class="me1">Count</span> <span class="sy3">-</span> 1 <span class="kw1">do</span>
&nbsp; &nbsp;<span class="kw1">begin</span>
&nbsp; &nbsp; &nbsp; dbItem <span class="sy1">:</span><span class="sy3">=</span> dbData<span class="br0">&#91;</span>i<span class="br0">&#93;</span> <span class="kw1">as</span> TOrmVerificationObject<span class="sy1">;</span>

&nbsp; &nbsp; &nbsp; <span class="co1">//90% of time is spent on this next line:</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">if</span> fileData<span class="sy1">.</span><span class="me1">param1</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param1</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param2</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param2</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param3</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param3</span> <span class="kw1">and</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;fileData<span class="sy1">.</span><span class="me1">param4</span> <span class="sy3">=</span> dbItem<span class="sy1">.</span><span class="me1">param4</span> <span class="kw1">then</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">begin</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure <span class="sy1">:</span><span class="sy3">=</span> CreateStoredProc<span class="br0">&#40;</span><span class="st0">'VERIFICATION_DATA_LOADER'</span><span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param1</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param3</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param2</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param4</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param3</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param5</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param4</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param6</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">param5</span> <span class="sy1">:</span><span class="sy3">=</span> fileData<span class="sy1">.</span><span class="me1">param7</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;updateProcedure<span class="sy1">.</span><span class="me1">Execute</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span class="kw1">if</span> updateProcedure<span class="sy1">.</span><span class="me1">ResultCode</span> &amp;lt<span class="sy1">;</span>&amp;gt<span class="sy1">;</span> GOOD_RESULT <span class="kw1">then</span>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span class="kw1">raise</span> Exception<span class="sy1">.</span><span class="me1">Create</span><span class="br0">&#40;</span><span class="st0">'Something went wrong'</span><span class="br0">&#41;</span><span class="sy1">;</span>

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;dbData<span class="sy1">.</span><span class="kw3">Delete</span><span class="br0">&#40;</span>i<span class="br0">&#41;</span><span class="sy1">;</span>
&nbsp; &nbsp; &nbsp; <span class="kw1">end</span><span class="sy1">;</span>
&nbsp; &nbsp;<span class="kw1">end</span><span class="sy1">;</span>
<span class="kw1">end</span><span class="sy1">;</span></div>
</div>
</pre>
<p>Then I profiled the newer version and got very similar results, except that it was spending even more time in the <strong>if</strong> statement to match the objects against each other.  Close to 99% now.  So what had changed?  I looked back through version control and found that the SQL that generates the result set that goes into dbData had been changed between versions.  A new table was added to simplify the big mess of joins, but they forgot one of the <strong>on</strong> criteria, so it was returning three times as many results as it should have.  There&#8217;s your factor of three right there.  Easy enough to fix.  But that still doesn&#8217;t address the quality of the original code.  A couple things jumped right out at me, and I wondered if I could bring the time down below the original mark.</p>
<p>The first thing came out of the SQL profiler.  I kept seeing a call to sp_procedure_params_rowset, an <a href="http://msdn.microsoft.com/en-us/library/ms187961.aspx">undocumented procedure in SQL Server</a> that the connection object uses internally to get information about the expected parameters for a stored procedure, immediately followed by a call to the VERIFICATION_DATA_LOADER proc.  This seemed a bit silly to me.  The signature of the stored procedure isn&#8217;t going to change!  Turns out that was called internally by the CreateStoredProc function, which was being called every time it went to save some data to the database, in order to create the proper object.</p>
<p>So I moved the call to CreateStoredProc out to the main procedure and set it up as an extra parameter to pass into MatchFileDataAgainstDB.  It would reuse the same basic stored procedure object and reassign its parameters for each call, so you get the same net effect, but with 50% less database hits.  Unfortunately, this didn&#8217;t yield a 50% increase in performance.  SQL Server can cache the results of redundant queries, so this call wasn&#8217;t taking much time at all to process repeatedly, but the transport layer overhead was still a factor, and removing this redundant call sped the overall process up by about 20%.</p>
<p>But the big one was in the matching, where the profiler said the system was spending the majority of its time.  It doesn&#8217;t exactly look like a speed bottleneck, because it&#8217;s stored inside a method call, but what it is is a linear search inside of a loop, with both lists containing a few thousand elements each.  But how do you make something like this run faster?  I could try sorting the second list and using a binary search, but have you ever written a binary search?  It&#8217;s a bunch of extra code, and it&#8217;s often confusing and hard to read.  I couldn&#8217;t use a TDictionary to index the second list, because I need to match against 4 items, not just 1.  So instead I used a very simple trick that&#8217;s been around for decades but I don&#8217;t tend to see very often these days: list comparison.</p>
<p>The general algorithm goes like this:</p>
<ol>
<li>Sort both lists by the same criteria.  This must also be the same as the matching criteria.</li>
<li>Start at the top of both lists.  Pick the first item from each and compare them.</li>
<li>If they match, handle the case and advance the index for both lists.</li>
<li>If they don&#8217;t match, loop through, advancing the index for the list with the &#8220;lesser&#8221; value each time, until a match is found.</li>
<li>When you reach the end of either list, you&#8217;re done.  (Unless you want to handle any leftovers from the other list.)</li>
</ol>
<p>This is a very simple and very useful algorithm for reconciling two sets of data, and I&#8217;ve managed to find all sorts of uses for it.  Unlike a double-nested loop, which basically runs in <a href="http://en.wikipedia.org/wiki/Big_O_notation">quadratic time</a>,  this is guaranteed to run in linear time and never walk either list more than once.  I managed to adapt this algorithm to the existing code, and suddenly processing the input files, which had previously taken at least a minute each, takes between 2 and 6 seconds per file.  Now loading the data takes about the same amount of time as performing the calculations, instead of an order of magnitude longer.</p>
<p>Lessons learned:</p>
<ul>
<li>Profilers, especially non-invasive ones, are invaluable for finding what&#8217;s going on in your app.  I&#8217;d have probably noticed that double-nested loop soon enough, but I would never have found the stored procedure issue without SQL Server Profiler to point it out.</li>
<li>Pulling things out of loops—especially other loops!—is a great way to increase performance.</li>
<li>Reducing algorithmic time complexity is by far the best optimization for large data sets.</li>
<li>Linear, single-threaded techniques are still relevant.  A lot of people are talking these days about parallel programming and how the meaning of optimization has changed in today&#8217;s world.  They&#8217;re right, to a certain extent, but as hard as I try I can&#8217;t think of any way to parallelize this check that would make it faster than a simple list comparison.  The only thing I know of with the potential to be faster than this is a hash table lookup, which could be parallelized, but it won&#8217;t work particularly well when you need to look up your values based on more than one index value.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://tech.turbu-rpg.com/42/real-world-optimization/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
