How to break the D2010 compiler

I really loved when Delphi 2009 came out, how it fixed so many ugly problems in the Delphi IDE.  The stability issues and memory leaks that plagued D2006 and D2007 were greatly reduced.  And it just got better in D2010.

The tradeoff, though, seems to have been compiler stability.  Trying to do anything with Generics in D2009 before Update 3 came out was a nightmare, and even after, (and even in D2010,) there were still plenty of dark corners where you can end up with an Internal Compiler Error or linker error on something that, syntactically speaking at least, is perfectly cromulent Object Pascal.

I think I ran into the darkest corner of all this morning, or at least the darkest one that’s been found yet.  I found something that not only won’t compile, but will either destabilize or completely crash the IDE if you try to compile it.

I was trying to build a specialized tree class.  Trees are pretty familiar to anyone who’s taken a Data Structures class:  Each TreeNode contains a Value and pointers to two or more other TreeNodes.  There’s an excellent implementation of the standard binary tree concept in DeHL, for example.  This works really well if you’re trying to build a standard binary tree, but when you want a hierarchical tree, (like the way the VCL’s TTreeView organizes information,) the standard binary tree concept doesn’t work so well.

I came across a situation while building the TURBU editor where I had a list of objects with an ID property and a Parent property, where both are integers, and I needed to come up with a list of all objects whose Parent value was equal to the ID value of a given object.  In other words, all immediate children of a certain node.

Obtaining all descendants of a binary tree node is trivial if you understand recursion.  But that’s not what I needed; I needed all immediate children, but not their children.  So a different kind of tree is needed, one composed of TLists of nodes instead of binary subtrees of nodes.  So that’s what I wrote.  Each node contains a value, a pointer to the list it’s in, and a pointer to the list containing its children, and each list is a list of nodes, plus a pointer to its parent node.  It ended up looking like this:

[code lang="Delphi"]
unit hierarchy_tree;

interface
uses
   SysUtils, generics.collections;

type
   THierarchyTreeList = class;

   THierarchyTreeNode = class
   private type

      TEnumerator = class(TEnumerator)
      private
         FTopNode: THierarchyTreeNode;
         FIndex: integer;
         FNext: TEnumerator;
         FTopRead: boolean;
      protected
         function DoGetCurrent: T; override;
         function DoMoveNext: Boolean; override;
      public
         constructor Create(value: THierarchyTreeNode);
         function GetCurrent: T;
         function MoveNext: boolean;
         property Current: T read GetCurrent;
      end;

      TNodeList = THierarchyTreeList>;

   private
      FData: T;
      FRight: TNodeList;
      FList: TNodeList;
      function GetParent: THierarchyTreeNode;
      function GetLevel: integer;
   public
      constructor Create(value: T);
      destructor Destroy; override;
      procedure Add(node: THierarchyTreeNode); overload;
      procedure Add(value: T); overload;
      function GetEnumerator: TEnumerator;

      property List: TNodeList read FList;
      property Right: TNodeList read FRight;
      property Parent: THierarchyTreeNode read GetParent;
      property Level: integer read GetLevel;
      property Data: T read FData write FData;
   end;

   THierarchyTreeList = class(TObjectList>)
   private
      FParent: THierarchyTreeNode;
      FLevel: integer;
      procedure SetParent(Value: THierarchyTreeNode);
   public
      constructor Create(parent: THierarchyTreeNode);
      function Add(value: THierarchyTreeNode): integer;
      property Parent: THierarchyTreeNode read FParent write SetParent;
      property Level: integer read FLevel;
   end;

implementation

{ No implementation provided.  Compiler will break before reaching this
point anyway. }

end.[/code]

You can try to compile this if you want, but make sure to save any work before you try.  Turns out this will break the compiler.  The compiler will try to process this, get stuck for about 30 seconds, all the while sucking up memory like there’s no tomorrow, until it hits 1.5 GB according to Task Manager, and then break.  It may crash, or it may simply become unstable.  If you’re really lucky, you’ll get an Out Of Memory error so at least you have some idea what’s going on.

After sharing this problem with a few other coders, I’ve got some idea what’s going on.  The node uses the list in its definition, and the uses the node as its generic parameter.  Apparently the compiler is trying to resolve one of these generics immediately, instead of waiting for instantiaion when a value of T can be provided, and getting stuck in an infinite loop, even though the definition is not infinitely recursive.

I’ve submitted a bug report about this to QC . Thanks to Moritz Beutel and Stephen Kamradt for the insights as to what was going on!

11 Comments

  1. Jim McKeeth says:

    I disagree. In my opinion the Delphi 2010 compiler is more stable than the Delphi 2009 compiler, which is saying a lot. Even in your write up you say that 2010 fixed a lot of the more complex Generic issues that were in 2009 (which worked pretty good in the original 2009 release).

    One really really critical obscure bug does not an unstable compiler make. Sure, it seems like that should work, and it doesn’t. But the fact that no one else has submitted a QC or blogged about it would indicate that it is fairly obscure.

    Good find though. Making it a syntax error should be an easy fix, supporting it might be a little tougher. Will be interested to see how they handle it. I hope the support it. Would be nice.

  2. Bryden says:

    I’ve got a sneaking suspicion that defining the TNodeList type may in fact be allowing the infinite recursion. I can’t quite put my finger on why I suspect that, although that may force a resolve of the generics immediately.

  3. Mason Wheeler says:

    Jim: I should have phrased that more clearly. The Generics issues have made the more recent compilers less stable than D2007’s. 2010 is definitely an improvement over 2009, though!

    Bryden: Yeah, that’s about how I had it figured too. The question is, why? According to Moritz Beutel, this concept works just fine with C++ templates and C# generics, and the concept itself isn’t inherently recursive, so it shouldn’t throw the compiler into an infinite loop. I think it just doesn’t quite realize that it doesn’t have a type for T yet, so it keeps going around in circles looking for it.

  4. Lex Li says:

    When Embarcadero announced generics support in Delphi, I was surprised that nobody designs a test suite for that to check whether at that moment Delphi supports all or most features available in C++/Java/C#. I believe the dev team have such a suite to test out the compiler and hope some day they can show that publicly. It is not a shame that some test cases fail, but it can be improved in future versions.

    Like Nick Hodges once reported, the compiler is under heavy redesign and reimplementation to achieve C++Builder/Delphi parity and multi-platform support. The opportunity is that such a new compiler architecture should make C++ 0x and Delphi generics support complete.

  5. Jolyon Smith says:

    @Lexi:

    “I believe the dev team have such a suite to test out the compiler”

    Maybe so, but if they do one of the following MUST pertain:

    1) They knew of the bug but chose to release anyway

    2) Their test suite doesn’t provide full coverage of all supported syntax – and if it doesn’t cover this case what other cases doesn’t it cover?

    3) Their test suite provides 100% coverage but wasn’t part of the QA process subsequent to this syntax change/extension but prior to release

  6. Hello Mason,

    you say “even though the definition is not infinitely recursive” – I’d give that statement another thought.

    THierarchyTreeNode has a list of descendent nodes, FList. This is supposed to be a list of THierarchyTreeNode objects. You declare FList with the type THierarchyTreeList<THierarchyTreeNode>. But THierarchyTreeList inherits from TObjectList<THierarchyTreeNode> – which effectively gives you a list of nodes of type THierarchyTreeNode<THierarchyTreeNode>, which is both recursive and probably not what you’re longing for.

    You should be able to get it to work as expected if you define TNodeList = THierarchyTreeList. This will give you a list of descendent nodes of same type, just as you described.

  7. Your comment engine ate my ‘<‘ and ‘>’ signs 🙂
    I guess you can figure out what I meant anyway. Just append the missing <T> to THierarchyTreeNode where appropriate.

  8. Warren says:

    It is interesting that this code causes the compiler’s VM working size to exceed 1.5 gb on my system, almost immediately. Imagine how much fun it would be if this bug still existed in a 64-bit compiler that doesn’t have a 32 bit memory cap. 🙂

    I guess my reluctance to use templated types should be explained the same way that a guy who is scared of arrays would explain himself. It’s not one array that scares me, he would say, it’s an array of array of arrays of arrays. And so on. And a list of trees of lists of all templated types. Well, lots of brackets is one problem there.

    I changed the code to make a forward declaration of TEnumerator that is not necessarily a templated type, just so TEnumerator (without ) resolves:

    TEnumerator = class;

    THierarchyTreeNode = class
    // private type
    // TEnumerator = class(TEnumerator)

    No more crash. So that private type appears to be one item on the path of the compiler’s endless recursion or looping.

    W

  9. Barry Kelly says:

    As Moritz says, this bug is a failure of the compiler to diagnose infinite recursion in generic type instantiation. The code is invalid, but the compiler tries to instantiate anyway, and gets caught up in the recursion.

    The problem type is TNodeList = THierarchyTreeList<THierarchyTreeNode<T>> – it should be TNodeList = THierarchyTreeList<T>.

    The ancestor of THierarchyTreeList`1 calls the type constructor of TObjectList`1, but not after first calling the type constructor THierarchyTreeNode`1. But the argument to THierarchyTreeList`1 from TNodeList already included the call to THierarchyTreeNode`1. The instantiation of THierarchyTreeNode`1 will trigger another, different THierarchyTreeList`1, and so on and so forth.

  10. Mason Wheeler says:

    Moritz: Wow, good catch! I fixed that and now it compiles. Now I feel a bit embarrassed… 🙁

    Barry: I see. Any way to fix this without having to solve the Halting Problem? Also, I’m a bit surprised that this infinite recursion dies from Out Of Memory (heap exhaustion) and not Stack Overflow. Isn’t that what usually happens to infinite recursion?

  11. Depends how you recurse. You can easily do that in a loop:

    while ((currentType = getSubType(currentType)) != 0)
    allocateSymbolTableEntry (currentType);

    As I said, a similar construct compiles fine when using C# or C++. I suspect it works in C# because generics are instantiated at runtime by the CLR, whereas the Delphi compiler has to resolve every instantiation during compile time. In C++ it works because the compiler doesn’t need more than a forward declaration for a certain type in order to use pointers to that type, so it doesn’t attempt to actually instantiate the template.

    It should be possible to fix without the Halting problem getting in the way – I guess you would just check whether an instantiation of TFoo<SomethingElse<T, …>, …> was triggered by TFoo<T, …> – but given the work and the compile-time overhead involved, it’s probably not worth the effort.