How to simplify memory management the right way

As I pointed out yesterday, with FastMM available, memory management is so much of a solved problem that it’s a non-problem.  So dropping a performance-killing pseudo-GC “solution” on us is patronizing and insulting in the extreme, not to mention a massive waste of effort that could have been spent actually improving the language and/or standard libraries.  But I did notice one very interesting thing from its implementation: the introduction of the [Weak] attribute, and a few other related attributes, are something fundamentally new in the Delphi language.

Attributes were introduced to Delphi as part of the extended RTTI package in D2010.  They’re essentially little tags you can put on a class (or various other places) that provide metadata about it, which can be retrieved via the RTTI system.  But [Weak] is something more: it’s an attribute that has a specific meaning to the compiler itself.  It’s not just a metadata attribute, but a semantic attribute.  And that concept of semantic attributes, if employed in a slightly different way, could be used to make Delphi code a lot prettier without having to slow everything down, break tons of existing code, and tick off the vast majority of Delphi developers who do not want garbage collection in the language.

Consider the following code:

procedure myProc;
var
   myObj: TMyObject;
begin
   myObj := TMyObject.Create;
   try
      myObj.DoWhatever;
   finally
      myObj.free;
   end;
end;

You’re probably familiar with this pattern.  You’ve probably written code like this a million times.  It’s The Right Way to write it: you clean up after yourself once you’re done.

It’s also kind of bulky.  The body of the procedure is six lines long. One for setup, one for doing what we actually care about, and four for cleanup.  The ARC version would be a lot shorter:

procedure myProc;
var
   myObj: TMyObject;
begin
   myObj := TMyObject.Create;
   myObj.DoWhatever;
end;

But while that may look like less code, there’s actually a lot more going on.  That try block is still there; it’s just been added implicitly by the compiler.  And instead of calling Free directly, there’s a ref count release call (which slows down everything because it has to use an atomic decrement) before the destructor.  And the stupid part is, none of it is necessary.  Even the most trivial static analysis could show that this reference to myObj is never going to be passed to anything else, so there’s no need to keep a count of it, at all.  A smart compiler should be able to automatically insert a direct call to Free without having to keep a reference count here.

The problem is, the only static analysis that can really do any good is “the most trivial static analysis,” which can only be useful in the most trivial of cases.  For example, if the TMyObject constructor passed Self to some routine somewhere, it may or may not have held a reference to it in an object, and then all bets are off.  Have you heard of the Halting Problem, how it’s impossible to come up with a general-case analysis routine that will determine if a given program will halt or not?  Well, it gets worse.  There’s this thing called Rice’s theorem which states that it is impossible to come up with any general-purpose analysis routine that can answer any useful, non-trivial question about computer programs.  Which means that there’s no way to write a static analyzer that can detect which variables can be freed safely in a situation like this.

Rice’s theorem says that it’s impossible to create a smart compiler that can take care of something like that, for the same basic reasons that it’s impossible to solve the Halting Problem.  Simply put, you can’t automate “smart,” which is why attempts to do so invariably end up with stupid solutions that require dumbing things way down.  (See: every implementation of garbage collection ever devised, including ARC.)

This is where semantic attributes come in.  With a bit of compiler support, it would be simple to accomplish the same thing, like this:

procedure myProc;
var
   [Local] myObj: TMyObject;
begin
   myObj := TMyObject.Create;
   myObj.DoWhatever;
end;

The semantics here should be intuitively obvious.  A [Local] attribute would declare to the compiler that this variable is scoped to the current declaration.  Instead of trying to automate “smart”, which is impossible, we keep the smart decisions where they belong–with the intelligent person behind the keyboard–and instead get the compiler to automate something computers are very good at automating: mindless drudgery.  The [Local] attribute would transform the procedure above into this:

procedure myProc;
var
   myObj: TMyObject;
begin
   myObj := nil;
   try
      myObj := TMyObject.Create;
      myObj.DoWhatever;
   finally
      myObj.free;
   end;
end;

(If static analysis could prove that myObj can never be accessed before it gets assigned (as is the case here) it could even optimize out the initialization.)

There would be two other use cases for the [Local] attribute.  Putting it inside a field list for a class or record would mean that when the class or record gets cleaned up, the object in question should be destroyed.  (And I’ve got more to say about object and record cleanup a bit later on.  There’s something that could be done to improve those greatly)  And for functions, it would be a little bit different.

You’ve used factory functions before, right?  You call a function that produces a new object, instead of using an existing one that you pass in.  Well, [Local] would be useful for them, for safety:

function MyFactory(value, value2: integer): [Local] TMyObject;
begin
   result := FactoryRegistryClass[integer].Create;
   result.Foo(value2);
end;

The problem with a function like this is, what happens if the call to Foo can raise an exception?  You don’t want a try/finally block here, because if everything goes right you’re supposed to return the value, but you do have to wrap it in a “try/except free; raise; end;” block to get it right.  Marking the result as [Local] would have the compiler take care of that for you.

Tagging something that’s not an object or a typed pointer with [Local] should result in a compiler warning.

This scheme would result in much cleaner code, it would keep the responsibility for making the smart decisions where it belongs–with the programmer, not the language designer–it doesn’t introduce any overhead beyond that which exists for correct code, and it doesn’t break any already-existing code.  Unlike ARC, where there’s really no upside, with this system there’s no downside.

…except for one thing.  Moving object fields to an object’s or record’s automatic destruction list would mean moving them to FinalizeRecord, and for the life of me, I cannot understand why FinalizeRecord exists at all.

For those of you who’ve never ended up in there while debugging with Use Debug DCUs turned on, or who just plain don’t know what it is, FinalizeRecord (and its partner in crime, FinalizeArray) is a routine that uses RTTI to automatically clean up managed managed-type variables (strings, interfaces, variants) that are owned by something else.  When you destroy an object that contains a string, you don’t have to clean up the string in the destructor, because FinalizeRecord takes care of it for you.  The compiler creates a table that says where all of the managed references are and what types they are, and FinalizeRecord and FinalizeArray go over this table, and for each item in it they look up the correct cleanup routine and call it with a pointer to the right location in memory.  It’s been around since the beginning of Delphi, and it’s been a big mistake since the beginning.  It’s absolutely the wrong way to clean up this sort of situation.

I can sort of imagine how it came about.  They’d just invented the Delphi RTTI system to take care of form streaming, which was quite the impressive accomplishment, and remains one today.  (Remember that Java came out at the same time as Delphi, and still doesn’t have a decent form designer!)  And someone on another part of the team said, “well, look at this.  We’ve got managed types, that the compiler cleans up when they go out of scope without the programmer needing to write any code for it, and that works fine inside of procedures… but how are we going to handle it for objects that contain managed types as fields?”

And then someone, tragically, said “well, look at this cool RTTI thing we invented!  We can write up one cleanup routine that inspects the object’s structure and cleans up managed types that way.”  And so that’s what they did.

The only problem with that theory is that it requires the original Delphi team to have been smart enough to create Delphi, and yet stupid enough to make a mistake like this.  It requires that Anders Hejlsberg, who created Delphi, who famously stated “static when possible, dynamic when necessary,” to have gotten his own principle completely wrong in this case.  (But, since FinalizeRecord has been in Delphi since the beginning, any explanation of why it happened must necessarily involve this in some way.  It really does baffle me.)

The thing is, RTTI is Run-Time type information.  It’s for looking things up that are not and cannot be known at compile time, such as (de)serializing a form file.  But this is not such a situation.  Remember that the compiler has to build the finalization table in the first place.  This means that it already knows exactly what has to be finalized, and seeing as how Delphi is not a dynamically-typed language, that knowledge it has isn’t going to become invalid at any point after compilation.  So why doesn’t it simply do what compilers do best, and generate code to do the finalization?

Even though they’re written in hand-optimized assembly, FinalizeRecord and FinalizeArray are painfully slow.  They’ll frequently turn up in profiling reports if you’re working with code that churns through a lot of objects or records.  And adding objects to them will just make things slower.  (Which the ARC system has done, BTW.  It just keeps inflicting layer upon layer of extra slowness upon us!)  What I’d like to see is those routines deleted from System.pas, and the VMT slot that points to the finalization table replaced with a pointer to an automatically-generated cleanup routine.  No more unnecessary RTTI to take care of a task that could easily be covered under “static when possible”.

And before anyone objects, saying that debugging would suffer under all of the new auto-generated code… have you ever tried to debug into FinalizeArray?  It’s a huge mess of ASM.  What you would see in the debugger if you tried to trace into this would be a much smaller and more straightforward mess of ASM, so if anything, debugging would be improved!

If Embarcadero would do these two things–throw out ARC in favor of a [Local] attribute (don’t worry about compatibility if it’s only for one version, guys; Delphi developers are very forgiving and we’d let you take a mulligan on an obvious mistake; it wouldn’t be the first time we’ve done it) and throw out FinalizeArray and FinalizeRecord in favor of faster, compiler-generated static code, we’d see some huge improvements in both readability and performance for the Delphi language.  That would give us a NextGen I could actually rant about positively!

25 Comments

  1. Eric says:

    Executable size must be the reason, it’s the same kind of trade off as dynamics vs virtual. Except of course this could have been improved later on when RAM became less of an issue.

    Also, while static compilation of those would be nice, even nicer would be if you could provide your own implementation for initialization, finalization and assignment (assignment is RTTI based as well). This would allow writing efficient & compact Variant, TValue or UTF8String ersatz… Or heap objects.

    As for [local], color me not convinced, it’s adding bulk to the code and has non-obvious side-effects in a non-trivial routine, which would eventually discourage its use from regular code. Also in a trivial case, the reference counting isn’t going to be the bottleneck, memory allocation & release will be for a heap-based object.

    So a more general and safe approach would be stack-based objects, which we “have” with the old object keyword, and could have with soured up records provided you add custom initialization/finalization routines. The difference is semantics would naturally guarantee that references don’t leak out with either a promotion or a clone (custom assignment)

    • A. Bouchez says:

      I fully agree with your comments, Eric.

      In particular, this [local] is just weird and not-pascalish at all.

    • Mason Wheeler says:

      Executable size must be the reason, it’s the same kind of trade off as dynamics vs virtual.

      I’m not convinced that’s true. The use case for dynamics is spelled out explicitly in the documentation: it’s for when a class is likely to have many descendants, but only few of them will override the method in question. In other words, it’s to avoid a size increase that’s completely out of proportion to the code change. But cleanup tables vs. pre-generated cleanup methods are both exactly in proportion to the code that generates them, and I doubt that the ASM code to call a series of procedures would be appreciably larger in size than the table describing which procedures to call and the data that the procedures need to be called on.

      As for [local], color me not convinced, it’s adding bulk to the code and has non-obvious side-effects in a non-trivial routine, which would eventually discourage its use from regular code.

      I must admit, I’m completely confused by that statement, since the entire point of writing that was to show that it could *remove* bulk from your code and produce obvious, predictable, deterministic effects. Would you mind explaining what you mean?

      So a more general and safe approach would be stack-based objects, which we “have” with the old object keyword, and could have with soured up records provided you add custom initialization/finalization routines.

      There’s a reason why no serious OO language before or since C++ has had that, and why the Borland folks abandoned that early approach so quickly: because it’s a horrible idea. The very essence of object-oriented programming is Liskov substitution, (inheritance and polymorphism as language-level features,) and having objects-as-value-types wrecks polymorphism.

  2. Joseph says:

    Mason, you just have to accept it: ARC works. Languages use it. No one complains. Life goes on. Like calling a loop for a certain number of iterations “FOR” rather than “SQUIDOO”, de facto standards exist. Like the GOTO Wars, it’s over. Even Niklaus Wirth added garbage collection to what he describes as his crowning achievement, Oberon (and is disappointed people built on his “flawed” earlier languages instead). You don’t have to “add [Local]” or anything else. Instead, you *remain focused on writing the code necessary to solve your problem*. It makes the code safer as well – even if you’re a genius, that doesn’t guarantee the person who wrote the library/component you’re using is.

    Former Delphi programmer Mattias Fagerlund put it thus:

    >Garbage collection is strange, we had it with the VIC64, but it wasn’t any good so people gave up on it. And all of a
    >sudden, now it’s good again? I can collect my own garbage, I can manage my own memory de-allocations, no need for the
    >system to do that for me… except, I can’t trust other tool developers to do it correctly. If I use someone else’s
    >software in my solution, a memory leak in their software might crash my software! Also, time spent making sure my
    >structures are de-allocated is time not spent doing something more productive.
    >Don’t worry about it, it just works. Spend your time doing other wonderful stuff instead!

    Embrace the future, trust Marco, write code, have fun.

    • A. Bouchez says:

      It is not a matter of trust.
      I trust my family and friends, not any other company I have business with.

      I may trust Marco as human being (sounds like a fine people).
      But I won’t trust his company roadmap and expectations.

      When something is wrong TO ME (I never meant wrong for everyone) and will cost me a lot of money (e.g. compilation speed, removal of AnsiString, ARC model), I won’t follow their path.

      I do not need a guru, nor fun.

      I need a good tool to work with.

  3. Nicholas Ring says:

    I wonder how soon we will start seeing things like this:

    procedure myProc;
    begin
    TMyObject.Create.DoWhatever;
    end;

    I guess that ARC will handle this ok but it makes my stomach turn (or was it something I ate) 🙂

  4. A. Bouchez says:

    Nice shot.

    But WHY is there always this un-proven axiom: “The shorter code to type, the better”.
    I do not understand this wrong advice. Unless you want to obfuscate the code.

    Here are some thoughts from my little experiment:

    1. Delphi interface types already allow to write such short code

    First of all, it’s worth saying that you can use interfaces to write perfectly safe and short code, without ARC, just with the current version of Delphi.
    See for instance how is implemented http://www.bilsen.com/gdiplus/index.shtml

    Implicit try..finally free blocks are already written by the compiler.

    2. Code length does not make it less readable

    We all know that we read much more code than we write.
    So first priority is to let the code be as readable as possible.
    IMHO try…finally free does not pollute the readibility of the code. On the contrary, from my own taste, it shows exact scope of a class instance life time.

    3. Managing life time objects can help you write better code

    When you manage the object life time, you are not tempted to re-create the same again and again. You factorize, optimize your code. And it results… in faster code.

    I’ve seen plenty of Java and C# programmer how does not give a clue about memory allocation and internal process: they write code how works, without asking themselves what will be excecuted in fact.
    Having to manage memory life time by hand does not bother me.
    On the contrary, it made me a better programmer.
    And it helped me write faster code, which I know what it is doing when executed.

    Of course, it is not automatic.
    You can just write try…finally blocks and weep, without searching for code refactoring.
    I’ve seen Delphi code written like that, especially from programmers with a GC model background.

    3. Source code size has nothing to do with execution speed.

    The more the compiler magic or the runtime will execute under the hood, the slower it will be.
    You are not able to tune the execution process.

    It is not for nothing that the most speed-effective Java projects just use POJO and statically allocated instances, to by-pass the GC.

    • Mason Wheeler says:

      First of all, it’s worth saying that you can use interfaces to write perfectly safe and short code, without ARC, just with the current version of Delphi.

      Interfaces use ARC; they just didn’t used to call it that until they imposed the same model Interfaces use all on the rest of the objects.

      2. Code length does not make it less readable

      We all know that we read much more code than we write.
      So first priority is to let the code be as readable as possible.
      IMHO try…finally free does not pollute the readibility of the code. On the contrary, from my own taste, it shows exact scope of a class instance life time.

      This is actually something that’s been studied quite a bit, and what they find, consistently, is that methods that are longer than one screen of text, even if it’s just a little bit longer, are significantly harder to read and understand than methods that can be read in their entirety without having to scroll. What this means is that if you can eliminate 4 lines of code by sticking them into an attribute, you now have an extra 4 lines worth of space to work with before things get harder to read.

      3. Managing life time objects can help you write better code

      Not sure why you phrased this point as if you’re disagreeing with me, because you’re completely right about that. I think you must have misunderstood what I wrote somehow. The whole point of the [Local] proposal is that the coder still has full control over lifetime management, and the responsibility that goes with it. It’s still explicit, and it’s still obvious (once you understand the semantics, like any other language element.) It just becomes less noisy.

      3. Source code size has nothing to do with execution speed.

      The more the compiler magic or the runtime will execute under the hood, the slower it will be.
      You are not able to tune the execution process.

      That’s only true when you are not able to tune the execution process. Implicit in the idea that “it doesn’t break any already-existing code” is that the old model would still be there as-is, and can still be used. And can you think of a way that having the compiler write the same try/finally block you would have written for you would make the execution speed slower than it would have been if you wrote it yourself? Because I really can’t, so I’m sorry, but I don’t understand what you’re claiming here. 🙁

  5. Larry Hengen says:

    Thanks for such a great article on ARC. Unfortunately, I don’t know how much influence we have on the decision makers at EMBT. Criticism seems to fall upon deaf ears. Implementing ARC in Delphi does not have my support. I don’t like attempts at automating software development, removing decision making from the developer. It never works well….otherwise at this point we would have machines that can program themselves.

    Introducing interfaces in an existing code base can cause nothing but grief, even with FastMM’s assistance in isolating issues. It’s a question of implicit vs. explicit. Not knowing exactly what is being done, when and why, can make debugging very difficult. I would far rather have the choice of using refcounted interfaces for lifetime management, or traditional class based manual memory management.

    Giving your customers a choice is never bad for business, except when it’s the choice of “my way or the highway”.

  6. NCook says:

    You are missing something important. ARC (or any other garbage collection) is not just about trying to automate what we can do already. I agree that is a waste of time. It’s about trying to enable things that we cannot do without it. For example: Operator overloading on objects.

    • Jacques Vignoles says:

      ARC is not a mandatory move for operator overloading on objects. It can be achieved exactly in the same way with regular TObject, and for objects that are not implicitly declared, or overriden, the call to the free method can be automatically generated. This is even more efficient without ARC, as there is in this case strictly no need for reference counting.

      As the matter of fact, automatic cleanup of factory-created non-reference counted undeclared objects is already existing with enumerators. In a For … In … loop an instance of your enumerator will be created without any local or global declaration, and once the loop is over, its free method will be called.

      I suspect Operator overloading for classes was not provided because Embarcadero don’t trust enough their users to clean-up all objects that would have been factory-created.

  7. Maël Hörz says:

    I think Rice’s theorem does not help to argue here. I have been thinking for a while to get what was actually missing: You first would need a counter-example that shows where the compiler would fail to free an object properly.
    Rice’s theorem just states that when you have a positive and a negative example (one where the compiler succeeds to free an object and one where it would fail to do so) you cannot write a program that decides when this will occur in general for all cases.

    But who says the compiler cannot do it properly? At the very least it could add free calls at the very end of the program.
    So what you need to do is define your requirements for properly handling free much more precisely.

    In a single threaded program the program execution is linear, and you know the exact sequence of operation at compile time. Therefore adding frees at the appropriate places should work analogously as you can insert reference counting, too.
    That means reference counting is only necessary if you have multi-threading, because the order of execution is unpredictable if you do not introduce some synchronization logic.

    This is what is done in reference counting anyways, so all you would do is duplicate it and I’ll gladly let the compiler to error prone work for me.

    Now what is left is optimization of special cases. I see no reason why control-flow analysis would not be able to optimize those special cases like you could optimize them manually, too.

    What really seems missing is that sometimes code will not be subject to multithreading issues but the compiler does not know. For such cases annotations might be useful.

    • Maël Hörz says:

      Rereading it seemed a bit unclear, I rephrased the end of the first paragraph:

      Assuming a question is about behavior of programs (in our case “Does the compiler always free objects properly?”),
      Rice’s theorem just states that when you have a positive and a negative example answer to your question (one where the compiler succeeds to free an object and one where it would fail to do so) you cannot write a program that decides when the answer to your question is yes or no in general (does the compiler always free objects properly?).

      In other words, you need examples where the compiler properly frees objects and examples where it would fail before you can apply Rice’s theorem.

      • Mason Wheeler says:

        That’s easy. But first we need a meaningful definition of a memory leak. Saying “everything gets freed at the end of the program” is meaningless, because 1) under that definition, memory leaks don’t exist at all, and 2) memory leaks are known to be a problem because it’s possible for leaks to cause an otherwise-correct program to crash due to address space exhaustion.

        When you look at the way freeing objects works, it’s easy to derive a meaningful definition: a memory leak is any condition in which the program continues to hold memory allocated after the point where there is no longer any need for the data it was allocated to hold. It can be caused by forgetting to free a reference, but also by freeing a reference too late, such as putting something in a container that doesn’t get freed until the end of the program, even though it’s not needed for very long after it gets instantiated. (This is a common cause of memory leaks in garbage-collected programs.)

        With that definition in mind:

        Case where the compiler could automatically detect exactly when to free the object through static analysis: The example given above (parameterless constructor, object is not passed to any external routines), given the following constraints: the parameterless constructor is the do-nothing constructor from TObject.Create, and the internal method(s) called on the object do not pass any references to the object to any external routines, and the object does not contain any fields which are records or objects.

        Case where the compiler could *not* automatically detect exactly when to free the object through static analysis: The constructor takes a parameter: a method pointer that takes a parameter that the object being constructed could be passed to. As that method pointer could have been chosen dynamically at runtime, by user behavior or even through RTTI, the compiler has no way of knowing whether or not to expect that something else will be holding a reference to the object by the time it comes out of the constructor.

        • Maël Hörz says:

          Your counter-example shows a code-smell, though. If the compiler cannot know because of RTTI, the programmer cannot safely assume it either. And if he knows for sure because of the design of say plugins that use the RTTI, then it should be clearly stated in some documentation or comment. This is definitely non-obvious to people who haven’t written the code.

          Having this information as annotation is useful for the compiler and while reading the code.

          You are right of course that given this ideal definition and your example Rice’s theorem applies.
          However the assumption is that humans are capable of freeing objects when they should be freed.
          Since their only advantage comes from additional knowledge, the compiler can be as good as the human (Church-Turing thesis), too.
          And if Rice’s theorem applies to algorithms it applies to humans, too.

          In other words, it’s very likely humans do not free objects at the earliest possible moment, and given a more relaxed definition of when is the right time to free an object would make sense.

          That’s why I say Rice’s thorem is not very useful, if you assume a human is capable of solving the problem, you just did not state the problem in a fair way for a computer.

          In short: annotations would be more useful, since they give not only the compiler but also the code maintainer the implicit knowledge that was used for proper handling memory.

    • Maël Hörz says:

      I should further add that in the single threaded case, you would essentially reference count at compile time, since you know the order of execution you could “statically execute” the program and add the Free cakk only where the reference count would become 0.

      • Mason Wheeler says:

        That only works as long as the order of execution of the entire program can be deterministically determined at compile time. As soon as you put a user interface on there that allows a user to choose between more than one valid option, that concept goes out the window.

        • Maël Hörz says:

          That’s right, but if it is truly single-threaded you can compute all possible execution paths statically at compile time because there are no unknown events from the outside that can block or otherwise reorder execution.
          You wait for an event then execute a procedure based on the events parameters (and program state), wait for an event execute a procedure…
          This can be modeled as if you were running the program with different inputs. All possible execution paths = possible sequences. That’s also how verification works.

          In theory you can do that for the multi-threaded case, too, although it gets more complex.

          My main point is, static analysis is possible. Depending on language design it is more or less difficult, functional languages being significantly easier to handle in the multi-threaded case because they use immutable data.

          And if static analysis is hard for a computer, it is not easier for a human. The only thing that helps a human is additional knowledge. I prefer to make this knowledge explicit through judicious use of annotations, and let the compiler figure out the rest.

  8. MBIKE Rowery says:

    Does your blog have a contact page? I’m having trouble locating it but, I’d like to shoot you an email.

    I’ve got some ideas for your blog you might be interested in hearing. Either way, great website and I look forward to seeing it improve over time.

  9. Nico says:

    Hello Mason, nice story. I believe the “local” variables idea is great, maybe because it also ocurred to me some years ago 🙂

    I wrote some messages in an FPC list about it. The objections were pretty much the same: leaking references would be a chaos. I don’t think so. Have the local variables (I would use “local” ub the place of “var”, instead of adding an annotation) strictly limited to local scope and forbid any outgoing assignment. Compiler error.

    That doesn’t need much compiler analysis. Just a more strict control of where parameters/function results/global references can be assigned from/to.

    Anyway, it would be a lot of work for someone. Don’t hold your breath 🙂

Leave a Reply