There’s been a lot of talk recently about immutable strings in the iOS compiler, which, as Marco pointed out, is not actually implemented (yet) but is just something that’s under consideration. And it appears that he’s uncomfortable with the removal of AnsiStrings. That’s a good thing, IMO. I think they should be put back, particularly for UTF-8 strings.
I have to disagree with one of the points he made, before I get into the main focus of my post. He said that “Optimizing for English language is not a primary goal, as Delphi is used a lot around the world.” That’s a mistake in modern coding, because even if your application’s user interface isn’t in English, it’s very likely that it will rely on one or more of the following features:
- HTML/HTTP, either for display purposes or for communicating with websites and web services
- Some scripting language
Each of these features make heavy use of English keywords, and numbers and symbols in the ASCII range. Optimizing for them with a native UTF-8 string type will help most programs no matter what language the users speak or read.
But that’s not really what I wanted to talk about. There are more serious problems in the NextGen compiler. Does anyone remember Visual Fred? When Microsoft came out with Visual Basic.NET, it was incompatible with classic VB code in many ways, which made it essentially impossible to port existing VB projects forward. There are good arguments out there to support the claim that VB.NET is a better, more powerful language than classic VB, but that doesn’t change the fact that it’s a very different, incompatible language, and that calling it by the same name as classic VB is deceptive. So people decided they should call it by a different name, and the term “Visual Fred” was coined.
I’m worried that Delphi is heading in the same direction. Let’s have a look at a couple of the other things that were added in the Nextgen compiler. (These are things that actually were added, not just things that are being discussed as possibilities.) They’re huge messes, and for some reason, people don’t seem to be talking about them.
STRING INDEXING CONFUSION: Nextgen strings are 0-based, rather than 1-based. Now, that would be bad enough if that was all. (Note: I’m not claiming that 1-based strings are inherently better. There are good arguments for both styles. But changing the way it works when you have literally billions of existing lines of code that use the old system is just insane.) But it’s worse than that. “To make the transition easier,” string indexing is controlled by a compiler directive. Now strings might be zero-based or they might be one-based, depending on what the compiler directive is set to where the routine that uses the string is compiled. Now we don’t have a language with 0-based strings, and we don’t have a language with 1-based strings; we have both in the same codebase!
There’s an old joke in programming circles: “There are only two difficult problems in computer science: cache invalidation, naming things, and off-by-one errors.” Expect to see off-by-one errors skyrocket under the new string indexing regime.
AUTOMATED REFERENCE COUNTING: Under the new compiler, TObject is now a managed type, like String or IInterface. Very much like IInterface, in fact: Every TObject now has a reference count (it’s been moved off of TInterfacedObject) and special add/release methods, and the compiler has a special-case hack in it where calling TObject.Free sets the object reference to nil instead.
Remember the additional memory pressure created a few releases back, when TMonitor was introduced and every TObject got an extra 4 bytes of overhead whether it needed it or not? Well, here we go again. Even more bizarre is the fact that it was added to the iOS compiler. Adding that overhead to the 64-bit compiler might not be such a bad thing. But when your target platform is a mobile device with very limited memory? You’ve gotta be kidding me!
But it gets worse. Again following the IInterface model, every reference to a TObject, as a variable, a parameter, a member of an object or record, or an array element will incur compiler-managed AddRef/Release calls. Again, on devices with limited hardware resources. And since mobile devices are going multicore now, that means that we get all the additional overhead of synchronization on an atomic value change FOR EVERY SINGLE ONE.
…well, almost. You can get around that for parameters by marking them const, like with strings and interfaces. This makes sense with strings, but a lot less sense with objects. It’s not uncommon to pass a string somewhere and then change the local copy in some way, but you just don’t do that with objects. At least I don’t. When was the last time that you wrote a routine that takes an object as input, then assigns something new to the parameter variable, without the parameter being specifically marked as var? So what this means is that in order to avoid crippling overhead on your function calls, you have to go back and spam up a huge percentage of them with const declarations.
But it gets worse still. There’s a second problem with reference counting: cycles. If A holds a reference to B, and B holds a reference to A, neither one of them will fall to 0 and nothing will get cleaned up. To deal with this problem, you can mark variables and object members with a [Weak] attribute, which marks it as a weak reference. That’s great, right? No cycles, no reference counting overhead… right? Not quite. Look closely at the bottom of the documentation, and you’ll see:
Note: When an instance has its memory released, all active [weak] references are set to nil. Just as a strong (normal) reference, a [weak] variable can only be nil or reference a valid instance. This is the main reason why a weak reference should be assigned to a strong reference before being tested for nil and dereferenced. Assigning to a strong reference does not allow the instance to be released prematurely.
So all of those weak references have to get tracked, and then cleaned up when you’re done with them. That’s actually far more expensive than the atomic updates, and is likely to be a massive performance killer if you use weak references. And where do you think that all gets stored? If you said “on the object itself, with even more overhead per instance, like the new reference count,” you’d be… wrong, actually. It’s not *quite* as bad as that. But all that bookkeeping has to go somewhere, so instead, there’s a new global multimap structure declared deep in the bowels of system.pas that manages weak references. Look up TInstHashMap for the gory details. It’s not pretty, and because it’s a global, access has to be protected by a TMonitor. Yay for additional atomic synchronization and memory management overhead!
And did you notice how it says that with both a strong or a weak reference, an object variable can only be nil or reference a valid object? OK, everyone who’s ever called something like myStringList.AddObject(myString, TObject(myIntegerValue)); please raise your hand. Yeah, lots of hands going up out there. It’s a pretty common idiom in Delphi code. Well congratulations, your code no longer works! You see, this isn’t really about reference counting. If it was, no one would care about cleaning up those weak references. No, what it’s really about is memory safety. Delphi programmers have soundly rejected the concept of putting garbage collection in the language for years, consistently, every time the question comes up. But now someone at Embarcadero is trying to sneak it in through the back door.
And this isn’t just me ranting. Look at that same documentation page. Just below the section on weak references, we see that there’s a new [Unsafe] attribute for special-case references that really need to not be reference-counted for some reason.
It should be only used outside the System unit in very rare situations. It is considered dangerous and its use is not recommended as no code associated with reference counting is generated.
In other words, “we realize that “unsafe” code is absolutely necessary to actually create a working programming language, as evidenced by the fact that we have to employ it in the System unit in order to make basic functionality work, but it’s only for us to use; you really should stick to high-level code only.”
What a bunch of patronizing crap! If we wanted guaranteed memory safety and enforced garbage collection, we would be using
Prism Oxygene. Not being burdened with all that baggage is one of Delphi’s strongest selling points, and someone at Embarcadero doesn’t get that, and thinks it should all just be thrown out the window. Well, they’re wrong. We already have a .NET Delphi implementation, and one for Java, in Oxygene. There’s no good reason not to keep native Delphi native and close to the metal.
And it keeps getting worse! You see, there’s no [Weak] generic constraint, which means it’s impossible to create a TObjectList<[Weak]T>. And you know where generic lists are popping up? Everywhere! Since you only want to assign object references to strongly typed values so that ARC can work, that means the old TList is out. So let’s look at one specific scenario: GUI design.
In previous versions of Delphi, TComponent had a member declared as FComponents: TList. It held the component’s child components. Now that’s a generic list, containing strong references to the child components. What that means (among other things) is that you can no longer delete a UI element by calling Free on it. (Free is “magic” now, remember? All it does is set your reference to nil. But with the object’s owner holding a strong reference to it, that won’t bring the count down to 0.)
Whatever shall we do? Never fear! It’s even more incompatible changes to the rescue! There’s a new method on TObject, called DisposeOf, that forces the destructor to execute immediately in exactly the same way that Free used to do. So instead of using what we already have that we know works, someone decided to change the semantics of Free, and then introduce something else that does exactly what Free is supposed to do, except that you still have to wait for the garbage collector to actually clean up the object. And they called it Dispose. Hmm… where have I heard that before?
The entire ARC model is one big mess from beginning to end. It will make Delphi code slow–it will essentially make it impossible to write fast Delphi code, in fact–and it will break tons of existing, working code. I literally see zero benefit to any of this. Managed code advocates’ FUD notwithstanding, manual memory management is not a difficult issue. Heck, in Delphi it’s a solved problem with FastMM and FullDebugMode. Even if someone were to make the (somewhat dubious) claim that this “modernizing” of the language will attract new developers, the saying “a bird in the hand beats two in the bush” exists for a reason.
There’s one very interesting thing that was introduced among all of this ARC crap, that could be used to produce real improvements in memory management and help people write cleaner code going forward, without breaking any existing code, had the compiler team followed a slightly different path. But this article is too long already, so I’ll explain how to do this sort of stuff right tomorrow.