NextGen: Delphi’s “Visual Fred” moment?

There’s been a lot of talk recently about immutable strings in the iOS compiler, which, as Marco pointed out, is not actually implemented (yet) but is just something that’s under consideration.  And it appears that he’s uncomfortable with the removal of AnsiStrings.  That’s a good thing, IMO.  I think they should be put back, particularly for UTF-8 strings.

I have to disagree with one of the points he made, before I get into the main focus of my post.  He said that “Optimizing for English language is not a primary goal, as Delphi is used a lot around the world.”  That’s a mistake in modern coding, because even if your application’s user interface isn’t in English, it’s very likely that it will rely on one or more of the following features:

  • HTML/HTTP, either for display purposes or for communicating with websites and web services
  • XML
  • SQL
  • Some scripting language

Each of these features make heavy use of English keywords, and numbers and symbols in the ASCII range.  Optimizing for them with a native UTF-8 string type will help most programs no matter what language the users speak or read.

But that’s not really what I wanted to talk about.  There are more serious problems in the NextGen compiler.  Does anyone remember Visual Fred?  When Microsoft came out with Visual Basic.NET, it was incompatible with classic VB code in many ways, which made it essentially impossible to port existing VB projects forward.  There are good arguments out there to support the claim that VB.NET is a better, more powerful language than classic VB, but that doesn’t change the fact that it’s a very different, incompatible language, and that calling it by the same name as classic VB is deceptive.  So people decided they should call it by a different name, and the term “Visual Fred” was coined.

I’m worried that Delphi is heading in the same direction.  Let’s have a look at a couple of the other things that were added in the Nextgen compiler.  (These are things that actually were added, not just things that are being discussed as possibilities.)  They’re huge messes, and for some reason, people don’t seem to be talking about them.

STRING INDEXING CONFUSION: Nextgen strings are 0-based, rather than 1-based.  Now, that would be bad enough if that was all.  (Note: I’m not claiming that 1-based strings are inherently better.  There are good arguments for both styles.  But changing the way it works when you have literally billions of existing lines of code that use the old system is just insane.)  But it’s worse than that.  “To make the transition easier,” string indexing is controlled by a compiler directive.  Now strings might be zero-based or they might be one-based, depending on what the compiler directive is set to where the routine that uses the string is compiled.  Now we don’t have a language with 0-based strings, and we don’t have a language with 1-based strings; we have both in the same codebase!

There’s an old joke in programming circles: “There are only two difficult problems in computer science: cache invalidation, naming things, and off-by-one errors.”  Expect to see off-by-one errors skyrocket under the new string indexing regime.

AUTOMATED REFERENCE COUNTING: Under the new compiler, TObject is now a managed type, like String or IInterface.  Very much like IInterface, in fact: Every TObject now has a reference count (it’s been moved off of TInterfacedObject) and special add/release methods, and the compiler has a special-case hack in it where calling TObject.Free sets the object reference to nil instead.

Remember the additional memory pressure created a few releases back, when TMonitor was introduced and every TObject got an extra 4 bytes of overhead whether it needed it or not?  Well, here we go again.  Even more bizarre is the fact that it was added to the iOS compiler.  Adding that overhead to the 64-bit compiler might not be such a bad thing.  But when your target platform is a mobile device with very limited memory?  You’ve gotta be kidding me!

But it gets worse.  Again following the IInterface model, every reference to a TObject, as a variable, a parameter, a member of an object or record, or an array element will incur compiler-managed AddRef/Release calls.  Again, on devices with limited hardware resources.  And since mobile devices are going multicore now, that means that we get all the additional overhead of synchronization on an atomic value change FOR EVERY SINGLE ONE.

…well, almost.  You can get around that for parameters by marking them const, like with strings and interfaces.  This makes sense with strings, but a lot less sense with objects.  It’s not uncommon to pass a string somewhere and then change the local copy in some way, but you just don’t do that with objects.  At least I don’t.  When was the last time that you wrote a routine that takes an object as input, then assigns something new to the parameter variable, without the parameter being specifically marked as var?  So what this means is that in order to avoid crippling overhead on your function calls, you have to go back and spam up a huge percentage of them with const declarations.

But it gets worse still.  There’s a second problem with reference counting: cycles.  If A holds a reference to B, and B holds a reference to A, neither one of them will fall to 0 and nothing will get cleaned up.  To deal with this problem, you can mark variables and object members with a [Weak] attribute, which marks it as a weak reference.  That’s great, right?  No cycles, no reference counting overhead… right? Not quite.  Look closely at the bottom of the documentation, and you’ll see:

Note: When an instance has its memory released, all active [weak] references are set to nil. Just as a strong (normal) reference, a [weak] variable can only be nil or reference a valid instance. This is the main reason why a weak reference should be assigned to a strong reference before being tested for nil and dereferenced. Assigning to a strong reference does not allow the instance to be released prematurely.

So all of those weak references have to get tracked, and then cleaned up when you’re done with them.  That’s actually far more expensive than the atomic updates, and is likely to be a massive performance killer if you use weak references.  And where do you think that all gets stored?  If you said “on the object itself, with even more overhead per instance, like the new reference count,” you’d be… wrong, actually.  It’s not *quite* as bad as that.  But all that bookkeeping has to go somewhere, so instead, there’s a new global multimap structure declared deep in the bowels of system.pas that manages weak references.  Look up TInstHashMap for the gory details.  It’s not pretty, and because it’s a global, access has to be protected by a TMonitor.  Yay for additional atomic synchronization and memory management overhead!

And did you notice how it says that with both a strong or a weak reference, an object variable can only be nil or reference a valid object?  OK, everyone who’s ever called something like myStringList.AddObject(myString, TObject(myIntegerValue)); please raise your hand.  Yeah, lots of hands going up out there.  It’s a pretty common idiom in Delphi code.  Well congratulations, your code no longer works!  You see, this isn’t really about reference counting.  If it was, no one would care about cleaning up those weak references.  No, what it’s really about is memory safety.  Delphi programmers have soundly rejected the concept of putting garbage collection in the language for years, consistently, every time the question comes up.  But now someone at Embarcadero is trying to sneak it in through the back door.

And this isn’t just me ranting.  Look at that same documentation page.  Just below the section on weak references, we see that there’s a new [Unsafe] attribute for special-case references that really need to not be reference-counted for some reason.

It should be only used outside the System unit in very rare situations. It is considered dangerous and its use is not recommended as no code associated with reference counting is generated.

In other words, “we realize that “unsafe” code is absolutely necessary to actually create a working programming language, as evidenced by the fact that we have to employ it in the System unit in order to make basic functionality work, but it’s only for us to use; you really should stick to high-level code only.”

What a bunch of patronizing crap!  If we wanted guaranteed memory safety and enforced garbage collection, we would be using Prism Oxygene.  Not being burdened with all that baggage is one of Delphi’s strongest selling points, and someone at Embarcadero doesn’t get that, and thinks it should all just be thrown out the window.  Well, they’re wrong.  We already have a .NET Delphi implementation, and one for Java, in Oxygene.  There’s no good reason not to keep native Delphi native and close to the metal.

And it keeps getting worse!  You see, there’s no [Weak] generic constraint, which means it’s impossible to create a TObjectList<[Weak]T>.  And you know where generic lists are popping up?  Everywhere!  Since you only want to assign object references to strongly typed values so that ARC can work, that means the old TList is out.  So let’s look at one specific scenario: GUI design.

In previous versions of Delphi, TComponent had a member declared as FComponents: TList.  It held the component’s child components.  Now that’s a generic list, containing strong references to the child components.  What that means (among other things) is that you can no longer delete a UI element by calling Free on it.  (Free is “magic” now, remember?  All it does is set your reference to nil.  But with the object’s owner holding a strong reference to it, that won’t bring the count down to 0.)

Whatever shall we do?  Never fear! It’s even more incompatible changes to the rescue!  There’s a new method on TObject, called DisposeOf, that forces the destructor to execute immediately in exactly the same way that Free used to do.  So instead of using what we already have that we know works, someone decided to change the semantics of Free, and then introduce something else that does exactly what Free is supposed to do, except that you still have to wait for the garbage collector to actually clean up the object.  And they called it Dispose.  Hmm… where have I heard that before?

The entire ARC model is one big mess from beginning to end.  It will make Delphi code slow–it will essentially make it impossible to write fast Delphi code, in fact–and it will break tons of existing, working code.  I literally see zero benefit to any of this.  Managed code advocates’ FUD notwithstanding, manual memory management is not a difficult issue.  Heck, in Delphi it’s a solved problem with FastMM and FullDebugMode.  Even if someone were to make the (somewhat dubious) claim that this “modernizing” of the language will attract new developers, the saying “a bird in the hand beats two in the bush” exists for a reason.

There’s one very interesting thing that was introduced among all of this ARC crap, that could be used to produce real improvements in memory management and help people write cleaner code going forward, without breaking any existing code, had the compiler team followed a slightly different path.  But this article is too long already, so I’ll explain how to do this sort of stuff right tomorrow.

20 Comments

  1. Darian Miller says:

    Well said!

    Zero Based Strings mixed in with One Based Strings is a disaster begging to happen and it’s simply a wrong-headed thing to do. Whatever the RTL is using, that’s what should be required. If they are breaking compatibility, then break it already and don’t muck it up for old guys, new guys, and everyone in between. Just think of the vast number of new NextGen developers they want to buy in and they see zero based strings but then see all the references to 1 based… They won’t have Ansi/UTF8 either but docs are littered in references causing all sorts of confusion. There should be two copies of Delphi since they are now incompatible at their core. Keep the venerable Delphi XE4 Win32/Win64/MacOSX and introduce Delphi XEFred for IOS/Android. XEFred 1.0 should have its own help files, its own RTL, and the two should diverge. OR, kill XEFred and make XE5 for Win32/Win64/MacOSX/iOS/Android with compatible code!

  2. Gustavo says:

    Objective-C uses only ARC for a while now, so there is no point in thinking it´s a mess or will slow down things that much. The point about strings is valid though.

    • Mason Wheeler says:

      I can’t really say anything about that without in-depth technical knowledge of how Apple’s implementation of ARC works. What I do know, though, is that this implementation is a mess that will slow things down.

      • “What I do know, though, is that this implementation is a mess that will slow things down.”

        You don’t ‘know’ that. If Delphi for iOS apps are relatively slow and memory heavy, it will be because of FMX, not ARC.

      • Joseph says:

        It’s not just Objective-C. Cocoa, PHP, Python, Perl – lots of things use reference counting. I know Python uses reference counting with cycle detection. If Python is interpreted and doesn’t “feel the heat” of RC, I don’t know why a compiled language like Delphi would.

        • Mason Wheeler says:

          OK, what Python are you using? Because the one I’m familiar with has a well-deserved infamy for being one of the slowest languages out there. It gets its “cool” reputation for being easy to write, not for being in any way performant.

  3. GrandmasterB says:

    Excellent article, Mason. Sounds like the return of Inprise. What a shame.

  4. Edwin Yip says:

    I couldn’t agree more on that, either change to 0-based string or adding a compiler switch to allow 0-based or 1-based strings, will lead to great confusions and code mess!

    EMB, please don’t make that mistake!

  5. Andreas says:

    Well, your points are true (obviously), but why on earth would any professional actually care about what exactly they’re planning?

    The product is so obviously designed by clowns, maintained by interns and marketed by trolls.
    Why do people still care? Isn’t it time to move on and e.g. create someting that’s actually useful elsewhere? (e.g. creating some 3rd-Party-libraries for Free Pascal)…

  6. Ciprian Khlud says:

    I think we come from different world, but reference counted objects I think that it is a great idea. Zero/one based strings it is awful as people work with different models and I agree that zero-based is weird. But reference counted is another ballpark all-together:
    – ref counted code is more likely to have leaks (yuck)
    – it has a small performance implication (many increments/decrements are either local or optimized away)
    + but more unlikely to have: use-after-free usages
    + you write less code

    I think as Delphi IDE is concerned, using ref-counting all over the place, will make Delphi more stable (even it will be more leaky, if cycles appear). Will Delphi applications benefit of reference counting? If you take the advantage of less crashes, this is a huge one, but as for coding, I think that the main advantage, is that Delphi was never that fast (if you use the class library) so the code that uses TList or TStringList, etc. having a small overhead will not count as much, but many of Delphi codes are legacy, and as any legacy, it is easier to get zero-ref code automatically freed for you, and no crashing.

  7. xenon says:

    “Remember the additional memory pressure created a few releases back, when TMonitor was introduced and every TObject got an extra 4 bytes of overhead whether it needed it or not? Well, here we go again. Even more bizarre is the fact that it was added to the iOS compiler. Adding that overhead to the 64-bit compiler might not be such a bad thing. But when your target platform is a mobile device with very limited memory? You’ve gotta be kidding me!”

    Most mobile devices have at least 1GB of memory!
    4 more bytes have no meaning.
    More important is the speed of coding, and a smaller number of potential memory leaks.
    I can not wait for the ARC in Delphi for Windows.

    • Mason Wheeler says:

      4 bytes is meaningless, sure. 4 bytes * 1 million objects? Not so much. How about 10 million? It just keeps getting worse.

      • Michael Baytalsky says:

        Well, that’s still only 4 or 10Mb, which is like a couple of big images on your web page :).
        We are talking about GBs of Ram.

        Reference counting is another issue and I cannot agree more on the fact that the way it is done is wrong in this case.
        It can be done the right way, as well as zero-based strings, though.

  8. himselfv says:

    I can live with zero-based strings, they had to happen. Everything else in the language is zero-based. And having a zero-based-string-switch is not a bad idea either, it lets you have code which just works, no matter which version compiles it. Overall, a good solution.

    But reference-counted objects… One of the greatest problems of Delphi has long been that you can’t have data types that are both fast and powerful. TObjects are slow and clumsy as hell and unsuitable to implementing fast types such as string. Records can’t do inheritance and private/protected stuff. (And slow anyway). Interfaces require you typing the same stuff twice.

    What Delphi really needs is a fast, bare-bones object type with full language support but without any “smarter” stuff, akin to the one C++ has. Instead Delphi moves in the direction of more specialized type which leaves less options as to what you can do with it.

  9. Bunny says:

    The point is to provide a path for both – the existing applications and the new ones to be built. For me both are different ways to go. FMX and new compilers and VCL + traditional compilers + fixing. Maybe in a Delphi classic version in maintenance mode for 49 USD a year. The second is a Delphi for the young generation. If EMB think they can force people to migrate existing applications and migrate and migrate … no chance. Who is going to pay for the transition?

    I don’t care about 1 or 0 based strings in general and I don’t care if strings are immutable or not, whatever this means in practice. I have an idea what it means for the developer but not for the compiler and the runtime environment. If EMB want to convince people then an ‘ARC’ for the desktop does make sense. What will you need.
    a) a runtime that allows easy threading
    b) do away with free
    c) a command line + web + GUI + mobile way to go
    d) Maybe if the demand arises a C++ Builder for ‘mobile’ devices and ’embedded’ programming.

    iOS support is nice but it’s the right thing on the wrong OS imo. It’s not false to have the opportunity, don’t get me wrong.

    You still have Java originally designed with embedded devices in mind and believe me there are tons of Java developers out there that develop on ’embedded’ devices using the Java language because it’s a lot more comfortable than Assembler.

    I think in the end comfort counts. What EMB has to do – put the cards on the table. In practice you have delay any new projects if you don’t know what is on the way.

    • Bunny says:

      b) do away with Free
      … and Destroy of course. This is so negative … destructive.

      I think there are people who want to have the opportunity to have access to assembler and develop at a low level and there are other who still prefer the easy way. Maybe little competition between these two ways to go helps …

    • Joseph says:

      Excellent comment, especially “comfort counts”. All this talk about numbers and milliseconds… what’s forgotten is that developers are humans who want to *enjoy* the language they’re working with. With Guido Van Rossum, he routinely declines ideas that will speed up performance a bit but add to the complexity, hurt the readability, or make the language harder to learn.

      I just don’t know if they have the resources to make a new product and maintain an old one.

  10. Joseph says:

    Mr. Wheeler… all… ok, most… of the problems stem from two… ok, three… issues.

    1. Programmers who haven’t updated their code (or in some cases their coding style!) since the days of Turbo Pascal for DOS.

    These people want their fossil code to run unchanged, but they want to be able to use Delphi XE5 on it as well… what for I don’t know since they have no intention of actually *porting* the thing. I guess they just want to compile it again and/or keep adding to their code with ancient language constructs. We wouldn’t need some of these backward compatibility compromises if these people would just move on. But then Delphi’s one competitive advantage is users who don’t like to move on. That’s the conundrum.

    2. Programmers who either haven’t programmed in anything outside of Delphi or who otherwise bought into the marketing propaganda for Delphi.

    The reality is dinky interpreted languages run Reddit and YouTube and sifted through the data generated in the hunt for the Higgs Boson. Yet Delphi users are obsessed with “performance” to the point where they believe *compiled* languages aren’t even fast enough! Unless you’re working on something for which state-of-the-art hardware is inadequate, such as real-time photorealistic 3D animation, any compiled language is fast enough, since non-compiled languages are too.

    Somewhere when the rest of the world gained all sorts of modern features that Delphi didn’t, we started acting like Golum and telling Precious that we still had “performance” and “inline assembler” and our beloved “Begin/End” statements made the code faster or something. Or more likely someone at Borland told us that everything else was slower and we believed it and this made us fearful of any feature in any other language “polluting” our language and making it “slower”. That way we didn’t complain as other products passed us by and our cool factor wore off.

    It’s time to let it go. This, that, or the other thing *isn’t going to make your code slow to a crawl*. This is the kind of stuff that makes people laugh at us. I know because sometimes I show it to other people. 🙂 A Ruby user or Matlab engineer would slit your throat for a quarter of Delphi’s speed. Go tell them that your compiled code is too slow and watch their eyes roll. This is a whole new realm beyond premature optimization. We’re complaining about the performance of software we haven’t written for a compiler that doesn’t exist and starting the bit-fiddling already. 🙁 If we can’t predict bottlenecks in real code and need profiling, isn’t it that much harder in non-existent code? Amdahl’s Law?

    3. Embarcadero is not very forthcoming except for Marco.

    It’s not like David I. or Allen Bauer or anyone really engages with people in other than a marketing move. Marco’s finally begun to talk about some of the underlying reasons for these changes, at least guardedly. Let me put it more bluntly. EMB has somewhere between 15-21 developers for the language, the IDE, two frameworks, etc. Moving to a new compiler is a huge task, and EMB is not a company that can afford to not release things until they’re done (and we’ve plenty of evidence of that). The more they have to port the longer things will take. EMB will ship a release no matter what (Nick is alleged to have said something a bit more blunt right before being let go). Marco’s trying to keep the work … and the bugs… manageable. We’re not helping if we take a “You can’t get rid of THAT!” approach to everything they want to leave behind. He won’t say it, but they can’t port the whole dang 18 year legacy base. That’ll mean porting over the bugs too. He’s trying to lay the groundwork for a new, lean, mean, MODERN compiler and we have to meet him half way on this. Some of us need to be willing to stop writing text files with “File Of Char” and others need to put down the millisecond stopwatches. Me, I have to give up on the dreams of type inferencing for now. 🙂

    “String Indexing Confusion” – How do you embrace the de facto standard of today and also deal with all the legacy code base users and NOT use a compiler switch? On the positive (?) front, if we’re honest we can admit one can probably count the number of companies with more than a single employee or two who still use Delphi on one hand, so we don’t have to worry about contaminated code bases. Most Delphi users are one-man shops and will be using one or the other.

    ARC – Really simple: everyone’s using the concept. It’s simple, powerful, and it works. Languages that are so much more popular than Delphi it makes me cry use ARC and somehow the world hasn’t ended. That’s it.

    If the world would end with using ARC, we’d have noticed it already. If every other computer language ground to a halt, apparently no one cared and have continued using it. Just say no to memory leaks and wasted time managing memory. If Delphi users cared half as much about development speed today as they cared about software speed, we’d return to our roots and possibly carve out a profitable niche. And you didn’t address cycle detection, which deals with circular reference issues. When I asked Marco about it, he said there was a function to invoke cycle detection. Not sure if that means it’s not automatic, but…

    > manual memory management is not a difficult issue.

    Neither is churning butter. But we don’t do that anymore either. THIS IS WHY PEOPLE LAUGH AT US. We’re like the Amish of programmers. 🙁 It’s 2013. Why do crap we don’t have to worry about? What most Delphi users don’t understand is that this kind of thinking went away in the rest of the world except for Delphi Island, which is like the island from Lost but with less polar bears and more dentists. No one’s trying to save a millisecond or FreeAndNil things because they think they can do it better than a computer (ironic choice of occupation not withstanding). No one is going to adopt a language in 2013 if they have a choice that has all the antiquated “features” of Delphi. And they’re not. But EMB needs to SELL Delphi. And we’re shrinking. We’re no longer in schools. We’re not the tool of choice of Amazon or Google or Facebook or Reddit or any other enterprise that’s of any importance. Kids don’t know what Delphi is. We’re LOSING. We NEED ARC and zero-based strings and anything else EMB can squeeze in there that people actually want (we old users don’t count; we’re obviously not making them the big bucks). If non-Delphi people don’t turn to a new Delphi for cross-platform (because they’re certainly not using it for single platform) we’re dead. It’ll be a long, slow death, but we’re dead. If we love the language we have to suck it up as say “Sir, Yes Sir” to Marco unless we have a better idea. “Do nothing” isn’t a better idea.

    And that’s my insane, rambling, offensive, callous, mostly unfiltered, anti-intellectual, probably unintentionally aggressive thoughts on this issue. 🙂

    To sum up: Nobody uses Delphi and we the few of us who are left aren’t making EMB rich. The remaining developers, many green and underpaid, are tasked with producing a new compiler from scratch. They also are tasked to actually selling it to someone who didn’t start with Turbo Pascal. They’re trying to best they can to incorporate modern ideas and minimize the effort, without bringing up the fact that you can be sure EMB will ship whether the work is done or not (like they did with FireMonkey). Like when BASIC lost the GOTO Wars, we have a choice. We can admit that memory management and zero-based strings won and adjust, or we can be stubborn and shout “Never!”. Shutting ourselves off from the rest of the world will hurt us more than it could possibly hurt them. BASIC adopted structured programming and went on to Visual Basic, VBScript, RealBASIC, PureBASIC, Gambas and many fine products. We can do the same and still be Delphi while accepting modern advances in computer science. Niklaus Wirth put garbage collection is his final computer language, Oberon. We need to embrace advances too and adapt, or else we’ll perish. I don’t want to see the good of Delphi die along with the outdated. I support Marco.

  11. Sebastian Jänicke says:

    I agree an many of your points, but:
    “It’s not uncommon to pass a string somewhere and then change the local copy in some way, but you just don’t do that with objects.”
    It may be not uncommon, but it’s bad code. A parameter should never be changed unless it’s a var parameter. Otherwise you always have to look for changes to every parameter before using it a few lines after begin.

    That’s why I make everything const unless it is intended to be a var parameter.

  12. Michael Baytalsky says:

    Very well said. Totally agree on all points.