Little things Delphi gets right

For those who haven’t seen it yet, due to popular demand, the StackOverflow people created a new site called programmers.stackexchange.com, a site for the more subjective questions that StackOverflow isn’t really designed for.  Someone recently set up a poll: What’s your favorite programming language. You can probably guess what my answer was.

Delphi

It’s the standard, imperative, object-oriented paradigm that most coders are familiar with, but it gets all the little details right that the C family always gets wrong. Plus the more recent versions have started to add support for mixing in functional programming concepts, without running into the ugly abstraction inversions that typically come with functional languages.

I’ve gotten some interesting replies to that.  One of them asked, “to justify the C-bashing (so they don’t clobber you!), and to satisfy my curiosity, what are the little details Delphi/Pascal gets right?”

Well, I’d answer that in a comment, but there isn’t enough space in 500 characters to do it justice, so I figured I’d write up my answer here.  Let’s start with the elephant in the room.

SECURITY!!!

There is no good reason why anyone should ever under any circumstances write a networked program with security requirements–such as an operating system or a Web browser–in C, C++ or Objective-C.  We’ve known that for 22 years now, ever since Robert Tappan Morris released the Internet Worm that used buffer overflow exploits in Unix to crash about 10% of the Internet, causing tens of millions of dollars in damage.  (It was a very small Internet at the time.  Today, the cost would be measured in hundreds of billions, and it probably could be, if not more, if you add up all the damages lost due to buffer overflow and other C-specific vulnerabilities exploited over the last couple decades.)

This should have been a major wake-up call.  You cannot write programs that require reliable security in a language that was designed with no thought to it! At least not with any degree of consistency.  Just look at how many patches are still being issued today for Windows, Linux, OSX, iOS, and various Internet services and Web browsers, all due to buffer overflows.  And it’s getting worse.  We’ve got a lot of people talking about computerizing the power grid, which makes a lot of sense in theory, but it’s likely to open up a few hundred million new vulnerabilities to terrorists both domestic and foreign. (Who all’s seen Live Free Or Die Hard?  Anyone want to live in that world, except without Bruce Willis and that “I’m a Mac” kid to conveniently step in and save the day?)

It’s the same old issue over and over and over again.  It keeps showing up in C programs because it simply can’t be solved in C without breaking backwards compatibility.  In any sane world, the C language would have been dead by 1989–the Morris Worm having shown it to be utterly unsuitable for its intended purpose: building operating systems–and all our computers would be safer for it.  And it’s not like there are no alternatives.  By the time the worm hit, Apple had been building the most advanced operating system of its day for several years already, in Pascal.  Continuing to write Internet-facing OSes, browsers and apps in C (or C++ or Objective-C) ought to be treated as an act of criminal negligence.

In Delphi, on the other hand, we have a real string type, the best-thought-out string type I’ve seen in any language.  It’s reference-counted and grows and resizes automatically as needed, which frees the coder from dealing with string-size and string-memory hassles.  It’s bounds-checked, and it does not live on the stack, so there’s no way to use a Delphi string buffer overflow for a stack-smashing exploit.  Likewise, for non-string data, Delphi has a real array type which is also bounds-checked.  In fact, we’ve got two real array types, both bounds-checked, and the one whose size is not fixed (the more dangerous kind) also does not live on the stack.  We’ve also got a string-format routine that doesn’t use varargs, and string-output code that doesn’t assume its input is a format string in the first place, which means that Delphi programs are immune to format string exploits.

In the interest of fairness, of course, I should point out that as with any language that supports pointers, it’s still possible to write unsafe code in Delphi.  But you have to really go out of your way to do it; it’s not the default state of the language, the way it is in C!  (And removing pointers from a language causes more problems than it fixes.  This is why Java and C# both have explicit “unsafe” features: because they’re necessary to actually accomplish important tasks.)

Syntax and semantics

OK, enough ranting about security.  Let’s move on to other things.  A lot of C’s security problems have been fixed by managed languages like C# and Java.  But what they can’t fix is the syntax, at least not without abandoning a great deal of their C roots, which is a very important marketing device.  For example, C has no boolean type.  Some of its descendants do, but they haven’t managed to escape the ramifications of this blunder: because C has no boolean type, anything can be treated as a boolean.

I heard a really horrible pun a while ago:  The cake may be a lie, but Pi is always True.  (Because it’s a nonzero number, and in C, anything can be treated as a boolean.)  When everything is a boolean, including an assignment operation, it’s not safe to write “if x = 5”, no matter how intuitive that looks.  And when everything is a boolean, including a number, if you try to and two expressions, the compiler doesn’t know if you mean a logical or a bitwise and, so you need two versions of all the boolean operators.  And if you get them wrong, it might work, or you might end up with some very hard-to-debug issues.

Java, JavaScript and C# still have double versions of all the operators.  And I know the “if x = 5” bug still exists in JavaScript.  (I’ve heard the compiler doesn’t accept it in C#.  Not sure about Java.)

In Delphi, a boolean is a boolean, and a number (or a string or an object) is not.  This means that we have one and, one or, one xor and one not, and the compiler knows what to do with them by looking at the operands.  And if you try to do something nonsensical like anding a boolean and a number, the compiler throws an error instead of silently accepting it and generating nonsensical code.

And while we’re on the subject of operators, can anyone tell me what * or & do in C?  “Well, it depends on whether–” Oh, I see.  Fundamental syntactic elements whose meanings are context sensitive.  How lovely.  In Delphi, “a * b” means multiplication and nothing else.  And for addressing and dereferencing, we’ve got the @ and ^ symbols, which actually make sense mnemonically.

Then we’ve got object-oriented programming.  C++’s object model is a big mess.  There’s no base object class, which means that there’s no way to pass an object of arbitrary type between one routine and another.  This also means that there’s no standardized way to take an object and get RTTI information about it.  And objects are value types, declared by default on the stack (or inline in the larger object that contains them), and passed around by value by default.  This wreaks havoc on inheritance and polymorphism.

For example, what’s the output of this program?  And if you change the signature of Foo to pass the object by reference, does it alter the output of the program?

[code lang="C++"]
#include 

class Parent
{
public:
        int a;
        int b;
        int c;

        Parent(int ia, int ib, int ic) {
                a = ia; b = ib; c = ic;
        };

        virtual void doSomething(void) {
                std::cout << "Parent doSomething" << std::endl;
        }
};

class Child : public Parent {
public:
        int d;
        int e;

        Child(int id, int ie) : Parent(1,2,3) {
                d = id; e = ie;
        };
        virtual void doSomething(void) {
                std::cout << "Child doSomething : D = " << d << std::endl;
        }
};

void foo(Parent a) {
        a.doSomething();
}

int main(void)
{
        Child c(4, 5);
        foo(c);
        return 0;
}
[/code]

If you have to stop and reason about it for any length of time, that’s a warning sign.  I asked our local C++ expert at work what this would do, and he thought about it for a few minutes, came to a logical-sounding conclusion about what he thought it should have to do.  Then he wrote up the code above to test it, just to be sure.  (This is a guy who’s been writing C++ professionally since I was in high school, and he’s really good at it.  But even with all that experience, he’s not experienced enough to be confident what it would do without testing it.)

In Delphi, there aren’t a bunch of arcane passing and copying rules to keep track of when you’re working with polymorphism.  Objects are always reference types, so when you pass an object to a function, it passes that object, and when you call a virtual method on an object, it calls that object’s class’s version.  Always.

External code

There are at least technical reasons that can explain a lot of the above issues.  But here’s something really bizarre that I’ve never heard a good explanation for.  I had to debug a C DLL that one of my Delphi programs calls into a while ago to fix some problems in it.  I opened it up in Visual Studio and tried to get it to build.  Everything was syntactically correct, and it compiled just fine… and then failed at the link phase, because it couldn’t find the .lib file for a second DLL that this DLL requires.

.lib file?  What in the name of Turing is a .lib file?!?  Turns out it’s a file that describes… something… about the other DLL so that the linker can hook… something… up properly.  I really have no clue what it is or why it’s necessary.  I’ve never had to deal with them before.  In Delphi, if you need to link to a DLL, you put a function header in the code, declare that it’s an external reference, and provide the name of the DLL it’s found in, and that’s it.

The C code had all the same information: there was a .h file containing the function headers and… ohhhh, wait. Now I see what’s going on!  That’s actually the exact same .h file that’s in the DLL I’m linking against. So it doesn’t specify that these functions are external references, or where they’re found.  That information, even though it’s an important part of your source code, needs to be provided in a .lib file, a binary blob generated by the compiler of the external DLL, that’s not human-editable and not version-control friendly.  (And if your external DLL wasn’t written in a C family language, you’re in for even more fun trying to generate a .lib file.)

What it all boils down to is that, for some bizarre reason, Delphi does a more hassle-free job of linking to C DLLs than C does.  That doesn’t even make any sense, but it’s true.  Delphi can talk to external C code better than C can!

I could go on, (I haven’t even mentioned templates yet!) but this post is getting long enough already.  But I think the facts speak for themselves.  By paying attention to little details like the things I’ve mentioned here and thinking through the ramifications, the Delphi language designers have managed to build a language that is easier to work with and easier to write correct code in.  I hope this clarifies what I meant when I posted that.

13 Comments

  1. Mark C says:

    Thank you, I learned something from this post!

  2. Daniel says:

    These discussion are fun, so sorry I can’t resist 🙂

    a) You are right about C. Its easy to make mistakes, but still operating systems are written in this horribly unsafe language. But to me it seems like Delphi went only half the way. Java got it right by making non-pointer based code the default, C# luckily copied this. Delphi feels more like C++, where you use references usually but still have too easy access to unsafe function (e.g. BlockRead, or Stream.Read).
    b) “In Delphi, “a * b” means multiplication and nothing else.”. Thats not true. Since Delphi 2006 you can overload operators for records, thereby you share the same power and responsibilities as C++ and C# coders.

    But anyway you can make similar arguments about the little things Java/C# got right and Delphi got wrong:
    – interface/implementation: this was the pre-Pascal 5.5 way of separating public from private, but it collides with “never” object-oriented paradigms. For example lets say you have a Unit with public class (defined in interface). You want this class to have a private field with the type of a private class (defined in implementation). You can’t do this! You can work-around with inner-class, but then you a) need to have both classes in the same file and b) consistently run into cases where the compiler and the IDE can’t settle on a syntax (Delphi 2009).
    – single pass compiler: records can have functions, which is awesome. no lets have two records: PolarCoordinates and EuclidienCoordinates and write ConvertToPolar/ConvertToEuclidien in each. You can’t, because the compiler won’t know the second record while the first one is being defined.
    – variable declarations at the top prevent you from applying the “variables should have the smallest scope possible” rule. You might argue thats a good thing, but I always found smaller scopes to be easier to handle.
    – no garbage collector/not strictly managed: speaking of security, this seems to me equally important. If I access a free’d object in Delphi I usually get an exception. Notice “usually”, as this is not guaranteed. In Java/.Net (assuming I am not using unsafe/interop code), this is guaranteed to throw an exception, making my program more predictable. Also, it helps avoid memory leaks.
    – no platform independence, not open-source. this is actually a huge thing for me. want to run your app on Linux? a Mac? on Android? You are at one company’s mercy. If they feel like canceling a product, you end up with all that useless, uncompilable source-code (hi Kylix and Delphi.Net users).

    I could also go on and on. You have made some valid points (except for the a*b part), but when I read your article it feels like you are cherry-picking arguments. Each language has its pros and cons and Delphi certainly got a few things right. Still I stopped using it after Delphi 2009 (was a user since Pascal 5.0) simply because other languages seem so much more suitable for the tasks that I face today.

    Still, these discussions are fun 🙂 By the way, my favorite language is Scala as it fixes most of the things that Java got wrong.

  3. Mason Wheeler says:

    Daniel: With the a * b thing, even if you’re using operator overloads, the * operator still invokes the &op_Multiply function. (Which, theoretically at least, should be performing some kind of multiplication on your record.) What I meant was that * always means semantic multiplication, whereas in C it can also mean a declaration of a pointer variable–which has nothing at all to do with multiplication–depending on the context.

    Private classes: I’ve never seen the IDE having trouble with them. What sort of trouble are you talking about? Also, why do you talk about putting multiple classes in the same file as if it were a bad thing? Forcing one-class-per-file on you is IMO one of Java’s ugliest misfeatures.

    Mutually-dependent records: This can be resolved with a record helper. (Though I agree, it would be nicer if there was a way to forward-declare records without breaking the compiler. But a workaround does exist.)

    GC: The point’s kind of moot, since you can’t implement an operating system in pure manged, “safe” code in the first place. Like I pointed out in the article, those unsafe operations at the language level are in C# and Java for a reason. And besides, a garbage collector tends to cause worse memory leaks than it fixes, especially the generational kind. I prefer Delphi with FastMM’s built-in leak tracking, which makes memory leaks easy to find and fix when you do get them.

    Variables at the top: I do argue that that’s a good thing. It makes the code easier to read because you have a nice, predictable place where the variables go and you know exactly where to look to find them. If you’ve got nested loops, for example, you don’t have to scan through multiple levels of scope to figure out what type your variable is. And if your code is large enough that you have to scroll way up to find the variable section, that’s a good indication that it can probably be refactored into smaller pieces, so having the variables at the top of the routine helps incentivize good design too. 🙂

    Open source/platform independence: There’s always FPC, which is mostly compatible. They could do a lot better job of some things, but the alternative does exist. And why does canceling a product make the code useless and uncompilable? There are still people using Kylix today, and if Emb. canceled Delphi tomorrow, I still have the IDE and the compiler and they still work. We just wouldn’t get any new features.

  4. Guy Gordon says:

    Great article. I love working in Delphi, but there are also some things if gets wrong or leaves out.

    For example, declaring functions in Java you specify if exceptions it throws or catches. I sure wish Delphi allowed or enforced that. Instead, I have to keep track of it myself — a job the compiler should take off my hands.

    Daniel: “cherry-picking arguments” Of *course* you pick your best arguments. What else would you do? Pick them at random?

    “Each language has its pros and cons” This is faulty reasoning. Let’s try applying it to Ghandi and Hitler: “Each man had his faults and virtues.” Gee, kinda makes it sound like there wasn’t much difference between them.

    You make a valid point that you need to pick the language most appropriate to the problem at hand. But that doesn’t mean that something like VBA is *ever* that language.

  5. Michael Justin says:

    You wrote that “Java and C# both have explicit “unsafe” features” – for C# I agree, but which one is it one the Java side?

  6. mleyen says:

    In these rare cases when i need Records, i always handle them with a Pointer. This has some benefits. Mainly performance:

    type
    PPolarCoordinates = ^TPolarCoordinates;
    PEuclidienCoordinates = ^TPolarCoordinates;

    TPolarCoordinates = record
    function ConvertToEuclidien: PEuclidienCoordinates;
    end;
    TEuclidienCoordinates = record
    function ConvertToPolar: PPolarCoordinates;
    end;

  7. Barry Kelly says:

    In C, .lib is to .dll as in Delphi, .dcp is to .bpl.

  8. Starting with Delphi 2006, you can work around this forward record declaration by providing either Implicit or Explicit conversion operators from/to TPolarCoordinates on TEuclidienCoordinates; see this posting how:
    http://stackoverflow.com/questions/770809/mutually-dependent-records-in-delphi

    –jeroen

  9. Ken Knopfli says:

    I have several books like “Writing Solid Code” and “No Bugs”. So much of the code-related problems they address simply do not arise in Delphi.

  10. Steve M says:

    Brilliant article and replies. Thanks. I don’t believe religiously in Delphi and Object Pascal, they are just generally superior to C-based languages. Starting by deliberately mis-naming everything was a bad “day one” for ugly old C. Pascal and Object Pascal are beautiful.

  11. Shawn Stamps says:

    I made similar arguments decades ago, back in the TP days (about the same time as the Morris worm). It is very true. Language choice does affect people’s cognitive focus and thinking processes. C has always been “terse”, and it has spawned many programmers where that “terseness” has bitten them (and, indeed, the rest of us) on the arse. The general arguments used in reply are “well, you can write crap in any language”; while true, when the language /encourages/ you to write crap versus forcing you to “do it better”, I think it should be touted as an important feature of said language. Inevitably, the arguments end up with the pro-C people touting coding anarchy as if it was the holy grail of languages. “I don’t need a language to protect me from myself”. Yeah, OK, given the resultant history, I think we needed a language to protect the rest of us from “you”.

    That said, Delphi/Object Pascal does have its faults, but its solid foundation as 1) a teaching language, and 2) a language that HELPS you design better algorithms and write better code definitely makes it a much superior language.

    It is interesting that Anders Hejlsberg took some of these concepts and implemented them in C#; I guess that kinda proves the point.

  12. Xepol says:

    actually a * b DOES infact do 2 different things depending on whether you are working with sets or numbers, this extends to a number of other set operators that are also math operators.

    And a .lib file is set of .obj files in a single file, so it is like a .BPL file, but without call information attached (C also requires .H files, which are absent from the .lib file) .lib files pre-date .dll files and C can work with .dll files without .lib files.