The internet: A ship lost at C

34 years ago, Tony Hoare gave a very interesting, and somewhat prophetic, Turing Award lecture.  In case anyone’s not familiar with him, he’s one of the great pioneers of computer science.  Among other things, he invented Quicksort, and the CASE statement.

He talks about his work on ALGOL compilers, and one of the things he said has been on my mind recently:

In that design I adopted certain basic principles that I believe to be as valid today as they were back then.  The first principle was security: The principle that every syntactically incorrect program should be rejected by the compiler and that every syntactically correct program should give a result or an error message that was predictable and comprehensible in terms of the source language program itself. Thus no core dumps should ever be necessary. It was logically impossible for any source language program to cause the computer to run wild, either at compile time or at run time.

A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

He said this in 1980, about work he had done in 1960, so this was known and understood to be a good idea as far back as 50 years ago.  But, of course, the programming community in general didn’t listen.  Several  years later, the consequences came back to bite us, in the form of the Morris Worm.

It rampaged throughout the fledgling Internet of the day, crashing an estimated 10% of all systems connected to the Internet by exploiting buffer overruns in a handful of specific UNIX programs.  The author, a sleazebag by the name of Robert Morris, later claimed that he just wanted to find a way to “count the number of computers on the Internet,” but his actions put the lie to that statement.  He encrypted the Worm and used rootkit techniques to hide it from the file system, and he released it from a different university than the one he attended, in an attempt to cover his tracks.  A person who believes they aren’t doing anything wrong doesn’t try to hide what they’re doing, and comments in his original source code make it clear that his intention was anything but benign; he was trying to build what we call a botnet today.

And all because of buffer exploits in a handful of C programs.  That really should have put us all on notice.  Hoare was right, and in any sane world, the C language would have been dead by 1990.

But it didn’t happen, and those who refuse to learn from history are doomed to repeat it, so once the Internet started becoming a big thing among the general public, in the early 2000s, we ended up with a bunch of new worms that snuck into Windows systems through buffer exploits.  Anyone remember Slammer?  Blaster?  Code Red?

Hoare was right.  We should have listened.

Why has all this been on my mind lately?  If you’ve been paying attention at all to Internet news, you already know:  Heartbleed.  History has repeated itself yet again.  A buffer exploit in a widely-used C library, affecting anywhere from 10% (there’s that figure again) to 66% of all servers on the Internet, depending on which estimate you listen to, with a horrendous vulnerability described by security expert Bruce Schneier as “on a scale of 1 to 10, this is an 11.”

Hoare was right.  Will we listen this time?  Probably not.  So it’ll happen again.

Building any software with an inherent security requirement–browsers and other network-facing software, OSes, and so on–in C, C++, Objective-C or any other member of the C family ought to be regarded by now as an act of criminal negligence, by the programming community in general if not by the law.

Remember when Steve Jobs died, the minor kerfuffle over Richard Stallman’s quoting Chicago Mayor Harold Washington WRT the corrupt former Mayor Daley: “I’m not glad he’s dead, but I’m glad he’s gone”?  It was just a few days later that Dennis Ritchie, the creator of C, died, and that’s exactly how I felt about it.  As one of my coworkers at WideOrbit put it, Ritchie’s true legacy to the world is the buffer overflow.

Some people say “the language is not the problem; the problem is bad programmers using it incorrectly.”  But that’s not true.  The guy responsible for the Heartbleed vulnerability isn’t a bad programmer.  Have a look at the commit where the bug was introduced.  See if you can find the problem without being told where it is.

It’s clear that this is not the work of an incompetent n00b; this is someone who really knows his way around the language.  But he made a mistake, and it’s a subtle enough one that most people, even knowing beforehand that that changeset contains a severe bug and knowing what class of bug it is (a buffer exploit vulnerability) won’t be able to find it.

To err is human, but when a mistake can have consequences of this magnitude, it’s also unforgivable.  That puts the language, which forgives such mistakes all to easily, fundamentally at odds with reality vis a vis human nature.  That means something’s gotta give, and it’s not going to be reality… and this is what happens when it does.

Others claim that the C language, with its unsafe low-level direct memory access, is necessary on many low-resource systems where counting bytes and cycles still matter.  To this I say, wake up and smell the 21st century.  In an age of Raspberry Pis and Arduinos, capable of running a full-fledged Linux operating system with enough hardware capacity left over to play HD movies, all for well under $100, the existence of such limited systems is laughably obsolete.

No, there’s really no excuse left for C, other than inertia.  (Which, if you recall, is ultimately what ran the Titanic into that iceberg.)  Can we let it and its entire misbegotten family die already?  It’s 25 years overdue for its own funeral.

PS. It’s not like viable alternatives don’t exist.  It’s worth noting that at the time the Morris Worm first brought the Internet to its knees by exploiting buffer overruns in C, Apple was already five years into its Macintosh project that ended up defining the entire future of operating system design… in Pascal.

27 Comments

  1. David Heffernan says:

    You appear to be claiming that buffer overruns are not possible in Pascal.

    • Mason Wheeler says:

      Oh, I know they’re not impossible, but with built-in bounds checking, a stronger type system, and no string-buffer-on-the-stack nonsense, (at least since Delphi came out and replaced classic Pascal Strings with Ansistring,) they’re far more rare, and far less harmful when they do happen. When’s the last time you heard of a buffer overrun exploit in a Delphi program?

      • David Heffernan says:

        You are making no sense at all

        • sandmann says:

          But it’s true – Delphi is much safer than any C/C++ Compiler.
          When a Delphi application crashes you will get a exeption messages and in the majority of cases the user can continue his work.
          Delphi strings are always safe, no buffer overun is possible.

          C/C++ apps tends to hang completely or die without any message or hint.
          C programmers are using potentially dangerous functions like memcpy, strcpy, strlen,… all the time.
          Buffer overruns and stack corruption are likely to happen.

          • David Heffernan says:

            No sandmann that is just not true. These bugs occur at module boundaries. No Delphi strings there. Delphi is no safer than C. And why talk about C++? Not relevant here.

  2. Jan Doggen says:

    And the default compiler settings for the Delphi IDE are still ‘overflow checks off’ and ‘range checks off’. I wonder how that’s with other languages.

  3. david berneda says:

    On the other hand, TList.Get does range checking and cannot be disabled (unless you access the now deprecated List pointer array directly).

    I like to wrap runtime checkings with $A+ assertions, independent of overflow or range.

    Ideally a smarter compile-time algorithm could warn on potential overrun

  4. Moritz Beutel says:

    Nice try. But Heartbleed is a mere peg for you to get an uninformed, misdirected and hateful rant going.

    That a convinced supporter of Pascal likes to discredit the C language and its derivatives is not surprising news to me. But it is astonishing how uninformed you seem to be. Or did you know about the Safe CRT? Did you know that VC++ emits a warning whenever you call one of the unsafe C RTL functions? Does Delphi warn you if you call StrCopy() or Move()?

    Of course, people who program C like they did in 1980 are, to stick with your dramatic cadence, a danger to humankind. But people who follow more recent C guidelines (e.g. Microsoft’s SDL) and who keep compiler warnings enabled are much more likely to produce secure code than programmers using obscure languages which allow direct memory access and disable array bounds checking by default. And lumping in C++, ObjC and doesn’t make your argument any more coherent. And what is it with the “other members of the C family”? Surely the reader is supposed to think of C#, which is considered blotted by mere syntactic relationship to C?

    This is especially funny because, while you carefully aim your anger at the C language, your arguments are not about C but about unsafe low-level programming. As you say, “it’s not like viable alternatives don’t exist”: they are called .NET CLR and JVM, and their design as type-safe environments makes it very hard for this type of error to occur. In the spirit of your argument, maybe OpenSSL should have been writen in C#?

    • Mason Wheeler says:

      > Did you know that VC++ emits a warning whenever you call one of the unsafe C RTL functions?

      I did, actually. Did you know that OpenSSL isn’t built in Visual Studio?

      Also, do you know how the “safe” C RTL functions its warnings tries to get you to call work? There’s an extra parameter you’re supposed to pass, indicating the size of the data. That means even more things you can screw up. Fixing it the right way–making the size of the buffer an integral part of the buffer being passed, like Delphi’s dynamic arrays do–isn’t possible without revamping a whole lot of the language and killing backwards compatibility.

      >And what is it with the “other members of the C family”? Surely the reader is supposed to think of C#, which is considered blotted by mere syntactic relationship to C?

      Actually, I was thinking of more obscure descendants, such as D. I know full well that C# isn’t vulnerable to buffer overruns. (As long as you stay strictly within the realm of managed code, and as long as there are no bugs in the CLR.)

  5. Leonardo Herrera says:

    The heartbleed bug is not a classic buffer overrun, but an implementation mistake. Even if the language used were not C but some hypothetical language that does runtime boundary checks, the bug would still exist (the memory sent back by the server is owned by the process, and it is copied into a valid allocated block; as long as the number of bytes copied from the rrec.data member is less than rrec.length you don’t even have a chance to incurr into a out-of-boundaries error.)

    • Mason Wheeler says:

      Yes, it’s copied *into* a valid allocated block, but it’s copied *from* outside the bounds of a valid block. It’s not a “classic” buffer-overflow-by-sending-too-much-data bug, but it’s still a buffer overflow that bounds checking would have mitigated.

      • David Heffernan says:

        Clearly bounds checking would have caught it. But that would be needed in any language. It is frankly embarrassing that you sincerely believe that Delphi is immune from programmer error.

        • Darkhog says:

          Immune? No. Safer? Hell yes. Also don’t forget about C’s “pointer hell” which makes programming, well, hell. Almost everywhere in standard library you need to pass pointers, even where passing simple array of char or single char would be enough.

          Now if you forget to initialize pointer/free it too soon…

  6. Leonardo Herrera says:

    @Moritz – your argument about VC++ having warnings about unsafe functions and whatever does nothing for the point you are trying to make. OpenSSL does not use anything of what you mention, and it is not likely to do so anytime soon.

  7. Yanniel says:

    Just look who’s first in the index.

    http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

    That might give you an idea if C is really dying.

  8. Yanniel says:

    I am simply out of words. Look at this video and see if C++ deserves to die. If you don’t want to watch the whole video, then just look for the prefetch stuff at about 40min in.

    http://video.ch9.ms/sessions/build/2014/2-661_LG.mp4

    Today, resource efficiency is more important than ever, especially for devices like Smartphones. It’s not just about CPU and memory. What about power (battery) efficiency?

    PS: The player you are using now to watch the video is most like compiled in C++. Your OS as well.

    • Mason Wheeler says:

      > PS: The player you are using now to watch the video is most like compiled in C++. Your OS as well.

      …and my OS receives critical security updates on a monthly basis. QED.

      • Yanniel says:

        Vulnerabilities will always be found, and fixes will always be provided. It’s the nature of progress itself. A “perfect programming language” (just to give it a name) replacing C or any other language for that matter, won’t change that fact.

  9. Moritz Beutel says:

    @Leonardo: I didn’t talk about OpenSSL. Mason opened a debate on the C programming language in general, and I responded to that. Of course I agree that OpenSSL might want to review their coding practices and deprecate the use of traditional and unsafe CRT functions.

    As you say, conventional bounds checking would have been useless in this case. It may work for direct array indexing but can easily be tricked with pointers, or by calling low-level runtime library code such as memcpy()/Move(), regardless of the language. The Safe CRT functions could not have avoided a Heartbleed-type bug either. But this particular kind of bug, as well as many others which the Safe CRT does help to mitigate, would not occur in a type-safe runtime such as the CLR.

  10. Wouter says:

    That’s true. C still needs to die though. Let’s do it!

    Die();

    …. Nothing happens…

    Damnit.. Turns out it didn’t work because it should’ve been:

    die();

  11. Paul Davis says:

    Ada fixes all this better than Pascal.

  12. PiotrL says:

    I surely can live without C, but just replace it with C++ and continue.
    Maybe some day somebody will introduce close-to-metal language (not VM) with a clean syntax, supported by a large group of developers and free IDE, but right now we have to use C++ if we need high performance, Java/Scala/C# if we need clean code, Python/PHP/JavaScript/Groovy if we need to implement something fast.

  13. Mason Wheeler says:

    Obnoxious troll comments deleted. Obnoxious troll has been IP banned for failing to ever post anything constructive or useful. Disagreement is fine, and I welcome it. Being disagreeable is another matter entirely.

  14. Oliver says:

    It’s a bit late but I’ve just come across this post after googling Hoare’s lecture in reference to Heartbleed. Near the end of it he gives a powerful warning against using unsafe programming languages. He’s specifically speaking about an early version of the Ada language, but it surely applies even more to C:

    Do not allow this language in its present state to be used in applications where reliability is crucial, i.e., nuclear power stations, cruise missiles, early warning systems, antiballistic missile defense systems. The next rocket to go astray as a result of a programming language error may not be an exploratory space rocket on a harmless trip to Venus: it may be a nuclear warhead exploding over one of our own cities. An unreliable programming language generating unreliable programs constitutes a far greater risk to our environment and to our society than unsafe cars, toxic pesticides, or accidents at nuclear power stations. Be vigilant to reduce the risk, not to increase it.

    We could perhaps update this for our present time and say:

    Do not allow a dangerous programming language to be used in applications that come anywhere close to untrusted user input. The next system to be hacked as a result of a buffer overflow may not be a free email provider or a discussion forum: it may be a submarine in command of nuclear weapons. An unsafe programming language generating unsafe programs is a far greater risk to our security than any foreign army or air force. Reduce that risk at all costs, don’t let it increase.

    I wonder if it’s possible to see a revival in the Pascal/ALGOL language family for systems development. Maybe modern Ada or something similar could be used to produce safer and reliable code for our operating systems and crytographic libraries.

Leave a Reply