Beware of Format Injection

So, first things first, I’ve moved on again.  And it was quite a move, all the way across the country to the Philadelphia area, home of the Liberty Bell, the cheesesteak, and the baseball team with possibly the least imaginative name in the history of professional sports.  (Original names don’t seem to be Pennsylvania’s strong suit. We’ve got a Pottstown, Pottsgrove and Pottsville all within about 30 miles of each other, for example.  On the other hand, there’s also this place, also nearby, so… yeah.)

I’m at Gateway Ticketing Systems now, the place where Nick Hodges was working when he made that “rock-star developer” video. (Which appears to no longer be available, unfortunately.)  It’s a nice place to work, even if they do use Mercurial. 🙁

Anyhoo, yesterday one of my coworkers sent around a very interesting email.  There’s nothing sensitive in it, so I’m going to copy it here, verbatim:

Something I’ve been meaning to mention:  use care when constructing format strings using string data not generated under your control.

The following code could fail if the exception isn’t generated by us explicitly:

except
    on e: Exception do
        begin
            e.Message := Format('A terrible error occurred: %s', [e.Message])
            raise;
            end;
        end;

We don’t always have control over the text in e.Message.  That text could contain a format specifier, which could cause Format to fail.

I encountered this error in a Win32 call, which generated the error message “%1 is not a valid Win32 application” when loading a library.  That caused the call to Format to fail, obscuring the original exception.

Such constructs should be written as:

    except
        on e: Exception do begin
            e.Message := 'A terrible error occurred: ' + e.Message;
            raise;
            end;
        end;  

Or, if using an inline translation facility as we Delphi programmers do:

    except
        on e: Exception do begin
            e.Message := Translate('A terrible error occurred:’) + ' ' + e.Message;
            raise;
            end;
        end;

We could get complicated and write a function to strip out format specifiers from text not generated by our system.

That’s something I’d never run into before: parameters to format() having unexpected % characters in them!  In the email title, he called it “format injection”, which makes it sound a lot like a SQL Injection vulnerability.  That actually makes me glad I’m a Delphi developer, though, because such an injection vulnerability does exist… in C.  It’s known as a Format String Attack, and two types exist.  One takes advantage of the way that C has no bounds checking, and can be used to reveal information.  The other takes advantage of the fact that C’s string formatting is actually not read-only!  The %n token will actually write a value to an arbitrary memory location!  (Not to be confused with the %n token in Delphi’s format syntax, which interprets an input value as a number.)

Obscure little details like that are one of the reasons it’s good to be working in Delphi, where the worst that can happen if some user screws up your format string is that it’ll raise an exception. (And possibly stomp the exception you were trying to handle.)

15 Comments

  1. I hope that code snippet is just for an example because adding that ‘A terrible error occurred:’ to the exception doesn’t make it any better 🙂

  2. C Johnson says:

    It looks like the result of stupid C string buffer handling ( a classic shortfall of the language) – and not because C strings aren’t read only, but because the input buffer is basically being used as the output and parse buffers as well. I find it hard to believe anyone out do something THAT foolish, but based on how C has traditionally taken string handling shortcuts, and since the problem is described in such detail, clearly this has happened.

    That said, even in Delphi if you used a string generated by other code as your format string and not just the argument data, then yes, that WOULD be foolish to the point of insanity.

  3. David Heffernan says:

    Yes, it’s wonderful to code in a language which is immune from runtime errors.

  4. Uwe Raabe says:

    That reminds me of a case where I had to maintain some old code where the programmer used a very simple regular expression to check an input string against a variable content. In my naivity I reduced this to a plain string compare – until one day the customer complaimed that his program didn’t act as usual. My fault was that I didn’t account for the fact that the input string could contain regex characters. According to the term SQL injection I do call this regex injection now.

  5. SpeedFreak says:

    As far as I can see the example code is perfectly safe (although not how I would do it).

    Are you claiming that the following would fail?:

    WhatEver := Format(‘A terrible error occurred: %s’, [‘%1 is not a valid Win32 application’]);

    In that case SysUtils.ExceptionErrorMessage would have the problem too.

    I’m guessing the example should have read something like this:

    except
    on e: Exception do
    begin
    e.Message := Format(e.Message, [‘Shit happened’]);
    raise;
    end;
    end;

  6. Remy Lebeau says:

    I agree with SpeedFreak. The original code looks safe to me (I have not tested it yet). The only way it would be unsafe is if Format() replaces ‘%s’ in the format string itself and then reparses it, thus seeing new ‘%’ characters to process. But as long as the format string and output string are separate buffers, then how could this fail?

  7. Petar says:

    One of my teachers used to say: “Never trust user input!”. The same applies to any input 🙂

  8. This is a weird advice as following line doesn’t cause any exception under XE2 or XE7.

    s := Format(‘A terrible error occurred: %s’, [‘%1 is not a valid Win32 application’]);

  9. Arthur Hoornweg says:

    This has happened to me several times. My applications are multi-language and sometimes it is unavoidable that even the format strings are localized, simply because different languages have different word order so just “appending” strings isn’t the way to go.

    For example, in English the adjective comes before the noun. In many other languages it’s the other way around. I use Gettext for localization and send my *.po files to the translators with detailed instructions on how to handle format strings and html-formatted strings.

    For example, look at the following line:

    Memo1.lines.add(format(gettext(‘You added a %s %s to the shopping cart’),[gettext(SelectedColor),gettext(SelectedFurniture)]));

    In French, Italian, Spanish… the format string would have to be modified to reflect the different word order of adjective/noun: (‘Vous avez ajouté un %1:s %0:s au chariot.’) and even then the gender of the article would be wrong in 50% of the cases. But typos in the translated format string are truly time bombs because they raise an exception…. and these exceptions are darned hard to track down because the bug isn’t in your source code, it is in a specific language resource…

  10. Cobus Kruger says:

    I don’t get it. There is no reason why Format would need to parse the arguments and (as SpeadFreak and others have shown) it clearly doesn’t. Somewhere along the line, something was misdiagnosed and now it’s all over the Internet. Don’t you just love it when that happens 🙂

  11. I have to agree with several other commenters: How could this possibly cause problems? Could you give us a more concrete example and tell us exactly what kind of problem it causes? I have used this kind of code basically everywhere in my programs because it makes translating the strings much easier than string concatenation (as Arthur Hoornweg pointed out). I’d hate to revisit all these places.

    I can only see one possible problem: If the generated string later on gets used in a way where %1 could be interpreted, like in SQL statements.

  12. David Robb says:

    My apologies, the problem occurs only if the result of the format() call is passed to another format call. I didn’t explain that at the time I sent the email but should have.

    “A terrible error occurred” is for demonstration purposes only and of course is not representative of an actual error message! Please don’t do that at home.

    Dave

  13. David Robb says:

    Here is an example that actually reproduces the issue:

    {$APPTYPE CONSOLE}

    uses
    SysUtils;

    type EDLLLoadError = class(Exception);

    var DLLFilename: string;

    begin
    try
    try
    DLLFilename := ‘Test.dll’;

    raise Exception.Create(‘%1 is not a valid Win32 Application’);
    except
    on e: Exception do begin
    raise EDLLLoadError.Create(Format(‘%s could not be loaded:’ + #13#10#13#10 + E.Message, [DLLFilename]));
    end;
    end;
    except
    on e: Exception do begin
    WriteLn(Format(‘An error occurred: %s’, [e.Message]));
    readln;
    end;
    end;

    end.

    Output:

    An error occurred: No argument for format ‘%1 ‘

    Sorry about that! I encountered the bug 23 months before I sent the email and forgot the details.

  14. David Robb says:

    Last email. I think I misstated the original problem.

    The real problem was that e.message *wasn’t* parameterized. This construct works fine and is how it should have been written in the first place:

    raise EDLLLoadError.CreateFmt(‘%s could not be loaded:%s%s’, [DLLFilename, #13#10#13#10, e.Message]);

    It appears to me that format does sanitize input.

    Dave

  15. David Heffernan says:

    This is a most bizarre article now. What we have here is our old friend, an uncontrolled format string: http://en.wikipedia.org/wiki/Uncontrolled_format_string

    Often seen in C code that (mis)uses printf.

Leave a Reply to Arthur Hoornweg