I built a compiler today

I wrote a compiler at work today.  It took about 5 hours.

Granted, its input is very simple and nowhere near Turing-complete, and its output is in Delphi, not any type of machine language or bytecode.  But it definitely fits Joel Spolsky’s definition of a compiler.

A few years back, the term “ORM” started to become popular in the computer programming community.  An Object-Relational Mapper is a library that takes a lot of the repetitive grunt work out of building an interface between the data in your database and the objects in your program that the data represents.

There are ORMs freely available for pretty much every significant object-oriented language, including a few for Delphi.  But our program has been around since before the ORM concept got all big and well-known like it is today, so our engineers didn’t have these libraries to choose from.  They built their own instead.

It’s simply called “OM,” the Object Model.  Most engineers, if they looked it it, would call it cobbled-together legacy code. It’s kinda ugly, warty, and full of hacks.  But by the time the cool, fancy ORM libraries came out, the OM was firmly entrenched throughout a major, industry-leading app, so we’ve never really thought about changing it.

Basically, the OM framework maps between database tables and OM classes.  For each table, there’s a corresponding OM unit that describes the object and how to map fields to columns.  That’s one of the most annoying parts of working with it.  You basically have to create everything twice: a SQL script to create the table, and then a matching OM unit that basically takes all the same data and describes it in a different way.  And you have to go through the same process any time some new feature requires a new class.

Well, today I needed a new class for something I’ve been working on.  For a couple years now, I’ve had the idea of automatically generating one side of the pair from the other, since they’re basically just two views of the same data.  The idea occurred to me again today, and I decided to finally do something about it.

Since the SQL CREATE TABLE statement is a lot simpler than a Delphi unit, I decided to use that as the input.  I copied the code for my RTTI Script compiler over to my dev system, gutted most of it since I’m not parsing Object Pascal code in this case, and started working on a simple lexer and parser that know how to read a CREATE TABLE query.  Nothing more; I tried to stick to the bare minimum, since I’m not actually creating any tables with this code. It followed the basic grammar rules we tend to use, with very little error checking or verification, since it’s only intended to use as input a script that already runs correctly on the database.

Basically, the parser knows how a CREATE TABLE query starts, and that it contains a long list of column declarations between parenthesis, and possibly a few other things that aren’t column declarations, which it filters out and ignores.  It creates a very simple intermediate representation, a single object containing the table name and a generic TList of column definition data records.

This gets passed to the code generator, which creates a TStringList and passes it and the intermediate representation object through a series of methods corresponding to different sections of the unit.  These methods are mostly long sequences of calls to list.Add, and for-in loops over the column definition list that call Format to fill in field-related information in various places.  When it’s all done, I retrieve the list’s .Text property as the final output.  After the obligatory debugging and tweaking phase, I ended up with output for some of our tables that looks just like the original OM units.

Why do I mention this on here?  Because of what I said earlier, that the basic idea for this has been running around in my mind for quite a while now.  Before, it was just an idea, and I didn’t really have much of a notion of how I might implement something like this. But now that I’ve been through the learning process of building a (much more complicated) script compiler, something like this was extremely simple.  You never really know, when you start learning something new, where it will come in handy further down the road.

I certainly didn’t build a script compiler so I would be able to create an internal tool to save my coworkers and myself some tedious grunt work.  But now that I have the necessary skills, it’s nice to be able to put them to use.

5 Comments

  1. Chee Meng says:

    Mason, you might be interested to look at how Doctrine (a popular ORM for PHP) uses the YAML syntax: http://www.doctrine-project.org/projects/orm/1.2/docs/manual/yaml-schema-files/en … I found it quite intuitive and it can be used to enforce column types, relationships, etc.

  2. neugls says:

    Can you let’s have a try on what you have done?
    It may be gratefully.

  3. François says:

    Nice.
    Ah, the joy of the Object Model!
    Not really missing it 😉

  4. jachguate says:

    The other day, I have to create some classes for java, all matching database structures. I was forced out of a ORM by my “customer”, a friend who is responsible of this project… so I did something similar to what you did, also in Delphi, but instead of parsing SQL sentences, I got the table structure directly querying the data dictionary (it was SQL Server), then, I generated 4 classes related to that table… it took 2 to 3 hours to complete the generator… and was used to create a bunch of classes for 20 tables that day, lot of manual work was saved that day, and since then my friend is still using this when he needs to integrate a new table or makes changes to the underlying structure for existing classes. Good story!

    • Peter says:

      Great ideas, thanks to both Mason and jachguate – boiled down to basics for me this reads: open the TDataSet on the table, iterate through TFields and generate away. Doh, I really should think of this in my previous job where I’ve done several manual extensions of a yet another in-house ORM system…

Leave a Reply