ClassType field should not be “magic”

I’ve been doing some work with code generation recently. It’s kind of messy. You need to build a tree in memory of objects that represent various types of syntax for the language you’re generating code for. You have to think inside-out from the way you normally write code, since you’re creating it in logical tree form, not in line form.

You have a base class that represents any code-generation object, and a bunch of classes that descend from it.  In order to manage things properly, you’re likely to have TList<TCodegenObject> and TStack<TCodegenObject> collections (or, worse still, non-generic TObjectList and TObjectStack containers) all over the place.  But TCodegenObject, like TObject, is an abstract base class that you only instantiate descendants of.

This can make debugging messy, especially if you break on an exception and Evaluate/Modify can’t read the locals anymore. You try to look inside your collections to see what’s going on, but all you see are TCodegenObject instances, not the actual object instances you’re working with. Sometimes that information is crucial to figuring out what’s going on, and you have to resort to ugly debugger hackery to track it down.

After this had happened to me a few too many times, it occurred to me that there’s something slightly ridiculous in Delphi’s object model. The first 4 bytes of the instance contain a reference to the object’s VMT, which defines the object’s class type. It’s a hidden area of memory, managed through “magic” and ugly pointer casting. For example, the TObject.ClassType function, which returns a TClass variable, which is defined as a pointer to a class’s VMT, (which defines the class type,) is implemented as:

[code lang="delphi"]
function TObject.ClassType: TClass;
begin
  Pointer(Result) := PPointer(Self)^;
end;
[/code]

Now maybe, for some strange reason, this made sense back in 1995 when Delphi’s object model was first being set up. But today there are two problems with it. 1, it’s ugly, and 2, it leads to unnecessary frustration by hiding useful information, especially since the Delphi debugger has a built-in evaluator that converts TClass references to class names.

You can probably see what I’m getting at. I’d like to see, in the next Delphi version if possible, that “magic” field replaced with a real field, like so:

[code lang="delphi"]
  TObject = class
  private
    FClassType: TClass;
  public
    //stuff that's already there
    //minus the ClassType method
    property ClassType: TClass read FClassType;
  end;
[/code]

This remains syntactically equivalent to the existing implementation and would not break any existing code, and it has the added benefit of making polymorphism-related code easier to debug, since the FClassType field would be available in the debugger. I’ve thought about this and I can’t see any downsides. Can anyone see a potential problem with it?

I’ve submitted this to QC. Please vote it up if you like the idea.

EDIT:
I’m adding this because a few of the comments seem to be misunderstanding what I’m talking about. They think I want to add a new field to TObject, and they (correctly) think that adding one would increase overhead, which would be a bad thing.

Thing is, I’m not talking about adding any new overhead to TObject. That field is already there. It’s been there since Delphi 1. The first 4 bytes of every TObject instance is a field that contains a pointer to the object’s VMT, or in other words, a TClass. For whatever reason, the language architects decided not to expose it as a visible field of type TClass. What I’m suggesting is simply taking that hidden field and exposing it, not adding anything new that wasn’t already there all along.

15 Comments

  1. Tijmen says:

    It will make TObject bigger. If you have many small objects this change will increase the memory usage.

  2. Marjan Venema says:

    I also don’t like the debugger not showing class names, but to get around it I have just added a property that reads the function to the classes where it is important. I don’t want anymore bytes added to TObject without very, very good reason and only when there are really no alternatives. D2009 already silently added the monitor, doubling the size of TObject. This caused us great sorrow as we are loading a lot of data straight into memory (80GB+ models). We got around the monitor thing by using an array that “simply” puts the pointer to a second instance partly over the bytes of the first instance. And we only get away with it because we don’t use monitors… So, no thanks, solutions that make TObject bigger are very much undesireable to us.

  3. Andreas says:

    @Marjan: With the proposed change the TObject class wouldn’t need more bytes. What he suggests is to show the already existing field in the class declaration. There is no new field, only the hidden field becomes visible.

  4. Marjan Venema says:

    @Andreas: The size would increase. Try SizeOf on TObject. It used to be 4 – the bytes it takes for a Pointer to the instance. In d2009 it became 8, one instance pointer and one monitor pointer. The field Mason wants to make visible may be hidden, but it is hidden by compiler magic and currently does not take up any space in an instance’s size. Declaring it as a field in TObject would.

  5. Mason Wheeler says:

    Marjan: No, Andreas has it right. The first 4 bytes don’t represent the pointer to the instance. They’re inside the TObject instance itself, the first 4 bytes of the object, and they represent a TClass reference to the object’s VMT. The 4 bytes for the object reference aren’t reported because SizeOf uses the instance size.

    >It is hidden by compiler magic and currently does not take up any space in an instance’s size.

    This is not only incorrect, it’s impossible. I used the term “magic” in quotes for a reason. There is no magic in computer science, only things that are made explicit and things that are hidden from you by clever tricks.

    Think about it for a moment. If you pass a reference to something to an event handler expecting “Sender: TObject”, and the first line says “MyButton := Sender as TButton”, that as-cast has to evaluate that TObject instance and make sure it’s a TButton. In order to do that, it needs to have information about the class type of that specific instance available somehow, and the only information it’s got to look at is a reference to the object as a TObject. Therefore, that information must be stored somewhere within that individual TObject.

    The information is already there. My solution wouldn’t add any overhead whatsoever. Just look at the implementation of TObject.ClassType. (Look very carefully, since it uses pointers and pointers-to-pointers. But if you can follow the indirections you’ll see what it’s doing.) And this shows the other problem with using “magic” to implement what ought to be a simple, explicit concept. When it’s not explained, it doesn’t get understood, and people end up with incorrect ideas about how it works.

  6. Pham says:

    The propose is wrong. Every object has a pointer to meta class info. Unlike the hidden monitor object which must be initiated for every instance. Adding this will increase the size of object which is similar to monitor space

    Cheers

  7. Marjan Venema says:

    @Mason: Thanks for the explanation. Guess I have a knee-jerk reaction to anything that seems to increase the size of a TObject… 🙂

  8. Jolyon Smith says:

    Maybe I’m being a bit slow (it’s Monday morning and I’m still waiting for the acolyte to return with the morning coffee run), but how exactly would this help with debugging?

    Or more precisely, what is there in this suggestion that simply adding more awareness in the debugger would not already achieve? i.e. instead of changing the magic in the compiler (high risk, since it is a change in long established behaviour that code may not adequately allow for), simply enhance the debugger.

    I think the problem is that in asserting your solution I don’t fully grasp the problem you are solving. You state the need to resort to “debugger hackery” (prima facie evidence that it’s the DEBUGGER that needs fixing, not the compiler), but don’t make it clear (to me at least) what hackery it is that is required and how making fClassType “more visible” would remove the need to resort to such hackery.

  9. Xepol says:

    Technically, you are purposing placing class information inside an instance of the class information. The same class information you would need before you could determine where to look for the class information.

    Unless, of course, the compiler magically ensured that it always placed the fClassType in the same place all the time, like it already does. So, what you are REALLY suggesting is that the compiler expose this magic value as a field and then treat it in a very special way.

    Incidently, I believe I agree with Jolyon here. The issue is actually the debugger, not the compiler. The debugger is looking for class information from the container, not the objects it contains. You could suggest that the debugger have a mode to take type information from the instance directly – but it would have to not be the default behaviour as that would defeat the point of polymorphism (wanting to treat everything as TAncestor, instead of TDescendant1 or TDescendant2 as it provides as fixed API for the object. The same API the code is compiled against. Showing the actual instance may actually HIDE important details of what the code is actually doing)

    Additionaly, is is possible your problem of not being able to access local variables is actually the compiler optimizer removing the self variable to save space? I run into this frequently myself – it can indeed be a huge pain, but that is what debug loggers are for, or even outputdebugstring (or, you could turn off compiler optimization, but unless you are leaving it off for the final product, this may be unwise for debugging as it changes a great many things)

  10. Mason Wheeler says:

    @Joylon: Yes, making the debugger treat all objects as their actual type rather than their declared type would solve the problem, but it would probably be harder to implement and it might cause other problems, as Xepol pointed out.

    As for what sort of debugger hackery is required to get at an object in this situation, I could make an entire blog post about that! What would you do if you wanted to figure out what actual type MyList[3] is, and you’re caught at an exception so typing MyList[3].ClassType into Evaluate/Modify didn’t work? It can be done, but it’s not pretty.

    @Xepol: No, I’m well-aware of the optimization issues, and I always turn the Optimization setting off for debug builds. This is about exceptions, not stack optimization.

    And no, I’m not saying that the compiler needs to expose the magic field and treat it specially. I’m saying that it needs to *stop* treating it specially by exposing it as a normal field and treating it normally. The Monitor field needs to be treated specially because of the way class inheritance works, but the ClassType field does not, and should not IMO.

  11. Xepol says:

    I still posit that putting the ONE piece of information that tells you how to deal with data at an arbitrary point in that data is a very poor choice. It at least has to be at a fixed position to keep the code simple. If you have to know what the data is to know where the marker for what type of data it is, then you have pretty much created an impossible solution. As such, if you make it a property, you have to be 100% certain where it will occur. It takes compiler magic to know that THAT specific field will always occur at position X in all the various forms of the data. The end result is pretty much the same as what we have now, regardless of how its implemented – so what real advantage is there to changing something that already works?

    When I do have to know the exact class type of something in the debugger, I ensure that the optimizer does not compile out ClassName, and then I can easily see what to use as my type-cast in the debugger (tho an optional “be magic and figure out the right class on this object” and use that instead would be a handy feature in the debugger)

  12. Maxim says:

    I belive that the reason not to include the VMT as a visible field is that fields in Delphi cannot be made readonly. So “publishing” the VMT field would become a disaster. But why didn’t they made it a property (which can be write-protected)?
    And of course it’s more about debugger which looks at declared but not actual type of a variable/field.

  13. Hallvard Vassbotn says:

    The solution would be to enhance the debugger. I logged a request for this in 2007 – only 1 vote so far…:

    http://qc.embarcadero.com/wc/qcmain.aspx?d=38661

  14. delphishaman says:

    Well, I strongly disagree with the proposal. To change this, a part of compiler magic would have to be changed.
    IMO, it is a good thing that “magic” behind it is hidden, since it can change at any time and it’s tightly bound to compiler magic.
    If it was a property reading from a field, it would encourage programmers to use it as such.

    Otoh, you can declare such thing in TCodegenObject:
    property ClassType: TCodegenObjectClass read GetClassType;

    function TCodegenObject.GetClassType:TCodegenObjectClass;
    begin
    result := TCodegenObjectClass(inherited ClassType);
    end;

    If I’m not mistaken, it will appear properly in runtime inspector. I strongly suggest to name the property differently as hiding ancestor’s member is ugly programming and can lead to some hard-to-trace errors.

    Think about changing the core only when there’s no other option.

    However, I do like #38661.

  15. François says:

    @Hallvard. See, things can get better: 12 votes now…