Native LINQ under active development?

I got a very interesting response to my last post, about missing features in extended RTTI.  Barry Kelly wrote in a comment,

There wasn’t any increase in [RTTI] coverage for XE because other work had priority (64-bit, x-plat, a front end that could support LINQ, more things that I can’t talk about).

None of this actually showed up in XE.  The cross-platform work was supposed to, right up until a couple months ago, but they deferred it because it wasn’t ready yet.  And the 64-bit work’s been promised but not delivered yet for a long time.  (Now they say it’ll be in XE2.  I sure hope so!)  But… LINQ support in the compiler?  Under active development and not a “more things I can’t talk about”?  Wow, when did this happen?

Last time I can remember Barry Kelly discussing even the possibility of LINQ in Delphi was two years ago on The Podcast at Delphi.org.  A quick Google search doesn’t reveal anything on the subject on his blog since 2008.  So I went back and listened to that podcast again.  The stuff about LINQ starts at around minute 31, where he leads into it by talking about another new feature he’s currently (at the time) working on: anonymous methods.  (And a lot of the hits on his blog when I searched for LINQ were him explaining in comments how support for anonymous methods lead up to being able to use LINQ.)

Some of the interesting points from the podcast:

“The syntax we have for the first iteration of [anonymous methods] in Delphi is a little more heavyweight than the Lambda equivalent in C# 3.0.”
“If we were to try to implement the exact syntax [of LINQ like it is in C#]… that requires a kind of type inference that our compiler isn’t set up to do. … Implementing all that is extra work over and beyond actually implementing closures.”

Here we two concerns about LINQ:  The “first iteration of” anonymous methods are too bulky to be used that way, and there’s no type inference in the Delphi compiler.  Well, I definitely agree with the first point.  As useful as closures are, and I’m not going to say they aren’t, their current “first-iteration” syntax is ugly, bulky, and, frankly, not very Pascal-ish.

We’ve got a rule in Pascal that complex types passed to function parameters match on names, not on appearance.  Array parameters are a good example. In another part of that same podcast, Barry mentioned that he doesn’t like the confusion that this rule causes with array parameters, and he’d like to clean it up if he could without breaking backwards compatibility.  I’d be just fine with that, but anonymous methods take things to the opposite extreme, to the detriment of the language.

Let’s say I want to sort a generic list with an anonymous method as a comparer.  Which is easier to read?

MyList.Sort(
  function(const Left, Right: TMyItem): integer
  begin
    result := Left.ID - Right.ID;
  end);

or something like this?

MyList.Sort(TComparison<TMyItem>(result := Left.ID - Right.ID));

The function type name stands in for the function signature, and the parenthesis replace the begin and end, and all that’s left is the actual code. Now, obviously, the second version wouldn’t work if you’re writing a big, long, complex closure like the one found in the LazyLoadAttributes function, deep inside of RTTI.pas. And the first-iteration syntax could still be used for that, but the condensed version is ideal for the sort of simple, “one line filter” style functions that are so common in LINQ queries, and IMO it’s a lot easier to read than C# style lambdas, which are also ugly and not very pascal-ish.  So hopefully we’ll end up with a new anonymous method syntax that looks something like what I described above, and the LINQ part of the language could translate the SQL-style query syntax into these anonymous methods behind the scenes, creating the types for the correct function signatures if necessary.

As for the need for type inference… that’s something I’ve never really been able to wrap my head around.  It just seems like a solution in search of a problem, and then the problem is ever-so-conveniently provided by the C# language, which, being both a C descendant and a Microsoft product, could be reasonably expected to be full of unnecessary problems.

The example he mentioned in the podcast is a select lambda where you’re selecting a certain property from an object.  Something like this:

procedure DoQuery(list: TList<TMyObject>);
begin
 var names := from item in list where item.id >= 50 select item.name;
 ...
end;

Why does the compiler need to infer any types here?  We know what type “item” is because we know what type list is and we know what type comes out of list’s enumerator.  (Presumably TMyObject.)  The compiler has this information and doesn’t need to spend a bunch of time jumping insanely back and forth throughout the parse tree to “infer” it from anywhere.  Once we know what type item is, it’s not hard to know what type item.name is.  There are only two places where this code can’t be parsed and typed left-to-right.  One is “from item in list”, which doesn’t really present a problem figuring out what type item is if the parser reads the from clause as a single expression.

The other, however, is the completely un-Pascal-ish untyped “var names :=” statement at the beginning of the line.  That’s not how we do variable declaration, and I really don’t understand why the RemObjects folks decided to use that syntax in Prism?  It looks like it was lifted directly out of C#, which it probably was.  But it’s ugly in C# and it’s even uglier in Object Pascal.  There are good reasons why we declare variables with their types at the top of the routine, and one of them is to keep from breaking or cluttering up the compiler with stupid unnecessary crap like a type inference parser that has to bounce all over the abstract syntax tree before it can figure out what it’s dealing with.

Am I missing something here?  Is there some powerful technique that just won’t work without the type inference?  Because I’ve never seen any LINQ example that demonstrates that it’s actually necessary or that it improves code readability at all.  And if it’s harder for both the compiler and the coder to read, why bother putting it in?

One of Delphi’s strong points has always been that it gets the details right.  C++ is object-oriented programming done all wrong, but Delphi gets the details right and makes writing good code much easier.  This seems like a good opportunity to continue the tradition.  Don’t clutter up the compiler and the language syntax with ugly, hard to read stuff like inline-declared type-inferred vars and cryptic lambda expressions just because Microsoft (and RemObjects) did it that way.  If the Delphi compiler team is working on native LINQ, I’d really love to see them get the details right and give us something that really looks like it belongs in Object Pascal.

25 Comments

  1. David Champion says:

    > being both a C descendant and a Microsoft product, could be reasonably expected to be full of unnecessary problems.

    love it.

  2. Stefan says:

    You might look at my post at Delphi Praxis where I experimented with custom enumerators using interfaces to make delayed executing of the enumeration itself possible.
    It is in german but the code examples speak for themself though. 🙂
    http://www.delphipraxis.net/154706-custom-enumerator.html
    I am still improving it and might move it to googlecode at some point – still looking for some fancy name 🙂

  3. Much sugar stuff of C# is due for GC.
    But due for the absence of GC in Delphi there would be the possibility of memory leaks as it exists now to ref counting closures(cycles are insoluble)
    So writing closures and may future Dephi LINQ demands from programmer the full insight of compiler work. I dont know is it good or bad.
    But it is very interesting for compiler writers try to solve this problems.
    Just good luck.

  4. Francis Ruiz says:

    I think LINQ has been cooking for a while. Because closures, enumerables and the new RTTI are very important parts of LINQ.

    @Stefan very beautifull and usefull piece of code.

  5. You may find useful my implementation of coroutines

    C# Yield implementation in Delphi.

    http://santonov.blogspot.com/2007/10/yield-you.html

    with SEH unwinding

    http://santonov.blogspot.com/2007/10/seh-dynamic-unwinding-with-auto.html

  6. Barry Kelly says:

    First, I need to make something clear: I am not (and cannot) promising LINQ for any specific release. What I said, specifically, was working on a front end that *could* support LINQ. My statement about front end work comes (roughly) under the scope of http://edn.embarcadero.com/article/39174 .

    Next, an incidental: Left.ID – Right.ID will not work in cases of large negative Left.ID and large positive Right.ID because of overflow; try -MaxInt for Left.ID and 2 for Right.ID. Getting this right normally makes the code spread over more than one line.

    Now, the reason type inference is needed is because type assignment in the compiler currently works from the bottom up, composing types from fragments of expressions, and then working out the types according to language rules. So, given var a, b: Integer, a is of type integer, b is of type integer, and the expression “a / b” is of type Extended. See http://pastebin.com/6vRFkPeH for a sample program showing this.

    That stuff building up from atoms works well, but for LINQ, and for lambdas without type annotations, it doesn’t work. Your LINQ example, if the Delphi compiler took the same approach as C#, would first be transformed into something like this:

    var names := list.Where(item -> item.id >= 50).Select(item -> item.name)

    Note that the functions Where() and Select() may be overloaded. In C#, the compiler looks for them on the static type of the receiver, and if it doesn’t find them there, it looks for an extension method; and for many of the LINQ methods, there are different variants, overloaded. .NET’s Select and Where extension methods in the Enumerable class both have two overloads. In order to resolve an overload, we need to know the argument types, so we can compare them with the parameter types. So we need to parse things like ‘item -> item.id >= 50’ and ‘item -> item.name’ and understand what type they are.

    So, given ‘item -> item.id >= 50′ in isolation, how do you know what type it is? What’s the type of that fragment? Trying to build it up using the traditional rules doesn’t really work. Sure, you can say that ’50’ is some kind of integer, but you don’t know the type of item.id. The >= operator returns a boolean, so you know that the lambda as a whole returns a boolean; but you’re no wiser as to the type of the parameter, other than it has a member called ‘id’, and that the type of ‘id’ has a (possibly user-defined) >= operator whose right hand side is convertible from integer (i.e. it might be a user-defined type with a Implicit conversion operator).

    That’s really far too vague to form a concrete type that can be used for overload resolution. So instead, the set of overloads needs to be considered when trying to infer the type of the lambda. But that parameter position may have 2, or 20, or 200 different parameter types for that argument position, and they may be generic with type parameter inference that’s not trivial. By trivial, I mean something like the C.P method in that pastebin link; but it gets a whole lot worse when it’s only a little more complicated, like Select<T>(pred: TFunc<T,Boolean>).

    Moreover, this approach means information is flowing in the “wrong” direction. Normally, type assignment works from the bottom up of expressions; but here, we need to pass some information down, and the type assignment is context dependent.

    The current Delphi front end is not set up for this kind of type inference, and cannot be easily changed to do it. Something more dramatic is needed.

  7. Barry Kelly says:

    And by the way, “cryptic lambdas”, as you call them, are a far better solution to the problem domain in LINQ than an ad-hoc approach that dealt directly with LINQ-style sorta-SQL expressions, rather than transforming them into the chained method, monadic approach.

    The reason RemObjects used ‘var’, for one is that in LINQ, the type returned by the LINQ expression may not have a name in the case of a Select which creates an anonymous type. But it’s also good practice to declare variables as late as possible, preferably only when you have a value to put into them: it completely eliminates the problem of using a variable before it is definitely assigned. Better again to go a step further, like Scala, and have a ‘val’ syntax which lets you create named expressions (i.e. read-only “variables” that get their values exactly once, upon assignment). I’m not a fan of Pascal’s penchant for declaring all the variables up front; it’s a very imperative, bit-fiddling way of thinking about the world, in terms of slots that you mutate, rather than values you calculate from.

  8. Morwath says:

    It looks to me that LINQ is mostly a feature for nostalgics of dbBase/Clipper/Foxpro

  9. Yogi Yang says:

    I have found that in past one of the Parnters of CodeGear (I don’t remember the name at present) had developed a half baked LinQ like lib for Delphi but his was way before MS came out with one. Have you ever tried that?

    Currently they have removed it from their web site.

    If you want let me know I will try to find it in my IT Attic 🙂

    HTH

  10. Mason Wheeler says:

    @Barry:
    First off, my apologies that your first post didn’t show up. (And yours as well, Sergey.) They got caught in WordPress’s “hold for moderation” trap, apparently because they contain links, and it never emailed me about them so I didn’t notice until just now. Your second message makes more sense now. I looked at it and thought, “this looks like part 2 of something, but where’s part 1?!?”

    So, to part 1: My LINQ example, if the Delphi compiler took the same approach as the C# compiler, would end up badly broken and require type inference because of the way the lambdas work. So, OK, is there any pressing need to take the same approach as the C# compiler? Maybe I’m oversimplifying things a little, but here’s how I would parse this:

    “List.where? OK, Where<T> is a generic extension method to IEnumerable. So I need to know the type. Look up the enumerator on list, hmm, here it is, and Current returns a TMyObject. So create a template for an anonymous method of type TFilter<TMyObject> and try to parse “item.id >= 50″ against that to make sure it fits. Well, looks all right to me, TMyObject has a member named id of type integer. We’re good to go. Now do we have any Where<T> overloads that will accept an anonymous method that looks like this? Yep, we sure do. Everything looks good here. Next!”

    That’s one of the beautiful things about LINQ: it’s written in a logical order that can be easily read left-to-right. (As opposed to SQL, which can be read neither left-to-right nor right-to-left, but has to be read from the FROM clause, in the middle of the query, first in order to make sense of it.) If I can read it that way, is there any particular reason why the compiler can’t? Sure, you might need a separate “LINQ parser”, but it’s easy enough to switch over to that when it finds the keyword from, and that’s a heck of a lot more straightforward than all the type inference you were talking about on the podcast.

    To part 2: That’s interesting, but really, what good is an anonymous class? That’s another part of this I’ve never quite understood. You can’t call methods on it, because it doesn’t have any. You can’t pass it to anything, (except maybe a function that accepts a TObject,) because it doesn’t have a declared type. Sure, it holds data, but its usefulness is sharply limited when you can’t hand it off to anything outside the scope of the current function in order to use that data, at least not without a heavy dose of RTTI.

    And just as a side note, I’m with John Hughes on the subject of immutable variables. From his paper, “Why functional programming matters“:

    “The functional programmer sounds rather like a medieval monk, denying himself the pleasures of life in the hope that it will make
    him virtuous. To those more interested in material benefits, these “advantages” are not very convincing. Functional programmers argue that there are great material benefits … [but] this is plainly ridiculous. If omitting assignment statements brought such enormous benefits then FORTRAN programmers would have been doing it for twenty years. It is a logical impossibility to make a language more powerful by omitting features, no matter how bad they may be.”

  11. Barry Kelly says:

    Special-casing things by looking for an IEnumerable<T> implementation, the type of its Current property, etc., is pretty hacky – you must see that. It would rule out using lambdas in third-party libraries for other scenarios. Language features have to have a lot of punch to justify themselves, and every time you build knowledge into the compiler about expected RTL symbols, you make the whole thing more brittle.

    But it’s worse than that. Where<T> is not necessarily an extension method to IEnumerable. It may be user-supplied. If you want it to work in user-defined contexts, you have to do what you describe – try and infer the type argument, fill out the template, as you put it, and instantiate the TFilter<T> equivalent, and then try to type the lambda. But what you describe is type inference! (An ad-hoc type inference, but type inference all the same.) That procedure is, like, 5x more complicated than normal type assignment. And the front end is not set up for it. The Where<T> case is also one of the most simple situations.

    Have a look at Enumerable.SelectMany<TSource, TCollection, TResult>(
    IEnumerable<TSource> source, Func<TSource, IEnumerable<TCollection>> collectionSelector, Func<TSource, TCollection, TResult>> resultSelector). When parsing, you might know the type of TSource from your technique. But that doesn’t help you enough; you also need to know the type of TCollection to know the type of the second argument to resultSelector. The compiler needs to be able to partially instantiate SelectMany with only a single type argument, leaving the others unknown. It then needs to parse the second argument collectionSelector knowing only the argument type and not the return type, so it doesn’t know the type of the implicit Result variable. It has to figure out the type of that Result variable, so that it can go back up to the top level and re-instantiate SelectMany using the extra type parameter it found out, TCollection. What’s more, it’s not simply the result; it’s actually buried inside the instantiation of IEnumerable<T>, which is quite a leap of inference (compilers, like most programs, are pretty dumb). Only then can it type the third argument, resultSelector, and only after it’s done that can it properly instantiate the whole of SelectMany. Here, we’re approaching complexity that’s maybe 10..20x more complicated than the simple type assignment in the compiler. There’s hardly a single place that isn’t affected, because the compiler is forced to parse code without typing it, because the return type of the lambda isn’t known.

    And SelectMany is overloaded. There could be other combinations of arguments which are potentially correct in different scenarios, only to be found out to be incorrect when parsing the last argument.

    The reason Select is used with anonymous classes, quite apart from only wanting a couple of columns out of their ad-hoc query (which will only be foreach-ed over), is that people want their SQL not to blindly ‘SELECT *’, but be more precise. There’s an isomorphism (defined by the LINQ provider) between the AST that the compiler creates for LINQ expressions and the SQL generated behind the scenes when querying a real database. In order to avoid issuing a ‘SELECT *’ from the DB (and making your indexes a whole lot less efficient), you need a way of limiting the number of columns touched.

    The functional programmer avoids certain features because it makes programs more understandable. There are a lot of programs in the world which would be significantly better if their languages didn’t include global variables as a feature. Too many programmers are tempted to take shortcuts that harm good structure. If you don’t accept at least this, then there’s little point arguing it; it’s something you need to experience, through the pain of maintenance. Languages do get more powerful, in practice, when removing features, because programmers have fewer avenues to shoot their own feet, or even better, the feet of programmers that come 20 years later. Java, Ruby, Python etc. are all more powerful for making memory corruption impossible through memory safety, by not including certain type unsound operations. And his justification – alluding to FORTRAN – is nonsense, since it can be used to justify the irrelevance of any programming language feature that’s not included in FORTRAN. Object orientation is useless because FORTRAN programmers haven’t been doing it since 1970 (20 years before that paper was written). Ditto RTTI, ditto generics, ditto LINQ, etc., all useless by that metric.

  12. Barry Kelly says:

    BTW, the video you need to watch is linked from http://blogs.msdn.com/b/ericlippert/archive/2006/11/17/a-face-made-for-email-part-three.aspx , but unfortunately, the link is dead.

  13. Mason Wheeler says:

    I don’t really mean to “special-case” IEnumerable<T>. That was just an example. We’ve already got an existing special case in the compiler: anything you’re going to enumerate over must have a GetEnumerator method that returns a data structure that conforms to a certain public interface. (Unless it’s an array or a set, in which case other rules apply.) But my point is that the compiler already has all the special-case rules in place for determining the enumerated-type of anything you can apply a for-in loop to. Therefore, no new special case code is necessary for determining this for LINQ, since since all LINQ queries revolve around enumerators and you can already determine the type of enumerations in the general case.

    Having said that, though, the SelectMany example is more persuasive. I think I get it now. Thanks for explaining it. 🙂

    WRT the anonymous classes, I can understand why you’d want to not have to use SELECT *, but I still don’t see the link between that and the need for anonymous classes that you seem to be taking for granted here. If I was the one designing the system, the rule would be that if, for example, you wanted to SELECT three specific fields, then the return type, which would have to be declared up front, would need to be an existing class that contains (at least) three writable properties of the appropriate name and type and has a parameterless constructor available, and then someone (either the compiler or the LINQ provider) would be responsible for creating the items and stuffing the values into them.

    Not only does this do away with the need for extra complexity to handle anonymous classes, but it makes it far easier for the coder to deal with, since he can pass it around to other functions, call methods on the class, design the class with Set methods on the properties that set up other things as well, etc. This also makes it possible to break the code down into smaller pieces, by separating the querying code from the code that uses the query result. With an anonymous class, as you mentioned, you’re forced to keep the two together by putting the LINQ query inside a for-in loop.

    And while I can understand that some features may be generally “considered harmful,” that’s an education issue more than anything. You use globals as an example, and say a lot of programs would be better off without them in the language. Sometimes they’re needed for very fundamental concepts. Where would most Delphi programs be without globals in the language? (Hint: what percentage of Delphi programs out there don’t use Application?) Backwards compatibility issues aside, how would you build a VCL in a global-less Object Pascal dialect?

    As for removing things that let you shoot yourself in the foot to make the language “more powerful,” I don’t agree with that definition. Any bedrock abstraction that exists that the programmer cannot get beneath when necessary makes the language less powerful, because it prevents you from doing things that exist at a lower level than the base abstraction. For example, I would never try to build a game engine in any language that doesn’t provide access to inline assembly and pointer math. Languages without these “dangerous” features are crippled, because removing them removes the ability to do some things you just can’t get at any other way. That’s part of what makes Delphi so wonderful to work with. It doesn’t take any of these things away; it just gets enough little details right that you usually don’t have any need for them–but when you do, they’re there for you.

    And John Hughes wasn’t using FORTRAN to disqualify anything that came later; he was using it because he wrote this in 1984 and FORTRAN was still the big imperative language back then. I cut a lot of the quote out to keep the reply size down, and I may have cut a bit too much, but you could replace “FORTRAN” in that statement with “C,” “C++,” or “Delphi” without any loss of significance. The point he was making was that removing useful features–specifically, assignment and mutability–necessarily makes languages less powerful, not more. (And remember that this is in a pro-FP paper! He goes on to explain what the real benefits of functional programming are, and I tend to agree with him that what really matters is the ability to break down a program into smaller parts.)

  14. eric says:

    Java, Ruby, Python as powerful languages? Somehow, that doesn’t compute, all three are more in the realm of extended scripting than 1st-class languages (sorry Java users). Apart from maybe Java the other two are pretty much restricted to specialty fields, and arguably Java, though with broader uses us limited in most scenarios to the point of requiring libraries written in other languages.
    Though it probably all boils down to what one considers “powerful”, what is the metric?
    Also if I agree on declaring variables as late as possible being a plus, type inference, or massively overloaded functions for that matter, are the modern days source of dynamically typed horrors like you could encounter in early-days basic. The code looks shorter and more generic, sure, but the opportunities for adding carrots and apples and displaying a result in kilometers abound, if you get my meaning, and cause more trouble than they’re worth. In a way, type inference and overloads when relied heavily on, even if they are compile-time checked, don’t offer more guarantees than runtime type checks, which are a major real world reliability issue (major pita with some libraries in C# and Java for “regular” devs)

  15. Morwath says:

    IMHO some languages are called “powerful” lately referring to the hardware they have to run on.

  16. John says:

    Modern “powerful” languages need a powerful debugger in a powerful IDE running on a powerful machine, otherwise you can’t fix the powerful bugs you write with them.

  17. Delfi Phan says:

    LINQ met with quite some opposition in Anders Hejsberg’s own team, primarily because you end up with a language within a language (It was on a video, only Anders’s could be heard clearly).

    Anonymous methods are rarely-used constructs (mainly sorting and filtering) and because it’s rare, every time I encounter them in code, I do a double-take and have to re-read to realize what’s going on.

    In the past I used function types to achieve the same thing. Yes, more typing. But as I said, it’s relatively rare, so no big deal.

    These things are all pretty exotic. I wish more thought would be given to things that make a programmers day-to-day life easier, like a better WITH implementation. The Borland QC had several postings with good ideas. Don’t know if they are still there…

    But anyway: good information in this posting, thanks! Right-click: Save.

  18. @Mason
    Look please on Mirah Ruby inspired language presentation. Ruby as a dynamic language, and types are defined ad-hoc. Mirah instead is a statically language and only where is needed, it will require to do the typing.

    The Linq in itself to be implemented is really very hard for the very same reason: functional programming mostly do not work on types, but statical languages (as Delphi) do. And the expressions should be evaluated in the second pass parsing when types will be completed based on the whole program namespaces.
    The problem as far as I understand how Delphi works, is that even a var t := [functionCall|expression] is a very easy case of type inference, to solve in a global namespace the same issues are not that an easy matter.
    Linq expressions are converted to static functions and closures based on extension methods. And even type inference will be solved (and if the codebase is not “prepared” for this, is not in itself an easy task), the extra generation of code by compiler is also not such an easy task.

  19. too says:

    “There are good reasons why we declare variables with their types at the top of the routine, and one of them is to keep from breaking or cluttering up the compiler with stupid unnecessary crap like a type inference parser that has to bounce all over the abstract syntax tree before it can figure out what it’s dealing with.”

    To be honest, this is a major pitfall in Delphi, ability to declare variables inline actually unclutters the code on many occasions (for loop in c/c++/c# is a good example) and makes writing and analyzing code faster as the type is near the analyzed scope, not somewhere far up. For that in Delphi Ctrl + Shift + V only helps.

    As of compiler perspective, scope variables can be moved easier to registers during compilation for performance purposes as the scope of a variable is already bound with it’s declaration – there is no need to traverse variable reference tree.

    Finally, programming language is for humans not for compilers and the problem arises when requirements of both do not exactly match. Language flexibility and simplicity will serve programmer so it is desirable, regardless of compilation performance which on the other hand could finally use more than one CPU core in Delphi.

  20. Too: I’d have to disagree with that. Having a separate area for your variables and for your code definitely makes the code easier to read, since you know exactly where to look to find both. If it makes your code harder to read because you have to scroll or page up to find your variables, that’s a sign that your routine is growing too big and needs to be refactored, not a sign that the language is doing something wrong.

  21. too says:

    Mason: then it looks like it is rather a matter of preference. Definition of order and elegance in programming is rather fuzzy with respect to grammar and semantics, not design patterns.

    Additionally not having a functionality of inline variable declarations implies only one way of programming and available inline declarations can be used both ways, regardless of what programming style is personally perceived as the valid one (or what is the current “fashion”).

    Delpi is known as being very strict, so when there is a chance of loosing it a little bit, I would go for it. It will give possibilities to people who know how to use it at a cost of possible side effect of misusing it by others.

  22. Ken Knopfli says:

    Refactoring because a routine is “too long” introduces arbitrary breaks in the code. IBM once had a similarly arbitrary rule that procedures may not exceed 25 lines.

    If a routine truly is linear, keeping it that way helps readability.

    But if a section is called multiple times, or a section is repeated in another routine, THEN refactor.

    As to putting declarations in a header, that had two reasons, no longer valid in the 21st century:

    1. In an era when computers were slow, compilers needed help from the programmer. Having declarations before implementation gave the compiler a much needed helping hand and was one reason why TurboPascal was so fast.

    2. Editing back then was done with text editors. Modern IDEs (should) have tools to assist developers, such as hovering the mouse over a variable and getting all the info in a popup regardless of where it was declared. And jumping to the declaration if needed is also IDE assisted.

    As things stand, I am constantly jumping up and down as new variables are introduced to the code.

    Having some experience with C#, I am finding it a major PITA going back to Delphi. Pascal was a teaching language from a bygone age and it shows.

    Unfortunately, to modernize it risks breaking old code. Perhaps the move to 64bit will be an opportunity to bring Delphi Pascal up-to-date.

  23. Wouter says:

    Barry, interesting link about type inference..
    But what’s that Turbo Pascal box doing on the background? 🙂 (at exactly 25:00 minutes)

    http://wm.microsoft.com/ms/msdn/visualcsharp/eric_lippert_2006_11/EricLippert01.wmv

  24. […] is going on with Delphi, and his blog always makes for an interesting an educational read.  This post is no exception, in which he does a little digging and discusses some things he’s heard and how it might be […]

  25. codeelegance says:

    Ken: On the contrary. A routine’s length is a primary reason to refactor. I would agree that setting a hard limit is pointless but the way I see it IBM’s 25 lines rule was too lax. The functions I write rarely exceed 3 or 4 lines of code. Comprehension is king. The smaller a function is the more obvious its purpose is. To quote “Uncle Bob” Martin’s book, Clean Code – “The first rule of functions is that they should be small. The second rule of functions is that they should be smaller than that.”

    I find it humorous to see mammoth 3000 line functions written in a language invented to promote good programming practices, one of which was stepwise refinement. Lets face it. This is exactly what “Extract method” is. Niklaus Wirth was the father of refactoring.

    I regularly work with both Delphi and C# and the only thing that trips me up is assignments and equality comparisons (which Delphi’s compiler is better at catching than any of C’s descendants). One thing I do find inconvenient in Delphi is the inability to initialize a variable in its declaration.