Static Typing Still Matters

I ran across a very interesting story yesterday.  Apparently genetic researchers are having some real trouble with their spreadsheets: important data is being wrongly interpreted by Excel as specific data formats and ends up getting mangled irreversibly, leading to data corruption.  For example, the gene identifier “2310009E13” got converted to the floating point value “2.31E+13,” and the tumor suppressor DEC1 [Deleted in Esophageal Cancer 1] was being converted to ‘1-DEC.’

Reading the article, and seeing some of the workarounds that it describes, and the explanation that these all require the researchers to remain vigilant and keep the arcane implementation details of the system in mind, I was reminded of the Gordian knot that is JavaScript’s type system.  It suffers from the same basic problems: implicit conversions all over, arcane and unintuitive rules for both the conversions and working around them, and no straightforward way to define data types.

It brings to mind the floating point multiplication bug from a few years back, where again things got implicitly converted from one data type to another because the system had no type information to rely on.

Granted, this is Excel, and it would be beyond silly to force spreadsheet users to define every cell as “integer,” “float,” “date,” etc., but the point here is that there’s no good way to set a type definition when one would be useful.  That can certainly be done; DWS allows you to define types for your variables, Delphi-style, or to use type inference instead.  Likewise, in Boo, types are inferred by default, or defined explicitly when you define them.  But in Excel, there’s no simple way to say “these values are strings and not anything else.”  And before anyone says that Excel is not a programming language, the folks at Microsoft consider it to be one.

So remember, next time you hear someone talking about how horrible a “verbose” language like Delphi is compared to a “powerful, elegant” dynamic language, that all that syntax is there for a reason.  It carries meaning, both for the compiler and also for human beings who go to read the code, and when that meaning isn’t available, we’re left with a computer program trying to read the user’s mind, which never works well.

7 Comments

  1. Charlotte says:

    Not true about Excel. The researchers just didn’t know how to use it.

    If you’re entering data, you can force it to be regarded as a string by prefixing it with a single quote. Or you could select the column and set the cell format to string rather than general.

    If you’re importing data, you can declare column by column whether each data type is general (auto-detect) or a more specific type.

  2. No One says:

    Mason, your terminology/premises and hence conclusion are incorrect. The issue at hand is one of weak typing and strong typing. In weak typing, implicit conversions are used; e.g. “1” + 1 evaluating to 2. Strong typing does not allow mixing of types like this. In static typing the type of a variable is fixed; in dynamic typing it is not. Hence, you argue against weak typing and conclude that the problem is dynamic typing, which is a completely different thing! For instance, PHP is a dynamic, weakly typed language. Python is a dynamic, strongly typed language.

    >So remember, next time you hear someone talking about how horrible a “verbose” language like Delphi is compared to a >“powerful, elegant” dynamic language, that all that syntax is there for a reason.

    Again, this article didn’t address dynamic vs. static typing at all, only weak vs. strong. The conclusion does not follow from the premises. You’re going to need another article to demonstrate a reason for static typing.

  3. Torbins says:

    No One:
    Please, explain, what do you mean by “weak typing and strong typing”, because Wikipedia says: “In general, these terms do not have a precise definition.”

    • Mason Wheeler says:

      They have a very precise definition, actually. “Weak typing” means precisely “I don’t like this type system,” and “strong typing” means precisely “I do like this type system.” 😛

      But you’re right, No One’s comment doesn’t really address the point of what I was talking about here.

      • No One says:

        How doesn’t it address it? The subject is “Static Typing still matters”, and the evidence is an example of the pitfalls of weak typing, when the opposite of static typing is dynamic typing. Your conclusion referred to “a ‘powerful, elegant’ dynamic language”. That alleged power and elegance is the result of dynamic typing, not weak typing. They’re not the same thing. Javascript and PHP are weakly typed; you could add a string “1” to the number 1 and get two. This can cause major problems in PHP, such as some poorly designed functions that should return strings but return 0 for an error; if code doesn’t check the return value and assumes the function worked because no error was returned PHP will most likely implicitly cast that 0 to a string later on and use it like nothing wrong happened. Ruby and Python are dynamic but strongly typed just like Delphi. Adding an integer to a string will raise an exception. The example of Excel doesn’t justify static typing or make Delphi look better in comparison because it addressed weak, not dynamic, typing. It’s possible to make a case for static typing, but this Excel example isn’t it.

        • Mason Wheeler says:

          Because the problem I was addressing here is not implicit conversions vs. no implicit conversions, but rather manifest typing vs. inferred typing.

          I suppose you could view this as a type conversion issue, in the sense that everything is being implicitly converted from string, but the point I was making is that these researchers are in exactly the same boat as JavaScript (and Python!) users: they have no way to explicitly declare and enforce a data type on a variable, even optionally.

          • SilverWarior says:

            QUOTE: but the point I was making is that these researchers are in exactly the same boat as JavaScript (and Python!) users: they have no way to explicitly declare and enforce a data type on a variable, even optionally.

            Sure tese scientists have a way. Easiest way is to set cell type. Second option is to forcing certain datatype in the calulation formulas itself (it can be pain in the as and seems ugly but it is doable).

            But yes I agree that I would rather have a programming language where I need to declare datatype rather than alowing computer to sucsesfully or in many cases nonsucsesfully try to figure out which is the proper data type.
            That is one of the reasons why I dislike generics whose support has been aded to Delphi.

Leave a Reply to Charlotte