Wish list: Slicing syntax

I’ve been doing a lot of string processing work lately.  Delphi has a lot of strengths, but unfortunately string manipulation really isn’t one of them.  It gets really messy really quickly if you’re trying to do anything complicated.  Extracting substrings out of strings quickly degenerates into a mess of difficult-to-read Copy calls.  It would sure be nice if there was a better way.

There is a better way, by the way.  Python has it; it’s called “slicing.”  You know how in Delphi, you can use array[subscript] notation to get a single item out of an array?  Well in Python you can use an extended version of that to get a subrange (a slice) out of an array.  For example:

[code lang=”python”]

mystring = “Python string slicing”
mystring[7:13]

[/code]

returns “string”.  (In Python, strings start at index 0.)  Also, you don’t need a value on both sides of the colon.  If you don’t provide one, it translates to the beginning/end of the array, as appropriate.  So “mystring[:6]” returns “Python” and “mystring[14:]” returns “slicing”.

Now, using a colon here isn’t very Pascal-ish.  Colons are for variable declarations.  But we do have a range operator that’ s used with arrays: the double dot.  Wouldn’t it be nice to be able to do stuff like this in Delphi?

[code lang=”delphi”]

name := ‘Mason Wheeler’;
firstname := name[..5];
lastname := name[6..];

[/code]

The compiler could translate this syntax to calls to Copy easily enough.  Also, they could improve on the concept a little.  If you feed Python a negative number in a slicing expression, it will error out.  But if Delphi could automatically translate myArray[..-5] into myArray[..high(myArray) – 5], grabbing everything except the last  X number of elements would be that much easier.

Any thoughts?

15 Comments

  1. Eric says:

    The handling of negative values could have weird side-effects if the bound if computed rather than a constant, that’s probably why python errors.
    Other than that, it looks good and convenient, I’m noting it for DWScript (and it can be generalized to arrays, not just strings) 🙂
    I wonder if it couldn’t be extended to array properties too, with an extension to default values syntax to handle the [..value] case?

  2. Python/Delphi user says:

    Slicing is high on my wish list.
    Now, I’m not sure what you mean with errors with negative numbers in slicing expressions:

    *** Python 3.2.1 (default, Jul 10 2011, 21:51:15) [MSC v.1500 32 bit (Intel)] on win32. ***
    >>> mystring = “Python string slicing”
    >>> mystring[:-8]
    ‘Python string’
    >>>
    Negative indexes in python slices give the programmer lots of flexibility.

  3. Steven says:

    I’m not sure I agree with the change in array syntax for this specific case. What is wrong with the traditional helper routine LEFTSTR and creating a new function SPLIT that would handle the right side pattern (copy from 5 to end of string)?

    • Mason Wheeler says:

      A couple things. First, you’d have to remember 4 different functions (LeftStr, RightStr, MidStr, and Split) instead of one subrange notation that handles all of them. And second, those only work with strings. Copy works with all arrays. I used strings as examples here because they’re easy to use as examples, but that’s not the only place they could come in handy.

  4. François says:

    That is one of the things that I miss the most from APL where you can very easily cut, split, mirror, rotate, rearrange or index any structured variable (vectors in a large sense: arrays, matrices, cubes, hypercubes….).
    Wikipedia on APL: http://en.wikipedia.org/wiki/APL_(programming_language)

  5. GrandmasterB says:

    I dont agree that a change in syntax is warrented for this. I’d rather not clutter the language with new structures. Copy() is a very low level function – just make a nice slicing function on top of that. I have a ‘substr()’ function, for example, that I’ve been using for many years (based on the clipper function of the same name) that does exactly what the python slicing you describe does. There’s no need for a language change.

    Keep in mind, with unicode characters, you have variable length characters (2,3,4 bytes). So the compiler cant just translate such a slice syntax into a simple memory copy. It needs to have all the underlying functionality to work with variable length characters. So you’d be essentially inserting a function call anyways, if a slicing syntax was available.

    • Mason Wheeler says:

      One thing that slicing syntax can do that a function call can’t is enable you to put a default value on the first parameter but not the second. (As in the “mystring[:6]” example.)

  6. Wouter says:

    I like how you can get the last item from an array at index -1 in some languages (like perl)

  7. I agree with GrandmasterB: Why not simply a function Slice (or similar, as that may already be taken)? OK, not as elegant as a syntax change, but it does not require any changes to the compiler.

  8. Jolyon Smith says:

    You almost had me convinced until that final example which alone demonstrates how this sort of syntax change is less transparent than perhaps more verbose alternatives.

    myArray[..high(myArray) – 5]

    Consider just 2 things in this expression that if subtle mistakes were made would completely change the result:

    – Forget the leading “..” and you get a single element, not the final 5
    – Mistakenly substitute + for – and you get nothing at all (because you start at and then go 5 past the end of the array)

    Similar problems exist in other variants of the proposed syntax.. subtle errors in the syntax still compile but will give unexpected results, and unless you comment the code to describe what you are intending to do those mistakes won’t be obvious to a reader of the code:

    slice := myArray[1];

    If intended to get the LAST element in the array and simply in your rush happened to omit the leading ‘:’ (or ‘..’) then it is not at all obvious that this mistake is in the code unless you comment it:

    // Get the last element in the array
    slice := myArray[1];

    WHOOPS! (“Hey Bob … I think you made a mistake here…”)

    But at this point, faced with writing a comment that simply states what the code does (that is more verbose than a descriptive function call itself would be), we should find ourselves suddenly reminded one of the great advantages of Pascal …

    It may be verbose compared to other languages, but that means it is also to a far larger extent self documenting. Consider (for example):

    TailSlice(myArray, 1);

    Describes exactly what you intend to do and is more likely to either do precisely that or simply fail to compile if you make a simple error in syntax.

  9. nedko says:

    yep, it would be nice to have it

  10. Cameron says:

    This syntax would be nice to have but it is only slightly more verbose than a function with the same properties.

    Slice(str, 5, 10) = Fifth to the tenth character

    Slice(str, 0, 5) = First five characters from the beginning

    Slice(str, 5, 0) = Everything except the first five characters

    Slice(str, 5, -5) = Everything except the first and last five characters

    Personally, I want to see the “begin” to optional on all conditionals and loops. IMO it would be cleaner, easier to learn and with todays fast compilers no reason it couldn’t be done. Oh and a decent CASE statement, multi-line strings without the need to be escaped and lastly a UNICODE OFF switch.

    if x = 1 then
    begin
    blah1;
    blah2;
    end
    else
    begin
    blah3;
    blah4;
    end;

    to

    if x = 1 then
    blah1;
    blah2;
    else
    blah3;
    blah4;
    end;

  11. Isopod says:

    I think this would be a nice feature. Though I don’t think the idea with the negative numbers meaning “n-th from the right” would work because if this feature is added, I think for consistency it should also work with arrays – whose indices can be negative.

  12. Python/Delphi user says:

    Nice comments from everyone. I just want to point something as a Python programmer:
    – Slice syntax applies to any enumerable, not just strings (tuples, lists)
    – Slices are one of the reasons Python is so compact
    – Of course mistakes can be made! (to be fair, in almost any language). One of the Python important views, “We’re all consenting adults” acknowledges this: there is a price to pay for power. That is in contrast to say, Eiffel who wants to be your nanny.
    – No one is forced to use slices, you could use standard functions. Of course, in Python slices are faster than other ways so they are encouraged.

    Personally, I find slices elegant and powerful, that’s why I use them everywhere in production.
    Slices and dictionaries would make my life a lot easier in Delphi. Sure not everybody will think like me, but at least I’m allowed to dream.

    $0.02

Leave a Reply to Eric