From a741d06c0032aa25fc87b543974d8e653002d1aa Mon Sep 17 00:00:00 2001 From: Peter Aronoff Date: Sun, 17 Apr 2016 18:52:54 -0400 Subject: [PATCH] Prepare for split 3.0.0-1 --- CHANGES.md | 11 ++++ README.md | 109 ++++++++++++++++++++++++------- doc/changes.html | 14 ++++ doc/index.html | 109 ++++++++++++++++++++++++------- split-3.0.0-1.rockspec | 24 +++++++ src/split.lua | 27 ++++++-- test/test-basics.lua | 14 ++-- test/test-deviant-pattern.lua | 9 ++- test/test-empty-fields.lua | 22 +++---- test/test-first-and-rest.lua | 95 +++++++++++++++++++++++++++ test/test-information-fields.lua | 8 +-- test/test-iterator.lua | 50 +++++++------- test/test-utf8.lua | 18 ++--- 13 files changed, 401 insertions(+), 109 deletions(-) create mode 100644 split-3.0.0-1.rockspec create mode 100644 test/test-first-and-rest.lua diff --git a/CHANGES.md b/CHANGES.md index e5b8e42..21d6857 100644 --- a/CHANGES.md +++ b/CHANGES.md @@ -29,6 +29,17 @@ + Change the information variables to functions. These serve the same purpose, but don't use variable names that Lua explicitly warns users about. +## *3.0.0-1* (April 24, 2016) + ++ Clean up tests. ++ Change the name of `spliterator` to `each`. The new name is less silly and + hopefully clearer. **NB**: For the moment, `spliterator` is still provided as + an alias to `each`. However, in the next major version release (i.e., + 4.0.0-1), `spliterator` will be removed. Please start switching any code that + uses `spliterator` to `each`. ++ Add `first_and_rest`, a string equivalent to a function that splits a list + into its head and tail. + Would you rather view the [documentation][d]? [d]: /README.md diff --git a/README.md b/README.md index 900785b..a871ddb 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,9 @@ ## Synopsis -A string `split` function and iterator for Lua, which doesn't provide such -a function in its standard string library. Such a function is clearly useful, -and [many people have written their own][wiki]. +A string `split` function and iterator for Lua sicne Lua's standard sting +library doesn't provide such a function. When working with text `split` is very +useful, and [many people have written a version for Lua][wiki]. [wiki]: http://lua-users.org/wiki/SplitJoin @@ -14,16 +14,17 @@ and [many people have written their own][wiki]. The delimiter can be a literal string or a Lua pattern. The function returns a table of items found by splitting the string up into pieces divided by the - delimiter. - - Extra delimiters anywhere in the string will result in empty strings being - returned as part of the results table. + delimiter. If the delimiter is not present in the string, then the result + will be a table consisting of one item: the original string parameter. Extra + delimiters anywhere in the string will result in empty strings being returned + as part of the results table. The function also provides two shortcuts for common situations. If the delimiter parameter is an empty string, the function returns a table - containing every character in the original string as a separate item. If the - delimiter parameter is `nil`, the function considers this equivalent to the - Lua pattern `'%s+'` and splits the string on whitespace. + containing every character in the original string as a separate item. (I.e., + if the delimiter is the empty string, the function explodes the string.) If + the delimiter parameter is `nil`, the function considers this equivalent to + the Lua pattern `'%s+'` and splits the string on whitespace. Examples: @@ -39,41 +40,95 @@ and [many people have written their own][wiki]. * A special case: empty string delimiter - A pattern of the empty string is special. It tells the function to - return each character from the original string as an individual item. - Think of this as "explode the string". + If the delimiter is an empty string, the function returns each + character from the original string as an individual item. Think of + this as "explode the string". split('foo', '') -- returns {'f', 'o', 'o'} - * Another special case: nil delimiter + * Another special case: `nil` delimiter - Passing nothing or an explicit `nil` as the delimiter is a second - special case. `split` treats this as equivalent to a pattern of `'$s+'` - and splits on consecutive runs of whitespace. + Pass nothing or an explicit `nil` as the delimiter and `split` acts as + if the delimiter were `'$s+'`. This makes it easier to split on + consecutive runs of whitespace. split('foo bar buzz') -- returns {'foo', 'bar', 'buzz'} -+ `spliterator(string, delimiter) => custom iterator` ++ `each(string, delimiter) => custom iterator` + + **NB**: This function was previously called `spliterator`, but I've renamed + it to the shorter and less goofy `each`. In order to give people who might + rely on the previous name time to switch over, `spliterator` is still + provided as an alias for `each`. However, that name will be removed in the + next major version release (i.e., 4.0.0) of this module. - This is an iterator version of the same idea. Everything from above applies, - except that the function returns a custom iterator to work through results - rather than a table. + This is an iterator version of the same idea as `split`. Everything from + above applies, except that the function returns a iterator to work through + results rather than a table. - local spliter = require 'split'.spliterator + local split_each = require 'split'.each - local str = 'foo,bar,bizz,buzz,' + local str = 'foo,bar,bizz,buzz' local count = 1 - for p in spliter(str, ',') do + for p in split_each(str, ',') do print(count .. '. [' .. p .. ']') count = count + 1 end ++ `first_and_rest(string, delimiter) => string, string (or nil)` + + This function is a string equivalent for a function that divides a list into + its head and tail. The head of the string is everything that appears before + the first appearance of a specified delimiter; the tail is the rest of the + string. `first_and_rest` attempts to split a string into two pieces, and it + returns two results using Lua's multiple return. The exact return values vary + depending on the string and delimiter. + + In the simplest case, the string contains the delimiter at least once. If so, + the first return value will be the portion of the string before the first + appearance of the delimiter, and the second return value will be the rest of + the string after that delimiter. + + If the delimiter does not appear in the string, however, then there's no + possible split. In this case, the first return value will be the entire + string, and the second return value will be `nil`. (From Lua's point of view, + a second return value of `nil` is equivalent to saying that the function only + returns one value.) + + If the second return value is `nil`, there is probably a problem or malformed + record. So it will often make sense to test the second return value before + proceeding. For example: + + local head, tail = first_and_rest(record, '%s*:%s*') + if not tail then + -- Signal an error to the caller. + else + -- Process the record. + end + + A second complication is that the strings returned by the function may be + empty. If the delimiter is found, but the portion of the string before or + after it is zero-length, then an empty string may be returned. The examples + below show various possible outcomes. + + first_and_rest('head: tail', ': ') -- returns 'head', 'tail' + first_and_rest('head, tail', ': ') -- returns 'head, tail', nil + first_and_rest(': tail', ': ') -- returns '', 'tail' + first_and_rest('head: ', ': ') -- returns 'head', '' + + Like `split` and `each`, `first_and_rest` accepts `nil` or an empty string as + special cases for the delimiter. `nil` is automatically transformed into + '%s+', a generic "separated by space" pattern. In the case of an empty string + delimiter, `first_and_rest` returns the first character of the input and the + rest of the input. (This seems to be the only reasonable interpretation of + "exploding" the input string in the context of this function.) + ## Varia The module provides four informational functions that return strings. They should be self-explanatory. -+ `version() -- 2.0.0-1` ++ `version() -- 3.0.0-1` + `author() -- Peter Aronoff` @@ -86,8 +141,12 @@ should be self-explanatory. Many of my ideas came from reading [the LuaWiki page on split][wiki]. I thank all those contributors for their suggestions and examples. +[Alexey Melnichuk, AKA moteus][moteus] provided the idea and initial code for +`first_and_rest`. + All mistakes are mine. See [version history][c] for release details. +[moteus]: https://bitbucket.org/moteus [c]: /CHANGES.md --- diff --git a/doc/changes.html b/doc/changes.html index 6eedf6f..b5ce2b0 100644 --- a/doc/changes.html +++ b/doc/changes.html @@ -44,6 +44,20 @@

2.0.0-1 (March 5, 2016)

+

3.0.0-1 (April 24, 2016)

+ + + +

Would you rather view the documentation?


diff --git a/doc/index.html b/doc/index.html index 71bd16a..d44d7d1 100644 --- a/doc/index.html +++ b/doc/index.html @@ -12,9 +12,9 @@

split Synopsis

-

A string split function and iterator for Lua, which doesn’t provide such -a function in its standard string library. Such a function is clearly useful, -and many people have written their own.

+

A string split function and iterator for Lua sicne Lua’s standard sting +library doesn’t provide such a function. When working with text split is very +useful, and many people have written a version for Lua.

Usage

@@ -23,16 +23,17 @@

Usage

The delimiter can be a literal string or a Lua pattern. The function returns a table of items found by splitting the string up into pieces divided by the -delimiter.

- -

Extra delimiters anywhere in the string will result in empty strings being -returned as part of the results table.

+delimiter. If the delimiter is not present in the string, then the result +will be a table consisting of one item: the original string parameter. Extra +delimiters anywhere in the string will result in empty strings being returned +as part of the results table.

The function also provides two shortcuts for common situations. If the delimiter parameter is an empty string, the function returns a table -containing every character in the original string as a separate item. If the -delimiter parameter is nil, the function considers this equivalent to the -Lua pattern '%s+' and splits the string on whitespace.

+containing every character in the original string as a separate item. (I.e., +if the delimiter is the empty string, the function explodes the string.) If +the delimiter parameter is nil, the function considers this equivalent to +the Lua pattern '%s+' and splits the string on whitespace.

Examples:

@@ -49,37 +50,92 @@

Usage

  • A special case: empty string delimiter

    -

    A pattern of the empty string is special. It tells the function to - return each character from the original string as an individual item. - Think of this as “explode the string”.

    +

    If the delimiter is an empty string, the function returns each + character from the original string as an individual item. Think of + this as “explode the string”.

      split('foo', '') -- returns {'f', 'o', 'o'}
     
  • -
  • Another special case: nil delimiter

    +
  • Another special case: nil delimiter

    -

    Passing nothing or an explicit nil as the delimiter is a second - special case. split treats this as equivalent to a pattern of '$s+' - and splits on consecutive runs of whitespace.

    +

    Pass nothing or an explicit nil as the delimiter and split acts as + if the delimiter were '$s+'. This makes it easier to split on + consecutive runs of whitespace.

      split('foo       bar    buzz') -- returns {'foo', 'bar', 'buzz'}
     
  • -
  • spliterator(string, delimiter) => custom iterator

    +
  • each(string, delimiter) => custom iterator

    + +

    NB: This function was previously called spliterator, but I’ve renamed +it to the shorter and less goofy each. In order to give people who might +rely on the previous name time to switch over, spliterator is still +provided as an alias for each. However, that name will be removed in the +next major version release (i.e., 4.0.0) of this module.

    -

    This is an iterator version of the same idea. Everything from above applies, -except that the function returns a custom iterator to work through results -rather than a table.

    +

    This is an iterator version of the same idea as split. Everything from +above applies, except that the function returns a iterator to work through +results rather than a table.

    -
          local spliter = require 'split'.spliterator
    +
          local split_each = require 'split'.each
     
    -      local str = 'foo,bar,bizz,buzz,'
    +      local str = 'foo,bar,bizz,buzz'
           local count = 1
    -      for p in spliter(str, ',') do
    +      for p in split_each(str, ',') do
             print(count .. '. [' .. p .. ']')
             count = count + 1
           end
     
  • +
  • first_and_rest(string, delimiter) => string, string (or nil)

    + +

    This function is a string equivalent for a function that divides a list into +its head and tail. The head of the string is everything that appears before +the first appearance of a specified delimiter; the tail is the rest of the +string. first_and_rest attempts to split a string into two pieces, and it +returns two results using Lua’s multiple return. The exact return values vary +depending on the string and delimiter.

    + +

    In the simplest case, the string contains the delimiter at least once. If so, +the first return value will be the portion of the string before the first +appearance of the delimiter, and the second return value will be the rest of +the string after that delimiter.

    + +

    If the delimiter does not appear in the string, however, then there’s no +possible split. In this case, the first return value will be the entire +string, and the second return value will be nil. (From Lua’s point of view, +a second return value of nil is equivalent to saying that the function only +returns one value.)

    + +

    If the second return value is nil, there is probably a problem or malformed +record. So it will often make sense to test the second return value before +proceeding. For example:

    + +
          local head, tail = first_and_rest(record, '%s*:%s*')
    +      if not tail then
    +        -- Signal an error to the caller.
    +      else
    +        -- Process the record.
    +      end
    +
    + +

    A second complication is that the strings returned by the function may be +empty. If the delimiter is found, but the portion of the string before or +after it is zero-length, then an empty string may be returned. The examples +below show various possible outcomes.

    + +
          first_and_rest('head: tail', ': ') -- returns 'head', 'tail'
    +      first_and_rest('head, tail', ': ') -- returns 'head, tail', nil
    +      first_and_rest(': tail', ': ') -- returns '', 'tail'
    +      first_and_rest('head: ', ': ') -- returns 'head', ''
    +
    + +

    Like split and each, first_and_rest accepts nil or an empty string as +special cases for the delimiter. nil is automatically transformed into +‘%s+’, a generic “separated by space” pattern. In the case of an empty string +delimiter, first_and_rest returns the first character of the input and the +rest of the input. (This seems to be the only reasonable interpretation of +“exploding” the input string in the context of this function.)

  • @@ -89,7 +145,7 @@

    Varia

    should be self-explanatory.