Tuesday 20 September 2016

Lua Patterns, Red Parse, and Regex

In my Learning Lua project, I needed to extract a four character code from a string. Many languages use a builtin Regex implementation to provide the capability to match and extract patterns from strings. I am also familiar with Red, which like Rebol before it, provides the Parse Dialect (a Domain Specific Language (DSL)) which not only has excellent string matching and extraction features but makes it easy to write further DSLs. 

I found that Lua has another approach. It has a built-in lightweight pattern language, known as PatternsFrom what I understand, it was designed to be simpler than Regex both in its "grammar" and its capability. (Lua does have Regex modules available. It also has the very interesting LPeg pattern-matching library, based on Parsing Expression Grammars (PEGs).)

I thought it would be interesting to compare the three approaches, patterns, parse and regex. Here are three versions of a short function to extract the first four consecutive alphabetic characters from a string. The functions return an empty string if there is no match.

First Regex, courtesy of Ruby:

    def parse_line line
      ticker = line.match(/[A-Za-z]{4,4}/)
      if ticker then
        ticker[0]
      else
        ''
      end
    end

Note: 
    I'm no Regex expert, there may well  be a better way, and probably a more Rubyish way too.

Second Parse, courtesy of Red

parse-line: function [line [string!] /local ticker][
alpha: charset [#"A"-#"Z" #"a"-#"z"] parse line [to copy ticker 4 alpha] any [ticker ""] ]

Notes: 
    1. The charset allows you to efficiently define set of characters to be matched.
    2. The parse function apply the rules provided between the []s to the input.
    3. A parse rule can have any number of sub-rules.
    4. Red Parse details.

Finally Patterns, courtesy of Lua

    function stockfetch.parseLine (line)
      local match = string.match(line, '(%a%a%a%a)')
      if match then 
        return match
      else
        return ''
      end
    end

There doesn't seem much to chose between the approaches for such a simple task. The Lua pattern is very readable but you do have to know (or lookup) that %a matches an alphabetic character. The Regex is self-explanatory once you understand Regex syntax. The Red parse rule may seem a little more complicated because it also specifies to where the data are extracted (copy ticker) and has to ensure that the rules cover the whole supplied string (to end).


It will be interesting to see how the three approaches compare with more challenging requirements. Perhaps, I'll be able to add LPeg into the equation too.

Note: I am a strong supporter of and contributor to Red and am highly biased towards its Parse DSL. 

No comments: