Skip to content

Functional rewrite #196

Open
Open
@jehna

Description

@jehna

I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.

So now the README.md describes a simple example for using VerbalExpressions as such:

const tester = VerEx()
    .startOfLine()
    .then('http')
    .maybe('s')
    .then('://')
    .maybe('www.')
    .anythingBut(' ')
    .endOfLine();

This can be described as a builder-like extension for the native RegExp object; you can chain the expression and add more stuff to "build" a complete regular expression.

This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the expression quickly grow out of maintainability and readability.

For example, I think something like this is impossible to implement with VerbalExpressions at the moment:

/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/

To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:

VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
)

Motivation for this approach would be:

  • We can split regular expressions into multiple variables
    • Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
    • Each small part is testable with unit tests
  • Makes grouping explicit (enforce closing an opened capture group)

So the simplest example could be something like this:

const regex = VerEx(
  startOfLine,
  "http",
  maybe("s"),
  "://",
  maybe("www."),
  anythingBut(" "),
  endOfLine
);

And the complex example could be written e.g. like this:

VerEx(
  startOfLine,
  group(
    or(
      concat("http", maybe("s"), "://", maybe("www.")),
      "ftp://",
      "smtp://"
    )
  ),
  group(anythingBut(" /"))
);

While this looks a bit more complex, we can more easily split it up and name things:

const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));

This way we could test all of those "sub-expressions" (variables) in isolation.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions