Description
I've been thinking about re-writing JSVerbalExpressions to use function composition rather than the builder-like pattern it has now.
So now the README.md describes a simple example for using VerbalExpressions as such:
const tester = VerEx()
.startOfLine()
.then('http')
.maybe('s')
.then('://')
.maybe('www.')
.anythingBut(' ')
.endOfLine();
This can be described as a builder-like extension for the native RegExp
object; you can chain the expression and add more stuff to "build" a complete regular expression.
This is very clear approach for building simple, "one-dimensional" regular expressions. The problem with current implementation starts to surface when we start doing more complicated stuff like capture groups, lookaheads/behinds, using "or" pipe etc makes the expression quickly grow out of maintainability and readability.
For example, I think something like this is impossible to implement with VerbalExpressions at the moment:
/^((?:https?:\/\/)?|(?:ftp:\/\/)|(?:smtp:\/\/))([^ /]+)$/
To make it simpler, I'm proposing a 2.0 rewrite of VerbalExpressions that would take a functional approach, something like:
VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
)
Motivation for this approach would be:
- We can split regular expressions into multiple variables
- Naming "sub-expressions" allows better naming, different abstraction levels in regular expressions
- Each small part is testable with unit tests
- Makes grouping explicit (enforce closing an opened capture group)
So the simplest example could be something like this:
const regex = VerEx(
startOfLine,
"http",
maybe("s"),
"://",
maybe("www."),
anythingBut(" "),
endOfLine
);
And the complex example could be written e.g. like this:
VerEx(
startOfLine,
group(
or(
concat("http", maybe("s"), "://", maybe("www.")),
"ftp://",
"smtp://"
)
),
group(anythingBut(" /"))
);
While this looks a bit more complex, we can more easily split it up and name things:
const protocol = or(concat("http", maybe("s"), "://"), "ftp://", "smtp://");
const removeWww = maybe("www.");
const domain = anythingBut(" /");
const regex = VerEx(startOfLine, group(protocol), removeWww, group(domain));
This way we could test all of those "sub-expressions" (variables) in isolation.