Skip to content

Latest commit

 

History

History
181 lines (147 loc) · 9.44 KB

README.md

File metadata and controls

181 lines (147 loc) · 9.44 KB

apg-exp - APG Expressions

Deprecated Use the updated version apg-js instead.

apg-exp is a regex-like pattern-matching engine that uses a superset of the ABNF syntax for the pattern definitions and APG to create and apply the pattern-matching parser.

Tutorial: Don't miss the tutorial on sitepoint.com. It will walk you through the basics from simple to some fairly sophisticated pattern matching of nested, paired parentheses and other brackets. (Something you can't do with RegExp.) It's all laid out for you with nine (9), hands-on, CodePen examples.

Complete User's Guide: A complete user's guide can be found at ./guide/index.html or the APG website.

v2.1.0 release notes: There are no functional changes in version 2.1.0. Its dependency on apg has been modified to depend instead on the new apg API, apg-api. This removes all dependency on the node.js file system module "fs". Some development frameworks are incompatible with "fs".

apg-exp: By way of introduction, the regex Wikipedia article would be a good start and Jeffrey Friedl's book, Mastering Regular Expressions would be a lot better and more complete. This introduction will just mention features, a little on motivation and try to point out some possible advantages to apg-exp.

Features:

  1. The pattern syntax is a superset of ABNF (SABNF.) The ABNF syntax is standardized for and used to describe most Internet technical specifications.
  2. APG provides error checking and analysis for easy development of an accurate syntax for the desired pattern.
  3. Pattern syntax may be input as SABNF text or as an instantiated, APG parser object.
  4. Gives the user complete control over the pattern's character codes and their interpretation.
  5. Easy access to the full UTF-32 range of Unicode is provided naturally through the integer arrays that make up the character-coded strings and phrases.
  6. Results provide named access to all matched sub-phrases and the indexes where they were found, not just the last matched.
  7. Results can be returned as JavaScript strings or raw integer arrays of character codes.
  8. Global and "sticky" flags operate nearly identically to the same-named JavaScript RegExp flags.
  9. Recursive patterns are natural to the SABNF syntax for easy pair matching of opening and closing parentheses, brackets, HTML tags, etc.
  10. Fully implemented lookaround – positive and negative forms of both look-ahead and infinite-length look-behind.
  11. Back referencing – two modes, universal and parent. See the definitions in the SABNF documentation. For example, parent mode used with recursion can match not only the opening and closing tags of HTML but also the tag names in them. (See the back reference example.)
  12. Word and line boundaries are not pre-defined. By making them user-defined they are very flexible but nonetheless very easy to define and use. The user does not have to rely on or guess about what the engine considers a boundary to be.
  13. Character classes such as \w, \s and . are not pre-defined, providing greater flexibility and certainty to the meaning of any needed character classes.
  14. The syntax allows APG's User-Defined Terminals (UDTs) – write your own code for special phrase matching requirements. They make the phrase matching power of apg-exp essentially Turing complete.
  15. Provides the user with access to the Abstract Syntax Tree (AST) of the pattern match. The AST can be used for complex translations of the matched phrase. (See the dangling-else example.)
  16. Provides the user with access to APG's trace object which gives a complete, step-by-step picture of the parser's matching process for debugging purposes.
  17. A very flexible replacement function for replacing patterns in strings.
  18. A split function for using patterns to split strings.
  19. A test function for a quick yes/no answer.
  20. Tree depth and parser step controls to limit or "put the brakes on" an exponential or "catastrophic backtracking" syntax.
  21. Numerous display functions for a quick view of the results as text or HTML tables.

Introduction:
The motivation was originally twofold.

  1. I wanted to replace the pattern syntax with ABNF, which to me at least, is much easier to read, write and debug than the conventional regex syntax.
  2. I felt (mistakenly) that a recursive-descent parser like APG would prove to be much more a powerful pattern matcher than regular expressions.

Hardly any programmer has not needed regexes at some point, more likely lots of points, and it doesn't take much reading of the Internet forums to note that many others, like me, find the regex syntax to be quite cryptic. Additionally, because regexes have such a long, rich history with many versions from many (excellent) developers, there are many different syntax variations as you move from system to system and language to language. By contrast ABNF is standardized (although my non-standard superset additions are starting to pile up.) Whether or not the ABNF syntax is preferable to conventional regex syntax will always be a personal preference. But, for me and possibly others, ABNF offers a more transparent syntax to work with.

At the outset I naively thought that the regular expressions of regexes were just that – the Chomsky hierarchy variety. Therefore, I thought that using an APG parser for the pattern matching would add a great deal of parsing power to the problem. I soon discovered that not only were regexes not real "regular expressions", they were powerful, recursive-descent parsers, loaded with features that went well beyond that of APG. I had to play a little catch up to add look behind, back referencing and anchors. That being done, however, I think there is still a case for claiming some added power. I'm not a regex expert and I won't be making any big claims here, but there are a couple of points I will mention. I think the way that apg-exp gives the user nearly full control over the input, output and interpretation of the character codes goes a long way to address a number of the cautions mentioned in Jeffrey Friedl's book, for example on pages 92 and 106. I also think it addresses a number of the things Larry Wall finds wrong with the regex culture in his Apocalypse 5 page. For example, back referencing, support for named capture, nested patterns (recursive rules), capture of all matches to a sub-phrase and others.

But the best thing to do, probably, is to head over to the examples and take a look. See and compare for yourself. I would suggest starting with the flags, display and rules examples to get your bearings and go from there.

Installation:
GitHub: In your project directory,

git clone https://github.com/ldthomas/apg-js2-exp.git apgexp
npm install apgexp --save

npm: In your project directory,

npm install apg-exp --save

web page:

git clone https://github.com/ldthomas/apg-js2-exp.git apgexp

Then, in the header of your web page include,

<link rel="stylesheet" href="./apgexp/apgexp.css">
<script src="./apgexp/apgexp.js" charset="utf-8"></script>

or,

<link rel="stylesheet" href="./apgexp/apgexp-min.css">
<script src="./apgexp/apgexp-min.js" charset="utf-8"></script>

(Note that some apg-exp output is in HTML format and apgexp.css is needed to properly style it. Also, it is simply a copy of apglib.css.)

Now access apg-exp as,

<script>
var exp = new ApgExp(pattern);
</script>

See, specifically, the email example.

Examples:
See apg-js2-examples/apg-exp for many more examples of using apg-exp.

Documentation:
The full documentation is in the code in docco format. To generate the documentation, from the package directory:

npm install -g docco
./docco-gen

View docs/index.html in any web browser to get started. Or view it on the APG website

Copyright:
Copyright © 2017 Lowell D. Thomas, all rights reserved

License:
Released under the BSD-3-Clause license.