-
Notifications
You must be signed in to change notification settings - Fork 68
Home
Neotoma is a packrat parser-generator for Erlang for Parsing Expression Grammars (PEGs). It consists of a parsing-combinator library with memoization routines, a parser for PEGs, and a utility to generate parsers from PEGs. It is inspired by treetop, a Ruby library with similar aims, and parsec, the parser-combinator library for Haskell.
- Clone the repository:
$ git clone git://github.com/seancribbs/neotoma.git
- Build the library:
$ cd neotoma
$ make - Start the Erlang shell and generate your parser:
$ erl -pa ebin
1> peg_gen:file(“mygrammar.peg”).
ok
Neotoma’s PEG grammars are based on the grammars from Brian Ford’s thesis with some influences from Treetop. The basic format is thus:
nonterminal <- parsing_expression;
Where parsing_expression
is any combination of nonterminals, terminals and sub-expressions (e
, e1
, e2
are parsing expressions) as described below:
Non-terminal symbol | some_nonterminal |
All nonterminals on the RHS must have a corresponding rule/reduction. |
String | "Hello, world" |
single- or double-quoted, quotes escaped with \\
|
Character class | [a-zA-Z0-9]
|
just as in PCRE |
Any single character | . |
|
Sequence | e1 e2
|
|
Ordered choice | e1 / e2 |
|
Grouping | (e) |
|
Zero-width positive lookahead | &e |
|
Zero-width negative lookahead | !e |
|
Optional (zero-or-more) repetition | e* |
|
Mandatory (one-or-more) repetition | e+ |
|
Optional expression | e?
|
|
Label | name:e |
Helps extract sub-expressions from the AST |
Currently all reductions must end with a semi-colon ;
.
Without specifying any transformations, Neotoma will return a nested list of the results of its parse — essentially an S-expression. In this form, the AST is not very useful; one needs to transform and annotate the tree into a useful data structure. Neotoma provides hooks into the parsing process in the form of the transform/3
function. Once you have generated your parser, you can edit this function in the generated file. The prototype is thus:
transform('nonterminal', Node, Index)
-
nonterminal
is the nonterminal that was successfully parsed. -
Node
is a list of the results from sub-expressions, which may be raw terminals or the transformations of other nonterminals. -
Index
is a tuple representing the position of the parser at the start of this expression, in the form{{line, L},{column,C}}
whereL
andC
are both integers.
While editing this within the generated parser is easy, Neotoma does not currently allow Erlang transformation code inline with the grammar; therefore, I recommend that you put your transformations in a separate module. Doing so will allow you to develop your grammar and transformations independently, without the parser-generator overwriting your transformations. You can do this by specifying the transform_module
option to peg_gen:file/2
. The module will be generated for you if it does not exist already. An example:
1>peg_gen:file("mygrammar.peg", [{transform_module, myast}]).
- Transformation code and supplemental code inline with the grammar.
- Support for parsing in binary form/UTF.
- Support for LFE and Reia.