[RFC007] Migrate the parser to the new AST #2083

yannham · 2024-10-29T13:59:10Z

Migrate the parser to the new AST

Following the step by step implementation of RFC07, this PR migrates the parser to output the new AST, The plan is to convert this to the old AST for the remaining of the pipeline (typechecking, transformations, and evaluation), and to measure if that conversion is noticeable on various type of examples (small, big, small but importing big contracts, libraries, etc.)

Reviewing

The diff is quite big, but a lot of it is mostly mechanical changes. In particular I'm not sure that reviewing the change of the grammar.lalrpop file is really that interesting. Changes to bytecode::ast, bytecode::ast::compat and other modules are probably worth looking into it, though. For parser::utils and parser::uniterm, I'm not sure: most of it is mechanical, but wasn't entirely trivial either, in particular the gymnastic around type variable fixing. The latter might benefit from a pair of eyes.

Perf impact

I've written a detailed report below about the performance impact of this PR. The TL;DR is that although ast conversion is taking up a surprisingly high chunk of the overall parsing time, the net result is in the few percent difference and thus in the noise threshold, given that I also had net improvements on some runs. I think this is thus reasonable to move forward, and the more of the pipeline we'll migrate to the new AST, the better the performance, given that the new AST is smaller and has better locality.

github-actions · 2024-11-20T15:39:35Z

Bencher Report

Branch	rfc007/parsing
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	nanoseconds (ns)
fibonacci 10	📈 view plot 🚷 view threshold	493,720.00
foldl arrays 50	📈 view plot 🚷 view threshold	1,648,400.00
foldl arrays 500	📈 view plot 🚷 view threshold	6,767,200.00
foldr strings 50	📈 view plot 🚷 view threshold	6,967,400.00
foldr strings 500	📈 view plot 🚷 view threshold	61,108,000.00
generate normal 250	📈 view plot 🚷 view threshold	42,006,000.00
generate normal 50	📈 view plot 🚷 view threshold	1,905,000.00
generate normal unchecked 1000	📈 view plot 🚷 view threshold	3,325,300.00
generate normal unchecked 200	📈 view plot 🚷 view threshold	758,090.00
pidigits 100	📈 view plot 🚷 view threshold	3,166,100.00
pipe normal 20	📈 view plot 🚷 view threshold	1,489,500.00
pipe normal 200	📈 view plot 🚷 view threshold	9,963,400.00
product 30	📈 view plot 🚷 view threshold	832,600.00
scalar 10	📈 view plot 🚷 view threshold	1,532,500.00
sum 30	📈 view plot 🚷 view threshold	811,370.00

🐰 View full continuous benchmarking report in Bencher

First stab at making the parser compatible with the new AST representation (`bytecode::ast::Ast`). This is a heavy refactoring which required to update most of `parser::uniterm` and `parser::utils` as well as `grammar.lalrpop`. The current version is far from compiling; fixing compiler errors is planned in follow-up work.

As we move toward a bytecode compiler and a bytecode virtual machine, we are replacing the left part of the pipeline with the new AST representation. The bytecode module was previously gated by an experimental feature, thea idea being that this feature would enable the whole bytcode compiler pipeline. However, for now, we only have a new AST representation, and it's being used in the mainline Nickel parser (and soon, in the typechecker, etc.). Thus we need access to the new AST representation by default, and it doesn't make much sense to gate it behind a feature. We'll reintroduce the feature once we have a prototype compiler and a bytecode virtual machine, when it will then make sense to use the feature to toggle between the legacy tree-walking interpreter and the new bytecode compiler.

…esolution for RepeatSep1)

…cords)

yannham · 2024-11-25T17:25:04Z

Nickel AST conversion impact report

Here is a preliminary performance impact report of this change on small and larger examples.

Methodology

I did an end-to-end run of nickel eval <foo> --metrics 1>/dev/null (or sometimes nickel eval --field foo when relevant, for example for Organist) on a dev profile with metrics enabled, comparing master (f1c826d) and the HEAD of this PR (4641104).

Metrics gather runtime of part of the pipeline in milliseconds. Note that not all operations are measured, but the measures still cover most of the actual runtime.

Here is what's been measured:

runtime:ast_conversion: for the version of this PR, measure the time of calling from_ast on the root node after parsing to the new representation. This includes the conversion for the stdlib.
runtime:eval: measure the time taken for pure evaluation (vm.eval()).
runtime:parse:nickel: measure the time taken to call the parse_xxx methods for Nickel code (excluding JSON etc.). Note that for the version of this PR, this includes the AST conversion.
runtime:prepare_main: time taken to call prepare_eval on the main program. This includes parsing, typechecking, and program transformation. This excludes the preparation of the stdlib.
runtime:prepare_stdlib: time taken to call prepare_eval on the stdlib. This includes parsing, typechecking, and program transformation.
runtime:type_check: time taken by calling type_check on the root node of the AST.
total: the total is computed as prepare_main + prepare_stdlib + eval, as preparation includes most of the beginning of the pipeline (parsing and AST conversion included). Note that this isn't the actual whole runtime of the command, because some things aren't measured, but should account for most of it. This is just the total of the stuff that we measure.

So, there are some inclusions here: type_check <= prepare_main, ast_conversion <= parse:nickel <= prepare_main + prepare_stdlib. eval is disjoint from the rest.

Findings

It's both surprising and interesting that my initial expectations were somehow not entirely right. AST conversion is taking a large part of the overall parsing, around 55-65% consistently across all example size. Since very small examples are dwarfed by preparing the stdlib, and thus dominated by parsing it, this looks bad at first: we take a big hit on parsing on small examples.

However, it's not the case: when comparing with the version that doesn't perform AST conversion, this one actually sometimes perform better overall! In general the difference on small examples is very small and in the noise threshold (the order of a percent), and can be in both directions, both for the total runtime and when comparing parsing times. My interpretation is that on master, we're actually spending more than half of the parsing time allocating Rcs in the heap to build the current AST. The new AST seems to be very performant to build, so now this allocation cost is just mostly delayed to the AST conversion phase.

For bigger examples, the Mantis case is around +6% of parsing time, and +4.3% overall overhead. It's unique it that it has a very low evaluation time (it's almost a static config), and is thus still dominated somehow by parsing.

The other large examples are heavily dominated by evaluation (more than 95%), and only take a few percent for parsing, so this PR is even less relevant for the overall performance. I suspect the difference we can see (like +3.3% on OPL v2) is rather due to variability, as it seems pure evaluation is taking a small hit but there's no reason for it. As some examples are a bit long to run, I haven't done averaging or warming so we can expect a bit of volatility. Hopefully it's still enough to validate that this change doesn't seem to have much performance impact (and that the new AST might get us parsing time down by around 50% once we don't have to perform the conversion anymore!).

core/src/parser/uniterm.rs

Co-authored-by: jneem <[email protected]>

yannham force-pushed the rfc007/parsing branch from 9655af8 to 751da65 Compare October 30, 2024 14:49

yannham mentioned this pull request Oct 31, 2024

[RFC007] Add a builder module for the new AST #2085

Merged

yannham force-pushed the rfc007/parsing branch 3 times, most recently from 8cedc56 to 2de6236 Compare November 20, 2024 09:12

yannham mentioned this pull request Nov 20, 2024

Add missing implementation of from_ast for Record #2100

Merged

yannham force-pushed the rfc007/parsing branch from 2de6236 to f1da1dd Compare November 20, 2024 09:38

yannham mentioned this pull request Nov 20, 2024

[RFC007] Add Seal and Unseal to the new AST primops #2101

Merged

yannham force-pushed the rfc007/parsing branch 2 times, most recently from 74f9906 to eb338a1 Compare November 20, 2024 15:24

yannham mentioned this pull request Nov 21, 2024

[RFC007] Improve/simplify record representation in the new AST #2102

Merged

yannham force-pushed the rfc007/parsing branch from 0117944 to a394c35 Compare November 21, 2024 21:07

yannham added 17 commits November 22, 2024 15:41

Fix almost all grammar errors, fix parser/mod.rs

8483d74

Fix last errors to make it compile

3a611b1

Fix curried operator handling and make its impl nicer

caa860e

Revert to the previous handling of last fields (might need conflict r…

01164a3

…esolution for RepeatSep1)

Fix compilation errors and spurious grammar ambiguity

1ee2261

Fix unwrapping position panicking

85fd2b0

Fill todo!() when parsing seal/unseal

7887176

Entirely get rid of rec priorities leftovers

d9ea8ad

Fix fix_type_vars for forall binders, improve code doc sporadically

458f1f2

Fix handling of zero-ary application/variable

0539878

Fix test code and corner case of new -> mainline conversion

0bfc727

[Maybe to drop?] Fix failing test (symbolic string being recursive re…

33735fe

…cords)

Fix swapped seal/unseal

fcb29c2

Fix missing position for elaborated merge (piecewise defs)

ed6d0bc

Remove FieldDef and record elaboration from parser

440a827

yannham added 3 commits November 22, 2024 17:42

Fix compilation error after rebase

a770465

Fix missing field name; dont use generated ident for op curryfication

1f8fb29

Fix missing position panic, remove unused function

1274e47

yannham force-pushed the rfc007/parsing branch from a394c35 to 1274e47 Compare November 25, 2024 13:54

Add measures for AST conversion

4641104

yannham marked this pull request as ready for review November 25, 2024 17:35

yannham requested a review from jneem November 25, 2024 17:40

Fix clippy and cargo doc warnings

e86bc0a

yannham force-pushed the rfc007/parsing branch from 052eb46 to e86bc0a Compare November 26, 2024 17:41

jneem approved these changes Nov 27, 2024

View reviewed changes

core/src/parser/uniterm.rs Outdated Show resolved Hide resolved

core/src/parser/uniterm.rs Show resolved Hide resolved

Update core/src/parser/uniterm.rs

72d663d

Co-authored-by: jneem <[email protected]>

yannham enabled auto-merge November 27, 2024 09:45

yannham added this pull request to the merge queue Nov 27, 2024

Merged via the queue into master with commit 768e1d2 Nov 27, 2024
5 checks passed

yannham deleted the rfc007/parsing branch November 27, 2024 10:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC007] Migrate the parser to the new AST #2083

[RFC007] Migrate the parser to the new AST #2083

yannham commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

yannham commented Nov 25, 2024 •

edited

Loading

jneem left a comment

[RFC007] Migrate the parser to the new AST #2083

[RFC007] Migrate the parser to the new AST #2083

Conversation

yannham commented Oct 29, 2024 • edited Loading

Migrate the parser to the new AST

Reviewing

Perf impact

github-actions bot commented Nov 20, 2024 • edited Loading

Bencher Report

yannham commented Nov 25, 2024 • edited Loading

Nickel AST conversion impact report

Methodology

Findings

Small size programs

Arrays example

Before this PR

After this PR

Fibonacci example

Before this PR

After this PR

GCC config example

Before this PR

After this PR

Mid to large size programs

Organist project.ncl

Before this PR

After this PR

Mantis benchmark

Before this PR

After this PR

OPL config test v1

Before this PR

After this PR

OPL config test v2

Before this PR

After this PR

jneem left a comment

Choose a reason for hiding this comment

yannham commented Oct 29, 2024 •

edited

Loading

github-actions bot commented Nov 20, 2024 •

edited

Loading

yannham commented Nov 25, 2024 •

edited

Loading