Breaking changes for v1 #119

jakobnissen · 2023-03-09T08:30:57Z

Supersedes #95

Todo

Fixup the commit history (review if fixes are necessary)
Get test coverage near 100%
Figure out if you can get CI on M1 mac

Closes #52
Closes #71
Closes #80
Closes #82
Closes #91
Closes #102
Closes #111
Closes #115
Closes #116

codecov · 2023-03-09T08:33:14Z

Codecov Report

Patch coverage: 94.75% and project coverage change: +6.50 🎉

Comparison is base (3b5c913) 89.48% compared to head (fe0feb1) 95.98%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #119      +/-   ##
==========================================
+ Coverage   89.48%   95.98%   +6.50%     
==========================================
  Files          14       16       +2     
  Lines        1683     1793     +110     
==========================================
+ Hits         1506     1721     +215     
+ Misses        177       72     -105

Flag	Coverage Δ
unittests	`95.98% <94.75%> (+6.50%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/action.jl	`97.43% <ø> (ø)`
src/dot.jl	`95.78% <ø> (ø)`
src/precond.jl	`84.09% <ø> (ø)`
src/codegen.jl	`91.69% <87.50%> (+0.10%)`	⬆️
src/workload.jl	`89.47% <89.47%> (ø)`
src/re.jl	`93.40% <95.74%> (+0.73%)`	⬆️
src/tokenizer.jl	`96.92% <96.72%> (+4.87%)`	⬆️
src/stream.jl	`96.92% <96.92%> (ø)`
src/Automa.jl	`100.00% <100.00%> (ø)`
src/byteset.jl	`100.00% <100.00%> (ø)`
... and 4 more

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

kescobo · 2023-04-22T01:06:40Z

FYI, it is still on my radar to kick the tires of this. I think early next month is probably realistic.

kescobo · 2023-07-05T19:37:25Z

I think early next month is probably realistic.

I was wroooong 😭

Do you have a timeline for when you want this merged. Realistically, I need some pressure to make it a priority

jakobnissen · 2023-07-07T12:59:19Z

Haha don't I know that feeling. I also expected I would be done with this branch in the summer of 2022, so... 😅
My holidays start today, so I expect I can finish the work before the holidays are over in about 3 weeks. Maybe. No promises. Although if you want a deadline, then July 17th is probably when I'll pick this work up and finish it.

Honestly, four generators and a tokenizer was excessive. Inline generator was probably not really used, and the new simd generator made the goto generator obsolete. Remove inline and old goto generator, and rename simd generator goto.

Before this PR, a user could forget or mis-spell a symbol in the action Dict passed to generate_exec_code. Now add a check to throw an error if this happens.

nfa2dfa now errors if any RE object has a .actions field with an unsupported key. This prevents a user from mistyping e.g. `pat.actions[:etner] = [:foo]` and having it silently do nothing.

The user probably wants to use the input error code in most use cases, except when running a machine in execute mode, debug mode, or running a validator. Adding the input error code to generated Readers is a challenge for another day

One of the issues with using Automa is that its lack of exports makes it unclear what is internal/external. Also, the three typical using-statements needed to use Automa is just visual noise.

If two edges in equivalent paths have distinct preconditions, these could be used to distinguish the two edges, and so they should not provoke an error.

The gensym'd symbols in Automa's constant expressions are computed on precompilation of Automa. These can then clash with symbols that are computed on the precompilation of any particular generated code in downstream packages leading to very confusing bugs.

With the :goto generator, checking bounds is not permitted anyway. And the table generator is so slow that it makes no sense to disable checking of bounds - then you might as well use the goto generator

Many potential users of Automa are not interested in parsing from IOs, but only buffers. For those users, the IO-parsing functionality of Automa is not needed, and so there is no need for dependency on TranscodingStreams.

NFAs with ambiguities often contain multiple ambiguities. Displaying the simplest ambiguity when erroring makes debugging easier - especially compared to when the shown ambiguity can never happen due to another ambiguity.

An oversight in the ambiguity check meant that actions placed on non-epsilon edges were accidentally not included in the paths for validation. MWE: `compile(onfinal!(re"a", :a) | onfinal!(re"a", :b))` This breaks tokenizers, so we manually skip ambiguity check in tokenizers. In the case of conflicting actions in tokenizers, this will cause the longest matching token to be emitted.

The tokenizer has a completely new design and API. * It's now much easier to use * It's now lazy by default * It's much faster, although not completely optimised. Its API is amenable to further optimisation * It handles errors automatically See issue #116

Users should not have access to the module directly. Instead, export the RE struct, and also allow users to construct regex with `RE(str)`.

Instead of buffering an entire line, simply keep track of the number of columns cleared from the buffer. This reaches some more into TranscodingStreams privates, but it's well tested.

Currently, the functions `re2nfa` and `nfa2dfa` can produce dead (unreachable) nodes, which is pointless. Instead of relying on the user to themselves remove dead nodes by calling `remove_dead_nodes`, this should just happen automatically.

Since Automa doesn't just extend an existing function but adds a new function based on TranscodingStreams, this is not the intended use case of extensions.

I've come to the conclusion that Julia does not make it possible to robustly check what CPU instructions the user has available. The current options are all undocumented, complex and brittle, and not suitable for code that cannot be accepted to break at any time Whenever a robust way of checking for CPU instructions are available, the change is easy to revert.

Allow preconditions to be set to `:enter` only, and have directly conflicting preconditions resolve an ambiguous NFA.

jakobnissen · 2023-07-19T07:41:58Z

Ok I'm going to merge this now and release Automa v1. @kescobo

jakobnissen mentioned this pull request Mar 9, 2023

Breaking changes for V1 #95

Merged

jakobnissen force-pushed the v1 branch 2 times, most recently from 2999123 to fedab05 Compare March 9, 2023 08:51

jakobnissen force-pushed the v1 branch 2 times, most recently from 98283df to 9187d9e Compare July 19, 2023 05:52

jakobnissen added 21 commits July 19, 2023 09:24

Remove SIMD and inline generator

4b6fe1f

Honestly, four generators and a tokenizer was excessive. Inline generator was probably not really used, and the new simd generator made the goto generator obsolete. Remove inline and old goto generator, and rename simd generator goto.

Remove loop unrolling

93aef95

Allow CodeGenContext to be elided

67556af

Update benchmarks

088b9e4

Saner default for p_eof and p_end

8ad0503

Add generate_code convenience function

1e95ee4

Add validator convenience function

cbc8215

Enable ambiguity check

7c57e3c

Enforce action dict symbols are same as machines

1e79027

Before this PR, a user could forget or mis-spell a symbol in the action Dict passed to generate_exec_code. Now add a check to throw an error if this happens.

Add check for invalid RE.action keys

5357b75

nfa2dfa now errors if any RE object has a .actions field with an unsupported key. This prevents a user from mistyping e.g. `pat.actions[:etner] = [:foo]` and having it silently do nothing.

Add input error code to generate_code function

5d5e517

The user probably wants to use the input error code in most use cases, except when running a machine in execute mode, debug mode, or running a validator. Adding the input error code to generated Readers is a challenge for another day

Export user-facing names

d238472

One of the issues with using Automa is that its lack of exports makes it unclear what is internal/external. Also, the three typical using-statements needed to use Automa is just visual noise.

Check preconditions before declaring NFA ambiguous

f333cfb

If two edges in equivalent paths have distinct preconditions, these could be used to distinguish the two edges, and so they should not provoke an error.

Fix EOF check in machine error code

a743327

Add more comments

01e26e5

Also trigger default error when cs > 0

46f8cfb

Remove checkbounds option

d3b6ada

With the :goto generator, checking bounds is not permitted anyway. And the table generator is so slow that it makes no sense to disable checking of bounds - then you might as well use the goto generator

Make clean work

fe52c0a

Make Variables easier to construct

8b50068

Add tests for regex set operations

f288142

jakobnissen added 24 commits July 19, 2023 09:24

Make TranscodingStreams an optional dependency

afb2e47

Many potential users of Automa are not interested in parsing from IOs, but only buffers. For those users, the IO-parsing functionality of Automa is not needed, and so there is no need for dependency on TranscodingStreams.

Error with shortest known ambiguity

6ad6a99

NFAs with ambiguities often contain multiple ambiguities. Displaying the simplest ambiguity when erroring makes debugging easier - especially compared to when the shown ambiguity can never happen due to another ambiguity.

Rewrite tokenizer

decd39f

The tokenizer has a completely new design and API. * It's now much easier to use * It's now lazy by default * It's much faster, although not completely optimised. Its API is amenable to further optimisation * It handles errors automatically See issue #116

Rename generate_validator_function

a2681b5

Export regex struct instead of module

23f47e2

Users should not have access to the module directly. Instead, export the RE struct, and also allow users to construct regex with `RE(str)`.

Tweak: Allow | and & ops b/w chars/str and RE

fb98649

Remove report_col kwarg

35d578b

Instead of buffering an entire line, simply keep track of the number of columns cleared from the buffer. This reaches some more into TranscodingStreams privates, but it's well tested.

Add SnoopPrecompile

2d16183

Rewrite documentation

937bebb

Bump CI version to Julia 1.6

1dda546

Make generate_buffer_validator goto into kwarg

151a4c5

Update README.md

2bb69f2

Add documentation preview

3e04679

Always remove dead nodes

3c953e6

Currently, the functions `re2nfa` and `nfa2dfa` can produce dead (unreachable) nodes, which is pointless. Instead of relying on the user to themselves remove dead nodes by calling `remove_dead_nodes`, this should just happen automatically.

Do not make TranscodingStreams an extension

6b7ef62

Since Automa doesn't just extend an existing function but adds a new function based on TranscodingStreams, this is not the intended use case of extensions.

Add todo to gitignore

fec085b

Migrate from SnoopPrecompile to PrecompileTools

76145dc

Fix preconditions

6e3b676

Allow preconditions to be set to `:enter` only, and have directly conflicting preconditions resolve an ambiguous NFA.

Add more tests

e2b285a

Some JET fixes

355c869

Improve string/char and RE operations

b72dae5

Fix Project

fe0feb1

jakobnissen force-pushed the v1 branch from 181463f to fe0feb1 Compare July 19, 2023 07:31

jakobnissen merged commit d032b16 into master Jul 19, 2023

jakobnissen deleted the v1 branch July 19, 2023 07:42

jakobnissen mentioned this pull request Jul 19, 2023

[Question] Practical example of how to use the when field in RE #52

Closed

CiaranOMara mentioned this pull request Sep 3, 2023

Cap Automa compat to address Stream submodule removal BioJulia/GFF3.jl#16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Breaking changes for v1 #119

Breaking changes for v1 #119

jakobnissen commented Mar 9, 2023 •

edited

Loading

codecov bot commented Mar 9, 2023 •

edited

Loading

kescobo commented Apr 22, 2023

kescobo commented Jul 5, 2023

jakobnissen commented Jul 7, 2023

jakobnissen commented Jul 19, 2023

Breaking changes for v1 #119

Breaking changes for v1 #119

Conversation

jakobnissen commented Mar 9, 2023 • edited Loading

Todo

codecov bot commented Mar 9, 2023 • edited Loading

Codecov Report

kescobo commented Apr 22, 2023

kescobo commented Jul 5, 2023

jakobnissen commented Jul 7, 2023

jakobnissen commented Jul 19, 2023

jakobnissen commented Mar 9, 2023 •

edited

Loading

codecov bot commented Mar 9, 2023 •

edited

Loading