-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaking changes for v1 #119
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #119 +/- ##
==========================================
+ Coverage 89.48% 95.98% +6.50%
==========================================
Files 14 16 +2
Lines 1683 1793 +110
==========================================
+ Hits 1506 1721 +215
+ Misses 177 72 -105
Flags with carried forward coverage won't be shown. Click here to find out more.
☔ View full report in Codecov by Sentry. |
2999123
to
fedab05
Compare
FYI, it is still on my radar to kick the tires of this. I think early next month is probably realistic. |
I was wroooong 😭 Do you have a timeline for when you want this merged. Realistically, I need some pressure to make it a priority |
Haha don't I know that feeling. I also expected I would be done with this branch in the summer of 2022, so... 😅 |
98283df
to
9187d9e
Compare
Honestly, four generators and a tokenizer was excessive. Inline generator was probably not really used, and the new simd generator made the goto generator obsolete. Remove inline and old goto generator, and rename simd generator goto.
Before this PR, a user could forget or mis-spell a symbol in the action Dict passed to generate_exec_code. Now add a check to throw an error if this happens.
nfa2dfa now errors if any RE object has a .actions field with an unsupported key. This prevents a user from mistyping e.g. `pat.actions[:etner] = [:foo]` and having it silently do nothing.
The user probably wants to use the input error code in most use cases, except when running a machine in execute mode, debug mode, or running a validator. Adding the input error code to generated Readers is a challenge for another day
One of the issues with using Automa is that its lack of exports makes it unclear what is internal/external. Also, the three typical using-statements needed to use Automa is just visual noise.
If two edges in equivalent paths have distinct preconditions, these could be used to distinguish the two edges, and so they should not provoke an error.
The gensym'd symbols in Automa's constant expressions are computed on precompilation of Automa. These can then clash with symbols that are computed on the precompilation of any particular generated code in downstream packages leading to very confusing bugs.
With the :goto generator, checking bounds is not permitted anyway. And the table generator is so slow that it makes no sense to disable checking of bounds - then you might as well use the goto generator
Many potential users of Automa are not interested in parsing from IOs, but only buffers. For those users, the IO-parsing functionality of Automa is not needed, and so there is no need for dependency on TranscodingStreams.
NFAs with ambiguities often contain multiple ambiguities. Displaying the simplest ambiguity when erroring makes debugging easier - especially compared to when the shown ambiguity can never happen due to another ambiguity.
An oversight in the ambiguity check meant that actions placed on non-epsilon edges were accidentally not included in the paths for validation. MWE: `compile(onfinal!(re"a", :a) | onfinal!(re"a", :b))` This breaks tokenizers, so we manually skip ambiguity check in tokenizers. In the case of conflicting actions in tokenizers, this will cause the longest matching token to be emitted.
The tokenizer has a completely new design and API. * It's now much easier to use * It's now lazy by default * It's much faster, although not completely optimised. Its API is amenable to further optimisation * It handles errors automatically See issue #116
Users should not have access to the module directly. Instead, export the RE struct, and also allow users to construct regex with `RE(str)`.
Instead of buffering an entire line, simply keep track of the number of columns cleared from the buffer. This reaches some more into TranscodingStreams privates, but it's well tested.
Currently, the functions `re2nfa` and `nfa2dfa` can produce dead (unreachable) nodes, which is pointless. Instead of relying on the user to themselves remove dead nodes by calling `remove_dead_nodes`, this should just happen automatically.
Since Automa doesn't just extend an existing function but adds a new function based on TranscodingStreams, this is not the intended use case of extensions.
I've come to the conclusion that Julia does not make it possible to robustly check what CPU instructions the user has available. The current options are all undocumented, complex and brittle, and not suitable for code that cannot be accepted to break at any time Whenever a robust way of checking for CPU instructions are available, the change is easy to revert.
Allow preconditions to be set to `:enter` only, and have directly conflicting preconditions resolve an ambiguous NFA.
Ok I'm going to merge this now and release Automa v1. @kescobo |
Supersedes #95
Todo
Closes #52
Closes #71
Closes #80
Closes #82
Closes #91
Closes #102
Closes #111
Closes #115
Closes #116