Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lib): add forbid_unsafe feature to disable unsafe code #413

Merged
merged 29 commits into from
Sep 13, 2024

Conversation

davidkern
Copy link
Contributor

@davidkern davidkern commented Aug 19, 2024

See Issue: #398

As discussed in the linked issue, this PR adds an allow_unsafe feature and gates all unsafe code on it.
This does not include that feature in the default features, so the effect of this PR would be to effectively remove unsafe code from logos unless it is opted-in.

There should be a discussion (on the issue) of the right strategy to roll this out. Removing unsafe code immediately might not be the best option.

There is an additional issue with the Source trait which this PR does not currently resolve.

Benchmarks

Throughput of the benches for this PR and the original code (in GiB/s). Method: bench was run on release build four times, alternating between toggling allow_unsafe on and off. No extra effort was made to quiesce the machine other than just close other desktop apps.

It'd be very beneficial to get benchmarks on other CPUs, given the surprising result for the i7.

Mac M3 Air

Benchmark Safe (this PR) Unsafe (original) Delta
iterate/identifiers 1.6214 1.6862 -3.84%
iterate/strings 1.2354 1.2479 -1.00%
iterate/keywords_operators_and_punctators 2.0598 2.0721 -0.59%
count_ok/identifiers 1.6191 1.6742 -3.29%
count_ok/keywords_operators_and_punctators 1.2391 1.2423 -0.25%
count_ok/strings 2.0746 2.0782 -0.17%

Disabling unsafe code does slow down throughput by a small amount, though not as much as I anticipated.

i7-7700HQ (Alienware 13 R3)

Benchmark Safe (this PR) Unsafe (original) Delta
iterate/identifiers 1.0751 0.9426 +14.06%
iterate/strings 0.9987 0.9204 +8.51%
iterate/keywords_operators_and_punctators 1.3997 1.2977 +7.86%
count_ok/identifiers 1.0430 0.9526 +9.49%
count_ok/keywords_operators_and_punctators 0.9497 0.9088 +4.50%
count_ok/strings 1.3772 1.2962 +6.25%

I did not expect this result, at all. I suspect the compiler is able to take an optimization it otherwise couldn't, but I've not investigated why yet.

Next steps

  • Some more benchmarks
  • Decision on default state of feature flag or just remove unsafe
  • Documentation
  • Mark Source unsafe if unsafe code is enabled - I think I have to duplicate the entire trait under a cfg because you can't conditionally compile an effect (unsafe) afaik
  • read_byte_unchecked use is emitted by the code generator, will need to bring it back and also introduce a safe alternative

@jeertmans
Copy link
Collaborator

Thanks for your detailed PR and comment!

This work looks good and legit to me, but I think three important things are missing:

  1. allow_unsafe should be opt-out, so maybe to set allow_unsafe as a default feature, or switch it to disallow_unsafe. This is to limit potential changes in performances that users may not expect when bumping between two versions. People that care about safety will be able to opt out of unsafe code. Maybe in the future safe code will be the default, but I think this need further investigation.
  2. Automatic tests in workflows should include a version with and a version without allow_unsafe.
  3. What you wrote should be documented in a page of Logos' book, probably talking about safety, where unsafe code is used (as you already did the job of investigating), and how to opt-out (and potential performance costs / gains).

Thanks for your work on this :-)

@jeertmans
Copy link
Collaborator

PS: note that the tests are currently failing, so fixing them is the priority :-)

@davidkern davidkern marked this pull request as draft August 19, 2024 16:30
@davidkern
Copy link
Contributor Author

davidkern commented Aug 19, 2024

Oops, I had run tests only against the logos crate itself and not the workspace. I'll work on fixing those now, looks like it is failing because I entirely removed read_byte_unchecked thinking it was unused.

  1. I'll rename the feature to disallow_unsafe so that it is opt-in. I think that will be more ergonomic since adding allow_unsafe to defaults would mean having to use default-features = false and then explicitly listing the other default features just to get safe-only .
  2. Thanks for enabling checks on the PR. I'll add tests against disallow_unsafe and also add it to the benchmark run.
  3. Handbook page - will do.

I'll additionally do something about the unsafety of the Source trait so safe code can't cause UB in unsafe code (note this would mean an API change, because by default the entire Source trait will need to be marked unsafe).

This will take a few commits to accomplish so I've changed this over to a draft until its ready for review.

@davidkern davidkern changed the title Move unsafe code behind an allow_unsafe feature Disable unsafe code with a forbid_unsafe feature Aug 19, 2024
@davidkern
Copy link
Contributor Author

Will name it forbid_unsafe for parity with Rust's allow/forbid terminology around unsafe_code.

@davidkern
Copy link
Contributor Author

This now runs tests and benchmarks against both the default feature set and the new forbid_unsafe feature.

The remaining work is to split the unsafe code into new UnsafeSource and UnsafeChunk traits. That will remove the conditional compilation from the existing Source and Chunk traits, which will expose only a safe interface. Implementers of a custom source may then additionally opt-in to implementing the two new unsafe traits to enable the faster code paths (taking responsibility for any UB caused by incorrect implementation).

Splitting the traits out will remove several conditional compilation attributes and will also work better with the rustdoc generator. As I have it right now, rustdoc can't document that some trait functions depend on features. Also, changing the API based on a feature is a little weird, so this approach will avoid that too.

So some refactors and some documentation and then this will be ready for review.

@jeertmans
Copy link
Collaborator

Thanks for your work @davidkern! Please ping me back when it's ready for a review :-)

@davidkern
Copy link
Contributor Author

Will do! I've also been busy, but will have some more time to finish this up toward the end of the week.

@davidkern davidkern marked this pull request as ready for review September 9, 2024 18:49
@davidkern
Copy link
Contributor Author

@jeertmans Sorry for the delay getting back to this.

I think this just needed some documentation, which I've now added. I did try to split up the Source trait - but this was causing more issues than it seemed to be solving, so I've instead added a safety note to that trait's documentation.

In terms of merge order, the other PR fixing the benchmark testing should go in first - that will make sure this one gets a good benchmarking run after merge, which should test out both the original unsafe code as well as the new safe-only code paths.

Please let me know if you'd like to see any changes or adjustments to the documentation!

@pchickey
Copy link

pchickey commented Sep 9, 2024

@davidkern thanks so much for this work! The design you arrived at does meet my needs: by requiring the forbid_unsafe feature in my crate I can make sure, even if other transitive dependencies just use the default features, that there is no unsafe code in the dependency tree. (If safety was the default and unsafe was feature-driven, I would not be able to make that guarantee.)

Copy link
Collaborator

@jeertmans jeertmans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks! Let’s see what @pchickey has to say about this :)

book/src/unsafe.md Outdated Show resolved Hide resolved
book/src/unsafe.md Outdated Show resolved Hide resolved
@jeertmans
Copy link
Collaborator

The PR looks great @davidkern! Please reformat the code so linting and formatting checks both pass. Then, I think we can merge this :-)

@davidkern
Copy link
Contributor Author

I really should add cargo fmt to my git pre-checks!

That should do it...

@jeertmans jeertmans changed the title Disable unsafe code with a forbid_unsafe feature feat(lib): add forbid_unsafe feature to disable unsafe code Sep 13, 2024
@jeertmans
Copy link
Collaborator

Looks great @davidkern, thank you for your contribution :-)

@jeertmans jeertmans merged commit 028bd89 into maciejhirsz:master Sep 13, 2024
27 checks passed
akrantz01 referenced this pull request in akrantz01/antsi Sep 19, 2024
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [logos](https://logos.maciej.codes/)
([source](https://redirect.github.com/maciejhirsz/logos)) | dependencies
| patch | `0.14.1` -> `0.14.2` |

---

### Release Notes

<details>
<summary>maciejhirsz/logos (logos)</summary>

###
[`v0.14.2`](https://redirect.github.com/maciejhirsz/logos/releases/tag/v0.14.2):
- Optional `forbid_unsafe` feature, fuzzing, book, and more!

[Compare
Source](https://redirect.github.com/maciejhirsz/logos/compare/v0.14.1...v0.14.2)

#### What's Changed

- chore(book): added link to Rust's reference by
[@&#8203;CommanderStorm](https://redirect.github.com/CommanderStorm) in
[https://github.com/maciejhirsz/logos/pull/411](https://redirect.github.com/maciejhirsz/logos/pull/411)
- feat: impl Source for T: Deref in no_std by
[@&#8203;yjhmelody](https://redirect.github.com/yjhmelody) in
[https://github.com/maciejhirsz/logos/pull/406](https://redirect.github.com/maciejhirsz/logos/pull/406)
- fix(codegen/regex): allow vec growth on parse by
[@&#8203;LeoDog896](https://redirect.github.com/LeoDog896) in
[https://github.com/maciejhirsz/logos/pull/405](https://redirect.github.com/maciejhirsz/logos/pull/405)
- test: basic fuzzing by
[@&#8203;LeoDog896](https://redirect.github.com/LeoDog896) in
[https://github.com/maciejhirsz/logos/pull/407](https://redirect.github.com/maciejhirsz/logos/pull/407)
- feat(lib): add `forbid_unsafe` feature to disable unsafe code by
[@&#8203;davidkern](https://redirect.github.com/davidkern) in
[https://github.com/maciejhirsz/logos/pull/413](https://redirect.github.com/maciejhirsz/logos/pull/413)
- chore(version): release v0.14.2 by
[@&#8203;jeertmans](https://redirect.github.com/jeertmans) in
[https://github.com/maciejhirsz/logos/pull/422](https://redirect.github.com/maciejhirsz/logos/pull/422)

#### New Contributors

- [@&#8203;CommanderStorm](https://redirect.github.com/CommanderStorm)
made their first contribution in
[https://github.com/maciejhirsz/logos/pull/411](https://redirect.github.com/maciejhirsz/logos/pull/411)
- [@&#8203;yjhmelody](https://redirect.github.com/yjhmelody) made their
first contribution in
[https://github.com/maciejhirsz/logos/pull/406](https://redirect.github.com/maciejhirsz/logos/pull/406)
- [@&#8203;davidkern](https://redirect.github.com/davidkern) made their
first contribution in
[https://github.com/maciejhirsz/logos/pull/413](https://redirect.github.com/maciejhirsz/logos/pull/413)

**Full Changelog**:
maciejhirsz/logos@v0.14.1...v0.14.2

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined),
Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/akrantz01/antsi).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOltdfQ==-->

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants