Improved unicode support in mutator, flattener, and more #2662

bohendo · 2025-02-07T20:36:13Z

solc, foundry, hardhat, everything reports source_map offsets denominated in bytes (solidity#14733 is a false positive). And (almost) all slither detectors/properly properly index the source code byte-wise. Great.

But some tools, notably the mutator and flattener, use per-byte offsets to index strings per-character. Once this PR is merged, they will not.

Summary of changes:

fixed source_mapping.content to index source code correctly and used this property instead of manual indexing in flat/mutate tools
fixed src_mapping usage in the documentation tool too
small bugfix to Makefile
manual review of all other src_mapping, source_code, and utf8 encodings to ensure we aren't applying byte-offsets to strings anywhere else
- resolved subtle unicode bugs in the unused_import, upgradability and codex detectors.
- standardize encoding strings from a mixture of "utf-8" and "utf8" to just the latter
try/catch mutant generation, so one bugged mutator won't crash the entire campaign (and it'll print more helpful logs now)
more tweaks to mutation logging (eg uncaught mutants use white text so the above errors stand out better)
bug fixes to a few mutators that eg crashed on assembly instructions

Note that the last 3 of these were merged into this branch from PR#2648 bc most of the changes in that PR would have needed to be duplicated in this one.

bohendo added 15 commits January 27, 2025 12:32

fix unicode patches during mutation

789e107

fix mutation logs & RR mutator

e31a881

skip assembly in BOR mutator

8b9590d

fix bugs in other mutators

2295939

try/catch mutant generation

5b03e0a

fix formatting

87fb412

bugfixes & fix pylint

af8e5cf

revert byte conversion in Source.content

9fbb9aa

Merge branch 'dev' into mutation-repairs

e0874c5

fix source_mapping.content to handle byte offsets

84011e4

Merge branch 'mutation-repairs' into fix-unicode-src-mappings

74cc3e9

fix source_code access in documentation + flattening tools

172cdbb

clean up utf8 strings

738aa96

bugfix Makefile

b3a226e

fix formatting

8dee964

bohendo requested review from montyly and smonicas as code owners February 7, 2025 20:36

bohendo mentioned this pull request Feb 7, 2025

Mutation repairs #2648

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved unicode support in mutator, flattener, and more #2662

Improved unicode support in mutator, flattener, and more #2662

bohendo commented Feb 7, 2025 •

edited

Loading

Improved unicode support in mutator, flattener, and more #2662

Are you sure you want to change the base?

Improved unicode support in mutator, flattener, and more #2662

Conversation

bohendo commented Feb 7, 2025 • edited Loading

bohendo commented Feb 7, 2025 •

edited

Loading