refactor: replace byte-level operations with word-level operations #692

AnarchistHoneybun · 2025-05-24T17:58:48Z

finally got to this, I'm sorry for being so late ;-;
pre-refactor bench results:


running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s


running 8 tests
test kupyna256_10    ... bench:         364.55 ns/iter (+/- 31.58) = 27 MB/s
test kupyna256_100   ... bench:       5,095.60 ns/iter (+/- 636.62) = 19 MB/s
test kupyna256_1000  ... bench:      49,466.16 ns/iter (+/- 6,330.77) = 20 MB/s
test kupyna256_10000 ... bench:     478,002.24 ns/iter (+/- 59,204.41) = 20 MB/s
test kupyna512_10    ... bench:         659.55 ns/iter (+/- 78.15) = 15 MB/s
test kupyna512_100   ... bench:       9,032.33 ns/iter (+/- 3,979.37) = 11 MB/s
test kupyna512_1000  ... bench:      67,634.22 ns/iter (+/- 8,083.88) = 14 MB/s
test kupyna512_10000 ... bench:     911,271.90 ns/iter (+/- 268,785.03) = 10 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 106.87s

post-refactor bench results:

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running benches/mod.rs (/home/vrin/rcrypto_hashes_contrib/target/release/deps/mod-fccbd874730b7904)

running 8 tests
test kupyna256_10    ... bench:         379.25 ns/iter (+/- 15.23) = 26 MB/s
test kupyna256_100   ... bench:       3,916.63 ns/iter (+/- 78.04) = 25 MB/s
test kupyna256_1000  ... bench:      57,448.10 ns/iter (+/- 7,546.75) = 17 MB/s
test kupyna256_10000 ... bench:     554,591.68 ns/iter (+/- 60,150.87) = 18 MB/s
test kupyna512_10    ... bench:         969.11 ns/iter (+/- 117.66) = 10 MB/s
test kupyna512_100   ... bench:       9,383.38 ns/iter (+/- 2,446.42) = 10 MB/s
test kupyna512_1000  ... bench:     116,995.27 ns/iter (+/- 6,493.42) = 8 MB/s
test kupyna512_10000 ... bench:     995,084.96 ns/iter (+/- 165,238.44) = 10 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured; 0 filtered out; finished in 119.54s

not seeing any increase yet, which I think is because we're still having to split the values up when doing the row rotation and substitution operations. need to look into the math of that so I can handle it, will probably require a refactor in how our sbox and mds tables are stored. still, bulk of the work seems done

… cryptographic functions

AnarchistHoneybun · 2025-05-24T17:59:51Z

@newpavlov please lmk if this is a step in the right direction at least

AnarchistHoneybun · 2025-05-25T03:08:26Z

is there any way to benchmark such that I can get stats for individual functions, so I can see where my bottleneck is, without wrapping every function inside a timing block

newpavlov · 2025-05-25T03:53:37Z

cargo flamegraph is a common tool for this. You also could use perf directly.

newpavlov

Looks like a good step forward even without noticeable performance improvements.

which I think is because we're still having to split the values up when doing the row rotation and substitution operations

Yeah, you should probably minimize number of conversion to and from bytes and operate over 64-bit integers whenever possible.

newpavlov · 2025-05-25T03:55:09Z

kupyna/src/long.rs

@@ -147,4 +137,4 @@ impl SerializableState for KupynaLongVarCore {
 }

 #[cfg(feature = "zeroize")]
-impl ZeroizeOnDrop for KupynaLongVarCore {}
+impl ZeroizeOnDrop for KupynaLongVarCore {}


Please keep the empty last line in all files.

kupyna/src/long_compress.rs

kupyna/src/utils.rs

newpavlov · 2025-05-25T04:02:05Z

kupyna/src/short_compress.rs

+
+    // println!("prev vector:=");
+    // for v in prev_vector.iter() {
+    //     println!("{:016X?}", v);


Remove these comments.

newpavlov · 2025-05-25T04:04:22Z

kupyna/src/long_compress.rs

    for nu in 0..ROUNDS {
        state = add_constant_plus(state, nu as usize);
        state = apply_s_box(state);
        state = rotate_rows(state);
        state = mix_columns(state);
    }
-    matrix_to_block(state)
+    state


It may be worth to change add_constant_plus, apply_s_box, rotate_rows, and mix_columns to work in-place, i.e. over &mut [u64; COLS].

newpavlov · 2025-05-25T04:05:55Z

kupyna/Cargo.toml

@@ -14,6 +14,7 @@ rust-version = "1.85"

 [dependencies]
 digest = "=0.11.0-pre.10"
+hex-literal = "1"


I don't think you need this dependency.

I've set up a main.rs file to verify the code works as expected while making changes to the functions, this is the reason for both the hex-literal dependency and the print statements. will clean them all up once I'm done making large changes to the functions, it just helps to pinpoint error faster if some math goes wrong

# Conflicts: # kupyna/src/long.rs # kupyna/src/long_compress.rs # kupyna/src/short.rs # kupyna/src/short_compress.rs

refactor: replace byte-level operations with word-level operations in…

bd0258f

… cryptographic functions

newpavlov reviewed May 25, 2025

View reviewed changes

AnarchistHoneybun added 3 commits May 25, 2025 10:07

refactor: optimize S-box application in apply_s_box function

6ba7c9b

cleanup

f8a3a97

Merge branch 'refs/heads/master' into kupyna-perf

b211b56

# Conflicts: # kupyna/src/long.rs # kupyna/src/long_compress.rs # kupyna/src/short.rs # kupyna/src/short_compress.rs

AnarchistHoneybun closed this May 26, 2025

AnarchistHoneybun deleted the kupyna-perf branch May 26, 2025 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: replace byte-level operations with word-level operations #692

refactor: replace byte-level operations with word-level operations #692

Uh oh!

AnarchistHoneybun commented May 24, 2025

Uh oh!

AnarchistHoneybun commented May 24, 2025

Uh oh!

AnarchistHoneybun commented May 25, 2025

Uh oh!

newpavlov commented May 25, 2025 •

edited

Loading

Uh oh!

newpavlov left a comment

Uh oh!

newpavlov May 25, 2025

Uh oh!

Uh oh!

Uh oh!

newpavlov May 25, 2025

Uh oh!

newpavlov May 25, 2025

Uh oh!

newpavlov May 25, 2025

Uh oh!

AnarchistHoneybun May 25, 2025

Uh oh!

Uh oh!

refactor: replace byte-level operations with word-level operations #692

refactor: replace byte-level operations with word-level operations #692

Uh oh!

Conversation

AnarchistHoneybun commented May 24, 2025

Uh oh!

AnarchistHoneybun commented May 24, 2025

Uh oh!

AnarchistHoneybun commented May 25, 2025

Uh oh!

newpavlov commented May 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

newpavlov left a comment

Choose a reason for hiding this comment

Uh oh!

newpavlov May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

newpavlov May 25, 2025

Choose a reason for hiding this comment

Uh oh!

newpavlov May 25, 2025

Choose a reason for hiding this comment

Uh oh!

newpavlov May 25, 2025

Choose a reason for hiding this comment

Uh oh!

AnarchistHoneybun May 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

newpavlov commented May 25, 2025 •

edited

Loading