Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize sha256 for aarch64 #27

Merged
merged 1 commit into from
Jan 25, 2021
Merged

Optimize sha256 for aarch64 #27

merged 1 commit into from
Jan 25, 2021

Conversation

dgbo
Copy link
Contributor

@dgbo dgbo commented Jan 19, 2021

Hi,

The sha256 is top hotspot for some applications, e.g. Phase1 in filecoin/lotus, this patch optimizes it for aarch64.
It mainly remove the dependency between ldr and add in each four rounds by preloading K constants into NEON registers.
We also take advantages of load/store pair and ld1 instructions to reduce the instructions executed.

Verified with repository sha2 [1], test commands: cargo +nightly test --features "asm", cargo +nightly test --release --features "asm".

On our aarch64 server (core tsv110), we witnessed 11.90% improvements with benches under asm-hashes/sha2:

# before
test bench_compress256 ... bench:          47 ns/iter (+/- 0) = 1361 MB/s
# after
test bench_compress256 ... bench:          42 ns/iter (+/- 0) = 1523 MB/s

[1] https://github.com/dgbo/hashes/tree/master/sha2

@newpavlov
Copy link
Member

Thank you!

@linkmauve
Can you please take a look?

@tarcieri
Copy link
Member

I can give this a try on an M1

@dgbo
Copy link
Contributor Author

dgbo commented Jan 20, 2021

Thank you for watching this.

BTW, we run lotus-bench of filecoin/lotus, this PR reduce 10.5% execution time of Phase1 in this application.

Regards.

@dgbo
Copy link
Contributor Author

dgbo commented Jan 25, 2021

Ping... Can I get a review for this, is there any suggestions? Thanks. :)

@newpavlov
Copy link
Member

@tarcieri
Can I leave this PR to you? Unfortunately I don't know ARM assembly well enough and do not have appropriate hardware to test the code.

@tarcieri
Copy link
Member

Yep, been meaning to validate it on an M1. I'll see if I can take a look today.

@tarcieri
Copy link
Member

tarcieri commented Jan 25, 2021

Hrmm, interesting. The sha2-asm crate does not build on the M1 at all, either before or with this change:

   Compiling sha2-asm v0.5.4 (/Users/tony/asm-hashes/sha2)
The following warnings were emitted during compilation:

warning: src/sha256_aarch64.S:64:2: error: ADR/ADRP relocations must be GOT relative
warning:  adrp x2, .K
warning:  ^
warning: src/sha256_aarch64.S:64:2: error: unknown AArch64 fixup kind!
warning:  adrp x2, .K
warning:  ^
warning: src/sha256_aarch64.S:65:2: error: unknown AArch64 fixup kind!
warning:  add x2, x2, :lo12:.K
warning:  ^

error: failed to run custom build command for `sha2-asm v0.5.4 (/Users/tony/asm-hashes/sha2)`

I'd say this PR looks good to merge then, but separately we should track an M1 build failure.

Edit: opened #28 to track this

@newpavlov newpavlov merged commit 47afd9c into RustCrypto:master Jan 25, 2021
@dgbo
Copy link
Contributor Author

dgbo commented Jan 26, 2021

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants