-
Notifications
You must be signed in to change notification settings - Fork 88
impl trig
for core::simd
#6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Heey, i could implement some of these and submit a PR, and I was wondering |
The Taylor series is probably not sufficient since it's only accurate near 0. This paper summarizes a variety of algorithms used for vectorized implementations of popular numeric functions, including trigonometry: https://arxiv.org/abs/2001.09258. The referenced SLEEF library is used by |
What algorithm did you have in mind then? And would the impls look like trait trig {
type Output=Self::Output
#[target-enable="sse"]
fn cos(&self);
}
impl trig for _mm128{
type Output = _mm128
#[target-enable="sse"]
fn cos(){
//Code
}
} |
Sorry for the first part, after re-reading your comment if I understand it right you want to use the algorithms in the paper used by sleef but not necessarily the sleef library. I.e rewrite them in rust? |
So, I don't know what the quality or speed is, but LLVM has built-in handling for vector trig same as with vector add, sub, etc. Example from If the LLVM versions are of sufficient quality, we should likely just use that. |
There are |
Are the LLVM intrinsics useful for SIMD |
There are LLVM intrinsics with names like |
@bjorn3 Thanks for clarifying that, so the code might look like ![feature(link_llvm_intrinsics)]
extern {
#[link_name="llvm.cos.v2f32"]
fn cos(a:[f64;2])->[f64;2];
}
/// A vector for two floating points
struct f64x2([f64; 2]);
impl trig for f64x2{
fn cos(&self)->f64x2{
f64x2(cos(self.0))
}
} |
More like #![feature(link_llvm_intrinsics)]
extern {
#[link_name="llvm.cos.v2f32"]
fn cos(a: f64x2)-> f64x2;
}
/// A vector for two floating points
#[repr(simd)]
struct f64x2([f64; 2]);
impl trig for f64x2{
fn cos(&self)-> f64x2{
cos(self)
}
} |
Okay that makes And the rest are we gonna implement from the ground up? |
I think so yes, |
Note that |
tan = sin / cos, so that one specifically is easy. also you can write sin_cos and merge 90% of the work. We probably want to port a lot of the Agner Fog vector math stuff, Some of which is already ported in the |
I think we need to be careful about "porting" libraries, both SLEEF and that library are incompatible with MIT/Apache. Additionally I think we need to be careful about our implementations, since naive approaches will likely have poor accuracy. I think our best bet is to find branchless algorithms (could be scalar) that are available under compatible licenses, and port them. |
oh my oh my, that's the wrong link! google search let me down. Version 2 of the lib is here |
Oh awesome, much better. I'm definitely okay with using this implementation. |
Ah, this was the issue I kept forgetting. This issue, roughly, makes a lot of this effort more complex: rust-lang/rust#50145 |
myself, @Lokathor and one of the other |
A contributor would be welcomed. The current status appears to be that most of the library can't be in core anyway for other reasons, so i think it's easiest to make this work at all, and then we can (maybe!) make it work in core later. |
I think that it's still an open question where to put it though, since part of the motivation behind supporting these is to use instrucitons on the platform where these have dedicated instructions. That probably means using |
I propose: stage 1: stage 2: |
Please do!
|
WIP implementation: |
If you are interested. I usually use a newton polynomial on the Chebyshev points pinned at the extrema This way sin and cos, for example just become a single horner-form polynomial over You can use combinations of COORDIC transforms to lower the initial range and this You can choose your newton polynomial points at points of interest to get exact values |
hmm, ok. for sin_cos_pi, I range reduced to |
I'll try to get some time to do some manual runs for you with carefully chosen newton polynomials. Sin and cos are very well behaved as their differentials are well behaved and the series Chebyshev approximation gives you a more consistent error profile, spreading the error out
By spreading the places where it is exact out along the valid range, you can get a much lower maximum To get a good f64 result, you need to work with algebra > f64 precision. This was There may be a f128 crate out there somewhere, too. https://en.wikipedia.org/wiki/Newton_polynomial |
Anyway, great work, @programmerjake I'm quite excited! I'll try to get to the next SIMD group meeting on Monday. I was doing a talk |
If you don't care about speed, you can use the |
Cool! |
took me several months to figure out how to efficiently factor polynomials with bigint coefficients -- used to implement real algebraic numbers |
modular fields of polynomials with modular integer coefficients are complicated! |
Tell Galois that :) Good to see this work being done. I have slightly wider requirements with the full gamut of stats functions |
for higher precision in |
The text was updated successfully, but these errors were encountered: