Skip to content

impl trig for core::simd #6

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
13 tasks
Lokathor opened this issue Sep 27, 2020 · 34 comments
Open
13 tasks

impl trig for core::simd #6

Lokathor opened this issue Sep 27, 2020 · 34 comments
Labels
A-simd Area: SIMD. Put this on tracking issues to help with cross-repo issue organization C-tracking-issue Ongoing issue with checkboxes for partial progress and such

Comments

@Lokathor
Copy link
Contributor

  • sin
  • cos
  • tan
  • asin
  • acos
  • atan
  • atan2
  • sinh
  • cosh
  • tanh
  • asinh
  • acosh
  • atanh
@Lokathor Lokathor added C-tracking-issue Ongoing issue with checkboxes for partial progress and such A-simd Area: SIMD. Put this on tracking issues to help with cross-repo issue organization labels Sep 27, 2020
@etemesi-ke
Copy link

Heey, i could implement some of these and submit a PR, and I was wondering
If the appropriate way was to use the Taylor series for the above trig functions since am not sure what the rust standard library uses

@calebzulawski
Copy link
Member

calebzulawski commented Sep 30, 2020

The Taylor series is probably not sufficient since it's only accurate near 0. This paper summarizes a variety of algorithms used for vectorized implementations of popular numeric functions, including trigonometry: https://arxiv.org/abs/2001.09258. The referenced SLEEF library is used by packed_simd, however I don't believe SLEEF itself can be used in this crate.

@etemesi-ke
Copy link

What algorithm did you have in mind then?

And would the impls look like

trait trig {
     type Output=Self::Output
     #[target-enable="sse"]
     fn cos(&self);
}
impl trig for _mm128{
     type Output = _mm128
    #[target-enable="sse"]
     
    fn cos(){
     //Code
    }
}

@etemesi-ke
Copy link

Sorry for the first part, after re-reading your comment if I understand it right you want to use the algorithms in the paper used by sleef but not necessarily the sleef library. I.e rewrite them in rust?

@Lokathor
Copy link
Contributor Author

Lokathor commented Oct 1, 2020

So, I don't know what the quality or speed is, but LLVM has built-in handling for vector trig same as with vector add, sub, etc.

Example from packed_simd: https://github.com/rust-lang/packed_simd/blob/master/src/codegen/math/float/cos.rs

If the LLVM versions are of sufficient quality, we should likely just use that.

@bjorn3
Copy link
Member

bjorn3 commented Oct 1, 2020

There are platform-intrinsic versions for sin and cos by the way, so no need to use LLVM intrinsics directly. Curiously enough there are no versions for other trigonometry functions, but that should be easy to add.

https://github.com/rust-lang/rust/blob/41507ed0d57eba71adc20a021a19b64322162f04/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1080-L1085

@etemesi-ke
Copy link

Are the LLVM intrinsics useful for SIMD
with what I understand according to their doc their cos function for example just operates on one floating point value

@bjorn3
Copy link
Member

bjorn3 commented Oct 1, 2020

There are LLVM intrinsics with names like llvm.cos.v2f32 that operate on vectors.

@etemesi-ke
Copy link

@bjorn3 Thanks for clarifying that,

so the code might look like

![feature(link_llvm_intrinsics)]
extern {
    #[link_name="llvm.cos.v2f32"]
      fn cos(a:[f64;2])->[f64;2];
}
/// A vector for two floating points
struct f64x2([f64; 2]);

impl trig for f64x2{
   fn cos(&self)->f64x2{
        f64x2(cos(self.0))
   }
}

@bjorn3
Copy link
Member

bjorn3 commented Oct 1, 2020

More like

#![feature(link_llvm_intrinsics)]
extern {
    #[link_name="llvm.cos.v2f32"]
      fn cos(a: f64x2)-> f64x2;
}
/// A vector for two floating points
#[repr(simd)]
struct f64x2([f64; 2]);

impl trig for f64x2{
   fn cos(&self)-> f64x2{
        cos(self)
   }
}

@etemesi-ke
Copy link

etemesi-ke commented Oct 1, 2020

Okay that makes sin and cos easy

And the rest are we gonna implement from the ground up?

@emmanuelantony2000
Copy link

I think so yes, tan and others have to be implemented from the ground up.
Is it implemented by using sin and cos, or is there an another way?

@programmerjake
Copy link
Member

Note that llvm.cos.v2f32 means f32x2, if you want f64x2 you need to use llvm.cos.v2f64

@Lokathor
Copy link
Contributor Author

Lokathor commented Oct 1, 2020

tan = sin / cos, so that one specifically is easy. also you can write sin_cos and merge 90% of the work.

We probably want to port a lot of the Agner Fog vector math stuff, https://github.com/vcoda/vectorclass (edit: wrong link, this is an older gpl3 version, see below for apache2)

Some of which is already ported in the wide crate.

@calebzulawski
Copy link
Member

I think we need to be careful about "porting" libraries, both SLEEF and that library are incompatible with MIT/Apache. Additionally I think we need to be careful about our implementations, since naive approaches will likely have poor accuracy. I think our best bet is to find branchless algorithms (could be scalar) that are available under compatible licenses, and port them.

@Lokathor
Copy link
Contributor Author

Lokathor commented Oct 1, 2020

oh my oh my, that's the wrong link! google search let me down.

Version 2 of the lib is here
https://github.com/vectorclass/version2, and is apache-2.0 licensed, so we could write the rust version and have it be apache/mit

@calebzulawski
Copy link
Member

Oh awesome, much better. I'm definitely okay with using this implementation.

@workingjubilee
Copy link
Member

Ah, this was the issue I kept forgetting. This issue, roughly, makes a lot of this effort more complex: rust-lang/rust#50145

@fu5ha
Copy link

fu5ha commented Feb 22, 2021

myself, @Lokathor and one of the other wide contributor people already ported several of these from the agner fog lib linked aboe to rust for use in wide, it would probably not be very hard to port those rust impls to stdsimd. I would be willing to do this, if nobody else is working on it currently.

@Lokathor
Copy link
Contributor Author

A contributor would be welcomed.

The current status appears to be that most of the library can't be in core anyway for other reasons, so i think it's easiest to make this work at all, and then we can (maybe!) make it work in core later.

@thomcc
Copy link
Member

thomcc commented Feb 22, 2021

I think that it's still an open question where to put it though, since part of the motivation behind supporting these is to use instrucitons on the platform where these have dedicated instructions. That probably means using simd_sin and stuff...

@Lokathor
Copy link
Contributor Author

I propose:

stage 1:
we just code it in the library

stage 2:
the library transitions to intrinsics
when llvm can't emit a direct call to a platform version it calls a vector version of libm
our vector version of libm is the new home of the code from stage 1

@workingjubilee
Copy link
Member

workingjubilee commented Feb 22, 2021

Please do!
And that sounds like a plan, @Lokathor!

simd_sin and simd_cos on x86 seem to degenerate to scalar libm calls even with -Ctarget-feature=+avx2 enabled, unfortunately (but with AVX vmovss instructions, just to make sure you know it's in fact using AVX!), so we can't really depend on LLVM emitting useful code for those, even if LLVM nominally supports them.

@programmerjake
Copy link
Member

WIP implementation:
#109
Code:
https://salsa.debian.org/Kazan-team/vector-math

@andy-thomason
Copy link

If you are interested.

I usually use a newton polynomial on the Chebyshev points pinned at the extrema
to avoid any conditional code.

This way sin and cos, for example just become a single horner-form polynomial over
the range [0..1) corresponding to 0..2pi.

You can use combinations of COORDIC transforms to lower the initial range and this
is necessary with inverse functions of trig and exponentials, but for most cases there
is no need and just adding a few more terms to the polynomial is sufficient to get
accuracy over the range and exact values at the exterma.

You can choose your newton polynomial points at points of interest to get exact values
where you like.

@programmerjake
Copy link
Member

hmm, ok. for sin_cos_pi, I range reduced to sin([-pi/4,pi/4]) and cos([-pi/4,pi/4]) and used a taylor series up to x^16 for f64, x^9 for f32, and x^5 for f16. That gave me 2ULP accuracy (tested against f64::sin/cos for f16 and f32, tested against mpfr for f64). using more terms didn't give more accuracy due to round-off error.

@andy-thomason
Copy link

I'll try to get some time to do some manual runs for you with carefully chosen newton polynomials.

Sin and cos are very well behaved as their differentials are well behaved and the series
converges rapidly. Even Taylor (McLaurin) series, which are about the worst case because they
only approximate the zero point will converge.

Chebyshev approximation gives you a more consistent error profile, spreading the error out
over the function whereas Remez can suffer from instability.

doctor_syn my fledgling computer algebra system based on syn currently generates code
based on something close to Chebeyshev. It uses Newton polynomials
to build a polynomial that is exact in certain places.

By spreading the places where it is exact out along the valid range, you can get a much lower maximum
error for fewer terms.

To get a good f64 result, you need to work with algebra > f64 precision. This was
traditionally f80 (legacy 8087) but there are a few extended precision Rust libs out
there. Rug is a popular choice, but is a huge Frankenstein's monster wrapping
GMP - another huge monster. You can also use BigRational as long as you
keep approximating the numerator/denominator to avoid ourageous run times.

There may be a f128 crate out there somewhere, too.

https://en.wikipedia.org/wiki/Newton_polynomial
https://en.wikipedia.org/wiki/Approximation_theory#Chebyshev_approximation

@andy-thomason
Copy link

Anyway, great work, @programmerjake I'm quite excited!

I'll try to get to the next SIMD group meeting on Monday. I was doing a talk
last night with the Cardiff Rust group and could not attend.

@programmerjake
Copy link
Member

If you don't care about speed, you can use the algebraics crate (implements real algebraic numbers), or the simple-soft-float crate which uses algebraics (implements ieee 754 fp math with support for huge fp types, e.g. f65536). I wrote both of them.

@andy-thomason
Copy link

Cool!

@programmerjake
Copy link
Member

took me several months to figure out how to efficiently factor polynomials with bigint coefficients -- used to implement real algebraic numbers

@programmerjake
Copy link
Member

modular fields of polynomials with modular integer coefficients are complicated!

@andy-thomason
Copy link

Tell Galois that :)

Good to see this work being done. I have slightly wider requirements with the full gamut of stats functions
in R, but I worked with SCEE on the Playstation compilers for a long time. I used to use Maple to
calculate aproximations and factorise polynomials, but I prefer to do it myself now.

@programmerjake
Copy link
Member

for higher precision in core::simd, I'm planning on using double-double arithmetic, since f64 SIMD is commonly supported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-simd Area: SIMD. Put this on tracking issues to help with cross-repo issue organization C-tracking-issue Ongoing issue with checkboxes for partial progress and such
Projects
None yet
Development

No branches or pull requests

10 participants