Skip to content

Fixes for macOS portability #51

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

joseeroman
Copy link
Contributor

You may want to add these changes, intended to avoid issues in macOS due to special calling convention. We have had these changes in PETSc's --download-scalapack for two years now, but it is better if they can be added upstream.

@langou
Copy link
Contributor

langou commented Feb 28, 2022

Thanks @joseeroman.

Is this to fix an issue with the Accelerate framework? I.e., there is no issue if you use a reference BLAS implementation.

( just want to make I understand the issue this PR is fixing )

@joseeroman
Copy link
Contributor Author

Yes, the issue appears only with Apple's BLAS and does not show up with reference BLAS. If I try calling this

     RES = CDOTC( 3, A, 1, B, 1 )

it works as expected with reference BLAS but produces a segmentation fault with Apple's BLAS. The latter expects a subroutine call instead of a function call (to avoid returning a complex value on the stack), so the following works

     CALL CDOTC( RES, 3, A, 1, B, 1 )

I have not found any place where this is documented.

Regarding the CLADIV case, I have not tried but it should be similar.

@mgates3
Copy link

mgates3 commented Mar 1, 2022

Isn't this just a Fortran compiler convention? The Intel ifort compiler does the same thing, return complex numbers as a hidden argument. Fine if (Sca)LAPACK is compiled with the same Fortran compiler as BLAS. The problem is the Fortran compiler on macOS (e.g., gfortran) doesn't match what Apple is assuming, which is the old f2c compiler/translator and CLAPACK conventions.

@mgates3
Copy link

mgates3 commented Mar 1, 2022

With gfortran, see -ff2c flag:

       -ff2c
           Generate code designed to be compatible with code generated by g77
           and f2c.

           The calling conventions used by g77 (originally implemented in f2c)
           require functions that return type default "REAL" to actually
           return the C type "double", and functions that return type
           "COMPLEX" to return the values via an extra argument in the calling
           sequence that points to where to store the return value.  Under the
           default GNU calling conventions, such functions simply return their
           results as they would in GNU C---default "REAL" functions return
           the C type "float", and "COMPLEX" functions return the GNU C type
           "complex".  Additionally, this option implies the
           -fsecond-underscore option, unless -fno-second-underscore is
           explicitly requested.

           This does not affect the generation of code that interfaces with
           the libgfortran library.

           Caution: It is not a good idea to mix Fortran code compiled with
           -ff2c with code compiled with the default -fno-f2c calling
           conventions as, calling "COMPLEX" or default "REAL" functions
           between program parts which were compiled with different calling
           conventions will break at execution time.

           Caution: This will break code which passes intrinsic functions of
           type default "REAL" or "COMPLEX" as actual arguments, as the
           library implementations use the -fno-f2c calling conventions.

I think it's fine to use updated ccdotc & zzdotc routines to work around these ABI issues.

@joseeroman joseeroman force-pushed the jose/macos-portability branch from 34a2a59 to 4144809 Compare March 2, 2022 10:32
@weslleyspereira
Copy link
Collaborator

Hi @joseeroman. Would you be interested in contributing to the Continuous Integration on ScaLAPACK for the MacOS? I am proposing a simple version 0 on #56. Thanks!

@prj-
Copy link
Contributor

prj- commented Mar 8, 2022

Any chance this gets approved/merged independently of the status of #56? There is an upcoming PETSc release by the end of this month, it would be nice if we could switch to this repository instead of our fork over at https://bitbucket.org/petsc/pkg-scalapack, but both this PR and #53 (edit: I'm just now seeing that the latter has been approved, thanks!) would need to be reviewed and merged by then.

@joseeroman joseeroman force-pushed the jose/macos-portability branch from 4144809 to b3f4553 Compare March 8, 2022 07:44
@joseeroman
Copy link
Contributor Author

Rebased on top of master

Copy link
Collaborator

@weslleyspereira weslleyspereira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workaround looks good to me!

Just to have it written somewhere. After you PR:

  • ScaLAPACK routines will use CCDOTC, ZZDOTC, CCDOTU, ZZDOTU. And CDOTC, ZDOTC, CDOTU, ZDOTU are deprecated (or at least should never be used inside the library).
  • ScaLAPACK routines will use SLADIV and DLADIV. And CLADIV and ZLADIV are deprecated (or at least should never be used inside the library).

Please, correct me if I am wrong or if I missed something.

@joseeroman
Copy link
Contributor Author

That's correct

@weslleyspereira
Copy link
Collaborator

Hi @joseeroman. Could you please force an empty commit so that your changes are subjected to the new tests? Thanks!

These BLAS functions return a complex number and this may produce ABI
conflict if using a BLAS compiled with a Fortran compiler with different
calling convention (e.g. macOS Accelerate framework). Use an equivalent
subroutine instead.
These LAPACK functions return a complex number (as ZDOTC and friends)
and hence do not work in MacOS, so it is better not to call them
@joseeroman joseeroman force-pushed the jose/macos-portability branch from b3f4553 to a0f76fc Compare March 11, 2022 06:51
@weslleyspereira weslleyspereira merged commit 2429dc7 into Reference-ScaLAPACK:master Mar 11, 2022
@zerothi
Copy link
Contributor

zerothi commented Apr 11, 2022

Wouldn't this introduce a performance regression for those relying on optimized blas libraries since optimized BLAS libraries will intrinsically do SIMD expansions in the *dot* methods?

While I totally understand the desire to ensure MacOS compatibility we also have to acknowledge the fact that this is based on LAPACK + BLAS and not on accelerate which does not obey the BLAS API. Whether or not the API is fixed I won't get into ;)

Couldn't this be changed to a CDEFS flag or something? Forcing this on end-users was not necessary.

@prj-
Copy link
Contributor

prj- commented Apr 11, 2022

Could you show me a use case where BLAS-1 operations are the bottleneck compared to higher-level BLAS operations?

@zerothi
Copy link
Contributor

zerothi commented Apr 11, 2022

For sure, they are not the bottleneck (perhaps just for very tiny matrices). But the point still stands.

@prj-
Copy link
Contributor

prj- commented Apr 11, 2022

But the point still stands.

Which one? The one suggesting to add a new flag so that there will be possibly more code paths to follow/maintain for operations that are most often no-op? Yup, that point still stands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants