Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: SOFTMAX F16 mask support and other fixes #11261

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

qnixsynapse
Copy link
Contributor

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

Also, replaced std::max with sycl::max in the softmax kernel. There was not a single test with F16 mask in the test-backend-ops so I manually had to add such a test locally and I can confirm that it passed on my machine. This PR did not add that test. Reviewers are requested to test it thoroughly on their machines.

Not sure why this was necessary. The models which I tested do not use F16 mask.
Also did few cleanups.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jan 16, 2025
Copy link
Collaborator

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

ggml/src/ggml-sycl/common.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-sycl/softmax.cpp Outdated Show resolved Hide resolved
float max_val = -INFINITY;

#pragma unroll
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worried about the impact of this on various devices. It would be less controversial to remove it for now and only add the pragma unroll if we can verify that this improves performance overall for some devices with little regressions.

Copy link
Contributor Author

@qnixsynapse qnixsynapse Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added this to the PR to see if this really cause regression or not in some devices. Once confirmed, I will remove it.

For Intel Arc A750 which I have tested and probably A770, there are no such regressions.

ggml/src/ggml-sycl/softmax.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-sycl/softmax.cpp Outdated Show resolved Hide resolved
ggml/src/ggml-sycl/softmax.cpp Show resolved Hide resolved
@qnixsynapse
Copy link
Contributor Author

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

It is possible but I want this from an actual project collaborator to initiate this because if I go ahead do this, it may be possible that the backend which do not support it will crash with an assertion which doesn't look nice imo.

@Rbiessy
Copy link
Collaborator

Rbiessy commented Jan 20, 2025

Thanks for the PR! I haven't tried to run it yet, I am worried we are adding support for fp16 without having tests for it. I saw the warnings you mentioned. Is it possible to first merge a PR that adds tests for softmax fp16 and all relevant backends and make sure they are skipped if this is not supported?

It is possible but I want this from an actual project collaborator to initiate this because if I go ahead do this, it may be possible that the backend which do not support it will crash with an assertion which doesn't look nice imo.

Maybe it would be accepted to only the test for one backend for now? If we only enable it for the SYCL backend it could be part of this PR. At least it would make me a bit more confident if I try to run the PR.

@qnixsynapse
Copy link
Contributor Author

Maybe it would be accepted to only the test for one backend for now? If we only enable it for the SYCL backend it could be part of this PR. At least it would make me a bit more confident if I try to run the PR.

CUDA backend also supports this. There may have been a reason why the test was not added in the first place. I will take this issue again up tomorrow. (It's dinner time here)
In the meantime, a comment from @ggerganov regarding this would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants