Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce elementwise extension size #1976

Merged
merged 4 commits into from
Jan 22, 2025

Conversation

oleksandr-pavlyk
Copy link
Collaborator

@oleksandr-pavlyk oleksandr-pavlyk commented Jan 21, 2025

This PR factors out inline submit to populate padded vector in specializations for binary operations on matrix and a vector.

Doing so allows to generate fewer of such kernels, resulting in the binary size decrease.

Before:

(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ls -l dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 38659896 Jan 19 20:58 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so

After:

(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ls -l dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 37176600 Jan 21 06:36 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so

Also this PR adds static_assert in offset_utils.hpp to verify that indexers are device copyable.

It also sneaks in changes of defining local typename for the functor being submitted in cgh.parallel_for to simplify the invocation.

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • Have you added documentation for your changes, if necessary?
  • Have you added your changes to the changelog?
  • If this PR is a work in progress, are you opening the PR as a draft?

…function

Doing so reduces the binary size of elementwise operations extension

Before:

```
(dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ls -l dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 38659896 Jan 19 20:58 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
```

After:

```
dev_dpctl) opavlyk@mtl-world:~/repos/dpctl$ ls -l dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
-rw-r--r-- 1 opavlyk opavlyk 37176600 Jan 21 06:36 dpctl/tensor/_tensor_elementwise_impl.cpython-312-x86_64-linux-gnu.so
```

Added static assertions to offset_utils to ensure that indexers are device copyable.
@oleksandr-pavlyk oleksandr-pavlyk force-pushed the reduce-elementwise-extension-size branch from 2a526d2 to 3f90e9b Compare January 21, 2025 12:58
Copy link

github-actions bot commented Jan 21, 2025

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_490 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@coveralls
Copy link
Collaborator

coveralls commented Jan 21, 2025

Coverage Status

coverage: 88.18%. remained the same
when pulling 1a95394 on reduce-elementwise-extension-size
into d9e9bf8 on master.

ndgrigorian
ndgrigorian previously approved these changes Jan 21, 2025
Copy link
Collaborator

@ndgrigorian ndgrigorian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link

Array API standard conformance tests for dpctl=0.19.0dev0=py310h93fe807_491 ran successfully.
Passed: 894
Failed: 2
Skipped: 118

@oleksandr-pavlyk oleksandr-pavlyk merged commit c5cbb08 into master Jan 22, 2025
47 of 55 checks passed
@oleksandr-pavlyk oleksandr-pavlyk deleted the reduce-elementwise-extension-size branch January 22, 2025 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants