Skip to content

coll: make bcast ring unsigned-safe #13287

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 10, 2025

Conversation

hppritcha
Copy link
Member

In the conversion to support big count, there are several places where signed int's were replaced by unsigned types (size_t). Unfortunately there were a few places where signedness was being used and these need to be refactored.

To find these places the -Wtype-limit gnu compile option was used. This compile option is added to the --enable-picky compile option list as part of this PR.

@hppritcha hppritcha requested a review from devreal June 2, 2025 16:39
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suggestion (feel free to ignore): this pattern occurs in multiple places here and could occur in other places. We could extract it into an inline function:

size_t rectified_diff(size_t a, size_t b){
  return a > b ? a - b : 0;
}

@hppritcha hppritcha force-pushed the add_some_unsigned_safe_code branch from 1861ca1 to 5686600 Compare June 3, 2025 19:56
@hppritcha
Copy link
Member Author

A suggestion (feel free to ignore): this pattern occurs in multiple places here and could occur in other places. We could extract it into an inline function:

size_t rectified_diff(size_t a, size_t b){
  return a > b ? a - b : 0;
}

good idea. i kept a helper routine inside this file as part of this PR.

@hppritcha hppritcha requested a review from devreal June 3, 2025 20:24
@hppritcha
Copy link
Member Author

@devreal please review again when you have a chance

devreal
devreal previously approved these changes Jun 5, 2025
Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hppritcha! Looks good, just one nit

@@ -850,10 +860,8 @@ int ompi_coll_base_bcast_intra_scatter_allgather(
* Allgather by recursive doubling
* Each process has the curr_count elems in the buf[vrank * scatter_count, ...]
*/
size_t rem_count = count - vrank * scatter_count;
size_t rem_count = (count > vrank * scatter_count) ? count - vrank * scatter_count : 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can use rectify_diff here too:

Suggested change
size_t rem_count = (count > vrank * scatter_count) ? count - vrank * scatter_count : 0;
size_t rem_count = rectify_diff(count, (size_t)(vrank * scatter_count));

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch!

@hppritcha hppritcha force-pushed the add_some_unsigned_safe_code branch from 5686600 to 9b2ff4a Compare June 9, 2025 14:04
@hppritcha hppritcha requested a review from devreal June 9, 2025 14:05
@hppritcha
Copy link
Member Author

@devreal recheck when you have a chance

@hppritcha
Copy link
Member Author

@dalcinl would you have suggestions on debugging what's going on with mpi4py? it looks like some kind of problem with generation of mpi4py internal docs?

Copy link
Contributor

@devreal devreal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hppritcha!

@hppritcha
Copy link
Member Author

@dalcinl never mind we are just going to disable the test_doc.py in our CI.

In the conversion to support big count, there are several places
where signed int's were replaced by unsigned types (size_t).
Unfortunately there were a few places where signedness was being
used and these need to be refactored.

To find these places the -Wtype-limit gnu compile option was used.
This compile option is added to the --enable-picky compile option
list as part of this PR.

Signed-off-by: Howard Pritchard <[email protected]>
@hppritcha hppritcha force-pushed the add_some_unsigned_safe_code branch from 9b2ff4a to 0f34c42 Compare June 10, 2025 18:32
@hppritcha hppritcha merged commit c859bfe into open-mpi:main Jun 10, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants