Skip to content

Average pooling clamped divisor should be done on all conditions where the kernel can go out of bounds #4144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

ivangarcia44
Copy link
Contributor

@ivangarcia44 ivangarcia44 commented Apr 17, 2025

In this pull request I added various E2E tests to cover edge cases on the average pooling torch to linalg lowering algorithm. These new tests uncovered various numerical issues that were addressed in this same PR.

One of the issues is the IREE test failure found in #4079 which triggered this work.

Background:

In the most common case, the divisor for the average pooling is just the product of kernel dimensions. But with padding, and ceil mode options, some elements need to be discounted from the divisor computation. This change fixes two components of this:

Fix the condition that determines if the divisor is just the product of kernel dimensions or the clamped divisor comptutation.
Add missing isCountIncludePad logic divisor computation algorithm and reversal of kernel/stride/padding parameters element order. Both were missing from the first generalization change.

@vivekkhandelwal1
@silvasean
@zjgarvey
@ramiro050
@JianzheXiao
@AmosLewis
@rsuderman
@nirvedhmeshram
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw

@ivangarcia44 ivangarcia44 marked this pull request as draft April 24, 2025 15:26
…rnel/stride/padding elements have to be processed in reversed order relative to the spatial dimensions.
@ivangarcia44 ivangarcia44 marked this pull request as ready for review April 24, 2025 17:11
@vivekkhandelwal1
Copy link
Collaborator

@ivangarcia44 This is not a good practice to close and open a new PR just because conflicts have arisen in the branch. Ideally, you should have addressed the conflicts and updated this PR just like other contributors do. At the very least, it creates confusion and makes it hard to follow the discussion on 2 different PR for the same changes.

Also, I would expect that this PR (or the new replacement of this) is not merged until a thorough review of the patch is done. The last PR had already made some changes which should not have been done, so I expect all to be more careful this time.

@ivangarcia44 ivangarcia44 reopened this Apr 29, 2025
@ivangarcia44
Copy link
Contributor Author

I will do the merge in this PR and close the new one.

@ivangarcia44 ivangarcia44 marked this pull request as ready for review April 29, 2025 21:59
@ivangarcia44
Copy link
Contributor Author

Hi all, updated the changes after merging the conflict with @vivekkhandelwal1 's changes. Please review when you get a chance. Thank you

@vivekkhandelwal1
@JianzheXiao
@AmosLewis
@rsuderman
@nirvedhmeshram)
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw

Comment on lines 922 to 931
//
// indexStartOffset = ceil((kernelSize - 1)/2) - padding
//
// clampedKernelSize =
// min(outIntIndex * stride + indexStartOffset + floor((kernelSize - 1)/2)
// + 1,
// InputSpatialDimSize + padding) -
// max(outIntIndex * stride + indexStartOffset - ceil((kernelSize - 1)/2),
// -padding)
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ivangarcia44, can you please add any link to the code/implementation in PyTorch from where you adopted this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Vivek, the original author of the torch-linalg avg_pool2d lowering (@AmosLewis ) based the 2D implementation of the divisor computation on this PyTorch code:

https://github.com/pytorch/pytorch/blob/4a6dfbe4806b361c43210dfd56db64c4097c66bb/aten/src/ATen/native/cpu/AvgPoolKernel.cpp#L78

For the new E2E test suite I added, the test AvgPool2dCeilPaddingStridedIncludePadding discovered that there was a numerical mismatch due to the divisor computation. Because of this I wrote a new average pooling divisor formula based on the PyTorch avg_pool2d doc and numerical values I observed. I tried to make the comments, variable names and code clearer than the previous version which helped me to correct the numerical issues I found and make all E2E tests pass.

Taking a second look at the PyTorch code, and the @AmosLewis code, I noticed I missed a detail in the 1D/3D generalization of the algorithm: 1) count_include_pad parameter used in the divisor computation 2) Reversing of kernel/stride/pad parameter elements order. Although the 2nd issue could have been present in the original code. The second issue was found by 3 new E2E tests from this PR with non-uniform values for these parameters.

I updated the PR to use the PyTorch based algorithm with these two fixes and all E2E tests are passing now.

@ivangarcia44
Copy link
Contributor Author

Hi all, this is a reminder to review when you get a chance. Thank you

@vivekkhandelwal1
@JianzheXiao
@AmosLewis
@rsuderman
@nirvedhmeshram)
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw

@ivangarcia44
Copy link
Contributor Author

ivangarcia44 commented May 16, 2025

Hi @vivekkhandelwal1,

Could you please take a look at this PR when you have a chance? I’ve addressed all the previous feedback, and GitHub requires your approval before it can be merged. This PR includes three numerical correctness bug fixes for the average pooling operator, which are important for our stakeholders working on safety-critical applications.

It’s been open for a month, so I wanted to check in and see if there’s anything else needed from my side to move this forward.

Thank you very much!

@silvasean
@zjgarvey
@ramiro050
@JianzheXiao
@AmosLewis
@rsuderman
@nirvedhmeshram
@sahas3
@Hanumanth04
@dixinzhou
@rafaelubalmw

@vivekkhandelwal1
Copy link
Collaborator

Hi @vivekkhandelwal1,

Could you please take a look at this PR when you have a chance? I’ve addressed all the previous feedback, and GitHub requires your approval before it can be merged. This PR includes three numerical correctness bug fixes for the average pooling operator, which are important for our stakeholders working on safety-critical applications.

It’s been open for a month, so I wanted to check in and see if there’s anything else needed from my side to move this forward.

Thank you very much!

Hi @ivangarcia44, I'm sorry for the delay. I got caught up in some other work. Can I review it in a couple of days?

@ivangarcia44
Copy link
Contributor Author

Hi @vivekkhandelwal1,
Could you please take a look at this PR when you have a chance? I’ve addressed all the previous feedback, and GitHub requires your approval before it can be merged. This PR includes three numerical correctness bug fixes for the average pooling operator, which are important for our stakeholders working on safety-critical applications.
It’s been open for a month, so I wanted to check in and see if there’s anything else needed from my side to move this forward.
Thank you very much!

Hi @ivangarcia44, I'm sorry for the delay. I got caught up in some other work. Can I review it in a couple of days?

Its ok, next week is perfectly fine. Thank you for following up!

Copy link
Collaborator

@vivekkhandelwal1 vivekkhandelwal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivangarcia44, thanks for the changes and for adding the clarifications. The PR looks good. If you want, then you may incorporate the comment suggestion before merging the PR.

I have 2 doubts:

  1. I am unable to understand the need for reversing the kernel/stride/padding element order.
  2. In response to this comment of mine, you said:

Because of this I wrote a new average pooling divisor formula based on the PyTorch avg_pool2d doc and numerical values I observed.

Honestly, as per my opinion doing something based on observation won't be correct. There must be some rationale/reference for the same. Also. now I don't see that comment in the code.

Committing Vivek's comment update suggestion.

Co-authored-by: Vivek Khandelwal <[email protected]>
@ivangarcia44
Copy link
Contributor Author

ivangarcia44 commented May 19, 2025

@ivangarcia44, thanks for the changes and for adding the clarifications. The PR looks good. If you want, then you may incorporate the comment suggestion before merging the PR.

I have 2 doubts:

  1. I am unable to understand the need for reversing the kernel/stride/padding element order.
  2. In response to this comment of mine, you said:

Because of this I wrote a new average pooling divisor formula based on the PyTorch avg_pool2d doc and numerical values I observed.

Honestly, as per my opinion doing something based on observation won't be correct. There must be some rationale/reference for the same. Also. now I don't see that comment in the code.

Hi Vivek, I will integrate your requested comment change. Here are my answers for the two questions above:

  1. In the average pooling ND generalization the spatial dimension order was reversed in PoolSizeCalculator's constructor. Hence kernel/size/pad need to compensate. I will update the order in PoolSizeCalculator's constructor to remove the need of kernel/size/stride element order reversal.
  2. The latest version of the divisor computation algorithm has a link to the PyTorch algorithm it is based on. I created a new divisor algorithm after finding numerical errors in it, but it turned out that a piece of the algorithm was omitted in the ND generalization. After adding this piece back I could bring back the PyTorch based algorithm with the reference to it in the comments.

I will submit a commit for point number 1 above. Please let me know if I missed anything. Thanks,
Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants