-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[LV] Ignore user-specified interleave count when unsafe. #153009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
When an VF is specified via a loop hint, it will be clamped to a safe VF or ignored if it is found to be unsafe. This is not the case for user-specified interleave counts, which can lead to loops such as the following with a memory dependence being vectorised with the specified IC: #pragma clang loop interleave_count(4) for (int i = 4; i < LEN; i++) b[i] = b[i - 4] + a[i]; According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if isSafeForAnyVectorWidth() returns false. This is already checked in selectInterleaveCount(). [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave
@llvm/pr-subscribers-llvm-transforms Author: Kerry McLaughlin (kmclaughlin-arm) ChangesWhen an VF is specified via a loop hint, it will be clamped to a safe
According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave Full diff: https://github.com/llvm/llvm-project/pull/153009.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index be00fd6a416e5..46168b13c253f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10138,8 +10138,10 @@ bool LoopVectorizePass::processLoop(Loop *L) {
ElementCount UserVF = Hints.getWidth();
unsigned UserIC = Hints.getInterleave();
+ unsigned SafeUserIC = CM.Legal->isSafeForAnyVectorWidth() ? UserIC : 0;
+
// Plan how to best vectorize.
- LVP.plan(UserVF, UserIC);
+ LVP.plan(UserVF, SafeUserIC);
VectorizationFactor VF = LVP.computeBestVF();
unsigned IC = 1;
@@ -10151,7 +10153,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// Select the interleave count.
IC = LVP.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
- unsigned SelectedIC = std::max(IC, UserIC);
+ unsigned SelectedIC = std::max(IC, SafeUserIC);
+
// Optimistically generate runtime checks if they are needed. Drop them if
// they turn out to not be profitable.
if (VF.Width.isVector() || SelectedIC > 1) {
@@ -10201,7 +10204,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
VectorizeLoop = false;
}
- if (!LVP.hasPlanWithVF(VF.Width) && UserIC > 1) {
+ if (UserIC > 0 && UserIC != SafeUserIC) {
+ LLVM_DEBUG(dbgs() << "LV: Disabling interleaving as user-specified "
+ "interleave count is unsafe.\n");
+ IntDiagMsg = {"InterleavingUnsafe",
+ "User-specified interleave count is not safe, interleave "
+ "count is set to 1."};
+ InterleaveLoop = false;
+ } else if (!LVP.hasPlanWithVF(VF.Width) && SafeUserIC > 1) {
// Tell the user interleaving was avoided up-front, despite being explicitly
// requested.
LLVM_DEBUG(dbgs() << "LV: Ignoring UserIC, because vectorization and "
@@ -10209,7 +10219,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
IntDiagMsg = {"InterleavingAvoided",
"Ignoring UserIC, because interleaving was avoided up front"};
InterleaveLoop = false;
- } else if (IC == 1 && UserIC <= 1) {
+ } else if (IC == 1 && SafeUserIC <= 1) {
// Tell the user interleaving is not beneficial.
LLVM_DEBUG(dbgs() << "LV: Interleaving is not beneficial.\n");
IntDiagMsg = {
@@ -10221,7 +10231,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
IntDiagMsg.second +=
" and is explicitly disabled or interleave count is set to 1";
}
- } else if (IC > 1 && UserIC == 1) {
+ } else if (IC > 1 && SafeUserIC == 1) {
// Tell the user interleaving is beneficial, but it explicitly disabled.
LLVM_DEBUG(
dbgs() << "LV: Interleaving is beneficial but is explicitly disabled.");
@@ -10245,7 +10255,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
}
// Override IC if user provided an interleave count.
- IC = UserIC > 0 ? UserIC : IC;
+ IC = SafeUserIC > 0 ? SafeUserIC : IC;
// Emit diagnostic messages, if any.
const char *VAPassName = Hints.vectorizeAnalysisPassName();
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
index 11cc971586773..f1fc78f117fba 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
@@ -417,21 +417,16 @@ for.end: ; preds = %for.body, %entry
; Note: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies
; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
-; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
+; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 1)
define i32 @memory_dependence(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @memory_dependence
; CHECK: vector.body:
; CHECK: %[[LOAD1:.*]] = load <4 x i32>
; CHECK: %[[LOAD2:.*]] = load <4 x i32>
-; CHECK: %[[LOAD3:.*]] = load <4 x i32>
-; CHECK: %[[LOAD4:.*]] = load <4 x i32>
-; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD3]], %[[LOAD1]]
-; CHECK: %[[ADD2:.*]] = add nsw <4 x i32> %[[LOAD4]], %[[LOAD2]]
-; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD3]]
-; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD4]]
+; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD2]], %[[LOAD1]]
+; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD2]]
; CHECK: middle.block:
-; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
-; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
+; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[MUL1]])
entry:
br label %for.body
diff --git a/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll b/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll
new file mode 100644
index 0000000000000..034df3f54e7e5
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll
@@ -0,0 +1,33 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 | FileCheck %s
+
+; Make sure the unsafe user specified interleave count is ignored.
+
+; CHECK: LV: Disabling interleaving as user-specified interleave count is unsafe.
+; CHECK: remark: <unknown>:0:0: User-specified interleave count is not safe, interleave count is set to 1.
+; CHECK-LABEL: @loop_distance_4
+define void @loop_distance_4(i64 %N, ptr %a, ptr %b) {
+entry:
+ %cmp10 = icmp sgt i64 %N, 4
+ br i1 %cmp10, label %for.body, label %for.end
+
+for.body:
+ %indvars.iv = phi i64 [ 4, %entry ], [ %indvars.iv.next, %for.body ]
+ %0 = getelementptr i32, ptr %b, i64 %indvars.iv
+ %arrayidx = getelementptr i8, ptr %0, i64 -16
+ %1 = load i32, ptr %arrayidx, align 4
+ %arrayidx2 = getelementptr inbounds nuw i32, ptr %a, i64 %indvars.iv
+ %2 = load i32, ptr %arrayidx2, align 4
+ %add = add nsw i32 %2, %1
+ store i32 %add, ptr %0, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %N
+ br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1
+
+for.end:
+ ret void
+}
+
+!1 = !{!1, !2, !3}
+!2 = !{!"llvm.loop.interleave.count", i32 4}
+!3 = !{!"llvm.loop.vectorize.width", i32 4}
|
@llvm/pr-subscribers-vectorizers Author: Kerry McLaughlin (kmclaughlin-arm) ChangesWhen an VF is specified via a loop hint, it will be clamped to a safe
According to [1], loop hints are ignored if they are not safe to apply. This patch adds a check to prevent vectorisation with interleaving if [1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave Full diff: https://github.com/llvm/llvm-project/pull/153009.diff 3 Files Affected:
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index be00fd6a416e5..46168b13c253f 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -10138,8 +10138,10 @@ bool LoopVectorizePass::processLoop(Loop *L) {
ElementCount UserVF = Hints.getWidth();
unsigned UserIC = Hints.getInterleave();
+ unsigned SafeUserIC = CM.Legal->isSafeForAnyVectorWidth() ? UserIC : 0;
+
// Plan how to best vectorize.
- LVP.plan(UserVF, UserIC);
+ LVP.plan(UserVF, SafeUserIC);
VectorizationFactor VF = LVP.computeBestVF();
unsigned IC = 1;
@@ -10151,7 +10153,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
// Select the interleave count.
IC = LVP.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost);
- unsigned SelectedIC = std::max(IC, UserIC);
+ unsigned SelectedIC = std::max(IC, SafeUserIC);
+
// Optimistically generate runtime checks if they are needed. Drop them if
// they turn out to not be profitable.
if (VF.Width.isVector() || SelectedIC > 1) {
@@ -10201,7 +10204,14 @@ bool LoopVectorizePass::processLoop(Loop *L) {
VectorizeLoop = false;
}
- if (!LVP.hasPlanWithVF(VF.Width) && UserIC > 1) {
+ if (UserIC > 0 && UserIC != SafeUserIC) {
+ LLVM_DEBUG(dbgs() << "LV: Disabling interleaving as user-specified "
+ "interleave count is unsafe.\n");
+ IntDiagMsg = {"InterleavingUnsafe",
+ "User-specified interleave count is not safe, interleave "
+ "count is set to 1."};
+ InterleaveLoop = false;
+ } else if (!LVP.hasPlanWithVF(VF.Width) && SafeUserIC > 1) {
// Tell the user interleaving was avoided up-front, despite being explicitly
// requested.
LLVM_DEBUG(dbgs() << "LV: Ignoring UserIC, because vectorization and "
@@ -10209,7 +10219,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
IntDiagMsg = {"InterleavingAvoided",
"Ignoring UserIC, because interleaving was avoided up front"};
InterleaveLoop = false;
- } else if (IC == 1 && UserIC <= 1) {
+ } else if (IC == 1 && SafeUserIC <= 1) {
// Tell the user interleaving is not beneficial.
LLVM_DEBUG(dbgs() << "LV: Interleaving is not beneficial.\n");
IntDiagMsg = {
@@ -10221,7 +10231,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
IntDiagMsg.second +=
" and is explicitly disabled or interleave count is set to 1";
}
- } else if (IC > 1 && UserIC == 1) {
+ } else if (IC > 1 && SafeUserIC == 1) {
// Tell the user interleaving is beneficial, but it explicitly disabled.
LLVM_DEBUG(
dbgs() << "LV: Interleaving is beneficial but is explicitly disabled.");
@@ -10245,7 +10255,7 @@ bool LoopVectorizePass::processLoop(Loop *L) {
}
// Override IC if user provided an interleave count.
- IC = UserIC > 0 ? UserIC : IC;
+ IC = SafeUserIC > 0 ? SafeUserIC : IC;
// Emit diagnostic messages, if any.
const char *VAPassName = Hints.vectorizeAnalysisPassName();
diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
index 11cc971586773..f1fc78f117fba 100644
--- a/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
+++ b/llvm/test/Transforms/LoopVectorize/AArch64/scalable-reductions.ll
@@ -417,21 +417,16 @@ for.end: ; preds = %for.body, %entry
; Note: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies
; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop.
-; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2)
+; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 1)
define i32 @memory_dependence(ptr noalias nocapture %a, ptr noalias nocapture readonly %b, i64 %n) {
; CHECK-LABEL: @memory_dependence
; CHECK: vector.body:
; CHECK: %[[LOAD1:.*]] = load <4 x i32>
; CHECK: %[[LOAD2:.*]] = load <4 x i32>
-; CHECK: %[[LOAD3:.*]] = load <4 x i32>
-; CHECK: %[[LOAD4:.*]] = load <4 x i32>
-; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD3]], %[[LOAD1]]
-; CHECK: %[[ADD2:.*]] = add nsw <4 x i32> %[[LOAD4]], %[[LOAD2]]
-; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD3]]
-; CHECK: %[[MUL2:.*]] = mul <4 x i32> %[[LOAD4]]
+; CHECK: %[[ADD1:.*]] = add nsw <4 x i32> %[[LOAD2]], %[[LOAD1]]
+; CHECK: %[[MUL1:.*]] = mul <4 x i32> %[[LOAD2]]
; CHECK: middle.block:
-; CHECK: %[[RDX:.*]] = mul <4 x i32> %[[MUL2]], %[[MUL1]]
-; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[RDX]])
+; CHECK: call i32 @llvm.vector.reduce.mul.v4i32(<4 x i32> %[[MUL1]])
entry:
br label %for.body
diff --git a/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll b/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll
new file mode 100644
index 0000000000000..034df3f54e7e5
--- /dev/null
+++ b/llvm/test/Transforms/LoopVectorize/unsafe-ic-hint-remark.ll
@@ -0,0 +1,33 @@
+; REQUIRES: asserts
+; RUN: opt -passes=loop-vectorize -pass-remarks-analysis=loop-vectorize -debug-only=loop-vectorize -S < %s 2>&1 | FileCheck %s
+
+; Make sure the unsafe user specified interleave count is ignored.
+
+; CHECK: LV: Disabling interleaving as user-specified interleave count is unsafe.
+; CHECK: remark: <unknown>:0:0: User-specified interleave count is not safe, interleave count is set to 1.
+; CHECK-LABEL: @loop_distance_4
+define void @loop_distance_4(i64 %N, ptr %a, ptr %b) {
+entry:
+ %cmp10 = icmp sgt i64 %N, 4
+ br i1 %cmp10, label %for.body, label %for.end
+
+for.body:
+ %indvars.iv = phi i64 [ 4, %entry ], [ %indvars.iv.next, %for.body ]
+ %0 = getelementptr i32, ptr %b, i64 %indvars.iv
+ %arrayidx = getelementptr i8, ptr %0, i64 -16
+ %1 = load i32, ptr %arrayidx, align 4
+ %arrayidx2 = getelementptr inbounds nuw i32, ptr %a, i64 %indvars.iv
+ %2 = load i32, ptr %arrayidx2, align 4
+ %add = add nsw i32 %2, %1
+ store i32 %add, ptr %0, align 4
+ %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
+ %exitcond.not = icmp eq i64 %indvars.iv.next, %N
+ br i1 %exitcond.not, label %for.end, label %for.body, !llvm.loop !1
+
+for.end:
+ ret void
+}
+
+!1 = !{!1, !2, !3}
+!2 = !{!"llvm.loop.interleave.count", i32 4}
+!3 = !{!"llvm.loop.vectorize.width", i32 4}
|
This PR fixes an issue I found whilst testing #147535 with LNT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
if (UserIC > 0 && UserIC != SafeUserIC) { | ||
LLVM_DEBUG(dbgs() << "LV: Disabling interleaving as user-specified " | ||
"interleave count is unsafe.\n"); | ||
IntDiagMsg = {"InterleavingUnsafe", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is strictly true because on the line above you've set SafeUserIC to 0, and then we have unsigned SelectedIC = std::max(IC, SafeUserIC);
It's possible that selectInterleaveCount
may set IC to > 1 in future based on the maximum safe distance. It might be better to say something more vague like "Ignoring UserIC due to possibly unsafe dependencies in the loop".
@@ -417,21 +417,16 @@ for.end: ; preds = %for.body, %entry | |||
|
|||
; Note: This test was added to ensure we always check the legality of reductions (end emit a warning if necessary) before checking for memory dependencies | |||
; CHECK-REMARK: Scalable vectorization not supported for the reduction operations found in this loop. | |||
; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 2) | |||
; CHECK-REMARK: vectorized loop (vectorization width: 4, interleaved count: 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth adding a check for the remark we emit about ignoring the UserIC?
@@ -0,0 +1,33 @@ | |||
; REQUIRES: asserts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you're testing the remarks anyway I think you can remove the ; REQUIRES: asserts
and don't bother testing the debug output.
- Removed need for asserts in new test
@@ -10138,8 +10138,10 @@ bool LoopVectorizePass::processLoop(Loop *L) { | |||
ElementCount UserVF = Hints.getWidth(); | |||
unsigned UserIC = Hints.getInterleave(); | |||
|
|||
unsigned SafeUserIC = CM.Legal->isSafeForAnyVectorWidth() ? UserIC : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to move dealing with UserIC to selectInterleaveCount
, where you can re-order the checks a bit so we first perform the legality check and otherwise return the UserIC, if provided.
That way we don't have to duplicate the logic here to determine if an interleave count is safe, and don't have to worry about keeping them in sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @fhahn, I've moved the decision on whether to use the UserIC into selectInterleaveCount as suggested, but kept the diagnostics in processLoop.
|
||
unsigned SelectedIC = std::max(IC, UserIC); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn I think this code looks wrong and is making life difficult for @kmclaughlin-arm here. If the user has asked for an interleave count of 2, but we create checks based on the selected interleave count of 4, then either it's just functionally incorrect, or it's leading to a pessimistic cost model where we may never enter the vector loop.
if (LVP.hasPlanWithVF(VF.Width)) { | ||
// Select the interleave count. | ||
IC = LVP.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, VF.Cost); | ||
IC = LVP.selectInterleaveCount(LVP.getPlanFor(VF.Width), VF.Width, UserIC, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fhahn by fixing the UserIC in selectInterleaveCount we have reintroduced a bug versus the original version of the patch. If you look above we pass in the original UserIC to LVP.plan, which is then used by computeMaxVF to determine if there is no scalar epilogue and to decide the tail-folding style. This was based on the assumption that UserIC will not change. I think we either:
- Have to go back to the original version of this PR, or
- The legality aspects of selectInterleaveCount need splitting out into a new function that is called before LVP.plan and the UserIC modified early on.
When an VF is specified via a loop hint, it will be clamped to a safe
VF or ignored if it is found to be unsafe. This is not the case for
user-specified interleave counts, which can lead to loops such as
the following with a memory dependence being vectorised with
interleaving:
According to [1], loop hints are ignored if they are not safe to apply.
This patch adds a check to prevent vectorisation with interleaving if
isSafeForAnyVectorWidth() returns false. This is already checked in
selectInterleaveCount().
[1] https://llvm.org/docs/LangRef.html#llvm-loop-vectorize-and-llvm-loop-interleave