-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[OpenACC][CIR] Add lowering for 'copy' array indexes #140971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The array indexes(and sections) are represented by the acc.bounds operation, which this ensures we fill in properly. The lowerbound is required, so we always get that. The upperbound or extent is required. We typically do extent, since that is the 'length' as specified by ACC, but in cases where we have implicit length, we use the extent instead. It isn't clear when 'stride' should be anything besides 1, though by my reading, since we have full-types in the emitted code, we should never have it be anything but 1. This patch enables these for copy on compute and combined constructs, and makes sure to test everything I could think of for combinations/permutations.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clangir Author: Erich Keane (erichkeane) ChangesThe array indexes(and sections) are represented by the acc.bounds operation, which this ensures we fill in properly. The lowerbound is required, so we always get that. The upperbound or extent is required. We typically do extent, since that is the 'length' as specified by ACC, but in cases where we have implicit length, we use the extent instead. It isn't clear when 'stride' should be anything besides 1, though by my reading, since we have full-types in the emitted code, we should never have it be anything but 1. This patch enables these for copy on compute and combined constructs, and makes sure to test everything I could think of for combinations/permutations. Patch is 88.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/140971.diff 3 Files Affected:
diff --git a/clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp b/clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp
index 91aa358638d5d..9e45ad8519e11 100644
--- a/clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp
+++ b/clang/lib/CIR/CodeGen/CIRGenOpenACCClause.cpp
@@ -121,6 +121,11 @@ class OpenACCClauseCIREmitter final
return constOp.getResult();
}
+ mlir::Value createConstantInt(SourceLocation loc, unsigned width,
+ int64_t value) {
+ return createConstantInt(cgf.cgm.getLoc(loc), width, value);
+ }
+
mlir::acc::DeviceType decodeDeviceType(const IdentifierInfo *ii) {
// '*' case leaves no identifier-info, just a nullptr.
if (!ii)
@@ -184,37 +189,94 @@ class OpenACCClauseCIREmitter final
mlir::Location beginLoc;
mlir::Value varValue;
llvm::StringRef name;
+ llvm::SmallVector<mlir::Value> bounds;
};
+ mlir::Value createBound(mlir::Location boundLoc, mlir::Value lowerBound,
+ mlir::Value upperBound, mlir::Value extent) {
+ // Arrays always have a start-idx of 0.
+ mlir::Value startIdx = createConstantInt(boundLoc, 64, 0);
+ // TODO: OpenACC: It isn't clear that stride would ever be anything other
+ // than '1'? We emit the type of the reference 'correctly' as far as I
+ // know, so it should just be 1 element each time. We could perhaps use
+ // the 'inBytes' variant here, but it isn't clear what value that gets us.
+ // We might need to revisit this once we try to opt this and see what is
+ // going to happen.
+ mlir::Value stride = createConstantInt(boundLoc, 64, 1);
+
+ auto bound = builder.create<mlir::acc::DataBoundsOp>(boundLoc, lowerBound,
+ upperBound);
+ bound.getStartIdxMutable().assign(startIdx);
+ if (extent)
+ bound.getExtentMutable().assign(extent);
+ bound.getStrideMutable().assign(stride);
+
+ return bound;
+ }
+
// A helper function that gets the information from an operand to a data
// clause, so that it can be used to emit the data operations.
- inline DataOperandInfo getDataOperandInfo(OpenACCDirectiveKind dk,
- const Expr *e) {
+ DataOperandInfo getDataOperandInfo(OpenACCDirectiveKind dk, const Expr *e) {
// TODO: OpenACC: Cache was different enough as to need a separate
// `ActOnCacheVar`, so we are going to need to do some investigations here
// when it comes to implement this for cache.
if (dk == OpenACCDirectiveKind::Cache) {
cgf.cgm.errorNYI(e->getSourceRange(),
"OpenACC data operand for 'cache' directive");
- return {cgf.cgm.getLoc(e->getBeginLoc()), {}, {}};
+ return {cgf.cgm.getLoc(e->getBeginLoc()), {}, {}, {}};
}
const Expr *curVarExpr = e->IgnoreParenImpCasts();
mlir::Location exprLoc = cgf.cgm.getLoc(curVarExpr->getBeginLoc());
+ llvm::SmallVector<mlir::Value> bounds;
+
+ // Assemble the list of bounds.
+ while (isa<ArraySectionExpr, ArraySubscriptExpr>(curVarExpr)) {
+ mlir::Location boundLoc = cgf.cgm.getLoc(curVarExpr->getBeginLoc());
+ mlir::Value lowerBound;
+ mlir::Value upperBound;
+ mlir::Value extent;
+
+ if (const auto *section = dyn_cast<ArraySectionExpr>(curVarExpr)) {
+ if (const Expr *lb = section->getLowerBound())
+ lowerBound = emitIntExpr(lb);
+ else
+ lowerBound = createConstantInt(boundLoc, 64, 0);
+
+ if (const Expr *len = section->getLength()) {
+ extent = emitIntExpr(len);
+ } else {
+ QualType baseTy = ArraySectionExpr::getBaseOriginalType(
+ section->getBase()->IgnoreParenImpCasts());
+ // We know this is the case as implicit lengths are only allowed for
+ // array types with a constant size, or a dependent size. AND since
+ // we are codegen we know we're not dependent.
+ auto *arrayTy = cgf.getContext().getAsConstantArrayType(baseTy);
+ // Rather than trying to calculate the extent based on the
+ // lower-bound, we can just emit this as an upper bound.
+ upperBound =
+ createConstantInt(boundLoc, 64, arrayTy->getLimitedSize() - 1);
+ }
- // TODO: OpenACC: Assemble the list of bounds.
- if (isa<ArraySectionExpr, ArraySubscriptExpr>(curVarExpr)) {
- cgf.cgm.errorNYI(curVarExpr->getSourceRange(),
- "OpenACC data clause array subscript/section");
- return {exprLoc, {}, {}};
+ curVarExpr = section->getBase()->IgnoreParenImpCasts();
+ } else {
+ const auto *subscript = dyn_cast<ArraySubscriptExpr>(curVarExpr);
+
+ lowerBound = emitIntExpr(subscript->getIdx());
+ // Length of an array index is always 1.
+ extent = createConstantInt(boundLoc, 64, 1);
+ curVarExpr = subscript->getBase()->IgnoreParenImpCasts();
+ }
+
+ bounds.push_back(createBound(boundLoc, lowerBound, upperBound, extent));
}
// TODO: OpenACC: if this is a member expr, emit the VarPtrPtr correctly.
if (isa<MemberExpr>(curVarExpr)) {
cgf.cgm.errorNYI(curVarExpr->getSourceRange(),
"OpenACC Data clause member expr");
- return {exprLoc, {}, {}};
+ return {exprLoc, {}, {}, std::move(bounds)};
}
// Sema has made sure that only 4 types of things can get here, array
@@ -223,14 +285,14 @@ class OpenACCClauseCIREmitter final
// right.
const auto *dre = cast<DeclRefExpr>(curVarExpr);
const auto *vd = cast<VarDecl>(dre->getFoundDecl()->getCanonicalDecl());
- return {exprLoc, cgf.emitDeclRefLValue(dre).getPointer(), vd->getName()};
+ return {exprLoc, cgf.emitDeclRefLValue(dre).getPointer(), vd->getName(),
+ std::move(bounds)};
}
template <typename BeforeOpTy, typename AfterOpTy>
void addDataOperand(const Expr *varOperand, mlir::acc::DataClause dataClause,
bool structured, bool implicit) {
DataOperandInfo opInfo = getDataOperandInfo(dirKind, varOperand);
- mlir::ValueRange bounds;
// TODO: OpenACC: we should comprehend the 'modifier-list' here for the data
// operand. At the moment, we don't have a uniform way to assign these
@@ -239,7 +301,7 @@ class OpenACCClauseCIREmitter final
auto beforeOp =
builder.create<BeforeOpTy>(opInfo.beginLoc, opInfo.varValue, structured,
- implicit, opInfo.name, bounds);
+ implicit, opInfo.name, opInfo.bounds);
operation.getDataClauseOperandsMutable().append(beforeOp.getResult());
AfterOpTy afterOp;
@@ -248,7 +310,7 @@ class OpenACCClauseCIREmitter final
builder.setInsertionPointAfter(operation);
afterOp = builder.create<AfterOpTy>(opInfo.beginLoc, beforeOp.getResult(),
opInfo.varValue, structured, implicit,
- opInfo.name, bounds);
+ opInfo.name, opInfo.bounds);
}
// Set the 'rest' of the info for both operations.
diff --git a/clang/test/CIR/CodeGenOpenACC/combined-copy.c b/clang/test/CIR/CodeGenOpenACC/combined-copy.c
index 8df6b9fc03454..3ebca6578c8a9 100644
--- a/clang/test/CIR/CodeGenOpenACC/combined-copy.c
+++ b/clang/test/CIR/CodeGenOpenACC/combined-copy.c
@@ -12,6 +12,8 @@ void acc_compute(int parmVar) {
// CHECK-NEXT: %[[LOCAL3:.*]] = cir.alloca !cir.float, !cir.ptr<!cir.float>, ["localVar3"]
// CHECK-NEXT: %[[LOCALPTR:.*]] = cir.alloca !cir.ptr<!s16i>, !cir.ptr<!cir.ptr<!s16i>>, ["localPointer"]
// CHECK-NEXT: %[[LOCALARRAY:.*]] = cir.alloca !cir.array<!cir.float x 5>, !cir.ptr<!cir.array<!cir.float x 5>>, ["localArray"]
+ // CHECK-NEXT: %[[LOCALARRAYOFPTRS:.*]] = cir.alloca !cir.array<!cir.ptr<!cir.float> x 5>, !cir.ptr<!cir.array<!cir.ptr<!cir.float> x 5>>, ["localArrayOfPtrs"]
+ // CHECK-NEXT: %[[THREEDARRAY:.*]] = cir.alloca !cir.array<!cir.array<!cir.array<!cir.double x 7> x 6> x 5>, !cir.ptr<!cir.array<!cir.array<!cir.array<!cir.double x 7> x 6> x 5>>, ["threeDArray"]
// CHECK-NEXT: cir.store %[[ARG]], %[[PARM]] : !s32i, !cir.ptr<!s32i>
#pragma acc parallel loop copy(localVar1, parmVar) copy(localVar2) copy(localVar3, parmVar)
@@ -261,4 +263,503 @@ void acc_compute(int parmVar) {
// CHECK-NEXT: acc.yield
// CHECK-NEXT: } loc
// CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!s32i>) async(%[[ZERO_CAST]] : si32, %[[ONE_CAST]] : si32 [#acc.device_type<nvidia>], %[[ONE_CAST]] : si32 [#acc.device_type<radeon>]) to varPtr(%[[LOCAL1]] : !cir.ptr<!s32i>) {dataClause = #acc<data_clause acc_copy>, name = "localVar1"} loc
+
+#pragma acc parallel loop copy(localArray[3])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[THREE:.*]] = cir.const #cir.int<3> : !s32i
+ // CHECK-NEXT: %[[THREE_CAST:.*]] = builtin.unrealized_conversion_cast %[[THREE]] : !s32i to si32
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST2:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[THREE_CAST]] : si32) extent(%[[ONE_CONST]] : i64) stride(%[[ONE_CONST2]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.parallel combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(parallel) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc serial loop copy(localArray[1:3])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[ONE:.*]] = cir.const #cir.int<1> : !s32i
+ // CHECK-NEXT: %[[ONE_CAST:.*]] = builtin.unrealized_conversion_cast %[[ONE]] : !s32i to si32
+ // CHECK-NEXT: %[[THREE:.*]] = cir.const #cir.int<3> : !s32i
+ // CHECK-NEXT: %[[THREE_CAST:.*]] = builtin.unrealized_conversion_cast %[[THREE]] : !s32i to si32
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST2:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[ONE_CAST]] : si32) extent(%[[THREE_CAST]] : si32) stride(%[[ONE_CONST2]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.serial combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(serial) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc kernels loop copy(localArray[:3])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[THREE:.*]] = cir.const #cir.int<3> : !s32i
+ // CHECK-NEXT: %[[THREE_CAST:.*]] = builtin.unrealized_conversion_cast %[[THREE]] : !s32i to si32
+ // CHECK-NEXT: %[[ZERO_CONST2:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[ZERO_CONST]] : i64) extent(%[[THREE_CAST]] : si32) stride(%[[ONE_CONST]] : i64) startIdx(%[[ZERO_CONST2]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.kernels combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(kernels) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.terminator
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc parallel loop copy(localArray[1:])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[ONE:.*]] = cir.const #cir.int<1> : !s32i
+ // CHECK-NEXT: %[[ONE_CAST:.*]] = builtin.unrealized_conversion_cast %[[ONE]] : !s32i to si32
+ // CHECK-NEXT: %[[FOUR_CONST:.*]] = arith.constant 4 : i64
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[ONE_CAST]] : si32) upperbound(%[[FOUR_CONST]] : i64) stride(%[[ONE_CONST]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.parallel combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(parallel) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc serial loop copy(localArray[localVar1:localVar2])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[LV1:.*]] = cir.load %[[LOCAL1]] : !cir.ptr<!s32i>, !s32i
+ // CHECK-NEXT: %[[LV1_CAST:.*]] = builtin.unrealized_conversion_cast %[[LV1]] : !s32i to si32
+ // CHECK-NEXT: %[[LV2:.*]] = cir.load %[[LOCAL2]] : !cir.ptr<!s16i>, !s16i
+ // CHECK-NEXT: %[[LV2_CAST:.*]] = builtin.unrealized_conversion_cast %[[LV2]] : !s16i to si16
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[LV1_CAST]] : si32) extent(%[[LV2_CAST]] : si16) stride(%[[ONE_CONST]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.serial combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(serial) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc kernels loop copy(localArray[:localVar2])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[LV2:.*]] = cir.load %[[LOCAL2]] : !cir.ptr<!s16i>, !s16i
+ // CHECK-NEXT: %[[LV2_CAST:.*]] = builtin.unrealized_conversion_cast %[[LV2]] : !s16i to si16
+ // CHECK-NEXT: %[[ZERO_CONST2:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[ZERO_CONST]] : i64) extent(%[[LV2_CAST]] : si16) stride(%[[ONE_CONST]] : i64) startIdx(%[[ZERO_CONST2]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.kernels combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(kernels) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.terminator
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc parallel loop copy(localArray[localVar1:])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[LV1:.*]] = cir.load %[[LOCAL1]] : !cir.ptr<!s32i>, !s32i
+ // CHECK-NEXT: %[[LV1_CAST:.*]] = builtin.unrealized_conversion_cast %[[LV1]] : !s32i to si32
+ // CHECK-NEXT: %[[FOUR_CONST:.*]] = arith.constant 4 : i64
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[LV1_CAST]] : si32) upperbound(%[[FOUR_CONST]] : i64) stride(%[[ONE_CONST]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.array<!cir.float x 5>> {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+ // CHECK-NEXT: acc.parallel combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) {
+ // CHECK-NEXT: acc.loop combined(parallel) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.array<!cir.float x 5>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALARRAY]] : !cir.ptr<!cir.array<!cir.float x 5>>) {dataClause = #acc<data_clause acc_copy>, name = "localArray"} loc
+
+#pragma acc serial loop copy(localPointer[3])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[THREE:.*]] = cir.const #cir.int<3> : !s32i
+ // CHECK-NEXT: %[[THREE_CAST:.*]] = builtin.unrealized_conversion_cast %[[THREE]] : !s32i to si32
+ // CHECK-NEXT: %[[ONE_CONST:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST2:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[THREE_CAST]] : si32) extent(%[[ONE_CONST]] : i64) stride(%[[ONE_CONST2]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin varPtr(%[[LOCALPTR]] : !cir.ptr<!cir.ptr<!s16i>>) bounds(%[[BOUNDS]]) -> !cir.ptr<!cir.ptr<!s16i>> {dataClause = #acc<data_clause acc_copy>, name = "localPointer"} loc
+ // CHECK-NEXT: acc.serial combined(loop) dataOperands(%[[COPYIN1]] : !cir.ptr<!cir.ptr<!s16i>>) {
+ // CHECK-NEXT: acc.loop combined(serial) {
+ // CHECK: acc.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: acc.yield
+ // CHECK-NEXT: } loc
+ // CHECK-NEXT: acc.copyout accPtr(%[[COPYIN1]] : !cir.ptr<!cir.ptr<!s16i>>) bounds(%[[BOUNDS]]) to varPtr(%[[LOCALPTR]] : !cir.ptr<!cir.ptr<!s16i>>) {dataClause = #acc<data_clause acc_copy>, name = "localPointer"} loc
+
+#pragma acc kernels loop copy(localPointer[1:3])
+ for(int i = 0; i < 5; ++i);
+ // CHECK-NEXT: %[[ONE:.*]] = cir.const #cir.int<1> : !s32i
+ // CHECK-NEXT: %[[ONE_CAST:.*]] = builtin.unrealized_conversion_cast %[[ONE]] : !s32i to si32
+ // CHECK-NEXT: %[[THREE:.*]] = cir.const #cir.int<3> : !s32i
+ // CHECK-NEXT: %[[THREE_CAST:.*]] = builtin.unrealized_conversion_cast %[[THREE]] : !s32i to si32
+ // CHECK-NEXT: %[[ZERO_CONST:.*]] = arith.constant 0 : i64
+ // CHECK-NEXT: %[[ONE_CONST2:.*]] = arith.constant 1 : i64
+ // CHECK-NEXT: %[[BOUNDS:.*]] = acc.bounds lowerbound(%[[ONE_CAST]] : si32) extent(%[[THREE_CAST]] : si32) stride(%[[ONE_CONST2]] : i64) startIdx(%[[ZERO_CONST]] : i64) loc
+ // CHECK-NEXT: %[[COPYIN1:.*]] = acc.copyin ...
[truncated]
|
The array indexes(and sections) are represented by the acc.bounds operation, which this ensures we fill in properly. The lowerbound is required, so we always get that.
The upperbound or extent is required. We typically do extent, since that is the 'length' as specified by ACC, but in cases where we have implicit length, we use the extent instead.
It isn't clear when 'stride' should be anything besides 1, though by my reading, since we have full-types in the emitted code, we should never have it be anything but 1.
This patch enables these for copy on compute and combined constructs, and makes sure to test everything I could think of for combinations/permutations.