MLIR 23.0.0git
ACCImplicitDeclare.cpp
Go to the documentation of this file.
1//===- ACCImplicitDeclare.cpp ---------------------------------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass applies implicit `acc declare` actions to global variables
10// referenced in OpenACC compute regions and routine functions.
11//
12// Overview:
13// ---------
14// Global references in an acc regions (for globals not marked with `acc
15// declare` by the user) can be handled in one of two ways:
16// - Mapped through data clauses
17// - Implicitly marked as `acc declare` (this pass)
18//
19// Thus, the OpenACC specification focuses solely on implicit data mapping rules
20// whose implementation is captured in `ACCImplicitData` pass.
21//
22// However, it is both advantageous and required for certain cases to
23// use implicit `acc declare` instead:
24// - Any functions that are implicitly marked as `acc routine` through
25// `ACCImplicitRoutine` may reference globals. Since data mapping
26// is only possible for compute regions, such globals can only be
27// made available on device through `acc declare`.
28// - Compiler can generate and use globals for cases needed in IR
29// representation such as type descriptors or various names needed for
30// runtime calls and error reporting - such cases often are introduced
31// after a frontend semantic checking is done since it is related to
32// implementation detail. Thus, such compiler generated globals would
33// not have been visible for a user to mark with `acc declare`.
34// - Constant globals such as filename strings or data initialization values
35// are values that do not get mutated but are still needed for appropriate
36// runtime execution. If a kernel is launched 1000 times, it is not a
37// good idea to map such a global 1000 times. Therefore, such globals
38// benefit from being marked with `acc declare`.
39//
40// This pass automatically
41// marks global variables with the `acc.declare` attribute when they are
42// referenced in OpenACC compute constructs or routine functions and meet
43// the criteria noted above, ensuring
44// they are properly handled for device execution.
45//
46// The pass performs two main optimizations:
47//
48// 1. Hoisting: For non-constant globals referenced in compute regions, the
49// pass hoists the address-of operation out of the region when possible,
50// allowing them to be implicitly mapped through normal data clause
51// mechanisms rather than requiring declare marking.
52//
53// 2. Declaration: For globals that must be available on the device (constants,
54// globals in routines, globals in recipe operations), the pass adds the
55// `acc.declare` attribute with the copyin data clause.
56//
57// Requirements:
58// -------------
59// To use this pass in a pipeline, the following requirements must be met:
60//
61// 1. Operation Interface Implementation: Operations that compute addresses
62// of global variables must implement the `acc::AddressOfGlobalOpInterface`
63// and those that represent globals must implement the
64// `acc::GlobalOpInterface`. Additionally, any operations that indirectly
65// access globals must implement the `acc::IndirectGlobalAccessOpInterface`.
66//
67// 2. Analysis Registration (Optional): If custom behavior is needed for
68// determining if a symbol use is valid within GPU regions, the dialect
69// should pre-register the `acc::OpenACCSupport` analysis.
70//
71// Examples:
72// ---------
73//
74// Example 1: Non-constant global in compute region (hoisted)
75//
76// Before:
77// memref.global @g_scalar : memref<f32> = dense<0.0>
78// func.func @test() {
79// acc.serial {
80// %addr = memref.get_global @g_scalar : memref<f32>
81// %val = memref.load %addr[] : memref<f32>
82// acc.yield
83// }
84// }
85//
86// After:
87// memref.global @g_scalar : memref<f32> = dense<0.0>
88// func.func @test() {
89// %addr = memref.get_global @g_scalar : memref<f32>
90// acc.serial {
91// %val = memref.load %addr[] : memref<f32>
92// acc.yield
93// }
94// }
95//
96// Example 2: Constant global in compute region (declared)
97//
98// Before:
99// memref.global constant @g_const : memref<f32> = dense<1.0>
100// func.func @test() {
101// acc.serial {
102// %addr = memref.get_global @g_const : memref<f32>
103// %val = memref.load %addr[] : memref<f32>
104// acc.yield
105// }
106// }
107//
108// After:
109// memref.global constant @g_const : memref<f32> = dense<1.0>
110// {acc.declare = #acc.declare<dataClause = acc_copyin>}
111// func.func @test() {
112// acc.serial {
113// %addr = memref.get_global @g_const : memref<f32>
114// %val = memref.load %addr[] : memref<f32>
115// acc.yield
116// }
117// }
118//
119// Example 3: Global in acc routine (declared)
120//
121// Before:
122// memref.global @g_data : memref<f32> = dense<0.0>
123// acc.routine @routine_0 func(@device_func)
124// func.func @device_func() attributes {acc.routine_info = ...} {
125// %addr = memref.get_global @g_data : memref<f32>
126// %val = memref.load %addr[] : memref<f32>
127// }
128//
129// After:
130// memref.global @g_data : memref<f32> = dense<0.0>
131// {acc.declare = #acc.declare<dataClause = acc_copyin>}
132// acc.routine @routine_0 func(@device_func)
133// func.func @device_func() attributes {acc.routine_info = ...} {
134// %addr = memref.get_global @g_data : memref<f32>
135// %val = memref.load %addr[] : memref<f32>
136// }
137//
138// Example 4: Global in private recipe (declared if recipe is used)
139//
140// Before:
141// memref.global @g_init : memref<f32> = dense<0.0>
142// acc.private.recipe @priv_recipe : memref<f32> init {
143// ^bb0(%arg0: memref<f32>):
144// %alloc = memref.alloc() : memref<f32>
145// %global = memref.get_global @g_init : memref<f32>
146// %val = memref.load %global[] : memref<f32>
147// memref.store %val, %alloc[] : memref<f32>
148// acc.yield %alloc : memref<f32>
149// } destroy { ... }
150// func.func @test() {
151// %var = memref.alloc() : memref<f32>
152// %priv = acc.private varPtr(%var : memref<f32>)
153// recipe(@priv_recipe) -> memref<f32>
154// acc.parallel private(%priv : memref<f32>) { ... }
155// }
156//
157// After:
158// memref.global @g_init : memref<f32> = dense<0.0>
159// {acc.declare = #acc.declare<dataClause = acc_copyin>}
160// acc.private.recipe @priv_recipe : memref<f32> init {
161// ^bb0(%arg0: memref<f32>):
162// %alloc = memref.alloc() : memref<f32>
163// %global = memref.get_global @g_init : memref<f32>
164// %val = memref.load %global[] : memref<f32>
165// memref.store %val, %alloc[] : memref<f32>
166// acc.yield %alloc : memref<f32>
167// } destroy { ... }
168// func.func @test() {
169// %var = memref.alloc() : memref<f32>
170// %priv = acc.private varPtr(%var : memref<f32>)
171// recipe(@priv_recipe) -> memref<f32>
172// acc.parallel private(%priv : memref<f32>) { ... }
173// }
174//
175//===----------------------------------------------------------------------===//
176
178
181#include "mlir/IR/Builders.h"
183#include "mlir/IR/BuiltinOps.h"
184#include "mlir/IR/Operation.h"
185#include "mlir/IR/Value.h"
187#include "llvm/ADT/SmallVector.h"
188#include "llvm/ADT/TypeSwitch.h"
189
190namespace mlir {
191namespace acc {
192#define GEN_PASS_DEF_ACCIMPLICITDECLARE
193#include "mlir/Dialect/OpenACC/Transforms/Passes.h.inc"
194} // namespace acc
195} // namespace mlir
196
197#define DEBUG_TYPE "acc-implicit-declare"
198
199using namespace mlir;
200
201namespace {
202
203using GlobalOpSetT = llvm::SmallSetVector<Operation *, 16>;
204
205/// Checks whether a use of the requested `globalOp` should be considered
206/// for hoisting out of acc region due to avoid `acc declare`ing something
207/// that instead should be implicitly mapped.
208static bool isGlobalUseCandidateForHoisting(Operation *globalOp,
209 Operation *user,
210 SymbolRefAttr symbol,
211 acc::OpenACCSupport &accSupport) {
212 // This symbol is valid in GPU region. This means semantics
213 // would change if moved to host - therefore it is not a candidate.
214 if (accSupport.isValidSymbolUse(user, symbol))
215 return false;
216
217 bool isConstant = false;
218 bool isFunction = false;
219
220 if (auto globalVarOp = dyn_cast<acc::GlobalVariableOpInterface>(globalOp))
221 isConstant = globalVarOp.isConstant();
222
223 if (isa<FunctionOpInterface>(globalOp))
224 isFunction = true;
225
226 // Constants should be kept in device code to ensure they are duplicated.
227 // Function references should be kept in device code to ensure their device
228 // addresses are computed. Everything else should be hoisted since we already
229 // proved they are not valid symbols in GPU region.
230 return !isConstant && !isFunction;
231}
232
233/// Checks whether it is valid to use acc.declare marking on the global.
234bool isValidForAccDeclare(Operation *globalOp) {
235 // For functions - we use acc.routine marking instead.
236 return !isa<FunctionOpInterface>(globalOp);
237}
238
239/// Checks whether a recipe operation has meaningful use of its symbol that
240/// justifies processing its regions for global references. Returns false if:
241/// 1. The recipe has no symbol uses at all, or
242/// 2. The only symbol use is the recipe's own symbol definition
243template <typename RecipeOpT>
244static bool hasRelevantRecipeUse(RecipeOpT &recipeOp, ModuleOp &mod) {
245 std::optional<SymbolTable::UseRange> symbolUses = recipeOp.getSymbolUses(mod);
246
247 // No recipe symbol uses.
248 if (!symbolUses.has_value() || symbolUses->empty())
249 return false;
250
251 // If more than one use, assume it's used.
252 auto begin = symbolUses->begin();
253 auto end = symbolUses->end();
254 if (begin != end && std::next(begin) != end)
255 return true;
256
257 // If single use, check if the use is the recipe itself.
258 const SymbolTable::SymbolUse &use = *symbolUses->begin();
259 return use.getUser() != recipeOp.getOperation();
260}
261
262// Hoists addr_of operations for non-constant globals out of OpenACC regions.
263// This way - they are implicitly mapped instead of being considered for
264// implicit declare.
265template <typename AccConstructT>
266static void hoistNonConstantDirectUses(AccConstructT accOp,
267 acc::OpenACCSupport &accSupport) {
268 accOp.walk([&](acc::AddressOfGlobalOpInterface addrOfOp) {
269 SymbolRefAttr symRef = addrOfOp.getSymbol();
270 if (symRef) {
271 Operation *globalOp =
272 SymbolTable::lookupNearestSymbolFrom(addrOfOp, symRef);
273 if (isGlobalUseCandidateForHoisting(globalOp, addrOfOp, symRef,
274 accSupport)) {
275 auto computeRegionParent =
276 addrOfOp->getParentOfType<acc::ComputeRegionOp>();
277 addrOfOp->moveBefore(accOp);
278 if (computeRegionParent)
279 for (Value v : addrOfOp->getResults())
280 computeRegionParent.wireHoistedValueThroughIns(v);
281 LLVM_DEBUG(
282 llvm::dbgs() << "Hoisted:\n\t" << addrOfOp << "\n\tfrom:\n\t";
283 accOp->print(llvm::dbgs(),
284 OpPrintingFlags{}.skipRegions().enableDebugInfo());
285 llvm::dbgs() << "\n");
286 }
287 }
288 });
289}
290
291// Collects the globals referenced in a device region
292static void collectGlobalsFromDeviceRegion(Region &region,
293 GlobalOpSetT &globals,
294 acc::OpenACCSupport &accSupport,
295 SymbolTable &symTab) {
296 region.walk([&](Operation *op) {
297 // 1) Only consider relevant operations which use symbols
298 auto addrOfOp = dyn_cast<acc::AddressOfGlobalOpInterface>(op);
299 if (addrOfOp) {
300 SymbolRefAttr symRef = addrOfOp.getSymbol();
301 // 2) Found an operation which uses the symbol. Next determine if it
302 // is a candidate for `acc declare`. Some of the criteria considered
303 // is whether this symbol is not already a device one (either because
304 // acc declare is already used or this is a CUF global).
305 Operation *globalOp = nullptr;
306 bool isCandidate = !accSupport.isValidSymbolUse(op, symRef, &globalOp);
307 // 3) Add the candidate to the set of globals to be `acc declare`d.
308 if (isCandidate && globalOp && isValidForAccDeclare(globalOp))
309 globals.insert(globalOp);
310 } else if (auto indirectAccessOp =
311 dyn_cast<acc::IndirectGlobalAccessOpInterface>(op)) {
312 // Process operations that indirectly access globals
314 indirectAccessOp.getReferencedSymbols(symbols, &symTab);
315 for (SymbolRefAttr symRef : symbols)
316 if (Operation *globalOp = symTab.lookup(symRef.getLeafReference()))
317 if (isValidForAccDeclare(globalOp))
318 globals.insert(globalOp);
319 }
320 });
321}
322
323// Adds the declare attribute to the operation `op`.
324static void addDeclareAttr(MLIRContext *context, Operation *op,
325 acc::DataClause clause) {
327 acc::DeclareAttr::get(context,
328 acc::DataClauseAttr::get(context, clause)));
329}
330
331// This pass applies implicit declare actions for globals referenced in
332// OpenACC compute and routine regions.
333class ACCImplicitDeclare
334 : public acc::impl::ACCImplicitDeclareBase<ACCImplicitDeclare> {
335public:
336 using ACCImplicitDeclareBase<ACCImplicitDeclare>::ACCImplicitDeclareBase;
337
338 void runOnOperation() override {
339 ModuleOp mod = getOperation();
340 MLIRContext *context = &getContext();
341 acc::OpenACCSupport &accSupport = getAnalysis<acc::OpenACCSupport>();
342
343 // 1) Start off by hoisting any AddressOf operations out of acc region
344 // for any cases we do not want to `acc declare`. This is because we can
345 // rely on implicit data mapping in majority of cases without uselessly
346 // polluting the device globals.
347 mod.walk([&](Operation *op) {
349 .Case<ACC_COMPUTE_CONSTRUCT_OPS, acc::ComputeRegionOp>(
350 [&](auto accOp) {
351 hoistNonConstantDirectUses(accOp, accSupport);
352 });
353 });
354
355 // 2) Collect global symbols which need to be `acc declare`d. Do it for
356 // compute regions, acc routine, and existing globals with the declare
357 // attribute.
358 SymbolTable symTab(mod);
359 GlobalOpSetT globalsToAccDeclare;
360 mod.walk([&](Operation *op) {
362 .Case<ACC_COMPUTE_CONSTRUCT_OPS, acc::ComputeRegionOp>(
363 [&](auto accOp) {
364 collectGlobalsFromDeviceRegion(
365 accOp.getRegion(), globalsToAccDeclare, accSupport, symTab);
366 })
367 .Case([&](FunctionOpInterface func) {
368 if ((acc::isAccRoutine(func) ||
370 !func.isExternal())
371 collectGlobalsFromDeviceRegion(func.getFunctionBody(),
372 globalsToAccDeclare, accSupport,
373 symTab);
374 })
375 .Case([&](acc::GlobalVariableOpInterface globalVarOp) {
376 if (globalVarOp->getAttr(acc::getDeclareAttrName()))
377 if (Region *initRegion = globalVarOp.getInitRegion())
378 collectGlobalsFromDeviceRegion(*initRegion, globalsToAccDeclare,
379 accSupport, symTab);
380 })
381 .Case([&](acc::PrivateRecipeOp privateRecipe) {
382 if (hasRelevantRecipeUse(privateRecipe, mod)) {
383 collectGlobalsFromDeviceRegion(privateRecipe.getInitRegion(),
384 globalsToAccDeclare, accSupport,
385 symTab);
386 collectGlobalsFromDeviceRegion(privateRecipe.getDestroyRegion(),
387 globalsToAccDeclare, accSupport,
388 symTab);
389 }
390 })
391 .Case([&](acc::FirstprivateRecipeOp firstprivateRecipe) {
392 if (hasRelevantRecipeUse(firstprivateRecipe, mod)) {
393 collectGlobalsFromDeviceRegion(firstprivateRecipe.getInitRegion(),
394 globalsToAccDeclare, accSupport,
395 symTab);
396 collectGlobalsFromDeviceRegion(
397 firstprivateRecipe.getDestroyRegion(), globalsToAccDeclare,
398 accSupport, symTab);
399 collectGlobalsFromDeviceRegion(firstprivateRecipe.getCopyRegion(),
400 globalsToAccDeclare, accSupport,
401 symTab);
402 }
403 })
404 .Case([&](acc::ReductionRecipeOp reductionRecipe) {
405 if (hasRelevantRecipeUse(reductionRecipe, mod)) {
406 collectGlobalsFromDeviceRegion(reductionRecipe.getInitRegion(),
407 globalsToAccDeclare, accSupport,
408 symTab);
409 collectGlobalsFromDeviceRegion(
410 reductionRecipe.getCombinerRegion(), globalsToAccDeclare,
411 accSupport, symTab);
412 }
413 });
414 });
415
416 // 3) Finally, generate the appropriate declare actions needed to ensure
417 // this is considered for device global.
418 for (Operation *globalOp : globalsToAccDeclare) {
419 LLVM_DEBUG(
420 llvm::dbgs() << "Global is being `acc declare copyin`d: ";
421 globalOp->print(llvm::dbgs(),
422 OpPrintingFlags{}.skipRegions().enableDebugInfo());
423 llvm::dbgs() << "\n");
424
425 // Mark it as declare copyin.
426 addDeclareAttr(context, globalOp, acc::DataClause::acc_copyin);
427
428 // TODO: May need to create the global constructor which does the mapping
429 // action. It is not yet clear if this is needed yet (since the globals
430 // might just end up in the GPU image without requiring mapping via
431 // runtime).
432 }
433 }
434};
435
436} // namespace
b getContext())
MLIRContext is the top-level object for a collection of MLIR operations.
Definition MLIRContext.h:63
Set of flags used to control the behavior of the various IR print methods (e.g.
OpPrintingFlags & skipRegions(bool skip=true)
Skip printing regions.
Operation is the basic unit of execution within MLIR.
Definition Operation.h:88
void setAttr(StringAttr name, Attribute value)
If the an attribute exists with the specified name, change it to the new value.
Definition Operation.h:608
void print(raw_ostream &os, const OpPrintingFlags &flags={})
std::enable_if_t< llvm::function_traits< std::decay_t< FnT > >::num_args==1, RetT > walk(FnT &&callback)
Walk the operation by calling the callback for each nested operation (including this one),...
Definition Operation.h:823
This class contains a list of basic blocks and a link to the parent operation it is attached to.
Definition Region.h:26
RetT walk(FnT &&callback)
Walk all nested operations, blocks or regions (including this region), depending on the type of callb...
Definition Region.h:296
This class represents a specific symbol use.
Operation * getUser() const
Return the operation user of this symbol reference.
This class allows for representing and managing the symbol table used by operations with the 'SymbolT...
Definition SymbolTable.h:24
Operation * lookup(StringRef name) const
Look up a symbol with the specified name, returning null if no such name exists.
static Operation * lookupNearestSymbolFrom(Operation *from, StringAttr symbol)
Returns the operation registered with the given symbol name within the closest parent operation of,...
This class represents an instance of an SSA value in the MLIR system, representing a computable value...
Definition Value.h:96
bool isValidSymbolUse(Operation *user, SymbolRefAttr symbol, Operation **definingOpPtr=nullptr)
Check if a symbol use is valid for use in an OpenACC region.
#define ACC_COMPUTE_CONSTRUCT_OPS
Definition OpenACC.h:62
bool isAccRoutine(mlir::Operation *op)
Used to check whether the current operation is marked with acc routine.
Definition OpenACC.h:194
bool isSpecializedAccRoutine(mlir::Operation *op)
Used to check whether this is a specialized accelerator version of acc routine function.
Definition OpenACC.h:200
static constexpr StringLiteral getDeclareAttrName()
Used to obtain the attribute name for declare.
Definition OpenACC.h:176
Include the generated interface declarations.
llvm::TypeSwitch< T, ResultT > TypeSwitch
Definition LLVM.h:139