MLIR 22.0.0git
ACCImplicitDeclare.cpp
Go to the documentation of this file.
1//===- ACCImplicitDeclare.cpp ---------------------------------------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass applies implicit `acc declare` actions to global variables
10// referenced in OpenACC compute regions and routine functions.
11//
12// Overview:
13// ---------
14// Global references in an acc regions (for globals not marked with `acc
15// declare` by the user) can be handled in one of two ways:
16// - Mapped through data clauses
17// - Implicitly marked as `acc declare` (this pass)
18//
19// Thus, the OpenACC specification focuses solely on implicit data mapping rules
20// whose implementation is captured in `ACCImplicitData` pass.
21//
22// However, it is both advantageous and required for certain cases to
23// use implicit `acc declare` instead:
24// - Any functions that are implicitly marked as `acc routine` through
25// `ACCImplicitRoutine` may reference globals. Since data mapping
26// is only possible for compute regions, such globals can only be
27// made available on device through `acc declare`.
28// - Compiler can generate and use globals for cases needed in IR
29// representation such as type descriptors or various names needed for
30// runtime calls and error reporting - such cases often are introduced
31// after a frontend semantic checking is done since it is related to
32// implementation detail. Thus, such compiler generated globals would
33// not have been visible for a user to mark with `acc declare`.
34// - Constant globals such as filename strings or data initialization values
35// are values that do not get mutated but are still needed for appropriate
36// runtime execution. If a kernel is launched 1000 times, it is not a
37// good idea to map such a global 1000 times. Therefore, such globals
38// benefit from being marked with `acc declare`.
39//
40// This pass automatically
41// marks global variables with the `acc.declare` attribute when they are
42// referenced in OpenACC compute constructs or routine functions and meet
43// the criteria noted above, ensuring
44// they are properly handled for device execution.
45//
46// The pass performs two main optimizations:
47//
48// 1. Hoisting: For non-constant globals referenced in compute regions, the
49// pass hoists the address-of operation out of the region when possible,
50// allowing them to be implicitly mapped through normal data clause
51// mechanisms rather than requiring declare marking.
52//
53// 2. Declaration: For globals that must be available on the device (constants,
54// globals in routines, globals in recipe operations), the pass adds the
55// `acc.declare` attribute with the copyin data clause.
56//
57// Requirements:
58// -------------
59// To use this pass in a pipeline, the following requirements must be met:
60//
61// 1. Operation Interface Implementation: Operations that compute addresses
62// of global variables must implement the `acc::AddressOfGlobalOpInterface`
63// and those that represent globals must implement the
64// `acc::GlobalOpInterface`. Additionally, any operations that indirectly
65// access globals must implement the `acc::IndirectGlobalAccessOpInterface`.
66//
67// 2. Analysis Registration (Optional): If custom behavior is needed for
68// determining if a symbol use is valid within GPU regions, the dialect
69// should pre-register the `acc::OpenACCSupport` analysis.
70//
71// Examples:
72// ---------
73//
74// Example 1: Non-constant global in compute region (hoisted)
75//
76// Before:
77// memref.global @g_scalar : memref<f32> = dense<0.0>
78// func.func @test() {
79// acc.serial {
80// %addr = memref.get_global @g_scalar : memref<f32>
81// %val = memref.load %addr[] : memref<f32>
82// acc.yield
83// }
84// }
85//
86// After:
87// memref.global @g_scalar : memref<f32> = dense<0.0>
88// func.func @test() {
89// %addr = memref.get_global @g_scalar : memref<f32>
90// acc.serial {
91// %val = memref.load %addr[] : memref<f32>
92// acc.yield
93// }
94// }
95//
96// Example 2: Constant global in compute region (declared)
97//
98// Before:
99// memref.global constant @g_const : memref<f32> = dense<1.0>
100// func.func @test() {
101// acc.serial {
102// %addr = memref.get_global @g_const : memref<f32>
103// %val = memref.load %addr[] : memref<f32>
104// acc.yield
105// }
106// }
107//
108// After:
109// memref.global constant @g_const : memref<f32> = dense<1.0>
110// {acc.declare = #acc.declare<dataClause = acc_copyin>}
111// func.func @test() {
112// acc.serial {
113// %addr = memref.get_global @g_const : memref<f32>
114// %val = memref.load %addr[] : memref<f32>
115// acc.yield
116// }
117// }
118//
119// Example 3: Global in acc routine (declared)
120//
121// Before:
122// memref.global @g_data : memref<f32> = dense<0.0>
123// acc.routine @routine_0 func(@device_func)
124// func.func @device_func() attributes {acc.routine_info = ...} {
125// %addr = memref.get_global @g_data : memref<f32>
126// %val = memref.load %addr[] : memref<f32>
127// }
128//
129// After:
130// memref.global @g_data : memref<f32> = dense<0.0>
131// {acc.declare = #acc.declare<dataClause = acc_copyin>}
132// acc.routine @routine_0 func(@device_func)
133// func.func @device_func() attributes {acc.routine_info = ...} {
134// %addr = memref.get_global @g_data : memref<f32>
135// %val = memref.load %addr[] : memref<f32>
136// }
137//
138// Example 4: Global in private recipe (declared if recipe is used)
140// Before:
141// memref.global @g_init : memref<f32> = dense<0.0>
142// acc.private.recipe @priv_recipe : memref<f32> init {
143// ^bb0(%arg0: memref<f32>):
144// %alloc = memref.alloc() : memref<f32>
145// %global = memref.get_global @g_init : memref<f32>
146// %val = memref.load %global[] : memref<f32>
147// memref.store %val, %alloc[] : memref<f32>
148// acc.yield %alloc : memref<f32>
149// } destroy { ... }
150// func.func @test() {
151// %var = memref.alloc() : memref<f32>
152// %priv = acc.private varPtr(%var : memref<f32>)
153// recipe(@priv_recipe) -> memref<f32>
154// acc.parallel private(%priv : memref<f32>) { ... }
155// }
156//
157// After:
158// memref.global @g_init : memref<f32> = dense<0.0>
159// {acc.declare = #acc.declare<dataClause = acc_copyin>}
160// acc.private.recipe @priv_recipe : memref<f32> init {
161// ^bb0(%arg0: memref<f32>):
162// %alloc = memref.alloc() : memref<f32>
163// %global = memref.get_global @g_init : memref<f32>
164// %val = memref.load %global[] : memref<f32>
165// memref.store %val, %alloc[] : memref<f32>
166// acc.yield %alloc : memref<f32>
167// } destroy { ... }
168// func.func @test() {
169// %var = memref.alloc() : memref<f32>
170// %priv = acc.private varPtr(%var : memref<f32>)
171// recipe(@priv_recipe) -> memref<f32>
172// acc.parallel private(%priv : memref<f32>) { ... }
173// }
174//
175//===----------------------------------------------------------------------===//
176
178
184#include "mlir/IR/Operation.h"
185#include "mlir/IR/Value.h"
187#include "llvm/ADT/SmallVector.h"
188#include "llvm/ADT/TypeSwitch.h"
189
190namespace mlir {
191namespace acc {
192#define GEN_PASS_DEF_ACCIMPLICITDECLARE
193#include "mlir/Dialect/OpenACC/Transforms/Passes.h.inc"
194} // namespace acc
195} // namespace mlir
196
197#define DEBUG_TYPE "acc-implicit-declare"
198
199using namespace mlir;
200
201namespace {
202
203using GlobalOpSetT = llvm::SmallSetVector<Operation *, 16>;
204
205/// Checks whether a use of the requested `globalOp` should be considered
206/// for hoisting out of acc region due to avoid `acc declare`ing something
207/// that instead should be implicitly mapped.
208static bool isGlobalUseCandidateForHoisting(Operation *globalOp,
209 Operation *user,
210 SymbolRefAttr symbol,
211 acc::OpenACCSupport &accSupport) {
212 // This symbol is valid in GPU region. This means semantics
213 // would change if moved to host - therefore it is not a candidate.
214 if (accSupport.isValidSymbolUse(user, symbol))
215 return false;
216
217 bool isConstant = false;
218 bool isFunction = false;
219
220 if (auto globalVarOp = dyn_cast<acc::GlobalVariableOpInterface>(globalOp))
221 isConstant = globalVarOp.isConstant();
222
223 if (isa<FunctionOpInterface>(globalOp))
224 isFunction = true;
225
226 // Constants should be kept in device code to ensure they are duplicated.
227 // Function references should be kept in device code to ensure their device
228 // addresses are computed. Everything else should be hoisted since we already
229 // proved they are not valid symbols in GPU region.
230 return !isConstant && !isFunction;
231}
232
233/// Checks whether it is valid to use acc.declare marking on the global.
234bool isValidForAccDeclare(Operation *globalOp) {
235 // For functions - we use acc.routine marking instead.
236 return !isa<FunctionOpInterface>(globalOp);
237}
238
239/// Checks whether a recipe operation has meaningful use of its symbol that
240/// justifies processing its regions for global references. Returns false if:
241/// 1. The recipe has no symbol uses at all, or
242/// 2. The only symbol use is the recipe's own symbol definition
243template <typename RecipeOpT>
244static bool hasRelevantRecipeUse(RecipeOpT &recipeOp, ModuleOp &mod) {
245 std::optional<SymbolTable::UseRange> symbolUses = recipeOp.getSymbolUses(mod);
246
247 // No recipe symbol uses.
248 if (!symbolUses.has_value() || symbolUses->empty())
249 return false;
250
251 // If more than one use, assume it's used.
252 auto begin = symbolUses->begin();
253 auto end = symbolUses->end();
254 if (begin != end && std::next(begin) != end)
255 return true;
256
257 // If single use, check if the use is the recipe itself.
258 const SymbolTable::SymbolUse &use = *symbolUses->begin();
259 return use.getUser() != recipeOp.getOperation();
260}
261
262// Hoists addr_of operations for non-constant globals out of OpenACC regions.
263// This way - they are implicitly mapped instead of being considered for
264// implicit declare.
265template <typename AccConstructT>
266static void hoistNonConstantDirectUses(AccConstructT accOp,
267 acc::OpenACCSupport &accSupport) {
268 accOp.walk([&](acc::AddressOfGlobalOpInterface addrOfOp) {
269 SymbolRefAttr symRef = addrOfOp.getSymbol();
270 if (symRef) {
271 Operation *globalOp =
272 SymbolTable::lookupNearestSymbolFrom(addrOfOp, symRef);
273 if (isGlobalUseCandidateForHoisting(globalOp, addrOfOp, symRef,
274 accSupport)) {
275 addrOfOp->moveBefore(accOp);
276 LLVM_DEBUG(
277 llvm::dbgs() << "Hoisted:\n\t" << addrOfOp << "\n\tfrom:\n\t";
278 accOp->print(llvm::dbgs(),
279 OpPrintingFlags{}.skipRegions().enableDebugInfo());
280 llvm::dbgs() << "\n");
281 }
282 }
283 });
284}
285
286// Collects the globals referenced in a device region
287static void collectGlobalsFromDeviceRegion(Region &region,
288 GlobalOpSetT &globals,
289 acc::OpenACCSupport &accSupport,
290 SymbolTable &symTab) {
291 region.walk([&](Operation *op) {
292 // 1) Only consider relevant operations which use symbols
293 auto addrOfOp = dyn_cast<acc::AddressOfGlobalOpInterface>(op);
294 if (addrOfOp) {
295 SymbolRefAttr symRef = addrOfOp.getSymbol();
296 // 2) Found an operation which uses the symbol. Next determine if it
297 // is a candidate for `acc declare`. Some of the criteria considered
298 // is whether this symbol is not already a device one (either because
299 // acc declare is already used or this is a CUF global).
300 Operation *globalOp = nullptr;
301 bool isCandidate = !accSupport.isValidSymbolUse(op, symRef, &globalOp);
302 // 3) Add the candidate to the set of globals to be `acc declare`d.
303 if (isCandidate && globalOp && isValidForAccDeclare(globalOp))
304 globals.insert(globalOp);
305 } else if (auto indirectAccessOp =
306 dyn_cast<acc::IndirectGlobalAccessOpInterface>(op)) {
307 // Process operations that indirectly access globals
309 indirectAccessOp.getReferencedSymbols(symbols, &symTab);
310 for (SymbolRefAttr symRef : symbols)
311 if (Operation *globalOp = symTab.lookup(symRef.getLeafReference()))
312 if (isValidForAccDeclare(globalOp))
313 globals.insert(globalOp);
314 }
315 });
316}
317
318// Adds the declare attribute to the operation `op`.
319static void addDeclareAttr(MLIRContext *context, Operation *op,
320 acc::DataClause clause) {
322 acc::DeclareAttr::get(context,
323 acc::DataClauseAttr::get(context, clause)));
324}
325
326// This pass applies implicit declare actions for globals referenced in
327// OpenACC compute and routine regions.
328class ACCImplicitDeclare
329 : public acc::impl::ACCImplicitDeclareBase<ACCImplicitDeclare> {
330public:
331 using ACCImplicitDeclareBase<ACCImplicitDeclare>::ACCImplicitDeclareBase;
332
333 void runOnOperation() override {
334 ModuleOp mod = getOperation();
335 MLIRContext *context = &getContext();
336 acc::OpenACCSupport &accSupport = getAnalysis<acc::OpenACCSupport>();
337
338 // 1) Start off by hoisting any AddressOf operations out of acc region
339 // for any cases we do not want to `acc declare`. This is because we can
340 // rely on implicit data mapping in majority of cases without uselessly
341 // polluting the device globals.
342 mod.walk([&](Operation *op) {
344 .Case<ACC_COMPUTE_CONSTRUCT_OPS, acc::KernelEnvironmentOp>(
345 [&](auto accOp) {
346 hoistNonConstantDirectUses(accOp, accSupport);
347 });
348 });
349
350 // 2) Collect global symbols which need to be `acc declare`d. Do it for
351 // compute regions, acc routine, and existing globals with the declare
352 // attribute.
353 SymbolTable symTab(mod);
354 GlobalOpSetT globalsToAccDeclare;
355 mod.walk([&](Operation *op) {
357 .Case<ACC_COMPUTE_CONSTRUCT_OPS, acc::KernelEnvironmentOp>(
358 [&](auto accOp) {
359 collectGlobalsFromDeviceRegion(
360 accOp.getRegion(), globalsToAccDeclare, accSupport, symTab);
361 })
362 .Case<FunctionOpInterface>([&](auto func) {
363 if ((acc::isAccRoutine(func) ||
365 !func.isExternal())
366 collectGlobalsFromDeviceRegion(func.getFunctionBody(),
367 globalsToAccDeclare, accSupport,
368 symTab);
369 })
370 .Case<acc::GlobalVariableOpInterface>([&](auto globalVarOp) {
371 if (globalVarOp->getAttr(acc::getDeclareAttrName()))
372 if (Region *initRegion = globalVarOp.getInitRegion())
373 collectGlobalsFromDeviceRegion(*initRegion, globalsToAccDeclare,
374 accSupport, symTab);
375 })
376 .Case<acc::PrivateRecipeOp>([&](auto privateRecipe) {
377 if (hasRelevantRecipeUse(privateRecipe, mod)) {
378 collectGlobalsFromDeviceRegion(privateRecipe.getInitRegion(),
379 globalsToAccDeclare, accSupport,
380 symTab);
381 collectGlobalsFromDeviceRegion(privateRecipe.getDestroyRegion(),
382 globalsToAccDeclare, accSupport,
383 symTab);
384 }
385 })
386 .Case<acc::FirstprivateRecipeOp>([&](auto firstprivateRecipe) {
387 if (hasRelevantRecipeUse(firstprivateRecipe, mod)) {
388 collectGlobalsFromDeviceRegion(firstprivateRecipe.getInitRegion(),
389 globalsToAccDeclare, accSupport,
390 symTab);
391 collectGlobalsFromDeviceRegion(
392 firstprivateRecipe.getDestroyRegion(), globalsToAccDeclare,
393 accSupport, symTab);
394 collectGlobalsFromDeviceRegion(firstprivateRecipe.getCopyRegion(),
395 globalsToAccDeclare, accSupport,
396 symTab);
397 }
398 })
399 .Case<acc::ReductionRecipeOp>([&](auto reductionRecipe) {
400 if (hasRelevantRecipeUse(reductionRecipe, mod)) {
401 collectGlobalsFromDeviceRegion(reductionRecipe.getInitRegion(),
402 globalsToAccDeclare, accSupport,
403 symTab);
404 collectGlobalsFromDeviceRegion(
405 reductionRecipe.getCombinerRegion(), globalsToAccDeclare,
406 accSupport, symTab);
407 }
408 });
409 });
410
411 // 3) Finally, generate the appropriate declare actions needed to ensure
412 // this is considered for device global.
413 for (Operation *globalOp : globalsToAccDeclare) {
414 LLVM_DEBUG(
415 llvm::dbgs() << "Global is being `acc declare copyin`d: ";
416 globalOp->print(llvm::dbgs(),
417 OpPrintingFlags{}.skipRegions().enableDebugInfo());
418 llvm::dbgs() << "\n");
419
420 // Mark it as declare copyin.
421 addDeclareAttr(context, globalOp, acc::DataClause::acc_copyin);
422
423 // TODO: May need to create the global constructor which does the mapping
424 // action. It is not yet clear if this is needed yet (since the globals
425 // might just end up in the GPU image without requiring mapping via
426 // runtime).
427 }
428 }
429};
430
431} // namespace
b getContext())
#define ACC_COMPUTE_CONSTRUCT_OPS
Definition OpenACC.h:57
MLIRContext is the top-level object for a collection of MLIR operations.
Definition MLIRContext.h:63
Set of flags used to control the behavior of the various IR print methods (e.g.
OpPrintingFlags & skipRegions(bool skip=true)
Skip printing regions.
Operation is the basic unit of execution within MLIR.
Definition Operation.h:88
void setAttr(StringAttr name, Attribute value)
If the an attribute exists with the specified name, change it to the new value.
Definition Operation.h:582
void print(raw_ostream &os, const OpPrintingFlags &flags={})
std::enable_if_t< llvm::function_traits< std::decay_t< FnT > >::num_args==1, RetT > walk(FnT &&callback)
Walk the operation by calling the callback for each nested operation (including this one),...
Definition Operation.h:797
This class contains a list of basic blocks and a link to the parent operation it is attached to.
Definition Region.h:26
RetT walk(FnT &&callback)
Walk all nested operations, blocks or regions (including this region), depending on the type of callb...
Definition Region.h:285
This class represents a specific symbol use.
Operation * getUser() const
Return the operation user of this symbol reference.
This class allows for representing and managing the symbol table used by operations with the 'SymbolT...
Definition SymbolTable.h:24
Operation * lookup(StringRef name) const
Look up a symbol with the specified name, returning null if no such name exists.
static Operation * lookupNearestSymbolFrom(Operation *from, StringAttr symbol)
Returns the operation registered with the given symbol name within the closest parent operation of,...
bool isValidSymbolUse(Operation *user, SymbolRefAttr symbol, Operation **definingOpPtr=nullptr)
Check if a symbol use is valid for use in an OpenACC region.
bool isAccRoutine(mlir::Operation *op)
Used to check whether the current operation is marked with acc routine.
Definition OpenACC.h:189
bool isSpecializedAccRoutine(mlir::Operation *op)
Used to check whether this is a specialized accelerator version of acc routine function.
Definition OpenACC.h:195
static constexpr StringLiteral getDeclareAttrName()
Used to obtain the attribute name for declare.
Definition OpenACC.h:171
Include the generated interface declarations.
llvm::TypeSwitch< T, ResultT > TypeSwitch
Definition LLVM.h:144