MLIR  15.0.0git
Functions
MemoryPromotion.cpp File Reference
#include "mlir/Dialect/GPU/Transforms/MemoryPromotion.h"
#include "mlir/Dialect/Affine/LoopUtils.h"
#include "mlir/Dialect/Arithmetic/IR/Arithmetic.h"
#include "mlir/Dialect/GPU/IR/GPUDialect.h"
#include "mlir/Dialect/MemRef/IR/MemRef.h"
#include "mlir/Dialect/SCF/IR/SCF.h"
#include "mlir/IR/ImplicitLocOpBuilder.h"
#include "mlir/Pass/Pass.h"
+ Include dependency graph for MemoryPromotion.cpp:

Go to the source code of this file.

Functions

static void insertCopyLoops (ImplicitLocOpBuilder &b, Value from, Value to)
 Emits the (imperfect) loop nest performing the copy between "from" and "to" values using the bounds derived from the "from" value. More...
 
static void insertCopies (Region &region, Location loc, Value from, Value to)
 Emits the loop nests performing the copy to the designated location in the beginning of the region, and from the designated location immediately before the terminator of the first block of the region. More...
 

Function Documentation

◆ insertCopies()

static void insertCopies ( Region region,
Location  loc,
Value  from,
Value  to 
)
static

Emits the loop nests performing the copy to the designated location in the beginning of the region, and from the designated location immediately before the terminator of the first block of the region.

The region is expected to have one block. This boils down to the following structure

^bb(...): <loop-bound-computation> for arg0 = ... to ... step ... { ... for argN = <thread-id-x> to ... step <block-dim-x> { %0 = load from[arg0, ..., argN] store %0, to[arg0, ..., argN] } ... } gpu.barrier <... original body ...> gpu.barrier for arg0 = ... to ... step ... { ... for argN = <thread-id-x> to ... step <block-dim-x> { %1 = load to[arg0, ..., argN] store %1, from[arg0, ..., argN] } ... }

Inserts the barriers unconditionally since different threads may be copying values and reading them. An analysis would be required to eliminate barriers in case where value is only used by the thread that copies it. Both copies are inserted unconditionally, an analysis would be required to only copy live-in and live-out values when necessary. This copies the entire memref pointed to by "from". In case a smaller block would be sufficient, the caller can create a subview of the memref and promote it instead.

Definition at line 123 of file MemoryPromotion.cpp.

References mlir::ImplicitLocOpBuilder::atBlockBegin(), mlir::Block::back(), mlir::Type::cast(), mlir::Type::dyn_cast(), mlir::Region::front(), mlir::Value::getLoc(), mlir::Value::getType(), insertCopyLoops(), mlir::promoteToWorkgroupMemory(), mlir::Value::replaceAllUsesWith(), and value.

◆ insertCopyLoops()

static void insertCopyLoops ( ImplicitLocOpBuilder b,
Value  from,
Value  to 
)
static

Emits the (imperfect) loop nest performing the copy between "from" and "to" values using the bounds derived from the "from" value.

Emits at least GPUDialect::getNumWorkgroupDimensions() loops, completing the nest with single-iteration loops. Maps the innermost loops to thread dimensions, in reverse order to enable access coalescing in the innermost loop.

Definition at line 32 of file MemoryPromotion.cpp.

References mlir::scf::buildLoopNest(), mlir::Type::cast(), mlir::ImplicitLocOpBuilder::create(), mlir::ImplicitLocOpBuilder::createOrFold(), mlir::detail::enumerate(), mlir::Builder::getIndexType(), mlir::ImplicitLocOpBuilder::getLoc(), mlir::Region::getParentOp(), mlir::Value::getParentRegion(), mlir::Value::getType(), and mlir::mapLoopToProcessorIds().

Referenced by insertCopies().