MLIR  19.0.0git
Classes | Functions
mlir::amdgpu Namespace Reference

Classes

struct  Chipset
 

Functions

void registerTransformDialectExtension (DialectRegistry &registry)
 
void populateAmdgpuEmulateAtomicsPatterns (ConversionTarget &target, RewritePatternSet &patterns, Chipset chipset)
 
LogicalResult optimizeSharedMemoryReadsAndWrites (Operation *parentOp, Value memrefValue, int64_t sharedMemoryLineSizeBytes, int64_t defaultVectorSizeBits)
 Passes. More...
 
std::optional< LogicalResultoptimizeSharedMemoryReadsAndWritesOp (func::FuncOp funcOp, int64_t sharedMemoryLineSizeBytes, int64_t defaultVectorSizeBits)
 
std::optional< Operation::operand_rangegetIndices (Operation *op)
 Get and set the indices that the given load/store operation is operating on. More...
 
void setIndices (Operation *op, ArrayRef< Value > indices)
 

Function Documentation

◆ getIndices()

std::optional< Operation::operand_range > mlir::amdgpu::getIndices ( Operation op)

Get and set the indices that the given load/store operation is operating on.

Preconditions:

  • The Op must have memory affects.
  • Considers memref::LoadOp, vector::LoadOp, and vector::TransferReadOp.
  • Considers memref::StoreOp, vector::StoreOp, and vector::TransferWriteOp.
  • Excludes subview op.

Definition at line 10 of file Utils.cpp.

Referenced by mlirSparseElementsAttrGetIndices(), and optimizeSharedMemoryReadsAndWrites().

◆ optimizeSharedMemoryReadsAndWrites()

LogicalResult mlir::amdgpu::optimizeSharedMemoryReadsAndWrites ( Operation parentOp,
Value  memrefValue,
int64_t  sharedMemoryLineSizeBytes,
int64_t  defaultVectorSizeBits 
)

Passes.

Optimizes vectorized accesses to a shared memory buffer specified by memrefValue. This transformation assumes the following: 1) All relevant accesses to memrefValue are contained with parentOp. 2) The function will fail precondition checks if any subviews are taken of memrefValue. All reads/writes to memrefValue should occur through memrefValue directly.

Shared memory bank conflicts occur when multiple threads attempt to read or write locations assigned to the same shared memory bank. For 2^N byte vectorized accesses, we need to be concerned with conflicts among threads identified as (tid) -> tid.floordiv(2^{7-N}). As such, this transformation changes any indexed memory access (vector.load, memref.load, etc) such that the final dimension's index value is permuted such that newColIndex = oldColIndex % vectorSize + perm[rowIndex](oldColIndex/vectorSize, rowIndex) where rowIndex is the index for the second-to last dimension and perm[rowIndex] is a permutation function that depends on the row Index. The permutation function is chosen to ensure that sequential distributed+vectorized reads/writes down a single dimension of the memref have minimal conflicts.

Definition at line 149 of file OptimizeSharedMemory.cpp.

References mlir::failed(), mlir::failure(), mlir::Operation::getContext(), getIndices(), mlir::Operation::getLoc(), getShmReadAndWriteOps(), mlir::Value::getType(), setIndices(), mlir::OpBuilder::setInsertionPoint(), mlir::success(), transformIndices(), and mlir::Operation::walk().

Referenced by optimizeSharedMemoryReadsAndWritesOp().

◆ optimizeSharedMemoryReadsAndWritesOp()

std::optional< LogicalResult > mlir::amdgpu::optimizeSharedMemoryReadsAndWritesOp ( func::FuncOp  funcOp,
int64_t  sharedMemoryLineSizeBytes,
int64_t  defaultVectorSizeBits 
)

◆ populateAmdgpuEmulateAtomicsPatterns()

void mlir::amdgpu::populateAmdgpuEmulateAtomicsPatterns ( ConversionTarget target,
RewritePatternSet patterns,
Chipset  chipset 
)

◆ registerTransformDialectExtension()

void mlir::amdgpu::registerTransformDialectExtension ( DialectRegistry registry)

◆ setIndices()

void mlir::amdgpu::setIndices ( Operation op,
ArrayRef< Value indices 
)

Definition at line 26 of file Utils.cpp.

Referenced by optimizeSharedMemoryReadsAndWrites().