MLIR
20.0.0git
|
Helper to create the tma operations corresponding to linalg::CopyOp
.
More...
Public Member Functions | |
CopyBuilder (RewriterBase &rewriter, Location loc) | |
SmallVector< Operation * > | rewrite (ArrayRef< Operation * > copyOps) |
Public Member Functions inherited from HopperBuilder | |
HopperBuilder (RewriterBase &rewriter, Location loc) | |
TypedValue< nvgpu::MBarrierGroupType > | buildAndInitBarrierInSharedMemory (OpFoldResult numThreads) |
TypedValue< nvgpu::TensorMapDescriptorType > | buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp) |
Create tma descriptor op to initiate transfer from global to shared memory. More... | |
OpFoldResult | buildTmaAsyncLoad (TypedValue< nvgpu::TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< nvgpu::MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps) |
Build a tma load from global memory to shared memory using barrier to synchronize. More... | |
void | buildBarrierArriveTx (TypedValue< nvgpu::MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes) |
SmallVector< Operation * > | buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< nvgpu::MBarrierGroupType > barrier) |
If threadIdx.x == 0 does TMA request + wait, else just wait. More... | |
void | buildTryWaitParity (TypedValue< nvgpu::MBarrierGroupType > barrier) |
Additional Inherited Members | |
Public Attributes inherited from HopperBuilder | |
RewriterBase & | rewriter |
Location | loc |
Helper to create the tma operations corresponding to linalg::CopyOp
.
Definition at line 1036 of file NVGPUTransformOps.cpp.
|
inline |
Definition at line 1037 of file NVGPUTransformOps.cpp.
SmallVector< Operation * > CopyBuilder::rewrite | ( | ArrayRef< Operation * > | copyOps | ) |
Definition at line 1043 of file NVGPUTransformOps.cpp.
References mlir::bindSymbols(), mlir::computeProduct(), mlir::RewriterBase::eraseOp(), mlir::Builder::getContext(), mlir::affine::makeComposedFoldedAffineApply(), and mlir::OpBuilder::setInsertionPoint().