MLIR
19.0.0git
|
Helper to create the base Hopper-specific operations that are reused in various other places. More...
Public Member Functions | |
HopperBuilder (RewriterBase &rewriter, Location loc) | |
TypedValue< nvgpu::MBarrierGroupType > | buildAndInitBarrierInSharedMemory (OpFoldResult numThreads) |
TypedValue< nvgpu::TensorMapDescriptorType > | buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp) |
Create tma descriptor op to initiate transfer from global to shared memory. More... | |
OpFoldResult | buildTmaAsyncLoad (TypedValue< nvgpu::TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< nvgpu::MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps) |
Build a tma load from global memory to shared memory using barrier to synchronize. More... | |
void | buildBarrierArriveTx (TypedValue< nvgpu::MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes) |
SmallVector< Operation * > | buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< nvgpu::MBarrierGroupType > barrier) |
If threadIdx.x == 0 does TMA request + wait, else just wait. More... | |
void | buildTryWaitParity (TypedValue< nvgpu::MBarrierGroupType > barrier) |
Public Attributes | |
RewriterBase & | rewriter |
Location | loc |
Helper to create the base Hopper-specific operations that are reused in various other places.
Definition at line 849 of file NVGPUTransformOps.cpp.
|
inline |
Definition at line 850 of file NVGPUTransformOps.cpp.
TypedValue< nvgpu::MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory | ( | OpFoldResult | numThreads | ) |
Definition at line 931 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::get(), mlir::Builder::getContext(), getSharedAddressSpaceAttribute(), and mlir::getValueOrCreateConstantIndexOp().
void HopperBuilder::buildBarrierArriveTx | ( | TypedValue< nvgpu::MBarrierGroupType > | barrier, |
ArrayRef< OpFoldResult > | sizes | ||
) |
Definition at line 995 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeSum(), mlir::OpBuilder::create(), mlir::Builder::getContext(), mlir::getValueOrCreateConstantIndexOp(), and mlir::affine::makeComposedFoldedAffineApply().
TypedValue< nvgpu::TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor | ( | TypedValue< MemRefType > | memref, |
gpu::LaunchOp | launchOp | ||
) |
Create tma descriptor op to initiate transfer from global to shared memory.
This must be done before the launch op, on the host.
Definition at line 945 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::get(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), mlir::OpBuilder::setInsertionPoint(), and mlir::MemRefType::Builder::setMemorySpace().
SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0 | ( | ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> | globalDescriptors, |
ArrayRef< TypedValue< MemRefType >> | sharedMemBuffers, | ||
TypedValue< nvgpu::MBarrierGroupType > | barrier | ||
) |
If threadIdx.x == 0 does TMA request + wait, else just wait.
Return the operation that performs the transfer on thread0.
Definition at line 886 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::getAsIndexOpFoldResult(), and mlir::Builder::getContext().
OpFoldResult HopperBuilder::buildTmaAsyncLoad | ( | TypedValue< nvgpu::TensorMapDescriptorType > | globalDesc, |
TypedValue< MemRefType > | sharedMemref, | ||
TypedValue< nvgpu::MBarrierGroupType > | barrier, | ||
SmallVectorImpl< Operation * > & | loadOps | ||
) |
Build a tma load from global memory to shared memory using barrier
to synchronize.
Return the number of bytes that will be transferred.
Definition at line 973 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::OpBuilder::create(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), and mlir::affine::makeComposedFoldedAffineApply().
void HopperBuilder::buildTryWaitParity | ( | TypedValue< nvgpu::MBarrierGroupType > | barrier | ) |
Definition at line 1011 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), and mlir::Builder::getI1Type().
Location HopperBuilder::loc |
Definition at line 883 of file NVGPUTransformOps.cpp.
RewriterBase& HopperBuilder::rewriter |
Definition at line 882 of file NVGPUTransformOps.cpp.