MLIR
20.0.0git
|
Helper to create the base Hopper-specific operations that are reused in various other places. More...
Public Member Functions | |
HopperBuilder (RewriterBase &rewriter, Location loc) | |
TypedValue< nvgpu::MBarrierGroupType > | buildAndInitBarrierInSharedMemory (OpFoldResult numThreads) |
TypedValue< nvgpu::TensorMapDescriptorType > | buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp) |
Create tma descriptor op to initiate transfer from global to shared memory. More... | |
OpFoldResult | buildTmaAsyncLoad (TypedValue< nvgpu::TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< nvgpu::MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps) |
Build a tma load from global memory to shared memory using barrier to synchronize. More... | |
void | buildBarrierArriveTx (TypedValue< nvgpu::MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes) |
SmallVector< Operation * > | buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< nvgpu::MBarrierGroupType > barrier) |
If threadIdx.x == 0 does TMA request + wait, else just wait. More... | |
void | buildTryWaitParity (TypedValue< nvgpu::MBarrierGroupType > barrier) |
Public Attributes | |
RewriterBase & | rewriter |
Location | loc |
Helper to create the base Hopper-specific operations that are reused in various other places.
Definition at line 855 of file NVGPUTransformOps.cpp.
|
inline |
Definition at line 856 of file NVGPUTransformOps.cpp.
TypedValue< nvgpu::MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory | ( | OpFoldResult | numThreads | ) |
Definition at line 937 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::get(), mlir::Builder::getContext(), getSharedAddressSpaceAttribute(), and mlir::getValueOrCreateConstantIndexOp().
void HopperBuilder::buildBarrierArriveTx | ( | TypedValue< nvgpu::MBarrierGroupType > | barrier, |
ArrayRef< OpFoldResult > | sizes | ||
) |
Definition at line 1001 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeSum(), mlir::OpBuilder::create(), mlir::Builder::getContext(), mlir::getValueOrCreateConstantIndexOp(), and mlir::affine::makeComposedFoldedAffineApply().
TypedValue< nvgpu::TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor | ( | TypedValue< MemRefType > | memref, |
gpu::LaunchOp | launchOp | ||
) |
Create tma descriptor op to initiate transfer from global to shared memory.
This must be done before the launch op, on the host.
Definition at line 951 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::get(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), mlir::OpBuilder::setInsertionPoint(), and mlir::MemRefType::Builder::setMemorySpace().
SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0 | ( | ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> | globalDescriptors, |
ArrayRef< TypedValue< MemRefType >> | sharedMemBuffers, | ||
TypedValue< nvgpu::MBarrierGroupType > | barrier | ||
) |
If threadIdx.x == 0 does TMA request + wait, else just wait.
Return the operation that performs the transfer on thread0.
Definition at line 892 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), mlir::getAsIndexOpFoldResult(), and mlir::Builder::getContext().
OpFoldResult HopperBuilder::buildTmaAsyncLoad | ( | TypedValue< nvgpu::TensorMapDescriptorType > | globalDesc, |
TypedValue< MemRefType > | sharedMemref, | ||
TypedValue< nvgpu::MBarrierGroupType > | barrier, | ||
SmallVectorImpl< Operation * > & | loadOps | ||
) |
Build a tma load from global memory to shared memory using barrier
to synchronize.
Return the number of bytes that will be transferred.
Definition at line 979 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::OpBuilder::create(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), and mlir::affine::makeComposedFoldedAffineApply().
void HopperBuilder::buildTryWaitParity | ( | TypedValue< nvgpu::MBarrierGroupType > | barrier | ) |
Definition at line 1017 of file NVGPUTransformOps.cpp.
References mlir::OpBuilder::create(), and mlir::Builder::getI1Type().
Location HopperBuilder::loc |
Definition at line 889 of file NVGPUTransformOps.cpp.
RewriterBase& HopperBuilder::rewriter |
Definition at line 888 of file NVGPUTransformOps.cpp.