|
MLIR 22.0.0git
|
Helper to create the base Hopper-specific operations that are reused in various other places. More...
Public Member Functions | |
| HopperBuilder (RewriterBase &rewriter, Location loc) | |
| TypedValue< MBarrierGroupType > | buildAndInitBarrierInSharedMemory (OpFoldResult numThreads) |
| TypedValue< TensorMapDescriptorType > | buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp) |
| Create tma descriptor op to initiate transfer from global to shared memory. | |
| OpFoldResult | buildTmaAsyncLoad (TypedValue< TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps) |
| Build a tma load from global memory to shared memory using barrier to synchronize. | |
| void | buildBarrierArriveTx (TypedValue< MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes) |
| SmallVector< Operation * > | buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< TensorMapDescriptorType > > globalDescriptors, ArrayRef< TypedValue< MemRefType > > sharedMemBuffers, TypedValue< MBarrierGroupType > barrier) |
| If threadIdx.x == 0 does TMA request + wait, else just wait. | |
| void | buildTryWaitParity (TypedValue< MBarrierGroupType > barrier) |
Public Attributes | |
| RewriterBase & | rewriter |
| Location | loc |
Helper to create the base Hopper-specific operations that are reused in various other places.
Definition at line 836 of file NVGPUTransformOps.cpp.
|
inline |
Definition at line 837 of file NVGPUTransformOps.cpp.
Referenced by CopyBuilder::CopyBuilder().
| TypedValue< MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory | ( | OpFoldResult | numThreads | ) |
Definition at line 917 of file NVGPUTransformOps.cpp.
References mlir::arith::ConstantIndexOp::create(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), loc, and rewriter.
Referenced by CopyBuilder::rewrite().
| void HopperBuilder::buildBarrierArriveTx | ( | TypedValue< MBarrierGroupType > | barrier, |
| ArrayRef< OpFoldResult > | sizes ) |
Definition at line 982 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeSum(), mlir::arith::ConstantIndexOp::create(), mlir::getValueOrCreateConstantIndexOp(), loc, mlir::affine::makeComposedFoldedAffineApply(), and rewriter.
Referenced by buildPredicateLoadsOnThread0().
| TypedValue< TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor | ( | TypedValue< MemRefType > | memref, |
| gpu::LaunchOp | launchOp ) |
Create tma descriptor op to initiate transfer from global to shared memory.
This must be done before the launch op, on the host.
Definition at line 932 of file NVGPUTransformOps.cpp.
References mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), loc, rewriter, and mlir::MemRefType::Builder::setMemorySpace().
Referenced by CopyBuilder::rewrite().
| SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0 | ( | ArrayRef< TypedValue< TensorMapDescriptorType > > | globalDescriptors, |
| ArrayRef< TypedValue< MemRefType > > | sharedMemBuffers, | ||
| TypedValue< MBarrierGroupType > | barrier ) |
If threadIdx.x == 0 does TMA request + wait, else just wait.
Return the operation that performs the transfer on thread0.
Definition at line 872 of file NVGPUTransformOps.cpp.
References buildBarrierArriveTx(), buildTmaAsyncLoad(), mlir::arith::ConstantIndexOp::create(), mlir::getAsIndexOpFoldResult(), loc, and rewriter.
Referenced by CopyBuilder::rewrite().
| OpFoldResult HopperBuilder::buildTmaAsyncLoad | ( | TypedValue< TensorMapDescriptorType > | globalDesc, |
| TypedValue< MemRefType > | sharedMemref, | ||
| TypedValue< MBarrierGroupType > | barrier, | ||
| SmallVectorImpl< Operation * > & | loadOps ) |
Build a tma load from global memory to shared memory using barrier to synchronize.
Return the number of bytes that will be transferred.
Definition at line 961 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::arith::ConstantIndexOp::create(), mlir::memref::getMixedSizes(), loc, mlir::affine::makeComposedFoldedAffineApply(), and rewriter.
Referenced by buildPredicateLoadsOnThread0().
| void HopperBuilder::buildTryWaitParity | ( | TypedValue< MBarrierGroupType > | barrier | ) |
Definition at line 997 of file NVGPUTransformOps.cpp.
References mlir::arith::ConstantIndexOp::create(), loc, and rewriter.
Referenced by CopyBuilder::rewrite().
| Location HopperBuilder::loc |
Definition at line 869 of file NVGPUTransformOps.cpp.
Referenced by buildAndInitBarrierInSharedMemory(), buildBarrierArriveTx(), buildGlobalMemRefDescriptor(), buildPredicateLoadsOnThread0(), buildTmaAsyncLoad(), buildTryWaitParity(), CopyBuilder::CopyBuilder(), HopperBuilder(), and CopyBuilder::rewrite().
| RewriterBase& HopperBuilder::rewriter |
Definition at line 868 of file NVGPUTransformOps.cpp.
Referenced by buildAndInitBarrierInSharedMemory(), buildBarrierArriveTx(), buildGlobalMemRefDescriptor(), buildPredicateLoadsOnThread0(), buildTmaAsyncLoad(), buildTryWaitParity(), CopyBuilder::CopyBuilder(), HopperBuilder(), and CopyBuilder::rewrite().