|
MLIR
22.0.0git
|
Helper to create the base Hopper-specific operations that are reused in various other places. More...
Inheritance diagram for HopperBuilder:Public Member Functions | |
| HopperBuilder (RewriterBase &rewriter, Location loc) | |
| TypedValue< MBarrierGroupType > | buildAndInitBarrierInSharedMemory (OpFoldResult numThreads) |
| TypedValue< TensorMapDescriptorType > | buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp) |
| Create tma descriptor op to initiate transfer from global to shared memory. More... | |
| OpFoldResult | buildTmaAsyncLoad (TypedValue< TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps) |
Build a tma load from global memory to shared memory using barrier to synchronize. More... | |
| void | buildBarrierArriveTx (TypedValue< MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes) |
| SmallVector< Operation * > | buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< MBarrierGroupType > barrier) |
| If threadIdx.x == 0 does TMA request + wait, else just wait. More... | |
| void | buildTryWaitParity (TypedValue< MBarrierGroupType > barrier) |
Public Attributes | |
| RewriterBase & | rewriter |
| Location | loc |
Helper to create the base Hopper-specific operations that are reused in various other places.
Definition at line 848 of file NVGPUTransformOps.cpp.
|
inline |
Definition at line 849 of file NVGPUTransformOps.cpp.
| TypedValue< MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory | ( | OpFoldResult | numThreads | ) |
Definition at line 929 of file NVGPUTransformOps.cpp.
References mlir::arith::ConstantIndexOp::create(), mlir::get(), mlir::Builder::getContext(), getSharedAddressSpaceAttribute(), and mlir::getValueOrCreateConstantIndexOp().
| void HopperBuilder::buildBarrierArriveTx | ( | TypedValue< MBarrierGroupType > | barrier, |
| ArrayRef< OpFoldResult > | sizes | ||
| ) |
Definition at line 994 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeSum(), mlir::arith::ConstantIndexOp::create(), mlir::Builder::getContext(), mlir::getValueOrCreateConstantIndexOp(), and mlir::affine::makeComposedFoldedAffineApply().
| TypedValue< TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor | ( | TypedValue< MemRefType > | memref, |
| gpu::LaunchOp | launchOp | ||
| ) |
Create tma descriptor op to initiate transfer from global to shared memory.
This must be done before the launch op, on the host.
Definition at line 944 of file NVGPUTransformOps.cpp.
References mlir::get(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), mlir::OpBuilder::setInsertionPoint(), and mlir::MemRefType::Builder::setMemorySpace().
| SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0 | ( | ArrayRef< TypedValue< TensorMapDescriptorType >> | globalDescriptors, |
| ArrayRef< TypedValue< MemRefType >> | sharedMemBuffers, | ||
| TypedValue< MBarrierGroupType > | barrier | ||
| ) |
If threadIdx.x == 0 does TMA request + wait, else just wait.
Return the operation that performs the transfer on thread0.
Definition at line 884 of file NVGPUTransformOps.cpp.
References mlir::arith::ConstantIndexOp::create(), mlir::getAsIndexOpFoldResult(), and mlir::Builder::getContext().
| OpFoldResult HopperBuilder::buildTmaAsyncLoad | ( | TypedValue< TensorMapDescriptorType > | globalDesc, |
| TypedValue< MemRefType > | sharedMemref, | ||
| TypedValue< MBarrierGroupType > | barrier, | ||
| SmallVectorImpl< Operation * > & | loadOps | ||
| ) |
Build a tma load from global memory to shared memory using barrier to synchronize.
Return the number of bytes that will be transferred.
Definition at line 973 of file NVGPUTransformOps.cpp.
References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::arith::ConstantIndexOp::create(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), and mlir::affine::makeComposedFoldedAffineApply().
| void HopperBuilder::buildTryWaitParity | ( | TypedValue< MBarrierGroupType > | barrier | ) |
Definition at line 1009 of file NVGPUTransformOps.cpp.
References mlir::arith::ConstantIndexOp::create(), and mlir::Builder::getI1Type().
| Location HopperBuilder::loc |
Definition at line 881 of file NVGPUTransformOps.cpp.
| RewriterBase& HopperBuilder::rewriter |
Definition at line 880 of file NVGPUTransformOps.cpp.