MLIR  20.0.0git
Public Member Functions | Public Attributes | List of all members
HopperBuilder Struct Reference

Helper to create the base Hopper-specific operations that are reused in various other places. More...

+ Inheritance diagram for HopperBuilder:

Public Member Functions

 HopperBuilder (RewriterBase &rewriter, Location loc)
 
TypedValue< nvgpu::MBarrierGroupType > buildAndInitBarrierInSharedMemory (OpFoldResult numThreads)
 
TypedValue< nvgpu::TensorMapDescriptorType > buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp)
 Create tma descriptor op to initiate transfer from global to shared memory. More...
 
OpFoldResult buildTmaAsyncLoad (TypedValue< nvgpu::TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< nvgpu::MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps)
 Build a tma load from global memory to shared memory using barrier to synchronize. More...
 
void buildBarrierArriveTx (TypedValue< nvgpu::MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes)
 
SmallVector< Operation * > buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< nvgpu::MBarrierGroupType > barrier)
 If threadIdx.x == 0 does TMA request + wait, else just wait. More...
 
void buildTryWaitParity (TypedValue< nvgpu::MBarrierGroupType > barrier)
 

Public Attributes

RewriterBaserewriter
 
Location loc
 

Detailed Description

Helper to create the base Hopper-specific operations that are reused in various other places.

Definition at line 855 of file NVGPUTransformOps.cpp.

Constructor & Destructor Documentation

◆ HopperBuilder()

HopperBuilder::HopperBuilder ( RewriterBase rewriter,
Location  loc 
)
inline

Definition at line 856 of file NVGPUTransformOps.cpp.

Member Function Documentation

◆ buildAndInitBarrierInSharedMemory()

TypedValue< nvgpu::MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory ( OpFoldResult  numThreads)

◆ buildBarrierArriveTx()

void HopperBuilder::buildBarrierArriveTx ( TypedValue< nvgpu::MBarrierGroupType >  barrier,
ArrayRef< OpFoldResult sizes 
)

◆ buildGlobalMemRefDescriptor()

TypedValue< nvgpu::TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor ( TypedValue< MemRefType >  memref,
gpu::LaunchOp  launchOp 
)

Create tma descriptor op to initiate transfer from global to shared memory.

This must be done before the launch op, on the host.

Definition at line 951 of file NVGPUTransformOps.cpp.

References mlir::OpBuilder::create(), mlir::get(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), mlir::OpBuilder::setInsertionPoint(), and mlir::MemRefType::Builder::setMemorySpace().

◆ buildPredicateLoadsOnThread0()

SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0 ( ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >>  globalDescriptors,
ArrayRef< TypedValue< MemRefType >>  sharedMemBuffers,
TypedValue< nvgpu::MBarrierGroupType >  barrier 
)

If threadIdx.x == 0 does TMA request + wait, else just wait.

Return the operation that performs the transfer on thread0.

Definition at line 892 of file NVGPUTransformOps.cpp.

References mlir::OpBuilder::create(), mlir::getAsIndexOpFoldResult(), and mlir::Builder::getContext().

◆ buildTmaAsyncLoad()

OpFoldResult HopperBuilder::buildTmaAsyncLoad ( TypedValue< nvgpu::TensorMapDescriptorType >  globalDesc,
TypedValue< MemRefType >  sharedMemref,
TypedValue< nvgpu::MBarrierGroupType >  barrier,
SmallVectorImpl< Operation * > &  loadOps 
)

Build a tma load from global memory to shared memory using barrier to synchronize.

Return the number of bytes that will be transferred.

Definition at line 979 of file NVGPUTransformOps.cpp.

References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::OpBuilder::create(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), and mlir::affine::makeComposedFoldedAffineApply().

◆ buildTryWaitParity()

void HopperBuilder::buildTryWaitParity ( TypedValue< nvgpu::MBarrierGroupType >  barrier)

Member Data Documentation

◆ loc

Location HopperBuilder::loc

Definition at line 889 of file NVGPUTransformOps.cpp.

◆ rewriter

RewriterBase& HopperBuilder::rewriter

Definition at line 888 of file NVGPUTransformOps.cpp.


The documentation for this struct was generated from the following file: