Helper to create the base Hopper-specific operations that are reused in various other places. More...

Inheritance diagram for HopperBuilder:

Public Member Functions
	HopperBuilder (RewriterBase &rewriter, Location loc)

TypedValue< nvgpu::MBarrierGroupType >	buildAndInitBarrierInSharedMemory (OpFoldResult numThreads)

TypedValue< nvgpu::TensorMapDescriptorType >	buildGlobalMemRefDescriptor (TypedValue< MemRefType > memref, gpu::LaunchOp launchOp)
	Create tma descriptor op to initiate transfer from global to shared memory. More...

OpFoldResult	buildTmaAsyncLoad (TypedValue< nvgpu::TensorMapDescriptorType > globalDesc, TypedValue< MemRefType > sharedMemref, TypedValue< nvgpu::MBarrierGroupType > barrier, SmallVectorImpl< Operation * > &loadOps)
	Build a tma load from global memory to shared memory using `barrier` to synchronize. More...

void	buildBarrierArriveTx (TypedValue< nvgpu::MBarrierGroupType > barrier, ArrayRef< OpFoldResult > sizes)

SmallVector< Operation * >	buildPredicateLoadsOnThread0 (ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >> globalDescriptors, ArrayRef< TypedValue< MemRefType >> sharedMemBuffers, TypedValue< nvgpu::MBarrierGroupType > barrier)
	If threadIdx.x == 0 does TMA request + wait, else just wait. More...

void	buildTryWaitParity (TypedValue< nvgpu::MBarrierGroupType > barrier)

Public Attributes
RewriterBase &	rewriter

Location	loc

Detailed Description

Helper to create the base Hopper-specific operations that are reused in various other places.

Definition at line 855 of file NVGPUTransformOps.cpp.

Constructor & Destructor Documentation

◆ HopperBuilder()

HopperBuilder::HopperBuilder	(	RewriterBase &	rewriter,
		Location	loc
	)

inline

Definition at line 856 of file NVGPUTransformOps.cpp.

Member Function Documentation

◆ buildAndInitBarrierInSharedMemory()

TypedValue< nvgpu::MBarrierGroupType > HopperBuilder::buildAndInitBarrierInSharedMemory ( OpFoldResult numThreads )

Definition at line 937 of file NVGPUTransformOps.cpp.

References mlir::arith::ConstantIndexOp::create(), mlir::get(), mlir::Builder::getContext(), getSharedAddressSpaceAttribute(), and mlir::getValueOrCreateConstantIndexOp().

◆ buildBarrierArriveTx()

void HopperBuilder::buildBarrierArriveTx	(	TypedValue< nvgpu::MBarrierGroupType >	barrier,
		ArrayRef< OpFoldResult >	sizes
	)

Definition at line 1002 of file NVGPUTransformOps.cpp.

References mlir::bindSymbolsList(), mlir::computeSum(), mlir::arith::ConstantIndexOp::create(), mlir::Builder::getContext(), mlir::getValueOrCreateConstantIndexOp(), and mlir::affine::makeComposedFoldedAffineApply().

◆ buildGlobalMemRefDescriptor()

TypedValue< nvgpu::TensorMapDescriptorType > HopperBuilder::buildGlobalMemRefDescriptor	(	TypedValue< MemRefType >	memref,
		gpu::LaunchOp	launchOp
	)

Create tma descriptor op to initiate transfer from global to shared memory.

This must be done before the launch op, on the host.

Definition at line 952 of file NVGPUTransformOps.cpp.

References mlir::get(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), getSharedAddressSpaceAttribute(), mlir::getValueOrCreateConstantIndexOp(), mlir::OpBuilder::setInsertionPoint(), and mlir::MemRefType::Builder::setMemorySpace().

◆ buildPredicateLoadsOnThread0()

SmallVector< Operation * > HopperBuilder::buildPredicateLoadsOnThread0	(	ArrayRef< TypedValue< nvgpu::TensorMapDescriptorType >>	globalDescriptors,
		ArrayRef< TypedValue< MemRefType >>	sharedMemBuffers,
		TypedValue< nvgpu::MBarrierGroupType >	barrier
	)

If threadIdx.x == 0 does TMA request + wait, else just wait.

Return the operation that performs the transfer on thread0.

Definition at line 892 of file NVGPUTransformOps.cpp.

References mlir::arith::ConstantIndexOp::create(), mlir::getAsIndexOpFoldResult(), and mlir::Builder::getContext().

◆ buildTmaAsyncLoad()

OpFoldResult HopperBuilder::buildTmaAsyncLoad	(	TypedValue< nvgpu::TensorMapDescriptorType >	globalDesc,
		TypedValue< MemRefType >	sharedMemref,
		TypedValue< nvgpu::MBarrierGroupType >	barrier,
		SmallVectorImpl< Operation * > &	loadOps
	)

Build a tma load from global memory to shared memory using barrier to synchronize.

Return the number of bytes that will be transferred.

Definition at line 980 of file NVGPUTransformOps.cpp.

References mlir::bindSymbolsList(), mlir::computeProduct(), mlir::arith::ConstantIndexOp::create(), mlir::Builder::getContext(), mlir::memref::getMixedSizes(), and mlir::affine::makeComposedFoldedAffineApply().

◆ buildTryWaitParity()

void HopperBuilder::buildTryWaitParity ( TypedValue< nvgpu::MBarrierGroupType > barrier )

Definition at line 1018 of file NVGPUTransformOps.cpp.

References mlir::arith::ConstantIndexOp::create(), and mlir::Builder::getI1Type().

Member Data Documentation

◆ loc

Location HopperBuilder::loc

Definition at line 889 of file NVGPUTransformOps.cpp.

◆ rewriter

RewriterBase& HopperBuilder::rewriter

Definition at line 888 of file NVGPUTransformOps.cpp.

The documentation for this struct was generated from the following file:

lib/Dialect/NVGPU/TransformOps/NVGPUTransformOps.cpp

Public Member Functions

Public Attributes