MLIR 22.0.0git
mlir::transform::gpu Namespace Reference

Classes

struct  CopyMappingInfo
struct  GpuBlockIdBuilder
 Builder for gpu::BlockIdOps used to map scf.forall to blocks. More...
struct  GpuIdBuilder
 Helper struct for configuring the rewrite of mapped scf.forall ops to various gpu id configurations. More...
struct  GpuLaneIdBuilder
 Builder for lane id. More...
struct  GpuThreadIdBuilder
 Builder for warp ids used to map scf.forall to reindexed threads. More...
struct  GpuWarpgroupIdBuilder
 Builder for warpgroup ids used to map scf.forall to reindexed warpgroups. More...
struct  GpuWarpIdBuilder
 Builder for warp ids used to map scf.forall to reindexed warps. More...
struct  IdBuilderResult
 Helper type for functions that generate ids for the mapping of a scf.forall. More...
struct  MappingInfo
 Base struct to hold GPU mapping information for a given operation. More...

Typedefs

using GpuIdBuilderFnType
 Common gpu id builder type, allows the configuration of lowering for various mapping schemes.

Functions

DiagnosedSilenceableFailure mapForallToBlocksImpl (RewriterBase &rewriter, TransformOpInterface transformOp, scf::ForallOp forallOp, SmallVectorImpl< int64_t > &gridDims, const GpuIdBuilder &gpuIdBuilder)
 Map the top level scf.forall op to GPU blocks.
DiagnosedSilenceableFailure mapOneForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, scf::ForallOp forallOp, ArrayRef< int64_t > blockSizes, int64_t warpSize, bool syncAfterDistribute)
 Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.
DiagnosedSilenceableFailure mapNestedForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, Operation *target, ArrayRef< int64_t > blockDims, int64_t warpSize, bool syncAfterDistribute)
 Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.
raw_ostreamoperator<< (raw_ostream &os, const IdBuilderResult &res)
DiagnosedSilenceableFailure checkGpuLimits (TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
 Determine if the size of the kernel configuration is supported by the GPU architecture being used.
DiagnosedSilenceableFailure createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, mlir::gpu::LaunchOp &launchOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt)
 Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within.
DiagnosedSilenceableFailure alterGpuLaunch (RewriterBase &rewriter, mlir::gpu::LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt)
 Alter kernel configuration of the given kernel.
DiagnosedSilenceableFailure findTopLevelForallOp (Operation *target, scf::ForallOp &topLevelForallOp, TransformOpInterface transformOp)
 Find the unique top level scf::ForallOp within a given target op.
llvm::raw_ostream & operator<< (llvm::raw_ostream &os, const CopyMappingInfo &info)
DiagnosedSilenceableFailure createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, LaunchOp &launchOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
DiagnosedSilenceableFailure alterGpuLaunch (RewriterBase &rewriter, LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
 Alter kernel configuration of the given kernel.

Typedef Documentation

◆ GpuIdBuilderFnType

Initial value:
std::function<IdBuilderResult(
This class defines the main interface for locations in MLIR and acts as a non-nullable wrapper around...
Definition Location.h:76
This class coordinates the application of a rewrite on a set of IR, providing a way for clients to tr...
Helper type for functions that generate ids for the mapping of a scf.forall.
Definition Utils.h:31

Common gpu id builder type, allows the configuration of lowering for various mapping schemes.

Takes:

  • A rewriter with insertion point set before the forall op to rewrite.
  • The loc of the forall op to rewrite.
  • A list of positive integers carrying the mapping sizes for the current forall op to rewrite.

Definition at line 55 of file Utils.h.

Function Documentation

◆ alterGpuLaunch() [1/2]

DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch ( RewriterBase & rewriter,
LaunchOp gpuLaunch,
TransformOpInterface transformOp,
std::optional< int64_t > gridDimX,
std::optional< int64_t > gridDimY,
std::optional< int64_t > gridDimZ,
std::optional< int64_t > blockDimX,
std::optional< int64_t > blockDimY,
std::optional< int64_t > blockDimZ )

◆ alterGpuLaunch() [2/2]

DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch ( RewriterBase & rewriter,
mlir::gpu::LaunchOp gpuLaunch,
TransformOpInterface transformOp,
std::optional< int64_t > gridDimX = std::nullopt,
std::optional< int64_t > gridDimY = std::nullopt,
std::optional< int64_t > gridDimZ = std::nullopt,
std::optional< int64_t > blockDimX = std::nullopt,
std::optional< int64_t > blockDimY = std::nullopt,
std::optional< int64_t > blockDimZ = std::nullopt )

Alter kernel configuration of the given kernel.

References target.

◆ checkGpuLimits()

DiagnosedSilenceableFailure mlir::transform::gpu::checkGpuLimits ( TransformOpInterface transformOp,
std::optional< int64_t > gridDimX,
std::optional< int64_t > gridDimY,
std::optional< int64_t > gridDimZ,
std::optional< int64_t > blockDimX,
std::optional< int64_t > blockDimY,
std::optional< int64_t > blockDimZ )

Determine if the size of the kernel configuration is supported by the GPU architecture being used.

TODO this is currently hardwired to CUDA, parameterize and generalize.

Definition at line 360 of file Utils.cpp.

References kMaxBlockdimx, kMaxBlockdimy, kMaxBlockdimz, kMaxGriddimx, kMaxGriddimy, kMaxGriddimz, kMaxTotalBlockdim, kMaxTotalGriddim, and mlir::DiagnosedSilenceableFailure::success().

Referenced by alterGpuLaunch(), and createGpuLaunch().

◆ createGpuLaunch() [1/2]

DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch ( RewriterBase & rewriter,
Location loc,
TransformOpInterface transformOp,
LaunchOp & launchOp,
std::optional< int64_t > gridDimX,
std::optional< int64_t > gridDimY,
std::optional< int64_t > gridDimZ,
std::optional< int64_t > blockDimX,
std::optional< int64_t > blockDimY,
std::optional< int64_t > blockDimZ )

◆ createGpuLaunch() [2/2]

DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch ( RewriterBase & rewriter,
Location loc,
TransformOpInterface transformOp,
mlir::gpu::LaunchOp & launchOp,
std::optional< int64_t > gridDimX = std::nullopt,
std::optional< int64_t > gridDimY = std::nullopt,
std::optional< int64_t > gridDimZ = std::nullopt,
std::optional< int64_t > blockDimX = std::nullopt,
std::optional< int64_t > blockDimY = std::nullopt,
std::optional< int64_t > blockDimZ = std::nullopt )

Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within.

◆ findTopLevelForallOp()

DiagnosedSilenceableFailure mlir::transform::gpu::findTopLevelForallOp ( Operation * target,
scf::ForallOp & topLevelForallOp,
TransformOpInterface transformOp )

Find the unique top level scf::ForallOp within a given target op.

Definition at line 669 of file GPUTransformOps.cpp.

References mlir::WalkResult::advance(), mlir::WalkResult::interrupt(), mlir::DiagnosedSilenceableFailure::success(), and target.

◆ mapForallToBlocksImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapForallToBlocksImpl ( RewriterBase & rewriter,
TransformOpInterface transformOp,
scf::ForallOp forallOp,
SmallVectorImpl< int64_t > & gridDims,
const GpuIdBuilder & gpuIdBuilder )

Map the top level scf.forall op to GPU blocks.

Mapping is one-to-one and the induction variables of scf.forall are rewritten to gpu.block_id according to the thread_dim_mapping attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic gridDims are currently not supported.

Definition at line 615 of file GPUTransformOps.cpp.

References mlir::arith::ConstantIndexOp::create(), diag(), ForallRewriteResult::mappingSizes, replaceUnitMappingIdsHelper(), rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointToStart(), mlir::DiagnosedSilenceableFailure::success(), and verifyGpuMapping().

◆ mapNestedForallToThreadsImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapNestedForallToThreadsImpl ( RewriterBase & rewriter,
std::optional< TransformOpInterface > transformOp,
Operation * target,
ArrayRef< int64_t > blockDims,
int64_t warpSize,
bool syncAfterDistribute )

Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.

The mapping is one-to-one and the induction variables of scf.forall are rewritten to appropriate ids according to the mapping attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic blockDims or newBasis entries are currently not supported. blockDims is expected to be of size 3.

The insertion point of the rewriter is expected to be set at the beginning of the target body block and dominate all other blocks.

Definition at line 894 of file GPUTransformOps.cpp.

References mlir::WalkResult::advance(), mlir::arith::ConstantIndexOp::create(), definiteFailureHelper(), diag(), mlir::WalkResult::interrupt(), mapOneForallToThreadsImpl(), replaceUnitMappingIdsHelper(), mlir::WalkResult::skip(), mlir::DiagnosedSilenceableFailure::success(), target, and mlir::WalkResult::wasInterrupted().

◆ mapOneForallToThreadsImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapOneForallToThreadsImpl ( RewriterBase & rewriter,
std::optional< TransformOpInterface > transformOp,
scf::ForallOp forallOp,
ArrayRef< int64_t > blockSizes,
int64_t warpSize,
bool syncAfterDistribute )

Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.

The mapping is one-to-one and the induction variables of scf.forall are rewritten to gpuIdBuilder.idBuilder according to the gpuIdBuilder.mappingAttributes attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic blockDims sizes are currently not supported. blockDims is expected to be of size 3.

Definition at line 854 of file GPUTransformOps.cpp.

References diag(), getThreadIdBuilder(), rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointAfter(), mlir::DiagnosedSilenceableFailure::success(), and verifyGpuMapping().

Referenced by mapNestedForallToThreadsImpl().

◆ operator<<() [1/2]

llvm::raw_ostream & mlir::transform::gpu::operator<< ( llvm::raw_ostream & os,
const CopyMappingInfo & info )
inline

Definition at line 122 of file GPUHeuristics.h.

References mlir::transform::gpu::CopyMappingInfo::print().

◆ operator<<() [2/2]

raw_ostream & mlir::transform::gpu::operator<< ( raw_ostream & os,
const IdBuilderResult & res )
inline