MLIR  20.0.0git
Classes | Typedefs | Functions
mlir::transform::gpu Namespace Reference

Classes

struct  IdBuilderResult
 Helper type for functions that generate ids for the mapping of a scf.forall. More...
 
struct  GpuIdBuilder
 Helper struct for configuring the rewrite of mapped scf.forall ops to various gpu id configurations. More...
 
struct  GpuBlockIdBuilder
 Builder for gpu::BlockIdOps used to map scf.forall to blocks. More...
 
struct  GpuWarpgroupIdBuilder
 Builder for warpgroup ids used to map scf.forall to reindexed warpgroups. More...
 
struct  GpuWarpIdBuilder
 Builder for warp ids used to map scf.forall to reindexed warps. More...
 
struct  GpuThreadIdBuilder
 Builder for warp ids used to map scf.forall to reindexed threads. More...
 
struct  MappingInfo
 Base struct to hold GPU mapping information for a given operation. More...
 
struct  CopyMappingInfo
 

Typedefs

using GpuIdBuilderFnType = std::function< IdBuilderResult(RewriterBase &, Location, ArrayRef< int64_t >, ArrayRef< int64_t >)>
 Common gpu id builder type, allows the configuration of lowering for various mapping schemes. More...
 

Functions

DiagnosedSilenceableFailure mapForallToBlocksImpl (RewriterBase &rewriter, TransformOpInterface transformOp, scf::ForallOp forallOp, SmallVectorImpl< int64_t > &gridDims, const GpuIdBuilder &gpuIdBuilder)
 Map the top level scf.forall op to GPU blocks. More...
 
DiagnosedSilenceableFailure mapOneForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, scf::ForallOp forallOp, ArrayRef< int64_t > blockDims, int64_t warpSize, bool syncAfterDistribute)
 Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims. More...
 
DiagnosedSilenceableFailure mapNestedForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, Operation *target, ArrayRef< int64_t > blockDims, int64_t warpSize, bool syncAfterDistribute)
 Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims. More...
 
DiagnosedSilenceableFailure checkGpuLimits (TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
 Determine if the size of the kernel configuration is supported by the GPU architecture being used. More...
 
DiagnosedSilenceableFailure createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, mlir::gpu::LaunchOp &launchOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt)
 Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within. More...
 
DiagnosedSilenceableFailure alterGpuLaunch (RewriterBase &rewriter, mlir::gpu::LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt)
 Alter kernel configuration of the given kernel. More...
 
DiagnosedSilenceableFailure findTopLevelForallOp (Operation *target, scf::ForallOp &topLevelForallOp, TransformOpInterface transformOp)
 Find the unique top level scf::ForallOp within a given target op. More...
 
DiagnosedSilenceableFailure createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, LaunchOp &launchOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
 
DiagnosedSilenceableFailure alterGpuLaunch (RewriterBase &rewriter, LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ)
 Alter kernel configuration of the given kernel. More...
 

Typedef Documentation

◆ GpuIdBuilderFnType

using mlir::transform::gpu::GpuIdBuilderFnType = typedef std::function<IdBuilderResult( RewriterBase &, Location, ArrayRef<int64_t>, ArrayRef<int64_t>)>

Common gpu id builder type, allows the configuration of lowering for various mapping schemes.

Takes:

  • A rewriter with insertion point set before the forall op to rewrite.
  • The loc of the forall op to rewrite.
  • A list of positive integers carrying the mapping sizes for the current forall op to rewrite.

Definition at line 58 of file Utils.h.

Function Documentation

◆ alterGpuLaunch() [1/2]

DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch ( RewriterBase rewriter,
LaunchOp  gpuLaunch,
TransformOpInterface  transformOp,
std::optional< int64_t >  gridDimX,
std::optional< int64_t >  gridDimY,
std::optional< int64_t >  gridDimZ,
std::optional< int64_t >  blockDimX,
std::optional< int64_t >  blockDimY,
std::optional< int64_t >  blockDimZ 
)

◆ alterGpuLaunch() [2/2]

DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch ( RewriterBase rewriter,
mlir::gpu::LaunchOp  gpuLaunch,
TransformOpInterface  transformOp,
std::optional< int64_t >  gridDimX = std::nullopt,
std::optional< int64_t >  gridDimY = std::nullopt,
std::optional< int64_t >  gridDimZ = std::nullopt,
std::optional< int64_t >  blockDimX = std::nullopt,
std::optional< int64_t >  blockDimY = std::nullopt,
std::optional< int64_t >  blockDimZ = std::nullopt 
)

Alter kernel configuration of the given kernel.

◆ checkGpuLimits()

DiagnosedSilenceableFailure mlir::transform::gpu::checkGpuLimits ( TransformOpInterface  transformOp,
std::optional< int64_t >  gridDimX,
std::optional< int64_t >  gridDimY,
std::optional< int64_t >  gridDimZ,
std::optional< int64_t >  blockDimX,
std::optional< int64_t >  blockDimY,
std::optional< int64_t >  blockDimZ 
)

Determine if the size of the kernel configuration is supported by the GPU architecture being used.

TODO this is currently hardwired to CUDA, parameterize and generalize.

Definition at line 231 of file Utils.cpp.

References mlir::DiagnosedSilenceableFailure::success().

Referenced by alterGpuLaunch(), and createGpuLaunch().

◆ createGpuLaunch() [1/2]

DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch ( RewriterBase rewriter,
Location  loc,
TransformOpInterface  transformOp,
LaunchOp &  launchOp,
std::optional< int64_t >  gridDimX,
std::optional< int64_t >  gridDimY,
std::optional< int64_t >  gridDimZ,
std::optional< int64_t >  blockDimX,
std::optional< int64_t >  blockDimY,
std::optional< int64_t >  blockDimZ 
)

◆ createGpuLaunch() [2/2]

DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch ( RewriterBase rewriter,
Location  loc,
TransformOpInterface  transformOp,
mlir::gpu::LaunchOp &  launchOp,
std::optional< int64_t >  gridDimX = std::nullopt,
std::optional< int64_t >  gridDimY = std::nullopt,
std::optional< int64_t >  gridDimZ = std::nullopt,
std::optional< int64_t >  blockDimX = std::nullopt,
std::optional< int64_t >  blockDimY = std::nullopt,
std::optional< int64_t >  blockDimZ = std::nullopt 
)

Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within.

◆ findTopLevelForallOp()

DiagnosedSilenceableFailure mlir::transform::gpu::findTopLevelForallOp ( Operation target,
scf::ForallOp &  topLevelForallOp,
TransformOpInterface  transformOp 
)

Find the unique top level scf::ForallOp within a given target op.

Definition at line 636 of file GPUTransformOps.cpp.

References mlir::WalkResult::advance(), mlir::WalkResult::interrupt(), mlir::DiagnosedSilenceableFailure::success(), and mlir::Operation::walk().

◆ mapForallToBlocksImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapForallToBlocksImpl ( RewriterBase rewriter,
TransformOpInterface  transformOp,
scf::ForallOp  forallOp,
SmallVectorImpl< int64_t > &  gridDims,
const GpuIdBuilder gpuIdBuilder 
)

Map the top level scf.forall op to GPU blocks.

Mapping is one-to-one and the induction variables of scf.forall are rewritten to gpu.block_id according to the thread_dim_mapping attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic gridDims are currently not supported.

Definition at line 582 of file GPUTransformOps.cpp.

References mlir::OpBuilder::create(), diag(), LDBG, ForallRewriteResult::mappingSizes, rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointToStart(), and mlir::DiagnosedSilenceableFailure::success().

◆ mapNestedForallToThreadsImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapNestedForallToThreadsImpl ( RewriterBase rewriter,
std::optional< TransformOpInterface >  transformOp,
Operation target,
ArrayRef< int64_t >  blockDims,
int64_t  warpSize,
bool  syncAfterDistribute 
)

Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.

The mapping is one-to-one and the induction variables of scf.forall are rewritten to appropriate ids according to the mapping attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic blockDims or newBasis entries are currently not supported. blockDims is expected to be of size 3.

The insertion point of the rewriter is expected to be set at the beginning of the target body block and dominate all other blocks.

Definition at line 846 of file GPUTransformOps.cpp.

References mlir::WalkResult::advance(), mlir::OpBuilder::create(), definiteFailureHelper(), diag(), mlir::Operation::getLoc(), mlir::WalkResult::interrupt(), LDBG, mapOneForallToThreadsImpl(), mlir::WalkResult::skip(), mlir::DiagnosedSilenceableFailure::success(), mlir::Operation::walk(), and mlir::WalkResult::wasInterrupted().

◆ mapOneForallToThreadsImpl()

DiagnosedSilenceableFailure mlir::transform::gpu::mapOneForallToThreadsImpl ( RewriterBase rewriter,
std::optional< TransformOpInterface >  transformOp,
scf::ForallOp  forallOp,
ArrayRef< int64_t >  blockDims,
int64_t  warpSize,
bool  syncAfterDistribute 
)

Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims.

The mapping is one-to-one and the induction variables of scf.forall are rewritten to gpuIdBuilder.idBuilder according to the gpuIdBuilder.mappingAttributes attribute.

Dynamic, scf.forall trip counts are currently not supported. Dynamic blockDims sizes are currently not supported. blockDims is expected to be of size 3.

Definition at line 806 of file GPUTransformOps.cpp.

References mlir::OpBuilder::create(), diag(), getThreadIdBuilder(), rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointAfter(), and mlir::DiagnosedSilenceableFailure::success().

Referenced by mapNestedForallToThreadsImpl().