MLIR
20.0.0git
|
Classes | |
struct | IdBuilderResult |
Helper type for functions that generate ids for the mapping of a scf.forall. More... | |
struct | GpuIdBuilder |
Helper struct for configuring the rewrite of mapped scf.forall ops to various gpu id configurations. More... | |
struct | GpuBlockIdBuilder |
Builder for gpu::BlockIdOps used to map scf.forall to blocks. More... | |
struct | GpuWarpgroupIdBuilder |
Builder for warpgroup ids used to map scf.forall to reindexed warpgroups. More... | |
struct | GpuWarpIdBuilder |
Builder for warp ids used to map scf.forall to reindexed warps. More... | |
struct | GpuThreadIdBuilder |
Builder for warp ids used to map scf.forall to reindexed threads. More... | |
struct | MappingInfo |
Base struct to hold GPU mapping information for a given operation. More... | |
struct | CopyMappingInfo |
Typedefs | |
using | GpuIdBuilderFnType = std::function< IdBuilderResult(RewriterBase &, Location, ArrayRef< int64_t >, ArrayRef< int64_t >)> |
Common gpu id builder type, allows the configuration of lowering for various mapping schemes. More... | |
Functions | |
DiagnosedSilenceableFailure | mapForallToBlocksImpl (RewriterBase &rewriter, TransformOpInterface transformOp, scf::ForallOp forallOp, SmallVectorImpl< int64_t > &gridDims, const GpuIdBuilder &gpuIdBuilder) |
Map the top level scf.forall op to GPU blocks. More... | |
DiagnosedSilenceableFailure | mapOneForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, scf::ForallOp forallOp, ArrayRef< int64_t > blockDims, int64_t warpSize, bool syncAfterDistribute) |
Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims . More... | |
DiagnosedSilenceableFailure | mapNestedForallToThreadsImpl (RewriterBase &rewriter, std::optional< TransformOpInterface > transformOp, Operation *target, ArrayRef< int64_t > blockDims, int64_t warpSize, bool syncAfterDistribute) |
Search scf.forall ops nested under target and map each such op to an explicit GPU implementation along blockDims . More... | |
DiagnosedSilenceableFailure | checkGpuLimits (TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ) |
Determine if the size of the kernel configuration is supported by the GPU architecture being used. More... | |
DiagnosedSilenceableFailure | createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, mlir::gpu::LaunchOp &launchOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt) |
Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within. More... | |
DiagnosedSilenceableFailure | alterGpuLaunch (RewriterBase &rewriter, mlir::gpu::LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX=std::nullopt, std::optional< int64_t > gridDimY=std::nullopt, std::optional< int64_t > gridDimZ=std::nullopt, std::optional< int64_t > blockDimX=std::nullopt, std::optional< int64_t > blockDimY=std::nullopt, std::optional< int64_t > blockDimZ=std::nullopt) |
Alter kernel configuration of the given kernel. More... | |
DiagnosedSilenceableFailure | findTopLevelForallOp (Operation *target, scf::ForallOp &topLevelForallOp, TransformOpInterface transformOp) |
Find the unique top level scf::ForallOp within a given target op. More... | |
DiagnosedSilenceableFailure | createGpuLaunch (RewriterBase &rewriter, Location loc, TransformOpInterface transformOp, LaunchOp &launchOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ) |
DiagnosedSilenceableFailure | alterGpuLaunch (RewriterBase &rewriter, LaunchOp gpuLaunch, TransformOpInterface transformOp, std::optional< int64_t > gridDimX, std::optional< int64_t > gridDimY, std::optional< int64_t > gridDimZ, std::optional< int64_t > blockDimX, std::optional< int64_t > blockDimY, std::optional< int64_t > blockDimZ) |
Alter kernel configuration of the given kernel. More... | |
using mlir::transform::gpu::GpuIdBuilderFnType = typedef std::function<IdBuilderResult( RewriterBase &, Location, ArrayRef<int64_t>, ArrayRef<int64_t>)> |
Common gpu id builder type, allows the configuration of lowering for various mapping schemes.
Takes:
DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch | ( | RewriterBase & | rewriter, |
LaunchOp | gpuLaunch, | ||
TransformOpInterface | transformOp, | ||
std::optional< int64_t > | gridDimX, | ||
std::optional< int64_t > | gridDimY, | ||
std::optional< int64_t > | gridDimZ, | ||
std::optional< int64_t > | blockDimX, | ||
std::optional< int64_t > | blockDimY, | ||
std::optional< int64_t > | blockDimZ | ||
) |
Alter kernel configuration of the given kernel.
Definition at line 300 of file Utils.cpp.
References checkGpuLimits(), mlir::OpBuilder::create(), diag(), mlir::Value::getLoc(), mlir::OpBuilder::setInsertionPointAfterValue(), mlir::DiagnosedSilenceableFailure::success(), and mlir::gpu::KernelDim3::x.
DiagnosedSilenceableFailure mlir::transform::gpu::alterGpuLaunch | ( | RewriterBase & | rewriter, |
mlir::gpu::LaunchOp | gpuLaunch, | ||
TransformOpInterface | transformOp, | ||
std::optional< int64_t > | gridDimX = std::nullopt , |
||
std::optional< int64_t > | gridDimY = std::nullopt , |
||
std::optional< int64_t > | gridDimZ = std::nullopt , |
||
std::optional< int64_t > | blockDimX = std::nullopt , |
||
std::optional< int64_t > | blockDimY = std::nullopt , |
||
std::optional< int64_t > | blockDimZ = std::nullopt |
||
) |
Alter kernel configuration of the given kernel.
DiagnosedSilenceableFailure mlir::transform::gpu::checkGpuLimits | ( | TransformOpInterface | transformOp, |
std::optional< int64_t > | gridDimX, | ||
std::optional< int64_t > | gridDimY, | ||
std::optional< int64_t > | gridDimZ, | ||
std::optional< int64_t > | blockDimX, | ||
std::optional< int64_t > | blockDimY, | ||
std::optional< int64_t > | blockDimZ | ||
) |
Determine if the size of the kernel configuration is supported by the GPU architecture being used.
TODO this is currently hardwired to CUDA, parameterize and generalize.
Definition at line 231 of file Utils.cpp.
References mlir::DiagnosedSilenceableFailure::success().
Referenced by alterGpuLaunch(), and createGpuLaunch().
DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch | ( | RewriterBase & | rewriter, |
Location | loc, | ||
TransformOpInterface | transformOp, | ||
LaunchOp & | launchOp, | ||
std::optional< int64_t > | gridDimX, | ||
std::optional< int64_t > | gridDimY, | ||
std::optional< int64_t > | gridDimZ, | ||
std::optional< int64_t > | blockDimX, | ||
std::optional< int64_t > | blockDimY, | ||
std::optional< int64_t > | blockDimZ | ||
) |
Definition at line 269 of file Utils.cpp.
References checkGpuLimits(), mlir::OpBuilder::create(), createConst(), diag(), mlir::OpBuilder::setInsertionPointToEnd(), and mlir::DiagnosedSilenceableFailure::success().
DiagnosedSilenceableFailure mlir::transform::gpu::createGpuLaunch | ( | RewriterBase & | rewriter, |
Location | loc, | ||
TransformOpInterface | transformOp, | ||
mlir::gpu::LaunchOp & | launchOp, | ||
std::optional< int64_t > | gridDimX = std::nullopt , |
||
std::optional< int64_t > | gridDimY = std::nullopt , |
||
std::optional< int64_t > | gridDimZ = std::nullopt , |
||
std::optional< int64_t > | blockDimX = std::nullopt , |
||
std::optional< int64_t > | blockDimY = std::nullopt , |
||
std::optional< int64_t > | blockDimZ = std::nullopt |
||
) |
Create an empty-body gpu::LaunchOp using the provided kernel settings and put a terminator within.
DiagnosedSilenceableFailure mlir::transform::gpu::findTopLevelForallOp | ( | Operation * | target, |
scf::ForallOp & | topLevelForallOp, | ||
TransformOpInterface | transformOp | ||
) |
Find the unique top level scf::ForallOp within a given target op.
Definition at line 636 of file GPUTransformOps.cpp.
References mlir::WalkResult::advance(), mlir::WalkResult::interrupt(), mlir::DiagnosedSilenceableFailure::success(), and mlir::Operation::walk().
DiagnosedSilenceableFailure mlir::transform::gpu::mapForallToBlocksImpl | ( | RewriterBase & | rewriter, |
TransformOpInterface | transformOp, | ||
scf::ForallOp | forallOp, | ||
SmallVectorImpl< int64_t > & | gridDims, | ||
const GpuIdBuilder & | gpuIdBuilder | ||
) |
Map the top level scf.forall
op to GPU blocks.
Mapping is one-to-one and the induction variables of scf.forall
are rewritten to gpu.block_id according to the thread_dim_mapping attribute.
Dynamic, scf.forall
trip counts are currently not supported. Dynamic gridDims
are currently not supported.
Definition at line 582 of file GPUTransformOps.cpp.
References mlir::OpBuilder::create(), diag(), LDBG, ForallRewriteResult::mappingSizes, rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointToStart(), and mlir::DiagnosedSilenceableFailure::success().
DiagnosedSilenceableFailure mlir::transform::gpu::mapNestedForallToThreadsImpl | ( | RewriterBase & | rewriter, |
std::optional< TransformOpInterface > | transformOp, | ||
Operation * | target, | ||
ArrayRef< int64_t > | blockDims, | ||
int64_t | warpSize, | ||
bool | syncAfterDistribute | ||
) |
Search scf.forall
ops nested under target
and map each such op to an explicit GPU implementation along blockDims
.
The mapping is one-to-one and the induction variables of scf.forall
are rewritten to appropriate ids according to the mapping attribute.
Dynamic, scf.forall
trip counts are currently not supported. Dynamic blockDims
or newBasis
entries are currently not supported. blockDims
is expected to be of size 3.
The insertion point of the rewriter
is expected to be set at the beginning of the target
body block and dominate all other blocks.
Definition at line 846 of file GPUTransformOps.cpp.
References mlir::WalkResult::advance(), mlir::OpBuilder::create(), definiteFailureHelper(), diag(), mlir::Operation::getLoc(), mlir::WalkResult::interrupt(), LDBG, mapOneForallToThreadsImpl(), mlir::WalkResult::skip(), mlir::DiagnosedSilenceableFailure::success(), mlir::Operation::walk(), and mlir::WalkResult::wasInterrupted().
DiagnosedSilenceableFailure mlir::transform::gpu::mapOneForallToThreadsImpl | ( | RewriterBase & | rewriter, |
std::optional< TransformOpInterface > | transformOp, | ||
scf::ForallOp | forallOp, | ||
ArrayRef< int64_t > | blockDims, | ||
int64_t | warpSize, | ||
bool | syncAfterDistribute | ||
) |
Search scf.forall
ops nested under target
and map each such op to an explicit GPU implementation along blockDims
.
The mapping is one-to-one and the induction variables of scf.forall
are rewritten to gpuIdBuilder.idBuilder according to the gpuIdBuilder.mappingAttributes attribute.
Dynamic, scf.forall
trip counts are currently not supported. Dynamic blockDims
sizes are currently not supported. blockDims
is expected to be of size 3.
Definition at line 806 of file GPUTransformOps.cpp.
References mlir::OpBuilder::create(), diag(), getThreadIdBuilder(), rewriteOneForallCommonImpl(), mlir::OpBuilder::setInsertionPointAfter(), and mlir::DiagnosedSilenceableFailure::success().
Referenced by mapNestedForallToThreadsImpl().