MLIR  17.0.0git
Classes | Typedefs | Functions
mlir::scf Namespace Reference

Classes

struct  LoopNest
 
class  ForLoopPipeliningPattern
 
struct  SCFTilingOptions
 Options to use to control tiling. More...
 
struct  SCFTilingResult
 Transformation information returned after tiling. More...
 
struct  SCFTileAndFuseOptions
 Options used to control tile + fuse. More...
 
struct  SCFFuseProducerOfSliceResult
 Fuse the producer of the source of candidateSliceOp by computing the required slice of the producer in-place. More...
 
struct  SCFTileAndFuseResult
 Transformation information returned after tile and fuse. More...
 
struct  SCFReductionTilingResult
 Transformation information returned after reduction tiling. More...
 
struct  PipeliningOption
 Options to dictate how loops should be pipelined. More...
 

Typedefs

using ValueVector = SmallVector< Value >
 An owning vector of values, handy to return from functions. More...
 
using LoopVector = SmallVector< scf::ForOp >
 
using SCFTileSizeComputationFunction = std::function< SmallVector< Value >(OpBuilder &, Operation *)>
 
using LoopMatcherFn = function_ref< LogicalResult(Value, OpFoldResult &, OpFoldResult &, OpFoldResult &)>
 Match "for loop"-like operations: If the first parameter is an iteration variable, return lower/upper bounds via the second/third parameter and the step size via the last parameter. More...
 

Functions

void buildTerminatedBody (OpBuilder &builder, Location loc)
 Default callback for IfOp builders. Inserts a yield without arguments. More...
 
void ensureLoopTerminator (Region &region, Builder &builder, Location loc)
 
ForOp getForInductionVarOwner (Value val)
 Returns the loop parent of an induction variable. More...
 
ParallelOp getParallelForInductionVarOwner (Value val)
 Returns the parallel loop parent of an induction variable. More...
 
ForallOp getForallOpThreadIndexOwner (Value val)
 Returns the ForallOp parent of an thread index variable. More...
 
bool insideMutuallyExclusiveBranches (Operation *a, Operation *b)
 Return true if ops a and b (or their ancestors) are in mutually exclusive regions/blocks of an IfOp. More...
 
LogicalResult promoteIfSingleIteration (PatternRewriter &rewriter, scf::ForallOp forallOp)
 Promotes the loop body of a scf::ForallOp to its containing block if the loop was known to have a single iteration. More...
 
void promote (PatternRewriter &rewriter, scf::ForallOp forallOp)
 Promotes the loop body of a scf::ForallOp to its containing block. More...
 
LoopNest buildLoopNest (OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs, ValueRange steps, ValueRange iterArgs, function_ref< ValueVector(OpBuilder &, Location, ValueRange, ValueRange)> bodyBuilder=nullptr)
 Creates a perfect nest of "for" loops, i.e. More...
 
LoopNest buildLoopNest (OpBuilder &builder, Location loc, ValueRange lbs, ValueRange ubs, ValueRange steps, function_ref< void(OpBuilder &, Location, ValueRange)> bodyBuilder=nullptr)
 A convenience version for building loop nests without iteration arguments (like for reductions). More...
 
void registerTransformDialectExtension (DialectRegistry &registry)
 
void registerBufferizableOpInterfaceExternalModels (DialectRegistry &registry)
 
FailureOr< ForOp > pipelineForLoop (RewriterBase &rewriter, ForOp forOp, const PipeliningOption &options)
 Generate a pipelined version of the scf.for loop based on the schedule given as option. More...
 
FailureOr< SCFTilingResulttileUsingSCFForOp (RewriterBase &rewriter, TilingInterface op, const SCFTilingOptions &options)
 Method to tile an op that implements the TilingInterface using scf.for for iterating over the tiles. More...
 
std::optional< SCFFuseProducerOfSliceResulttileAndFuseProducerOfSlice (RewriterBase &rewriter, tensor::ExtractSliceOp candidateSliceOp, MutableArrayRef< scf::ForOp > loops)
 Implementation of fusing producer of a single slice by computing the slice of the producer in-place. More...
 
void yieldReplacementForFusedProducer (RewriterBase &rewriter, tensor::ExtractSliceOp sliceOp, scf::SCFFuseProducerOfSliceResult fusedProducerInfo, MutableArrayRef< scf::ForOp > loops)
 Reconstruct the fused producer from within the tiled-and-fused code. More...
 
FailureOr< SCFTileAndFuseResulttileConsumerAndFuseProducerGreedilyUsingSCFForOp (RewriterBase &rewriter, TilingInterface consumer, const SCFTileAndFuseOptions &options)
 Method to tile and fuse a sequence of operations, by tiling the consumer and fusing its producers. More...
 
FailureOr< SmallVector< scf::ForOp > > lowerToLoopsUsingSCFForOp (RewriterBase &rewriter, TilingInterface op)
 Method to lower an op that implements the TilingInterface to loops/scalars. More...
 
FailureOr< scf::SCFReductionTilingResulttileReductionUsingScf (RewriterBase &b, PartialReductionOpInterface op, ArrayRef< OpFoldResult > tileSize)
 Method to tile a reduction and generate a parallel op within a serial loop. More...
 
void naivelyFuseParallelOps (Region &region)
 Fuses all adjacent scf.parallel operations with identical bounds and step into one scf.parallel operations. More...
 
LogicalResult peelForLoopAndSimplifyBounds (RewriterBase &rewriter, ForOp forOp, scf::ForOp &partialIteration)
 Rewrite a for loop with bounds/step that potentially do not divide evenly into a for loop where the step divides the iteration space evenly, followed by another scf.for for the last (partial) iteration (if any; returned via partialIteration). More...
 
std::pair< ParallelOp, ParallelOp > tileParallelLoop (ParallelOp op, llvm::ArrayRef< int64_t > tileSizes, bool noMinMaxBounds)
 Tile a parallel loop of the form scf.parallel (i0, i1) = (arg0, arg1) to (arg2, arg3) step (arg4, arg5) More...
 
void populateSCFStructuralTypeConversionsAndLegality (TypeConverter &typeConverter, RewritePatternSet &patterns, ConversionTarget &target)
 Populates patterns for SCF structural type conversions and sets up the provided ConversionTarget with the appropriate legality configuration for the ops to get converted properly. More...
 
void populateSCFLoopPipeliningPatterns (RewritePatternSet &patterns, const PipeliningOption &options)
 Populate patterns for SCF software pipelining transformation. More...
 
void populateSCFForLoopCanonicalizationPatterns (RewritePatternSet &patterns)
 Populate patterns for canonicalizing operations inside SCF loop bodies. More...
 
LogicalResult matchForLikeLoop (Value iv, OpFoldResult &lb, OpFoldResult &ub, OpFoldResult &step)
 Match "for loop"-like operations from the SCF dialect. More...
 
LogicalResult addLoopRangeConstraints (FlatAffineValueConstraints &cstr, Value iv, OpFoldResult lb, OpFoldResult ub, OpFoldResult step)
 Populate the given constraint set with induction variable constraints of a "for" loop with the given range and step. More...
 
LogicalResult canonicalizeMinMaxOpInLoop (RewriterBase &rewriter, Operation *op, LoopMatcherFn loopMatcher)
 Try to canonicalize the given affine.min/max operation in the context of for loops with a known range. More...
 
LogicalResult rewritePeeledMinMaxOp (RewriterBase &rewriter, Operation *op, Value iv, Value ub, Value step, bool insideLoop)
 Try to simplify the given affine.min/max operation op after loop peeling. More...
 

Typedef Documentation

◆ LoopMatcherFn

Match "for loop"-like operations: If the first parameter is an iteration variable, return lower/upper bounds via the second/third parameter and the step size via the last parameter.

The function should return success in that case. If the first parameter is not an iteration variable, return failure.

Definition at line 39 of file AffineCanonicalizationUtils.h.

◆ LoopVector

using mlir::scf::LoopVector = typedef SmallVector<scf::ForOp>

Definition at line 75 of file SCF.h.

◆ SCFTileSizeComputationFunction

Definition at line 28 of file TileUsingInterface.h.

◆ ValueVector

An owning vector of values, handy to return from functions.

Definition at line 74 of file SCF.h.

Function Documentation

◆ addLoopRangeConstraints()

LogicalResult mlir::scf::addLoopRangeConstraints ( FlatAffineValueConstraints cstr,
Value  iv,
OpFoldResult  lb,
OpFoldResult  ub,
OpFoldResult  step 
)

◆ buildLoopNest() [1/2]

LoopNest mlir::scf::buildLoopNest ( OpBuilder builder,
Location  loc,
ValueRange  lbs,
ValueRange  ubs,
ValueRange  steps,
function_ref< void(OpBuilder &, Location, ValueRange)>  bodyBuilder = nullptr 
)

A convenience version for building loop nests without iteration arguments (like for reductions).

Does not take the initial value of reductions or expect the body building functions to return their current value. The built nested scf::For are captured in capturedLoops when non-null.

Definition at line 666 of file SCF.cpp.

References buildLoopNest().

◆ buildLoopNest() [2/2]

LoopNest mlir::scf::buildLoopNest ( OpBuilder builder,
Location  loc,
ValueRange  lbs,
ValueRange  ubs,
ValueRange  steps,
ValueRange  iterArgs,
function_ref< ValueVector(OpBuilder &, Location, ValueRange, ValueRange)>  bodyBuilder = nullptr 
)

Creates a perfect nest of "for" loops, i.e.

all loops but the innermost contain only another loop and a terminator. The lower, upper bounds and steps are provided as lbs, ubs and steps, which are expected to be of the same size. iterArgs points to the initial values of the loop iteration arguments, which will be forwarded through the nest to the innermost loop. The body of the loop is populated using bodyBuilder, which accepts an ordered list of induction variables of all loops, followed by a list of iteration arguments of the innermost loop, in the same order as provided to iterArgs. This function is expected to return as many values as iterArgs, of the same type and in the same order, that will be treated as yielded from the loop body and forwarded back through the loop nest. If the function is not provided, the loop nest is not expected to have iteration arguments, the body of the innermost loop will be left empty, containing only the zero-operand terminator. Returns the LoopNest containing the list of perfectly nest scf::ForOp build during the call. If bound arrays are empty, the body builder will be called once to construct the IR outside of the loop with an empty list of induction variables.

Definition at line 593 of file SCF.cpp.

References copy(), mlir::OpBuilder::create(), mlir::OpBuilder::setInsertionPointToEnd(), and mlir::OpBuilder::setInsertionPointToStart().

Referenced by buildLoopNest(), mlir::linalg::GenerateLoopNest< LoopTy >::doit(), mlir::sparse_tensor::genDenseTensorOrSparseConstantIterLoop(), mlir::linalg::generateParallelLoopNest(), insertCopyLoops(), and tilePadOp().

◆ buildTerminatedBody()

void mlir::scf::buildTerminatedBody ( OpBuilder builder,
Location  loc 
)

Default callback for IfOp builders. Inserts a yield without arguments.

Definition at line 78 of file SCF.cpp.

References mlir::OpBuilder::create().

◆ canonicalizeMinMaxOpInLoop()

LogicalResult mlir::scf::canonicalizeMinMaxOpInLoop ( RewriterBase rewriter,
Operation op,
LoopMatcherFn  loopMatcher 
)

Try to canonicalize the given affine.min/max operation in the context of for loops with a known range.

Canonicalize min/max operations in the context of for loops with a known range.

loopMatcher is used to retrieve loop bounds and the step size for a given iteration variable.

Note: loopMatcher allows this function to be used with any "for loop"-like operation (scf.for, scf.parallel and even ops defined in other dialects).

Call canonicalizeMinMaxOp and add the following constraints to the constraint system (along with the missing dimensions):

  • iv >= lb
  • iv < lb + step * ((ub - lb - 1) floorDiv step) + 1

Note: Due to limitations of IntegerPolyhedron, only constant step sizes are currently supported.

Definition at line 146 of file AffineCanonicalizationUtils.cpp.

References addLoopRangeConstraints(), canonicalizeMinMaxOp(), mlir::failed(), mlir::failure(), and mlir::Operation::getOperands().

◆ ensureLoopTerminator()

void mlir::scf::ensureLoopTerminator ( Region region,
Builder builder,
Location  loc 
)

◆ getForallOpThreadIndexOwner()

ForallOp mlir::scf::getForallOpThreadIndexOwner ( Value  val)

Returns the ForallOp parent of an thread index variable.

If the provided value is not a thread index variable, then return nullptr.

Definition at line 1454 of file SCF.cpp.

References mlir::Value::dyn_cast().

Referenced by matchForLikeLoop().

◆ getForInductionVarOwner()

ForOp mlir::scf::getForInductionVarOwner ( Value  val)

Returns the loop parent of an induction variable.

If the provided value is not an induction variable, then return nullptr.

Referenced by buildPackingLoopNest(), matchForLikeLoop(), and replaceByPackingLoopNestResult().

◆ getParallelForInductionVarOwner()

ParallelOp mlir::scf::getParallelForInductionVarOwner ( Value  val)

Returns the parallel loop parent of an induction variable.

If the provided value is not an induction variable, then return nullptr.

Definition at line 2826 of file SCF.cpp.

References mlir::Value::dyn_cast().

Referenced by matchForLikeLoop().

◆ insideMutuallyExclusiveBranches()

bool mlir::scf::insideMutuallyExclusiveBranches ( Operation a,
Operation b 
)

Return true if ops a and b (or their ancestors) are in mutually exclusive regions/blocks of an IfOp.

Definition at line 1766 of file SCF.cpp.

References mlir::Operation::getParentOfType().

◆ lowerToLoopsUsingSCFForOp()

FailureOr< SmallVector< scf::ForOp > > mlir::scf::lowerToLoopsUsingSCFForOp ( RewriterBase rewriter,
TilingInterface  op 
)

Method to lower an op that implements the TilingInterface to loops/scalars.

Definition at line 722 of file TileUsingInterface.cpp.

References mlir::OpBuilder::create(), mlir::failed(), mlir::failure(), mlir::getValueOrCreateConstantIndexOp(), mlir::RewriterBase::notifyMatchFailure(), and mlir::OpBuilder::setInsertionPoint().

◆ matchForLikeLoop()

LogicalResult mlir::scf::matchForLikeLoop ( Value  iv,
OpFoldResult lb,
OpFoldResult ub,
OpFoldResult step 
)

Match "for loop"-like operations from the SCF dialect.

Definition at line 32 of file AffineCanonicalizationUtils.cpp.

References mlir::failure(), getForallOpThreadIndexOwner(), getForInductionVarOwner(), getParallelForInductionVarOwner(), and mlir::success().

◆ naivelyFuseParallelOps()

void mlir::scf::naivelyFuseParallelOps ( Region region)

Fuses all adjacent scf.parallel operations with identical bounds and step into one scf.parallel operations.

Uses a naive aliasing and dependency analysis.

Definition at line 138 of file ParallelLoopFusion.cpp.

References fuseIfLegal(), and mlir::isMemoryEffectFree().

◆ peelForLoopAndSimplifyBounds()

LogicalResult mlir::scf::peelForLoopAndSimplifyBounds ( RewriterBase rewriter,
ForOp  forOp,
scf::ForOp &  partialIteration 
)

Rewrite a for loop with bounds/step that potentially do not divide evenly into a for loop where the step divides the iteration space evenly, followed by another scf.for for the last (partial) iteration (if any; returned via partialIteration).

This transformation is called "loop peeling".

This transformation is beneficial for a wide range of transformations such as vectorization or loop tiling: It enables additional canonicalizations inside the peeled loop body such as rewriting masked loads into unmaked loads.

E.g., assuming a lower bound of 0 (for illustration purposes):

scf.for %iv = %c0 to %ub step %c4 {
(loop body)
}

is rewritten into the following pseudo IR:

%newUb = %ub - (%ub mod %c4)
scf.for %iv = %c0 to %newUb step %c4 {
(loop body)
}
scf.for %iv2 = %newUb to %ub {
(loop body)
}
int64_t mod(int64_t lhs, int64_t rhs)
Returns MLIR's mod operation on constants.
Definition: MathExtras.h:45

After loop peeling, this function tries to simplify affine.min and affine.max ops in the body of the peeled loop and in the body of the partial iteration loop, taking advantage of the fact that the peeled loop has only "full" iterations. This simplification is expected to enable further canonicalization opportunities through other patterns.

The return value indicates whether the loop was rewritten or not. Loops are not rewritten if:

  • Loop step size is 1 or
  • Loop bounds and step size are static, and step already divides the iteration space evenly.

Note: This function rewrites the given scf.for loop in-place and creates a new scf.for operation for the last iteration. It replaces all uses of the unpeeled loop with the results of the newly generated scf.for.

Referenced by mlir::linalg::peelLoop().

◆ pipelineForLoop()

FailureOr< ForOp > mlir::scf::pipelineForLoop ( RewriterBase rewriter,
ForOp  forOp,
const PipeliningOption options 
)

Generate a pipelined version of the scf.for loop based on the schedule given as option.

This applies the mechanical transformation of changing the loop and generating the prologue/epilogue for the pipelining and doesn't make any decision regarding the schedule. Based on the options the loop is split into several stages. The transformation assumes that the scheduling given by user is valid. For example if we break a loop into 3 stages named S0, S1, S2 we would generate the following code with the number in parenthesis as the iteration index: S0(0) // Prologue S0(1) S1(0) // Prologue scf.for I = C0 to N - 2 { S0(I+2) S1(I+1) S2(I) // Pipelined kernel } S1(N) S2(N-1) // Epilogue S2(N) // Epilogue

Definition at line 498 of file LoopPipelining.cpp.

References mlir::RewriterBase::eraseOp(), mlir::failure(), options, mlir::RewriterBase::replaceOp(), and mlir::OpBuilder::setInsertionPointAfter().

Referenced by mlir::scf::ForLoopPipeliningPattern::returningMatchAndRewrite().

◆ populateSCFForLoopCanonicalizationPatterns()

void mlir::scf::populateSCFForLoopCanonicalizationPatterns ( RewritePatternSet patterns)

Populate patterns for canonicalizing operations inside SCF loop bodies.

At the moment, only affine.min/max computations with iteration variables, loop bounds and loop steps are canonicalized.

Definition at line 178 of file LoopCanonicalization.cpp.

References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().

◆ populateSCFLoopPipeliningPatterns()

void mlir::scf::populateSCFLoopPipeliningPatterns ( RewritePatternSet patterns,
const PipeliningOption options 
)

Populate patterns for SCF software pipelining transformation.

See the ForLoopPipeliningPattern for the transformation details.

Definition at line 541 of file LoopPipelining.cpp.

References mlir::RewritePatternSet::add(), mlir::RewritePatternSet::getContext(), and options.

◆ populateSCFStructuralTypeConversionsAndLegality()

void mlir::scf::populateSCFStructuralTypeConversionsAndLegality ( TypeConverter typeConverter,
RewritePatternSet patterns,
ConversionTarget target 
)

Populates patterns for SCF structural type conversions and sets up the provided ConversionTarget with the appropriate legality configuration for the ops to get converted properly.

A "structural" type conversion is one where the underlying ops are completely agnostic to the actual types involved and simply need to update their types. An example of this is scf.if – the scf.if op and the corresponding scf.yield ops need to update their types accordingly to the TypeConverter, but otherwise don't care what type conversions are happening.

Definition at line 251 of file StructuralTypeConversions.cpp.

References mlir::RewritePatternSet::add(), mlir::ConversionTarget::addDynamicallyLegalOp(), mlir::RewritePatternSet::getContext(), and mlir::TypeConverter::isLegal().

◆ promote()

void mlir::scf::promote ( PatternRewriter rewriter,
scf::ForallOp  forallOp 
)

Promotes the loop body of a scf::ForallOp to its containing block.

Definition at line 555 of file SCF.cpp.

References mlir::OpBuilder::clone(), mlir::OpBuilder::create(), mlir::Value::getType(), mlir::Type::isa(), mlir::IRMapping::lookupOrDefault(), mlir::IRMapping::map(), and mlir::RewriterBase::replaceOp().

Referenced by promoteIfSingleIteration().

◆ promoteIfSingleIteration()

LogicalResult mlir::scf::promoteIfSingleIteration ( PatternRewriter rewriter,
scf::ForallOp  forallOp 
)

Promotes the loop body of a scf::ForallOp to its containing block if the loop was known to have a single iteration.

Promotes the loop body of a forallOp to its containing block if it can be determined that the loop has a single iteration.

Definition at line 540 of file SCF.cpp.

References mlir::constantTripCount(), mlir::failure(), promote(), and mlir::success().

◆ registerBufferizableOpInterfaceExternalModels()

void mlir::scf::registerBufferizableOpInterfaceExternalModels ( DialectRegistry registry)

◆ registerTransformDialectExtension()

void mlir::scf::registerTransformDialectExtension ( DialectRegistry registry)

Definition at line 274 of file SCFTransformOps.cpp.

References mlir::DialectRegistry::addExtensions().

Referenced by mlir::registerAllDialects().

◆ rewritePeeledMinMaxOp()

LogicalResult mlir::scf::rewritePeeledMinMaxOp ( RewriterBase rewriter,
Operation op,
Value  iv,
Value  ub,
Value  step,
bool  insideLoop 
)

Try to simplify the given affine.min/max operation op after loop peeling.

This function can simplify min/max operations such as (ub is the previous upper bound of the unpeeled loop):

#map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #map(%iv)[%step, %ub]

and rewrites them into (in the case the peeled loop):

%r = %step

min/max operations inside the partial iteration are rewritten in a similar way.

This function can simplify min/max operations such as (ub is the previous upper bound of the unpeeled loop):

#map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]

and rewrites them into (in the case the peeled loop):

%r = %step

min/max operations inside the partial iteration are rewritten in a similar way.

This function builds up a set of constraints, capable of proving that:

  • Inside the peeled loop: min(step, ub - iv) == step
  • Inside the partial iteration: min(step, ub - iv) == ub - iv

Returns success if the given operation was replaced by a new operation; failure otherwise.

Note: ub is the previous upper bound of the loop (before peeling). insideLoop must be true for min/max ops inside the loop and false for affine.min ops inside the partial iteration. For an explanation of the other parameters, see comment of canonicalizeMinMaxOpInLoop.

Definition at line 198 of file AffineCanonicalizationUtils.cpp.

References mlir::FlatAffineValueConstraints::addBound(), mlir::presburger::IntegerRelation::addInequality(), mlir::FlatLinearValueConstraints::appendDimVar(), mlir::FlatLinearValueConstraints::appendSymbolVar(), canonicalizeMinMaxOp(), mlir::presburger::EQ, and mlir::getConstantIntValue().

Referenced by rewriteAffineOpAfterPeeling().

◆ tileAndFuseProducerOfSlice()

std::optional< scf::SCFFuseProducerOfSliceResult > mlir::scf::tileAndFuseProducerOfSlice ( RewriterBase rewriter,
tensor::ExtractSliceOp  candidateSliceOp,
MutableArrayRef< scf::ForOp >  loops 
)

◆ tileConsumerAndFuseProducerGreedilyUsingSCFForOp()

FailureOr< scf::SCFTileAndFuseResult > mlir::scf::tileConsumerAndFuseProducerGreedilyUsingSCFForOp ( RewriterBase rewriter,
TilingInterface  consumer,
const SCFTileAndFuseOptions options 
)

Method to tile and fuse a sequence of operations, by tiling the consumer and fusing its producers.

Implementation of tile consumer and fuse producer greedily.

Note that this assumes that it is valid to tile+fuse the producer into the innermost tiled loop. Its up to the caller to ensure that the tile sizes provided make this fusion valid.

For example, for the following sequence

%0 =
%1 = linalg.fill ... outs(%0 : ... )
%2 = linalg.matmul ... outs(%1 : ...) ...

it is legal to fuse the fill with the matmul only if the matmul is tiled along the parallel dimensions and not the reduction dimension, i.e. the tile size for the reduction dimension should be 0. The resulting fused transformation is

%1 = scf.for ... iter_args(%arg0 = %0)
%2 = tensor.extract_slice %arg0
%3 = linalg.fill .. outs(%2 : ... )
%4 = linalg.matmul .. outs(%3 : ...)
}

Definition at line 643 of file TileUsingInterface.cpp.

References mlir::detail::enumerate(), mlir::failed(), mlir::Operation::getOperands(), mlir::scf::SCFTileAndFuseResult::loops, mlir::RewriterBase::notifyMatchFailure(), options, mlir::scf::SCFTileAndFuseResult::replacements, tileAndFuseProducerOfSlice(), mlir::scf::SCFTileAndFuseResult::tiledAndFusedOps, and tileUsingSCFForOp().

◆ tileParallelLoop()

std::pair< ParallelOp, ParallelOp > mlir::scf::tileParallelLoop ( ParallelOp  op,
llvm::ArrayRef< int64_t >  tileSizes,
bool  noMinMaxBounds 
)

Tile a parallel loop of the form scf.parallel (i0, i1) = (arg0, arg1) to (arg2, arg3) step (arg4, arg5)

into scf.parallel (i0, i1) = (arg0, arg1) to (arg2, arg3) step (arg4*tileSize[0], arg5*tileSize[1]) scf.parallel (j0, j1) = (0, 0) to (min(tileSize[0], arg2-j0) min(tileSize[1], arg3-j1)) step (arg4, arg5) The old loop is replaced with the new one.

The function returns the resulting ParallelOps, i.e. {outer_loop_op, inner_loop_op}.

into scf.parallel (i0, i1) = (arg0, arg1) to (arg2, arg3) step (arg4*tileSize[0], arg5*tileSize[1]) scf.parallel (j0, j1) = (0, 0) to (min(arg4*tileSize[0], arg2-i0) min(arg5*tileSize[1], arg3-i1)) step (arg4, arg5)

or, when no-min-max-bounds is true, into scf.parallel (i0, i1) = (arg0, arg1) to (arg2, arg3) step (arg4*tileSize[0], arg5*tileSize[1]) scf.parallel (j0, j1) = (0, 0) to (arg4*tileSize[0], arg5*tileSize[1]) step (arg4, arg5) inbound = (j0 * arg4 + i0 < arg2) && (j1 * arg5 + i1 < arg3) scf.if (inbound) ....

where the uses of i0 and i1 in the loop body are replaced by i0 + j0 and i1 + j1.

The old loop is replaced with the new one.

Definition at line 58 of file ParallelLoopTiling.cpp.

References mlir::OpBuilder::create(), mlir::detail::enumerate(), mlir::Block::eraseArguments(), mlir::Block::front(), mlir::AffineMap::get(), mlir::getAffineDimExpr(), mlir::Block::getArgument(), mlir::Builder::getContext(), mlir::Builder::getIndexType(), mlir::Builder::getIntegerType(), mlir::Block::getNumArguments(), mlir::Value::replaceAllUsesExcept(), and mlir::OpBuilder::setInsertionPointToStart().

◆ tileReductionUsingScf()

FailureOr< scf::SCFReductionTilingResult > mlir::scf::tileReductionUsingScf ( RewriterBase b,
PartialReductionOpInterface  op,
ArrayRef< OpFoldResult tileSize 
)

Method to tile a reduction and generate a parallel op within a serial loop.

Each of the partial reductions are calculated in parallel. Then after the loop all the partial reduction are merged into a final reduction. For example for the following sequence

%0 = linalg.generic %in ["parallel", "reduction"]
: tensor<7x9xf32> -> tensor<7xf32>

into:

%0 = linalg.fill ... : tensor<7x4xf32>
%1 = scf.for ... iter_args(%arg0 = %0)
%2 = tensor.extract_slice %arg0 : tensor<7x4xf32> -> tensor<7x?xf32>
%3 = tensor.extract_slice %in : tensor<7x9xf32> -> tensor<7x?xf32>
%4 = linalg.generic %2, %3 ["parallel", "parallel"]
: tensor<7x?xf32> -> tensor<7x?xf32>
%5 = tensor.insert_slice %3, %0[0, 0] : tensor<7x4xf32>
}
%6 = linalg.generic %1 ["parallel", "reduction"]
: tensor<7x4xf32> -> tensor<7xf32>

Definition at line 401 of file TileUsingInterface.cpp.

References mlir::OpBuilder::create(), mlir::OpBuilder::createOrFold(), mlir::detail::enumerate(), mlir::failed(), generateTileLoopNest(), mlir::Builder::getIndexAttr(), mlir::Operation::getResult(), mlir::Operation::getResults(), mlir::getValueOrCreateConstantIndexOp(), mlir::scf::SCFReductionTilingResult::initialOp, mlir::scf::SCFReductionTilingResult::loops, mlir::scf::SCFReductionTilingResult::mergeOp, mlir::RewriterBase::notifyMatchFailure(), mlir::scf::SCFReductionTilingResult::parallelTiledOp, mlir::RewriterBase::replaceOp(), mlir::OpBuilder::setInsertionPoint(), mlir::OpBuilder::setInsertionPointAfter(), updateDestinationOperandsForTiledOp(), and yieldTiledValues().

◆ tileUsingSCFForOp()

FailureOr< scf::SCFTilingResult > mlir::scf::tileUsingSCFForOp ( RewriterBase rewriter,
TilingInterface  op,
const SCFTilingOptions options 
)

◆ yieldReplacementForFusedProducer()

void mlir::scf::yieldReplacementForFusedProducer ( RewriterBase rewriter,
tensor::ExtractSliceOp  sliceOp,
scf::SCFFuseProducerOfSliceResult  fusedProducerInfo,
MutableArrayRef< scf::ForOp >  loops 
)

Reconstruct the fused producer from within the tiled-and-fused code.

Based on the slice of the producer computed in place it is possible that within the loop nest same slice of the producer is computed multiple times. It is in general not possible to recompute the value of the fused producer from the tiled loop code in such cases. For the cases where no slice of the producer is computed in a redundant fashion it is possible to reconstruct the value of the original producer from within the tiled loop. It is upto the caller to ensure that the producer is not computed redundantly within the tiled loop nest. For example, consider

%0 = linalg.matmul ins(...) outs(...) -> tensor<?x?xf32>
%1 = linalg.matmul ins(%0, ..) outs(...) -> tensor<?x?x?f32>

If %1 is tiled in a 2D fashion and %0 is fused with it, the resulting IR is,

%t1_0 = scf.for .... iter_args(%arg0 = ...) {
%t1_1 = scf.for ... iter_args(%arg1 = %arg0) {
...
%t1_2 = linalg.matmul ins(...) outs(...) -> tensor<?x?xf32>
%t1_3 = linalg.matmul ins(%t1_2, ...)
%t1_4 = tensor.insert_slice %t1_3 into %arg1 ...
scf.yield %t1_4
}
scf.yield %t1_1
}

Here t1_2 is the same for all iterations of the inner scf.for. Instead if %1 were tiled only along the rows, the resultant code would be

%t2_0 = scf.for .... iter_args(%arg0 = ...) {
...
%t2_1 = linalg.matmul ins(...) outs(...) -> tensor<?x?xf32>
%t2_2 = linalg.matmul ins(%t2_1, ...)
%t2_3 = tensor.insert_slice %t2_2 into %arg0 ...
scf.yield %t2_3
}

Here there is no intersection in the different slices of t2_1 computed across iterations of the scf.for. In such cases, the value of the original %0 can be reconstructed from within the loop body. This is useful in cases where %0 had other uses as well. If not reconstructed from within the loop body, uses of %0 could not be replaced, making it still live and the fusion immaterial.

Definition at line 612 of file TileUsingInterface.cpp.

References mlir::tensor::getOrCreateDestination(), mlir::succeeded(), updateDestinationOperandsForTiledOp(), and yieldTiledValues().