mlir.dialects._structured_transform_ops_gen¶

Attributes¶

_ods_ir

Classes¶

`ApplyDecomposeTensorPackUnpackPatternsOp`	Collect patterns to decompose linalg.pack and linalg.unpack into e.g.
`ApplyDecomposeTensorPadPatternsOp`	Collect patterns to decompose tensor.pad into e.g. tensor::EmptyOp,
`ApplyEraseUnnecessaryInputsPatternsOp`	Collects patterns that promote inputs to outputs and remove unused inputs of
`ApplyFoldAddIntoDestPatternsOp`	Collects patterns to replace linalg.add when destination passing suffices
`ApplyFoldIntoPackAndUnpackPatternsOp`	Indicates that operations like tensor.pad and tensor.extract_slice should
`ApplyFoldPackUnpackIntoEmptyPatternsOp`	// TODO:
`ApplyFoldUnitExtentDimsViaReshapesPatternsOp`	Collects patterns to fold unit-extent dimensions in operands/results of
`ApplyFoldUnitExtentDimsViaSlicesPatternsOp`	Collects patterns to fold unit-extent dimensions in operands/results of
`ApplyPadVectorizationPatternsOp`	Apply patterns that vectorize tensor.pad.
`ApplyTilingCanonicalizationPatternsOp`	Collects canonicalization patterns relevant to apply after tiling patterns.
`BufferizeToAllocationOp`	This transform bufferizes the targeted operation and materializes the
`ContinuousTileSizesOp`	This transform emits the IR computing the list of (1) exponentially
`ConvertConv2DToImg2ColOp`	Convert linalg.conv_2d_xxx into linalg.generic (for img2col packing)
`ConvertToLoopsOp`	For operations that implement the `TilingInterface`, and implement
`DecomposeInterfaceOp`	TODO
`DecomposeOp`	Decomposes named complex operations, such as higher-dimensional
`DecomposeWinogradOp`	Decompose winograd operations. It will convert filter, input and output
`EliminateLinalgOpAnchoredEmptyTensorsOp`	Try to eliminate all `tensor.empty` op uses that are anchored on a LinalgOp
`FlattenElementwiseLinalgOp`	Flattens the iteration space and (applicable) operands of elementwise
`FuseIntoContainingOp`	Fuses the `producer_op` into the `containing_op`.
`FuseOp`	Tiles the operations pointed to by the target handle and fuses their
`GeneralizeOp`	Transforms a named structured operation into the generic form with the
`HoistPadBuildPackingLoopNestOp`	Helper transform used to hoist a tensor.pad target operation. This operation
`HoistPadOp`	Hoist the tensor.pad target operation by at most the given number of loops.
`HoistRedundantVectorBroadcastsOp`	Hoist vector.extract / vector.broadcasts pairs out of immediately
`HoistRedundantVectorTransfersOp`	Hoist vector.transfer_read / vector.transfer_write pairs out of immediately
`InsertSliceToCopyOp`	Targeted rewrite of an tensor.insert_slice to linalg.copy.
`InterchangeOp`	Interchanges the iterators of the operations pointed to by the target handle
`LinalgCopyToMemrefOp`	Targeted rewrite of a linalg.copy on memrefs to a memref.copy.
`LowerPackOp`	Rewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
`LowerUnPackOp`	Lower a linalg.unpack into empty + linalg.transpose + tensor.collapse_shape +
`MapCopyToThreadsOp`	Targeted mapping of a linalg.copy / tensor.pad operation on tensors to a GPU
`MatchOp`	Match op with the specified constraints, within the target op.
`MultiTileSizesOp`	Emits the IR computing the tile sizes `s1` and `s2` such that:
`PackGreedilyOp`	Target a Linalg op and rewrite it into packed LinalgOp form by trying to
`PackOp`	Pack a LinalgOp by applying a data tiling transformation on the op and
`PackTransposeOp`	Apply a transposition to a single `linalg.pack` (resp. `linalg.unpack`) and
`PadOp`	Pads the operations pointed to by the target handle using the options
`PadTilingInterfaceOp`	Pads the iteration domain of the operations pointed to by the target
`PromoteOp`	Promotes the specified operands of the target into a separate memory buffer.
`PromoteTensorOp`	Requests that a tensor value lives in a specific memory space for its
`ReplaceOp`	Replace all `target` payload ops with the single op that is contained in
`RewriteInDestinationPassingStyleOp`	Rewrite a supported tensor operation that is not in destination-passing style
`ScalarizeOp`	Indicates that ops of a specific kind in the given function should be
`SpecializeOp`	Transforms a generic operation into the equivalent named form.
`SplitOp`	Splits the given `target` op into two or more complementary
`SplitReductionOp`	Indicates that the given `target` op should be transformed with the
`TileReductionUsingForOp`	Indicates that the given `target` op should be transformed with the
`TileReductionUsingForallOp`	Tile a PartialReductionOpInterface op to a tiled `scf.forall` doing
`TileUsingForOp`	Indicates that the given `target` op should be tiled with the given sizes.
`TileUsingForallOp`	Tile a TilingInterface op to a tiled `scf.forall`.
`TransposeConv2DOp`	Convert linalg.conv_2d_nhwc_fhwc into linalg.conv_2d_nhwc_hwcf by introducing
`TransposeMatmulOp`	Convert Linalg matmul ops to transposed variants.
`VectorizeChildrenAndApplyPatternsOp`	Vectorizes all children contained in the given `target` using the
`VectorizeOp`	Vectorize the target ops, which must be Linalg ops.
`WinogradConv2DOp`	Winograd Conv2D algorithm will convert linalg Conv2D operation into batched

Functions¶

`apply_patterns_linalg_decompose_pack_unpack`(...)
`apply_patterns_linalg_decompose_pad`(...)
`apply_patterns_linalg_erase_unnecessary_inputs`(...)
`apply_patterns_linalg_fold_add_into_dest`(...)
`apply_patterns_tensor_fold_into_pack_and_unpack`(...)
`apply_patterns_linalg_fold_pack_unpack_into_empty`(...)
`apply_patterns_linalg_fold_unit_extent_dims_via_reshapes`(...)
`apply_patterns_linalg_fold_unit_extent_dims_via_slices`(...)
`apply_patterns_linalg_pad_vectorization`(...)
`apply_patterns_linalg_tiling_canonicalization`(...)
`structured_bufferize_to_allocation`(→ _ods_ir)
`structured_continuous_tile_sizes`(→ _ods_ir)
`structured_convert_conv2d_to_img2col`(→ _ods_ir)
`structured_convert_to_loops`(→ _ods_ir)
`structured_decompose_interface`(→ _ods_ir)
`structured_decompose`(→ _ods_ir)
`structured_decompose_winograd_op`(→ _ods_ir)
`structured_eliminate_empty_tensors`(...)
`structured_flatten_elementwise`(→ _ods_ir)
`structured_fuse_into_containing_op`(→ _ods_ir)
`structured_fuse`(→ Union[_ods_ir, _ods_ir, FuseOp])
`structured_generalize`(→ _ods_ir)
`structured_hoist_pad_build_packing_loop_nest`(→ _ods_ir)
`structured_hoist_pad`(→ _ods_ir)
`structured_hoist_redundant_vector_broadcasts`(→ _ods_ir)
`structured_hoist_redundant_vector_transfers`(→ _ods_ir)
`structured_insert_slice_to_copy`(→ _ods_ir)
`structured_interchange`(→ _ods_ir)
`structured_linalg_copy_to_memref`(→ _ods_ir)
`structured_lower_pack`(→ _ods_ir)
`structured_lower_unpack`(→ _ods_ir)
`structured_gpu_map_copy_to_threads`(→ _ods_ir)
`structured_match`(→ _ods_ir)
`structured_multitile_sizes`(→ _ods_ir)
`structured_pack_greedily`(→ _ods_ir)
`structured_pack`(→ _ods_ir)
`structured_pack_transpose`(→ _ods_ir)
`structured_pad`(→ _ods_ir)
`structured_pad_tiling_interface`(→ _ods_ir)
`structured_promote`(→ _ods_ir)
`structured_promote_tensor`(→ _ods_ir)
`structured_replace`(→ _ods_ir)
`structured_rewrite_in_destination_passing_style`(→ _ods_ir)
`structured_scalarize`(→ _ods_ir)
`structured_specialize`(→ _ods_ir)
`structured_split`(→ _ods_ir)
`structured_split_reduction`(→ _ods_ir)
`structured_tile_reduction_using_for`(→ Union[_ods_ir, ...)
`structured_tile_reduction_using_forall`(...)
`structured_tile_using_for`(→ Union[_ods_ir, _ods_ir, ...)
`structured_tile_using_forall`(→ _ods_ir)
`structured_transpose_conv2d`(→ _ods_ir)
`structured_transpose_matmul`(→ _ods_ir)
`structured_vectorize_children_and_apply_patterns`(→ _ods_ir)
`structured_vectorize`(→ VectorizeOp)
`structured_winograd_conv2d`(→ _ods_ir)

Module Contents¶

mlir.dialects._structured_transform_ops_gen._ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ApplyDecomposeTensorPackUnpackPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collect patterns to decompose linalg.pack and linalg.unpack into e.g. tensor::PadOp, linalg::transposeOp Ops. Requires all outer dims to be unit.

OPERATION_NAME = 'transform.apply_patterns.linalg.decompose_pack_unpack'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_decompose_pack_unpack(*, loc=None, ip=None) → ApplyDecomposeTensorPackUnpackPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyDecomposeTensorPadPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collect patterns to decompose tensor.pad into e.g. tensor::EmptyOp, linalg::FillOp and tensor::InsertSliceOp.

OPERATION_NAME = 'transform.apply_patterns.linalg.decompose_pad'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_decompose_pad(*, loc=None, ip=None) → ApplyDecomposeTensorPadPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyEraseUnnecessaryInputsPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collects patterns that promote inputs to outputs and remove unused inputs of linalg.generic ops.

OPERATION_NAME = 'transform.apply_patterns.linalg.erase_unnecessary_inputs'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_erase_unnecessary_inputs(*, loc=None, ip=None) → ApplyEraseUnnecessaryInputsPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyFoldAddIntoDestPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collects patterns to replace linalg.add when destination passing suffices for achieving the sum.

OPERATION_NAME = 'transform.apply_patterns.linalg.fold_add_into_dest'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_add_into_dest(*, loc=None, ip=None) → ApplyFoldAddIntoDestPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyFoldIntoPackAndUnpackPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Indicates that operations like tensor.pad and tensor.extract_slice should be folded into linalg.pack and linalg.unpack operations, respectively.

OPERATION_NAME = 'transform.apply_patterns.tensor.fold_into_pack_and_unpack'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_tensor_fold_into_pack_and_unpack(*, loc=None, ip=None) → ApplyFoldIntoPackAndUnpackPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyFoldPackUnpackIntoEmptyPatternsOp(*, fold_single_use_only=None, loc=None, ip=None)¶

Bases: _ods_ir

// TODO:

OPERATION_NAME = 'transform.apply_patterns.linalg.fold_pack_unpack_into_empty'¶

_ODS_REGIONS = (0, True)¶

fold_single_use_only() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_pack_unpack_into_empty(*, fold_single_use_only=None, loc=None, ip=None) → ApplyFoldPackUnpackIntoEmptyPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyFoldUnitExtentDimsViaReshapesPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collects patterns to fold unit-extent dimensions in operands/results of linalg ops on tensors via reassociative reshape ops.

OPERATION_NAME = 'transform.apply_patterns.linalg.fold_unit_extent_dims_via_reshapes'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_unit_extent_dims_via_reshapes(*, loc=None, ip=None) → ApplyFoldUnitExtentDimsViaReshapesPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyFoldUnitExtentDimsViaSlicesPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collects patterns to fold unit-extent dimensions in operands/results of linalg ops on tensors via rank-reducing slices.

OPERATION_NAME = 'transform.apply_patterns.linalg.fold_unit_extent_dims_via_slices'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_unit_extent_dims_via_slices(*, loc=None, ip=None) → ApplyFoldUnitExtentDimsViaSlicesPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyPadVectorizationPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Apply patterns that vectorize tensor.pad.

These patterns rewrite tensor.pad Ops using vector.transfer_read and vector.transfer_write operations. This is done either by:

#. Folding tensor.pad with an existing vector.transfer_read / vector.transfer_write Op (generated prior to running these patterns). #. Rewriting it (when matched together with q tensor.insert_slice consumer Op) as a vector.transfer_read + vector.transfer_write pair.

In both cases, these patterns look at producers and consumers for the matched tensor.pad Op to find opportunities for vectorization.

OPERATION_NAME = 'transform.apply_patterns.linalg.pad_vectorization'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_pad_vectorization(*, loc=None, ip=None) → ApplyPadVectorizationPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.ApplyTilingCanonicalizationPatternsOp(*, loc=None, ip=None)¶

Bases: _ods_ir

Collects canonicalization patterns relevant to apply after tiling patterns.

OPERATION_NAME = 'transform.apply_patterns.linalg.tiling_canonicalization'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_tiling_canonicalization(*, loc=None, ip=None) → ApplyTilingCanonicalizationPatternsOp¶

class mlir.dialects._structured_transform_ops_gen.BufferizeToAllocationOp(target, *, memory_space=None, memcpy_op=None, alloc_op=None, bufferize_destination_only=None, emit_dealloc=None, results=None, loc=None, ip=None)¶

Bases: _ods_ir

This transform bufferizes the targeted operation and materializes the result in a new allocation. It replaces all original uses of the target result with the newly allocated buffer, wrapped in a bufferization.to_tensor op. It returns a handle to the newly allocated buffer. Furthermore, it returns a handle that is mapped to all newly created ops.

Only bufferizable ops are that bufferize to a memory write or have an aliasing OpOperand (and do not themselves bufferize to an allocation) are supported. They are bufferized using their BufferizableOpInterface implementation. E.g.:

%0 = tensor.insert %f into %dest[%pos] : tensor<10xf32>

Is bufferized to:

%alloc = memref.alloc() : memref<10xf32>
bufferization.materialize_in_destination %dest in %alloc
memref.store %f, %alloc[%pos] : memref<10xf32>
%0 = bufferization.to_tensor %alloc restrict writable : memref<10xf32>

Selected ops that bufferize to an allocation (or need special handling) are also supported:

tensor.pad is lowered to an allocation, followed by a linalg.fill and

and a buffer copy (all on memrefs). * vector.mask is bufferized together with its region. The allocation is placed in front of the vector.mask op.

An optional memory space attribute can be specified for the materialized buffer allocation.

If a memory copy is needed, a “bufferization.materialize_in_destination” is used when possible. This is an op with tensor semantics that will bufferize to a memory copy later. Which concrete op will be used for the memory copy is up to the bufferization framework. Alternatively, a custom memcpy op can be specified via memcpy_op. Currently supported are “memref.copy” and “linalg.copy”. In that case, the source of each memcpy must not have a custom memory space. Furthermore, because the future buffer layout unknown for a given tensor, a fully dynamic layout is assumed for best compatibility. Users should use “bufferization.materialize_in_destination” when possible.

“memref.alloc” is used for new buffer allocations. The buffer is deallocated at the end of the block if the “emit_dealloc” attribute is present. If this attribute is not present, the allocated memory will be leaked. However, running the -buffer-deallocation-pipeline after all bufferization is done will properly insert the corresponding deallocation(s). Custom allocation ops can be specified via alloc_op. Currently supported are “memref.alloc” and “memref.alloca”. In case of a “memref.alloca”, the buffer is not deallocated.

If bufferize_destination_only is set, only the destination operands of the op are bufferized to a new memory allocation, but not the op itself.

Return modes¶

This operation consumes the target handle and produces the allocated_buffer and new_ops handles. It always succeeds.

OPERATION_NAME = 'transform.structured.bufferize_to_allocation'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

memory_space() → _ods_ir | None¶

memcpy_op() → _ods_ir¶

alloc_op() → _ods_ir¶

bufferize_destination_only() → bool¶

emit_dealloc() → bool¶

allocated_buffer() → _ods_ir¶

new_ops() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_bufferize_to_allocation(target, *, memory_space=None, memcpy_op=None, alloc_op=None, bufferize_destination_only=None, emit_dealloc=None, results=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ContinuousTileSizesOp(tile_sizes, chunk_sizes, target, dimension, target_size, *, loc=None, ip=None)¶

Bases: _ods_ir

This transform emits the IR computing the list of (1) exponentially diminishing tile sizes that are powers of 2; and (2) the corresponding chunk-sizes the target op should be split into along the given dimension.

For example, for target_size 9, and dimension 0 for the following linalg op as target

%0 = linalg.matmul  ins(%arg0, %arg1: tensor<25x34xf32>, tensor<34x25xf32>)
                outs(%arg2: tensor<25x25xf32>)

the first result tile_sizes will be a list of diminishing tile sizes 9, 4, 2, 1; and the second result will be a list of chunk sizes 18, 4, 2, 1 that the corresponding dimension should be split into.

After the target op has been split along the given dimension (for example using multiway split), each chunk can be tiled with the corresponding tile size in the tile_sizes list generated as a result of this op.

Specifying the output type as !transform.param will cause tile_sizes and chunk_sizes to be computed statically and not dynamically.

OPERATION_NAME = 'transform.structured.continuous_tile_sizes'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

dimension() → _ods_ir¶

target_size() → _ods_ir¶

tile_sizes() → _ods_ir¶

chunk_sizes() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_continuous_tile_sizes(tile_sizes, chunk_sizes, target, dimension, target_size, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ConvertConv2DToImg2ColOp(img2col_tensor, transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Convert linalg.conv_2d_xxx into linalg.generic (for img2col packing) and linalg.matmul.

A convolution operation can be written as a matrix-matrix multiplication by unfolding the cross-correlation between input and filter and explicitly copy overlapped sliding window inputs.

Consider 2D input X with single channel input and output and 2x2 filter W:

[x(0, 0)  , x(0, 1)  , ...,   x(0, n)  ]
[x(1, 0)  , x(1, 1)  , ...,   x(1, n)  ]
[.        ,  .       ,.   ,      .     ]            [w(0, 0), w(0, 1)]
[.        ,  .       , .  ,      .     ]    (conv)  [w(1, 0), w(1, 1)]
[.        ,  .       ,   .,      .     ]
[x(n-1, 0), x(n-1, 1), ..., x(n-1, n-1)]

The packed input data (img2col) is a matrix with |rows| = output spatial size, |columns| = filter spatial size. To compute the output Y(i, j) we need to calculate the dot product between filter window at input X(x, y)) and the filter which will look like the following where r.h.s is the img2col matrix and l.h.s is the flattned filter:

[x(0,0), x(0,1), x(1,0), x(1,1)]
[x(0,1), x(1,1), x(0,2), x(1,2)] (matmul) [w(0,0), w(0,1), w(1,0), w(1,1)]
[x(0,1), x(1,1), x(0,2), x(1,2)]
[   .  ,    .  ,    .  ,    .  ]

In general for 2D case with (N, H, W, C) input and (Kh, Kw, C, D) filter and output (N, Ho, Wo, D) the convolution is the following matrix-matrix multiplication (Ho x Wo, Kh x Kw x C) * (Kh x Kw x C, D) for each input in the N input. For the case where N > 1 its a batched matrxi-matrix multplication.

Returns two handles:

One on the operation that produces the img2col tensor.
One on the final operation of the sequence that replaces the original

convolution.

Return modes:¶

Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.

OPERATION_NAME = 'transform.structured.convert_conv2d_to_img2col'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

img2col_tensor() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_convert_conv2d_to_img2col(img2col_tensor, transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ConvertToLoopsOp(result, target, *, loc=None, ip=None)¶

Bases: _ods_ir

For operations that implement the TilingInterface, and implement the generateScalarImplementation method, lowers the operation to loops. The return handle points to all generated loops. Fails if the payload ops cannot be lowered to loops.

OPERATION_NAME = 'transform.structured.convert_to_loops'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

result() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_convert_to_loops(result, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.DecomposeInterfaceOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

TODO

OPERATION_NAME = 'transform.structured.decompose_interface'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_decompose_interface(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.DecomposeOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Decomposes named complex operations, such as higher-dimensional (depthwise) convolutions, into combinations of lower-dimensional equivalents when possible.

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the target handle decompose properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced computational operations, which can be empty.

OPERATION_NAME = 'transform.structured.decompose'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_decompose(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.DecomposeWinogradOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Decompose winograd operations. It will convert filter, input and output transform operations into a combination of scf, tensor, and linalg equivalent operations. Before applying this transform operations, users need to tile winograd transform operations into supported sizes.

Return modes:¶

This operation fails if target is unsupported. Otherwise, the operation succeeds and returns a handle of the sequence that replaces the original operations.

OPERATION_NAME = 'transform.structured.decompose_winograd_op'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_decompose_winograd_op(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.EliminateLinalgOpAnchoredEmptyTensorsOp(target, *, loc=None, ip=None)¶

Bases: _ods_ir

Try to eliminate all tensor.empty op uses that are anchored on a LinalgOp within the targeted op.

This op is similar to bufferization.eliminate_empty_tensors, but specific to LinalgOps.

tensor.empty ops cannot be bufferized. They can either be converted to bufferization.alloc_tensor or replaced with another tensor (via this transform). tensor.empty does not specify the contents of the returned tensor so their results can be replaced with arbitrary tensor values as long as the dimensions match.

This transform looks for tensor.empty ops where the SSA use-def chain of the result ends in a supported LinalgOp (always following the aliasing OpOperand/OpResult chain). The following LinalgOps are supported:

Only parallel iterator types.
The use-def chain ends in an input operand of the LinalgOp.
The LinalgOp has an unused output operand with the same shape and

indexing map.

Example:

%0 = tensor.empty()
%1 = linalg.matmul ins(...) outs(%0)
%2 = linalg.generic ins(%1) outs(%dest) {
  ^bb0(%in: f32, %out: f32):
  // out not used
}

Is rewritten with:

%0 = tensor.empty()
%1 = linalg.matmul ins(...) outs(%dest)
%2 = linalg.generic ins(%0) outs(%1) {
  ^bb0(%in: f32, %out: f32):
  // Use %out instead of %in
}

After this transformation, the “ins” operand has no uses inside the body of the LinalgOp and can be folded away with existing cleanup patterns. Afterwards, the tensor::EmptyOp can also fold away, so that the example can bufferize without an allocation (in the absence of other conflicts).

Return modes¶

This transform reads the target handle and modifies the payload. It does not produce any handle.

OPERATION_NAME = 'transform.structured.eliminate_empty_tensors'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_eliminate_empty_tensors(target, *, loc=None, ip=None) → EliminateLinalgOpAnchoredEmptyTensorsOp¶

class mlir.dialects._structured_transform_ops_gen.FlattenElementwiseLinalgOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Flattens the iteration space and (applicable) operands of elementwise linalg ops to a single dimension.

Returns one handle:

Flattened linalg operation.

Return modes:¶

Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.

OPERATION_NAME = 'transform.structured.flatten_elementwise'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_flatten_elementwise(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.FuseIntoContainingOp(fused_op, new_containing_op, producer_op, containing_op, *, loc=None, ip=None)¶

Bases: _ods_ir

Fuses the producer_op into the containing_op. Returns a handle to the fused ops and the new_containing_op.

The producer is typically a slice of a tileable op (i.e., implements TilingInterface). In that case, this transform computes the accessed producer slice inside of the containing op (“tile and fuse”) and if required, creates a new containing op with outputs from the fused producer. Otherwise, the entire producer is cloned inside the containing op (“clone and fuse”).

The containing op handle must be associated with exactly one payload op. The producer op handle may be associated with multiple payload ops. This transform fuses producers one-by-one, always picking an unspecified producer that has at least one use inside the containing op among the producers. A producer can be listed multiple times in the handle.

Note: If a producer has multiple uses inside the containing op, it is currently tiled and/or cloned multiple times into the containing op. TODO: Reuse already fused OpResults instead of tiling/cloning a second time when possible. Fuse producers according to a topological sorting to achieve the largest amount of reuse.

Return modes¶

If at least one producer could not be fused, this operation produces a silenceable failure. This is the case when tiling fails or when no producer op could be found among the remaining producers that has at least one use within the containing op. I.e., “producers” that are not consumed within the containing op are rejected by this operation.

This operation consumes the producer handle. This operation only reads the containing op handle.

OPERATION_NAME = 'transform.structured.fuse_into_containing_op'¶

_ODS_REGIONS = (0, True)¶

producer_op() → _ods_ir¶

containing_op() → _ods_ir¶

fused_op() → _ods_ir¶

new_containing_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_fuse_into_containing_op(fused_op, new_containing_op, producer_op, containing_op, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.FuseOp(transformed, loops, target, tile_sizes, tile_interchange, *, static_tile_sizes=None, static_tile_interchange=None, apply_cleanup=None, use_forall=None, loc=None, ip=None)¶

Bases: _ods_ir

Tiles the operations pointed to by the target handle and fuses their producers greedily using the options provided as attributes. Tile sizes and loop interchange permutation can be provided as either static attributes or dynamic values (transform parameters or payload handles).

If apply_cleanup is true then slice canonicalization is applied between fusion steps. If use_forall is true then tiling method generates a scf.forall loop instead of scf.for loops.

OPERATION_NAME = 'transform.structured.fuse'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

tile_sizes() → _ods_ir¶

tile_interchange() → _ods_ir¶

static_tile_sizes() → _ods_ir | None¶

static_tile_interchange() → _ods_ir | None¶

apply_cleanup() → bool¶

use_forall() → bool¶

transformed() → _ods_ir¶

loops() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_fuse(transformed, loops, target, tile_sizes, tile_interchange, *, static_tile_sizes=None, static_tile_interchange=None, apply_cleanup=None, use_forall=None, loc=None, ip=None) → _ods_ir | _ods_ir | FuseOp¶

class mlir.dialects._structured_transform_ops_gen.GeneralizeOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Transforms a named structured operation into the generic form with the explicit attached region.

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the target handle generalize properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced equivalent generic operations, which can be empty or contain the original ops if they were already in generic form.

OPERATION_NAME = 'transform.structured.generalize'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_generalize(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.HoistPadBuildPackingLoopNestOp(packing_loop, target, loop, *, transpose=None, loc=None, ip=None)¶

Bases: _ods_ir

Helper transform used to hoist a tensor.pad target operation. This operation creates the packing loop nest required by the hoist_pad operation and makes that functionality available independently.

TODO: In the future, we should consider rewriting as a linalg.pack after hoisting since this abstraction is now available.

Return modes¶

This operation ignores non-tensor.pad ops and drops them in the result. If any non-tensor.pad is passed, the transform emits a silenceable failure.

The return handle points to only the subset of successfully created packing loop nests, which can be empty.

OPERATION_NAME = 'transform.structured.hoist_pad.build_packing_loop_nest'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

loop() → _ods_ir¶

transpose() → _ods_ir¶

packing_loop() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_hoist_pad_build_packing_loop_nest(packing_loop, target, loop, *, transpose=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.HoistPadOp(transformed, target, num_loops, *, transpose=None, loc=None, ip=None)¶

Bases: _ods_ir

Hoist the tensor.pad target operation by at most the given number of loops. Optionally apply the transpose attribute to the inner dimensions.

TODO: In the future, we should consider rewriting as a linalg.pack after hoisting since this abstraction is now available. TODO: Maybe also return the linalg.generic transpose created at some point.

Return modes¶

This operation ignores non-tensor.pad ops and drops them in the result. If any non-tensor.pad is passed, the transform emits a silenceable failure.

If all the operations referred to by the target handle padproperly, the transform succeeds. Otherwise the transform produces a silenceable failure.

The return handle points to only the subset of successfully hoisted tensor.pad operations, which can be empty.

OPERATION_NAME = 'transform.structured.hoist_pad'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

num_loops() → _ods_ir¶

transpose() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_hoist_pad(transformed, target, num_loops, *, transpose=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.HoistRedundantVectorBroadcastsOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Hoist vector.extract / vector.broadcasts pairs out of immediately enclosing scf::ForOp iteratively.

Return modes:¶

The operation always succeeds and returns a handle to the transformed function op.

OPERATION_NAME = 'transform.structured.hoist_redundant_vector_broadcasts'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_hoist_redundant_vector_broadcasts(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.HoistRedundantVectorTransfersOp(transformed, target, *, verify_non_zero_trip=None, loc=None, ip=None)¶

Bases: _ods_ir

Hoist vector.transfer_read / vector.transfer_write pairs out of immediately enclosing scf::ForOp iteratively, if the following conditions are true:

The 2 ops access the same memref with the same indices.
All operands are invariant under the enclosing scf::ForOp.

#. No uses of the memref either dominate the transfer_read or are dominated by the transfer_write (i.e. no aliasing between the write and the read across the loop)

WARNING: This hoisting does not model parallelism and is generally incorrect when used on distributed loops with memref semantics! TODO: obsolete and should be retired.

Return modes:¶

The operation always succeeds and returns a handle to the transformed function op.

OPERATION_NAME = 'transform.structured.hoist_redundant_vector_transfers'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

verify_non_zero_trip() → bool¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_hoist_redundant_vector_transfers(transformed, target, *, verify_non_zero_trip=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.InsertSliceToCopyOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Targeted rewrite of an tensor.insert_slice to linalg.copy. This is useful to materialize copies explicitly before bufferization and transform them, avoiding the need to rediscover them after bufferization.

If the insert_slice source is already a linalg.copy, only return the source op (i.e. do not create an additional linalg.copy op).

Return modes:¶

The operation always succeeds and returns a handle to the relevant linalg.copy op.

OPERATION_NAME = 'transform.structured.insert_slice_to_copy'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_insert_slice_to_copy(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.InterchangeOp(transformed, target, *, iterator_interchange=None, loc=None, ip=None)¶

Bases: _ods_ir

Interchanges the iterators of the operations pointed to by the target handle using the iterator interchange attribute.

Return modes¶

This operation ignores non-linalg::Generic ops and drops them in the return. This operation fails if the interchange attribute is invalid. If all the operations referred to by the target handle interchange properly, the transform succeeds. If any interchange fails, the transform produces a definite failure. The return handle points to only the subset of successfully produced interchanged operations, which can be empty.

OPERATION_NAME = 'transform.structured.interchange'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

iterator_interchange() → _ods_ir | None¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_interchange(transformed, target, *, iterator_interchange=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.LinalgCopyToMemrefOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Targeted rewrite of a linalg.copy on memrefs to a memref.copy. This is useful when bufferizing copies to a linalg.copy, later applying some transformations, and then rewriting the copy into a memref.copy. If the element types of the source and destination differ, or if the source is a scalar, the transform produces a silenceable failure.

OPERATION_NAME = 'transform.structured.linalg_copy_to_memref'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_linalg_copy_to_memref(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.LowerPackOp(pad_op, expand_shape_op, transpose_op, target, *, lowerPadLikeWithInsertSlice=None, loc=None, ip=None)¶

Bases: _ods_ir

Rewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose.

Return modes¶

This operation ignores non-pack ops and drops them in the return. This operation produces a silenceable failure if the rewrite fails for any reason. If all the operations referred to by the target are rewritten, the transform succeeds. Return handles to the newly produced pad, expand_shape and transpose ops.

OPERATION_NAME = 'transform.structured.lower_pack'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

lowerPadLikeWithInsertSlice() → _ods_ir¶

pad_op() → _ods_ir¶

expand_shape_op() → _ods_ir¶

transpose_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_lower_pack(pad_op, expand_shape_op, transpose_op, target, *, lower_pad_like_with_insert_slice=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.LowerUnPackOp(empty_op, transpose_op, collapse_shape_op, extract_slice_op, target, *, lowerUnpadLikeWithExtractSlice=None, loc=None, ip=None)¶

Bases: _ods_ir

Lower a linalg.unpack into empty + linalg.transpose + tensor.collapse_shape + tensor.extract_slice.

Return modes¶

This operation ignores non-unpack ops and drops them in the return. This operation produces a silenceable failure if the rewrite fails for any reason. If all the operations referred to by the target are rewritten, the transform succeeds. Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.

OPERATION_NAME = 'transform.structured.lower_unpack'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

lowerUnpadLikeWithExtractSlice() → _ods_ir¶

empty_op() → _ods_ir¶

transpose_op() → _ods_ir¶

collapse_shape_op() → _ods_ir¶

extract_slice_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_lower_unpack(empty_op, transpose_op, collapse_shape_op, extract_slice_op, target, *, lower_unpad_like_with_extract_slice=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.MapCopyToThreadsOp(forall_op, tiled_op, target, total_num_threads, desired_bit_alignment, *, loc=None, ip=None)¶

Bases: _ods_ir

Targeted mapping of a linalg.copy / tensor.pad operation on tensors to a GPU thread mapping.

This operation implements a greedy heuristic that determines a good distribution of threads to break down the copy/pad operation into. The heuristic is driven by considerations related to the underlying architecture for which good high-level decisions are needed assuming certain hardware features. Relevant features are exposed via first-class attributes to control the behavior of the transformation at a high level.

For now, a single heuristic is implemented and can be extended on a per-need basis.

Return modes¶

This operation fails definitely if there is an unsupported op (i.e., not linalg.copy / tensor.pad) among the targeted op. Otherwise, the operation always succeeds and returns a handle to the relevant tiled linalg.copy / tensor.pad op and the enclosing scf.forall op.

OPERATION_NAME = 'transform.structured.gpu.map_copy_to_threads'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

total_num_threads() → _ods_ir¶

desired_bit_alignment() → _ods_ir¶

forall_op() → _ods_ir¶

tiled_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_gpu_map_copy_to_threads(forall_op, tiled_op, target, total_num_threads, desired_bit_alignment, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.MatchOp(results_, target, *, ops=None, interface=None, op_attrs=None, filter_result_type=None, filter_operand_types=None, loc=None, ip=None)¶

Bases: _ods_ir

Match op with the specified constraints, within the target op.

The following constraints are supported:

interface: an optional MatchInterfaceEnum specifying an enum

representation for an interface to target. * ops: an optional StrArrayAttr specifying the concrete name of an op. Multiple names can be specified. Matched ops must have one of specified names. * attribute: the matched op must have all specified attributes (with their specified values). * filter_result_type: the matched op must return exactly this one type. * filter_operand_types: all the operands of the matched op must must be of this type. If more than a type is specified, then the length of the list must be equal to the number of operands in the matched op, and the match will succeed only if the operand types match all the types in the list in the order in which they are specified.

Note: Only ops that satisfy all specified constraints are matched.

TODO: Extend with regions to allow a limited form of constraints.

Return modes¶

This op traverses the ops nested under target and returns the handles to all the operations that match the requirements.

This op fails if the target is not a handle to exactly one operation. Otherwise it succeeds.

This operation does not consume the target handle and produces new handles: it is a navigation op.

OPERATION_NAME = 'transform.structured.match'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

ops() → _ods_ir | None¶

interface() → _ods_ir | None¶

op_attrs() → _ods_ir | None¶

filter_result_type() → _ods_ir | None¶

filter_operand_types() → _ods_ir | None¶

results_() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_match(results_, target, *, ops=None, interface=None, op_attrs=None, filter_result_type=None, filter_operand_types=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.MultiTileSizesOp(low_size, high_size, split_point, target, dimension, target_size, *, divisor=None, loc=None, ip=None)¶

Bases: _ods_ir

Emits the IR computing the tile sizes s1 and s2 such that:

there exists a combination of n tiles of size s1 and m tiles of

size s2 that covers the entirety of the iteration space dimension of the target structured op; * s1, s2 is less than or equal to target_size; * s1 and s2 are divisible by `divisor.

For example, for a dimension of size 54 with target size 12 and divisor 2, this can emit the IR computing the tile size 10, used for 3 tiles, and 12, used for 2 tiles, totally 10*3 + 12*2 = 54. Note that when the divisor does not divide the original dimension size, it is impossible to compute such tile sizes. An assertion is emitted to guard against this in the dynamic case.

Expects the target size and the divisor to be strictly positive. Folds the IR as much as possible, normally obtaining constant sizes and numbers of tiles for a statically known dimension.

This does not consume the target handle and produces three handles each pointing to single-result index-typed operations (which may be arithmetic constant operations) defining the two respective tile sizes and the product of the first tile size with the number of tiles of that size (useful for splitting the iteration space).

This operation composes with the regular tiling when applied per-dimension:

%sz1, %sz2, %split = structured.multitile_sizes %target
                     { target_size = 10, dimension = 1 }
                   : !transform.any_op, !transform.param<i64>,
                     !transform.param<i64>, !transform.param<i64>
%handles = structured.split %target after %split { dimension = 1 }
            : !transform.any_op, !transform.param<i64>
%low, %high = transform.split_handle %handles : (!transform.any_op)
                  -> (!transform.any_op, !transform.any_op)
%tiled_low, %loop1 = structured.tile_using_for %low [0, %sz1]
                   : (!transform.any_op, !transform.param<i64>)
                  -> (!transform.any_op, !transform.any_op)
%tiled_high, %loop2 = structured.tile_using_for %high [0, %sz2]
                    : (!transform.any_op, !transform.param<i64>)
                   -> (!transform.any_op, !transform.any_op)
%common = merge_handles %tiled_low, %tiled_high : !transform.any_op

%sz3, %sz4, %split = structured.multitile_size %target
                     { target_size = 42, dimension = 0 }
                   : !transform.any_op, !transform.any_op,
                     !transform.any_op, !transform.any_op
%sz3r, %sz4r, %splitr = replicate num(%common) %sz3, %sz4, %splitr
         : !transform.any_op, !transform.any_op, !transform.any_op
structured.split %common after %splitr { dimension = 0 }
         : !transform.any_op, !transform.any_op
// ...

OPERATION_NAME = 'transform.structured.multitile_sizes'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

dimension() → _ods_ir¶

target_size() → _ods_ir¶

divisor() → _ods_ir¶

low_size() → _ods_ir¶

high_size() → _ods_ir¶

split_point() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_multitile_sizes(low_size, high_size, split_point, target, dimension, target_size, *, divisor=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PackGreedilyOp(packed_op, target, matmul_packed_sizes, *, static_matmul_packed_sizes=None, matmul_padded_sizes_next_multiple_of=None, matmul_inner_dims_order=None, loc=None, ip=None)¶

Bases: _ods_ir

Target a Linalg op and rewrite it into packed LinalgOp form by trying to infer whether a known suboperation is embedded

Different packing strategies are applied in order, when one applies successfully, the transform returns:

#. Matmul packing: Try to infer a matmul operation embedded in the target op. Specifically, this looks for 2 parallel dimensions that participate in an outer-product and 1 reduction dimension. These dimensions are referred as (m, n, k) to match canonical matmul terminology.The packed sizes for (m, n, k) are specified by matmul_packed_sizes and the optional matmul_padded_sizes_next_multiple_of. When an entry matmul_packed_sizes[i] is non-0, the corresponding dimension is packed by matmul_packed_sizes[i]. Otherwise, the dimension is merely padded to the next multiple of matmul_padded_sizes_next_multiple_of[i].``matmul_padded_sizes_next_multiple_of`` is optional and is expected to either be empty or of size 3, matching the size of matmul_packed_sizes. For each individual element of matmul_packed_sizes and matmul_padded_sizes_next_multiple_of, only one of them is allowed to be non-zero.The ordering of the packed dimensions (mm, nn, kk) is specified by the matmul_inner_dims_order attribute.

Packing occurs as follows:

Find the dimensions to pack according to the strategy.
The target is converted to linalg.generic form.

#. An interchange transform is applied to isolate the dimensions to pack as the most minor indexing dimensions of the linalg.generic. The most minor dimensions are themselves ordered according to inner_dims_order. #. An elementwise traversal of matmul_packed_sizes and matmul_padded_sizes_next_multiple_of is performed and for each dimension d, either pack to matmul_packed_sizes[d] or pad to the matmul_padded_sizes_next_multiple_of[d]. #. Packing/padding is performed by the amounts determined in step 4. and following inner_dims_order.

By normalizing the most minor dimensions to inner_dims_order, the transform guarantees that packing immediately generates inner dimensions in a desirable layout.

Outer dimension layout permutations are not controlled by this transform op at the moment and can be obtained by composing with the pack_transpose transformation.

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. It returns the list of packed Linalg ops or the original op when all available packing strategies failed to apply.

OPERATION_NAME = 'transform.structured.pack_greedily'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

matmul_packed_sizes() → _ods_ir¶

static_matmul_packed_sizes() → _ods_ir¶

matmul_padded_sizes_next_multiple_of() → _ods_ir¶

matmul_inner_dims_order() → _ods_ir¶

packed_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_pack_greedily(packed_op, target, matmul_packed_sizes, *, static_matmul_packed_sizes=None, matmul_padded_sizes_next_multiple_of=None, matmul_inner_dims_order=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PackOp(packed_op, target, packed_sizes, *, static_packed_sizes=None, loc=None, ip=None)¶

Bases: _ods_ir

Pack a LinalgOp by applying a data tiling transformation on the op and packing the operands according to the packed_sizes specification.

Iterator dimensions are tiled in their canonical order in the op spec. Operands are packed according to the same canonical order of the op iterator dimensions.

Specifying a packed size of 0 for an iterator removes it from consideration for packing.

linalg.pack (resp. linalg.unpack) operations are inserted for the operands (resp. results) that need to be packed (resp. unpacked) according to the packed_sizes specification.

Example¶

Consider a linalg.matmul with indexing maps:

//              M   N   K       M   K
// affine_map<(d0, d1, d2) -> (d0, d2)>
//                              K   N
// affine_map<(d0, d1, d2) -> (d2, d1)>
//                              M   N
// affine_map<(d0, d1, d2) -> (d0, d1)>
%0 = linalg.matmul  ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>)
                   outs(    %C: tensor<?x?xf32>)

Specifying packed_sizes [2, 3, 4] results in tiling the iterator dimensions M, N and K, in this order, in both the op and its operands.

//              M   N   K   m   n   k       M   K   m   k
// affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d2, d3, d5)>
//                                          K   N   n   k
// affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d1, d4, d5)>
//                                          M   N   m   n
// affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d3, d4)>
%0 = linalg.generic_representing_some_higher_d_matmul
      ins(%A, %B: tensor<?x?x2x4xf32>, tensor<?x?x4x3xf32>)
     outs(    %C: tensor<?x?x2x3xf32>)

In particular, note that the second operand B has shape KxNxnxk (and not KxNxkxn as one could expect by looking only at the operand).

Other layouts can be obtained unsurprisingly from this canonical transformation by composing the resulting operation with a transform.structured.pack_transpose op. This composition allows separating concerns and composes better compared to adding additional permutation attributes to this transform op.

Return modes¶

This operation applies to a single Linalg op, otherwise it fails. This operation may produce a definite failure if the packing fails for any reason.

The returned handle point to the packed LinalgOp.

OPERATION_NAME = 'transform.structured.pack'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

packed_sizes() → _ods_ir¶

static_packed_sizes() → _ods_ir¶

packed_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_pack(packed_op, target, packed_sizes, *, static_packed_sizes=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PackTransposeOp(packed_op, pack_op, un_pack_op, target_pack_or_un_pack_op, target_linalg_op, *, outer_perm=None, inner_perm=None, loc=None, ip=None)¶

Bases: _ods_ir

Apply a transposition to a single linalg.pack (resp. linalg.unpack) and update the linalg.generic op that consumes (resp. produces) the operation.

This transform allows composing a simple structured.pack with additional transpositions to e.g. match the data format required by a specific library call or ISA instruction.

The transpose spec must specify at least one of outer_perm or inner_perm attributes, which will act upon the outer_dims_perm or inner_dims_pos of the specified linalg.pack or linalg.unpack op.

If the target of this op is a linalg.pack then a new tensor.empty will be created along with transposed versions of the linalg.pack and the consuming linalg.generic, which is expected to be the sole consumer.

If the target of this op is a linalg.unpack then the whole pack / compute / unpack chain will be transposed and transposed clones of linalg.pack, the consuming linalg.generic and the tail linalg.pack will be created.

Return modes¶

This operation targets a single linalg.pack / linalg.unpack op and a single matching linalg.generic that consumes / produces the op. Otherwise, it produces a silenceableFailure.

This operation may produce a silenceableFailure if the transpose spec is ill-formed (i.e. outer_perm or inner_perm are not permutations of the proper rank) or if the transposition of all involved operations fails for any reason.

This operation returns 3 handles, one to the transformed LinalgOp, one to the transformed linalg.pack and one to the transformed linalg.unpack. The last handle for linalg.unpack is empty if target_pack_or_unpack_op was not itself a linalg.unpack.

OPERATION_NAME = 'transform.structured.pack_transpose'¶

_ODS_REGIONS = (0, True)¶

target_pack_or_un_pack_op() → _ods_ir¶

target_linalg_op() → _ods_ir¶

outer_perm() → _ods_ir | None¶

inner_perm() → _ods_ir | None¶

packed_op() → _ods_ir¶

pack_op() → _ods_ir¶

un_pack_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_pack_transpose(packed_op, pack_op, un_pack_op, target_pack_or_un_pack_op, target_linalg_op, *, outer_perm=None, inner_perm=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PadOp(padded, pad, copy, target, pad_to_multiple_of, *, padding_values=None, padding_dimensions=None, static_pad_to_multiple_of=None, nofold_flags=None, transpose_paddings=None, copy_back_op=None, use_prescribed_tensor_shapes=None, loc=None, ip=None)¶

Bases: _ods_ir

Pads the operations pointed to by the target handle using the options provides as operation attributes. The operation returns a handle to the padded operation and to the padding operation (“tensor.pad”).

To preserve tensor SSA use-def chains, the unpadded result is copied back to the original destination tensor of the targeted op. The op that copies back the result can be customized with copy_back_op:

“bufferization.materialize_in_destination” (default)
“linalg.copy”
“none” (no copy back)

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. This operation may produce a definite failure if the padding fails for any reason.

If all the operations referred to by the target handle pad properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced padded operations, which can be empty.

OPERATION_NAME = 'transform.structured.pad'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

pad_to_multiple_of() → _ods_ir¶

padding_values() → _ods_ir¶

padding_dimensions() → _ods_ir¶

static_pad_to_multiple_of() → _ods_ir | None¶

nofold_flags() → _ods_ir¶

transpose_paddings() → _ods_ir¶

copy_back_op() → _ods_ir¶

use_prescribed_tensor_shapes() → bool¶

padded() → _ods_ir¶

pad() → _ods_ir¶

copy() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_pad(padded, pad, copy, target, pad_to_multiple_of, *, padding_values=None, padding_dimensions=None, static_pad_to_multiple_of=None, nofold_flags=None, transpose_paddings=None, copy_back_op=None, use_prescribed_tensor_shapes=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PadTilingInterfaceOp(padded, pad, target, padding_sizes, *, padding_values=None, static_padding_sizes=None, pad_to_multiple_of=None, loc=None, ip=None)¶

Bases: _ods_ir

Pads the iteration domain of the operations pointed to by the target handle using the options provided as operation attributes. Padding the iteration domain induces a padding of the operands that is consistent across the op semantics and, unlike for simple elementwise ops, may not be trivially deducible or specifiable on operands only (e.g. convolutions). Currently, only a limited set of projected permutation maps are supported.

The specification of padding_sizes follows that of tile_sizes during tiling: the value “0” on a particular iterator encode “no padding”. Like in the case of tiling, an automatic completion by 0 to the operation rank occurs.

This transformation returns a handle to the padded operation and to the padding operation (“tensor.pad”).

TODO: in the future this should be moved out of a specific Linalg implementation file and into a more general “Structured” file.

Return modes¶

This operation ignores non-IndexingMapOpInterface ops and drops them in the return. In the future, this operation will support all TilingInterfaceOps for which the contract between iteration domain and operands can be reified.

This operation may produce a definite failure if the padding fails for any reason.

OPERATION_NAME = 'transform.structured.pad_tiling_interface'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

padding_sizes() → _ods_ir¶

padding_values() → _ods_ir¶

static_padding_sizes() → _ods_ir | None¶

pad_to_multiple_of() → bool¶

padded() → _ods_ir¶

pad() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_pad_tiling_interface(padded, pad, target, padding_sizes, *, padding_values=None, static_padding_sizes=None, pad_to_multiple_of=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PromoteOp(transformed, target, *, operands_to_promote=None, use_full_tile_buffers=None, use_full_tiles_by_default=None, use_original_subview_size=None, use_alloca=None, memory_space=None, mapping=None, alignment=None, loc=None, ip=None)¶

Bases: _ods_ir

Promotes the specified operands of the target into a separate memory buffer.

At this point, this transform does not allow customizing alloc/dealloc functions nor the behavior on copy in/out operations.

Return modes¶

This operation applies to a single Linalg op that satisfies the promoteSubviewsPrecondition, otherwise it fails.

If the operations referred to by the target handle promote properly, the transform succeeds.

When successful, the return handle points to the $target operation that was modified inplace.

OPERATION_NAME = 'transform.structured.promote'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

operands_to_promote() → _ods_ir¶

use_full_tile_buffers() → _ods_ir¶

use_full_tiles_by_default() → bool¶

use_original_subview_size() → bool¶

use_alloca() → bool¶

memory_space() → _ods_ir | None¶

mapping() → _ods_ir | None¶

alignment() → _ods_ir | None¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_promote(transformed, target, *, operands_to_promote=None, use_full_tile_buffers=None, use_full_tiles_by_default=None, use_original_subview_size=None, use_alloca=None, memory_space=None, mapping=None, alignment=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.PromoteTensorOp(tensor, *, memory_space=None, results=None, loc=None, ip=None)¶

Bases: _ods_ir

Requests that a tensor value lives in a specific memory space for its lifetime. This is achieved by allocating a new tensor in the desired memory space with bufferization.alloc_tensor and optionally materializing the source value into that allocation with bufferization.materialize_in_destination. All uses of the original value are then redirected to the promoted value.

The generated code for promoting tensor value %0 resembles the following:

%1 = bufferization.alloc_tensor(<dynamic dims of %0>) { memory_space = memory_space } // Note: the materialization is omitted if %0 is never read and is only // written into (i.e., it behaves as a result tensor). %2 = bufferization.materialize_in_destination %0 in %1 // … <all users of %0 now use %2 instead>

Deallocation is not handled by this transform.

Return modes:

Produces a silenceable failure if the given handle does not point to

tensor-typed values. * Succeeds otherwise and returns a handle to the promoted value(s), i.e., the result of materialization if present and the allocation otherwise.

OPERATION_NAME = 'transform.structured.promote_tensor'¶

_ODS_REGIONS = (0, True)¶

tensor() → _ods_ir¶

memory_space() → _ods_ir | None¶

promoted() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_promote_tensor(tensor, *, memory_space=None, results=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ReplaceOp(replacement, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Replace all target payload ops with the single op that is contained in this op’s region. All targets must have zero arguments and must be isolated from above.

This op is for debugging/experiments only.

Return modes¶

This operation consumes the target handle.

OPERATION_NAME = 'transform.structured.replace'¶

_ODS_REGIONS = (1, True)¶

target() → _ods_ir¶

replacement() → _ods_ir¶

bodyRegion() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_replace(replacement, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.RewriteInDestinationPassingStyleOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Rewrite a supported tensor operation that is not in destination-passing style into a form that is in destination-passing style. Currently supported operations are:

tensor.pad
tensor.generate
tensor.from_elements

This dichotomy hints at a future interface, for now the implementation just switches between different implementation.

Return modes¶

This operation ignores non-unsupported ops and drops them from the return. If all the operations referred to by the target handle generalize properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to a subset of successfully produced operations:

tensor.pad case, the returned handle points to the tensor.insert_slice.
tensor.generate case, the returned handle points to the linalg.generic.
tensor.from_elements case, the returned handle points to the last

tensor.insert.

OPERATION_NAME = 'transform.structured.rewrite_in_destination_passing_style'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_rewrite_in_destination_passing_style(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.ScalarizeOp(result, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Indicates that ops of a specific kind in the given function should be scalarized (i.e. their dynamic dimensions tiled by 1).

Return modes:¶

This operation ignores non-Linalg ops and drops them in the return. This operation produces definite failure if the scalarization fails for any reason. If all the operations referred to by the target handle scalarize properly, the transform succeeds. Otherwise the transform produces a silenceable failure.

The return handle points to only the subset of successfully produced tiled-by-1 operations, which can be empty.

This operation does not return handles to the tiled loop. We make this design choice because it is hard to know ahead of time the number of loops that will be produced (it depends on the number of dynamic dimensions after multiple transformations have been applied). Loops can always be recovered by navigating from the tiled operations if needed.

OPERATION_NAME = 'transform.structured.scalarize'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

result() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_scalarize(result, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.SpecializeOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Transforms a generic operation into the equivalent named form.

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the target handle specialize, the transform succeeds; otherwise, the operation produces a silenceable failure. The return handle points to only the subset of successfully produced equivalent named operations, which can be empty or contain the original ops if they were already in named form. The supported specialization to named Linalg operations are:

linalg.copy of any rank.

OPERATION_NAME = 'transform.structured.specialize'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_specialize(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.SplitOp(split_list, target, dimension, static_chunk_sizes, *, dynamic_chunk_sizes=None, multiway=None, loc=None, ip=None)¶

Bases: _ods_ir

Splits the given target op into two or more complementary parts, which combined cover the entire iteration domain of the original op. The split is performed along the iteration space dimension provided as chunk size attribute specifying the size of the lower part; the remaining range in the iteration space is assigned as the upper part. In case of dimension overflow, the transformation fails. The split is performed at the dimension iterator value specified as either the static chunk size attribute when it is known at transform IR construction time or as the handle to an operation producing a single index-typed value when it is computed by payload IR. In the latter case, the chunk size point must be set to ShapedType::kDynamic and the dynamic size handle must point to as many value-producing operations as there are structured operations pointed to by the target handle.

The operation consumes the target handle, but preserves the chunk size handle if provided. Without the multiway attribute, it produces a new handle that is a list of the two parts of the structured op after splitting, whose lower index part corresponding to the part with lower iteration space indices.

Multiway split mode is enabled by specifying the multiway attribute. In this mode a single target op is split into multiple parts covering the iteration space of the specified dimension. static_chunk_sizes and dynamic_chunk_sizes in this case is a list of chunk sizes that the given dimension should be split into. With multiway it also produces a handle; The result handle is a list of the multiple parts of the structured op after splitting, where the target dimensions for each linalg op in the list corresponds to the chunk sizes specfied in the input split list. If the chunk sizes do not cover the entire iteration space, the leftover chunk is the last payload in the result handle.

As the result handle is most of time a list, an transform.split_handle is needed to access individual handle.

OPERATION_NAME = 'transform.structured.split'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

dynamic_chunk_sizes() → _ods_ir | None¶

dimension() → _ods_ir¶

static_chunk_sizes() → _ods_ir¶

multiway() → bool¶

split_list() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_split(split_list, target, dimension, static_chunk_sizes, *, dynamic_chunk_sizes=None, multiway=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.SplitReductionOp(init_or_alloc_op, fill_op, split_linalg_op, combining_linalg_op, target, *, split_factor=None, insert_split_dimension=None, inner_parallel=None, use_scaling_algorithm=None, use_alloc=None, loc=None, ip=None)¶

Bases: _ods_ir

Indicates that the given target op should be transformed with the splitReduction transformation and split factor provided as attribute.

The splitReduction transformation splits the first single linalg op reduction into a parallel and reduction dimension. A new linalg.generic op is created to perform the rest of the reduction.

The transformation supports different configurations attributes:

split_factor: the factor by which to split (i.e. the size of the

remaining reduction after splitting). * insert_split_dimension: the dimension in the temporary tensor into which the new parallel dimension is inserted. * inner_parallel: specifies whether the parallel dimension is before or after the reduction dimension in the splitting op. * use_scaling_algorithm: whether to use a scaling based formulation that does not create an ExpandShapeOp (default: do not use scaling) * use_alloc: whether to use an alloc op to allocate the temporary tensor (default: do not use alloc op)

Return modes¶

This operation ignores non-Linalg ops and drops them in the return. This operation produces a definite failure if the splitting fails for any reason.

If all the operations referred to by the target handle split properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The 4 returned handles points to only the subset of successfully produced computational operations, which can all be empty. This 4 returned handles point to:

the init op (or tensor_alloc op if use_alloc = true),
the fill op used to initialize the neutral element,
the split op and
the result-combining op.

Example (default: `use_scaling_algorithm = false, use_alloc = false`):¶

%r = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>,
                                      affine_map<(d0) -> ()>],
      iterator_types = ["reduction"]}
ins(%in : tensor<32xf32>)
outs(%out : tensor<f32>) {
^bb0(%arg1: f32, %arg2: f32):
  %y = arith.addf %arg1, %arg2 : f32
  linalg.yield %y : f32
} -> tensor<f32>

is split into:

%cst = arith.constant 0.000000e+00 : f32
%0 = tensor.expand_shape %in [[0, 1]] : tensor<32xf32> into tensor<4x8xf32>
%1 = tensor.empty() : tensor<4xf32>
%2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<4xf32>) -> tensor<4xf32>
%3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                      affine_map<(d0, d1) -> (d0)>],
  iterator_types = ["parallel", "reduction"]}
  ins(%0 : tensor<4x8xf32>) outs(%2 : tensor<4xf32>) {
  ^bb0(%arg3: f32, %arg5: f32):
  %5 = arith.addf %arg3, %arg4 : f32
  linalg.yield %5 : f32
} -> tensor<4xf32>
%r = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>,
                                      affine_map<(d0) -> ()>],
  iterator_types = ["reduction"]}
  ins(%3 : tensor<4xf32>) outs(%out : tensor<f32>) {
  ^bb0(%arg3: f32, %arg4: f32):
  %5 = arith.addf %arg3, %arg4 : f32
  linalg.yield %5 : f32
} -> tensor<f32>

Example (`use_scaling_algorithm = true, use_alloc = true`):¶

Instead of introducing an ExpandShapeOp, this scaling-based implementation rewrites a reduction dimension k into k * split_factor + kk. The dimension kk is added as an extra parallel dimension to the intermediate output tensor at position insert_split_dimension.

Consider a minimal example where k is reduced: O(i, j) += I(i, j, k) Assume i=3, j=5, k=128, split_factor=16 and insert_split_dimension=0. The compute is rewritten as: a. O_i(kk, i, j) += I(i, j, 16 * k + kk) b. O(i, j) += O_i(kk, i, j) The intermediate tensor O_i is of shape (128/16)x3x5 == 8x3x5.

Example:¶

%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>)
  outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>

Is transformed to:

#map0 = affine_map<(d0, d1, d2, d3) -> (d0, d2 * 4 + d3)>
#map1 = affine_map<(d0, d1, d2, d3) -> (d2 * 4 + d3, d1)>
#map2 = affine_map<(d0, d1, d2, d3) -> (d2, d3)>
#map3 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
#map4 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>
#map5 = affine_map<(d0, d1, d2) -> (d0, d1)>
%0 = tensor.empty() : tensor<16x32x64xf32>
%cst = arith.constant 0.000000e+00 : f32
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<16x32x64xf32>) ->
   tensor<16x32x64xf32>
%2 = tensor.empty() : tensor<64x4xi1>

%3 = linalg.generic {indexing_maps = [#map0, #map1, #map2, #map3],
  iterator_types = ["parallel", "parallel", "parallel", "reduction"]}
  ins(%A, %B, %2 : tensor<16x256xf32>, tensor<256x32xf32>, tensor<64x4xi1>)
  outs(%1 : tensor<16x32x64xf32>) {
    ^bb0(%arg3: f32, %arg4: f32, %arg5: i1, %arg6: f32):
      %5 = arith.mulf %arg3, %arg4 : f32
      %6 = arith.addf %arg6, %5 : f32
      linalg.yield %6 : f32
} -> tensor<16x32x64xf32>

%4 = linalg.generic {indexing_maps = [#map4, #map5],
  iterator_types = ["parallel", "parallel", "reduction"]}
  ins(%3 : tensor<16x32x64xf32>)
  outs(%C : tensor<16x32xf32>) {
    ^bb0(%arg3: f32, %arg4: f32):
      %5 = arith.addf %arg3, %arg4 : f32
      linalg.yield %5 : f32
} -> tensor<16x32xf32>

return %4 : tensor<16x32xf32>

OPERATION_NAME = 'transform.structured.split_reduction'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

split_factor() → _ods_ir¶

insert_split_dimension() → _ods_ir¶

inner_parallel() → bool¶

use_scaling_algorithm() → bool¶

use_alloc() → bool¶

init_or_alloc_op() → _ods_ir¶

fill_op() → _ods_ir¶

split_linalg_op() → _ods_ir¶

combining_linalg_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_split_reduction(init_or_alloc_op, fill_op, split_linalg_op, combining_linalg_op, target, *, split_factor=None, insert_split_dimension=None, inner_parallel=None, use_scaling_algorithm=None, use_alloc=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.TileReductionUsingForOp(fill_op, split_op, combining_op, for_op, target, *, reduction_dims=None, tile_sizes=None, loc=None, ip=None)¶

Bases: _ods_ir

Indicates that the given target op should be transformed with the tileReduction transformation with the tile size provided as attribute.

This transformation tiles the target along the reduction dimensions. It creates a tensor initialized with the identity value. Then it creates nested loops with a parallel version of target op inside. The parallel op dimensions are less or equal to the tile size passed by user. After the loop a merge operation is created to do a final reduction with the partial reductions. The initial tensor always uses the tile size dimension. This may overallocate if the tile size is greater than the reduction dimension.

Return modes¶

Returns 4 handles associated with (in order):

the fill op used to initialize the neutral element,
the parallel tiled op and
the result-combining op,
the parent for op.

The reduction_dims can be used to specify the subset of reduction dimensions of the operation to tile. If left unspecified, all reduction dimensions are tiled.

Example:¶

%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                        affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<?x?xf32>)
outs(%out : tensor<?xf32>) {
  ^bb0(%arg7: f32, %arg9: f32):
  %1 = arith.addf %arg7, %arg9 : f32
  linalg.yield %1 : f32
} -> tensor<?xf32>
return %red : tensor<?xf32>

is transformed into:

%0 = tensor.empty(%dim_1) : tensor<?x5xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x5xf32>) -> tensor<?x5xf32>
%2 = scf.for %arg2 = %c0 to %dim_0 step %c5 iter_args(%arg3 = %1) -> (tensor<?x5xf32>) {
  %extracted_slice = tensor.extract_slice %1[0, 0] [%dim, 5] [1, 1] : tensor<?x5xf32> to tensor<?x5xf32>
  %extracted_slice_2 = tensor.extract_slice %arg0[0, %arg2] [%dim, 5] [1, 1] : tensor<?x?xf32> to tensor<?x5xf32>
  %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                        affine_map<(d0, d1) -> (d0, d1)>],
  iterator_types = ["parallel", "parallel"]}
  ins(%extracted_slice_2 : tensor<?x5xf32>)
  outs(%extracted_slice : tensor<?x5xf32>) {
  ^bb0(%in: f32, %out: f32):
    %5 = arith.addf %in, %out : f32
    linalg.yield %5 : f32
  } -> tensor<?x5xf32>
  %dim_3 = tensor.dim %1, %c0 : tensor<?x5xf32>
  %inserted_slice = tensor.insert_slice %4 into %arg3[0, 0] [%dim_3, 5] [1, 1] : tensor<?x5xf32> into tensor<?x5xf32>
  scf.yield %inserted_slice : tensor<?x5xf32>
}
%3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                      affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}
ins(%2 : tensor<?x5xf32>)
outs(%arg1 : tensor<?xf32>) {
^bb0(%in: f32, %out: f32):
  %4 = arith.addf %in, %out : f32
  linalg.yield %4 : f32
} -> tensor<?xf32>

OPERATION_NAME = 'transform.structured.tile_reduction_using_for'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

reduction_dims() → _ods_ir¶

tile_sizes() → _ods_ir¶

fill_op() → _ods_ir¶

split_op() → _ods_ir¶

combining_op() → _ods_ir¶

for_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_tile_reduction_using_for(fill_op, split_op, combining_op, for_op, target, *, reduction_dims=None, tile_sizes=None, loc=None, ip=None) → _ods_ir | _ods_ir | TileReductionUsingForOp¶

class mlir.dialects._structured_transform_ops_gen.TileReductionUsingForallOp(fill_op, split_op, combining_op, forall_op, target, *, reduction_dims=None, num_threads=None, tile_sizes=None, mapping=None, loc=None, ip=None)¶

Bases: _ods_ir

Tile a PartialReductionOpInterface op to a tiled scf.forall doing partial reduction.

This transformation tiles the target along the reduction dimensions. It creates a tensor initialized with the identity value. Then it creates a scf.forall loops with the number threads given by num_threads. The op is tiled op with a size equal to floordiv(size, num_threads). All the partial reduction value is are parallel inserted to create a new tensor. After the loop a merge operation is created to do a final reduction with the partial reductions tensor. If an extra tile_sizes parameter is passed the tiles are cyclically distributed on the threads of the scf.foralls loop.

Return modes¶

Returns 4 handles associated with (in order):

the fill op used to initialize the neutral element,
the parallel tiled op and
the result-combining op,
the parent forall op.

Example:¶

%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>,
                                        affine_map<(d0, d1) -> (d0)>],
iterator_types = ["parallel", "reduction"]}
ins(%arg0 : tensor<?x?xf32>)
outs(%out : tensor<?xf32>) {
  ^bb0(%arg7: f32, %arg9: f32):
  %1 = arith.addf %arg7, %arg9 : f32
  linalg.yield %1 : f32
} -> tensor<?xf32>
return %red : tensor<?xf32>

is transformed into:

%0 = tensor.empty(%dim_1) : tensor<?x5xf32>
%1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x5xf32>) -> tensor<?x5xf32>
%2 = scf.forall (%arg2) in (%c5) shared_outs(%arg3 = %1) -> (tensor<?x5xf32>) {
  %4 = affine.min #map(%arg2)[%dim_0]
  %5 = affine.max #map1(%4)
  %extracted_slice = tensor.extract_slice %arg3[0, %arg2] [%dim, 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32>
  %6 = affine.apply #map2(%arg2)[%dim_0]
  %extracted_slice_2 = tensor.extract_slice %arg0[0, %6] [%dim, %5] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32>
  %extracted_slice_3 = tensor.extract_slice %extracted_slice[0] [%dim] [1] : tensor<?xf32> to tensor<?xf32>
  %7 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["parallel", "reduction"]} ins(%extracted_slice_2 : tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?xf32>) {
  ^bb0(%in: f32, %out: f32):
    %9 = arith.addf %in, %out : f32
    linalg.yield %9 : f32
  } -> tensor<?xf32>
  scf.forall.in_parallel {
    tensor.parallel_insert_slice %7 into %arg3[0, %arg2] [%dim, 1] [1, 1] : tensor<?xf32> into tensor<?x5xf32>
  }
} {mapping = []}
%3 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["parallel", "reduction"]} ins(%2 : tensor<?x5xf32>) outs(%arg1 : tensor<?xf32>) {
^bb0(%in: f32, %out: f32):
  %4 = arith.addf %in, %out : f32
  linalg.yield %4 : f32
} -> tensor<?xf32>

OPERATION_NAME = 'transform.structured.tile_reduction_using_forall'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

reduction_dims() → _ods_ir¶

num_threads() → _ods_ir¶

tile_sizes() → _ods_ir¶

mapping() → _ods_ir | None¶

fill_op() → _ods_ir¶

split_op() → _ods_ir¶

combining_op() → _ods_ir¶

forall_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_tile_reduction_using_forall(fill_op, split_op, combining_op, forall_op, target, *, reduction_dims=None, num_threads=None, tile_sizes=None, mapping=None, loc=None, ip=None) → _ods_ir | _ods_ir | TileReductionUsingForallOp¶

class mlir.dialects._structured_transform_ops_gen.TileUsingForOp(tiled_linalg_op, loops, target, dynamic_sizes, *, static_sizes=None, interchange=None, scalable_sizes=None, loc=None, ip=None)¶

Bases: _ods_ir

Indicates that the given target op should be tiled with the given sizes. This transform generates a loop nest with a smaller (“tiled”) target operation in its body. Currently limited to LinalgOps.

Tile sizes may be known at transformation time, in which case they are expected to be provided in the static_size attribute, or not, in which case the tile value must be computed by the payload IR and the handle to the operation computing it must be provided through dynamic_sizes. When the sizes are not known statically, the corresponding entry in the static_sizes attribute must be set to ShapedType::kDynamic. Only the dynamic sizes must be provided in dynamic_sizes, i.e., there should be as many handles as ShapedType::kDynamic values in the static_sizes attribute. A static size of 0 indicates that the dimension should not be tiled. No loop will be generated for such dimensions. If all tile sizes are 0, this transform is effectively a no-op.

This op returns handles to the tiled op (in the generated loop nest) and the generated loops. The number of loops is the number of tile sizes that are statically known to be non-zero.

Return modes¶

On success, the resulting handles are associated with co-indexed lists of tiled operations and loops around them.

This operation only supports Linalg ops and produces a silenceable failure if the input contains any non-Linalg ops. The ops preceding it in the list associated with the target handle will have been tiled.

This operation produces a silenceable failure if the dynamic_sizes handles are associated with lists of payload operations of a size different than that of the list associated with the target handle.

If the internal implementation of tiling for any of the operations fails, produces a definite failure.

OPERATION_NAME = 'transform.structured.tile_using_for'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

dynamic_sizes() → _ods_ir¶

static_sizes() → _ods_ir | None¶

interchange() → _ods_ir | None¶

scalable_sizes() → _ods_ir | None¶

tiled_linalg_op() → _ods_ir¶

loops() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_tile_using_for(tiled_linalg_op, loops, target, dynamic_sizes, *, static_sizes=None, interchange=None, scalable_sizes=None, loc=None, ip=None) → _ods_ir | _ods_ir | TileUsingForOp¶

class mlir.dialects._structured_transform_ops_gen.TileUsingForallOp(tiled_op, forall_op, target, num_threads, tile_sizes, *, packed_num_threads=None, packed_tile_sizes=None, static_num_threads=None, static_tile_sizes=None, mapping=None, loc=None, ip=None)¶

Bases: _ods_ir

Tile a TilingInterface op to a tiled scf.forall.

Tiling is applied by either specifying num_threads or tile_size. If num_threads is specified, then the tile size for each dimension i is calculated dynamically via ceilDiv(dimSize[i], num_threads[i]). num_threads and tile_size can be either static index attributes or operation handles (or a mix thereof). Operation handles must be mapped to exactly one op that has exactly one result of index type.

Static zero tile sizes indicate that the dimension is not tiled and can be thought of as tiling by the full size of data.

It is the user’s responsibility to ensure that num_threads/tile_sizes is a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case). If the dimension is not parallelizable, a warning is issued to notify the user that the generated code is not safe to parallelize.

If non-empty, the mapping is added as an attribute to the resulting scf.forall.

Note: tile_sizes and num_threads are variadic. Each tile size/number of threads can be an index attribute or a transform handle that is mapped to exactly one payload op with exactly one index result.

Return modes¶

This operation ignores ops that do not implement the TilingInterface and drops them in the return.

If all the operations referred to by the target handle tile successfully, the transform succeeds. Otherwise the transform produces a silenceable failure.

The two returned handles point to only the subset of successfully produced tiled operations, which can all be empty.

These two returned handles point to:

the tiled op that implements TilingInterface,
the new scf.forall op.

Example using `num_threads`¶

%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
   : (!transform.any_op) -> !transform.any_op
%3:2 = transform.structured.tile_using_forall %0 num_threads [10, 20]
   : (!transform.any_op) -> (!transform.any_op, !transform.any_op)

Example using `tile_sizes`¶

%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1
   : (!transform.any_op) -> !transform.any_op
%sz = transform.structured.match ...
%3:2 = transform.structured.tile_using_forall %0 tile_sizes [0, %sz, 20]
   : (!transform.any_op, !transform.any_op) -> (!transform.any_op, !transform.any_op)

OPERATION_NAME = 'transform.structured.tile_using_forall'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

num_threads() → _ods_ir¶

tile_sizes() → _ods_ir¶

packed_num_threads() → _ods_ir | None¶

packed_tile_sizes() → _ods_ir | None¶

static_num_threads() → _ods_ir | None¶

static_tile_sizes() → _ods_ir | None¶

mapping() → _ods_ir | None¶

tiled_op() → _ods_ir¶

forall_op() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_tile_using_forall(tiled_op, forall_op, target, num_threads, tile_sizes, *, packed_num_threads=None, packed_tile_sizes=None, static_num_threads=None, static_tile_sizes=None, mapping=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.TransposeConv2DOp(transformed, target, *, loc=None, ip=None)¶

Bases: _ods_ir

Convert linalg.conv_2d_nhwc_fhwc into linalg.conv_2d_nhwc_hwcf by introducing a linalg.transpose on the filter tensor/memref.

Whilst the fhwc filter channel ordering can be desirable for certain targets and is a more direct mapping to higher level dialects such as TOSA (which only supports this ordering) hwcf is better suited for transformations such as img2col which can make use of optimized BLAS routines such as GEMM.

Returns one handle:

The final operation of the sequence that replaces the original

convolution.

Return modes:¶

Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.

OPERATION_NAME = 'transform.structured.transpose_conv2d'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_transpose_conv2d(transformed, target, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.TransposeMatmulOp(transformed, target, *, inputToTranspose=None, loc=None, ip=None)¶

Bases: _ods_ir

Convert Linalg matmul ops to transposed variants.

By default the LHS matrix is transposed. Specify <rhs> to instead transpose RHS matrix.

Return modes:¶

This operation fails if target is unsupported, i.e., not a linalg.matmul or linalg.batch_matmul. Otherwise, the operation succeeds and returns a handle to the transposed matmul op.

OPERATION_NAME = 'transform.structured.transpose_matmul'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

inputToTranspose() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_transpose_matmul(transformed, target, *, input_to_transpose=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.VectorizeChildrenAndApplyPatternsOp(transformed, target, *, fold_type_extensions_into_contract=None, vectorize_padding=None, vectorize_nd_extract=None, flatten_1d_depthwise_conv=None, disable_multi_reduction_to_contract_patterns=None, disable_transfer_permutation_map_lowering_patterns=None, loc=None, ip=None)¶

Bases: _ods_ir

Vectorizes all children contained in the given target using the configuration specified by the attributes of this op. This only vectorizes structured ops that operate on shaped types and does not vectorize loops or straight-line. Internally, it applies a set of rewrite patterns, some of which enable vectorization and some of which clean up the results. Therefore, it can only be applied to an op with the “isolated from above” property. This transformation only fails if the entire pattern rewriting failed, i.e., it does not fail when no ops were vectorized.

Finer granularity can be achieved either with the VectorizeOp for individual ops or by outlining the target part of the payload IR into, e.g., a function, performing this transformation, and inlining it back.

Note that this transformation invalidates the handles to any payload IR operation that is contained inside the vectorization target.

This transformation supports the following attributes:

fold_type_extensions_into_contract: a UnitAttr to enable the folding of

type extension operations into vector.contract to create a mixed precision operation. * vectorize_padding: a UnitAttr to activate the vectorization of tensor.pad ops. Different pipelines may prefer to lower such ops to loops. * disable_multi_reduction_to_contract_patterns: a UnitAttr to deactivate the rewrite of vector.multi_reduction to vector.contract. This is intended to be used in tests only. * disable_transfer_permutation_map_lowering_patterns: a UnitAttr to deactivate the rewrite of vector.transfer with permutation maps into explicit vector.transpose operations. This is intended to be used in tests only but may be promoted to a first class attribute in the future.

Return modes:¶

This operation produces a definite failure if vectorization fails for any reason. The operation always returns the handle to the target op that is expected to be isolated from above.

OPERATION_NAME = 'transform.structured.vectorize_children_and_apply_patterns'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

fold_type_extensions_into_contract() → bool¶

vectorize_padding() → bool¶

vectorize_nd_extract() → bool¶

flatten_1d_depthwise_conv() → bool¶

disable_multi_reduction_to_contract_patterns() → bool¶

disable_transfer_permutation_map_lowering_patterns() → bool¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_vectorize_children_and_apply_patterns(transformed, target, *, fold_type_extensions_into_contract=None, vectorize_padding=None, vectorize_nd_extract=None, flatten_1d_depthwise_conv=None, disable_multi_reduction_to_contract_patterns=None, disable_transfer_permutation_map_lowering_patterns=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._structured_transform_ops_gen.VectorizeOp(target, vector_sizes, *, static_vector_sizes=None, vectorize_nd_extract=None, assume_dynamic_dims_match_vec_sizes=None, create_named_contraction=None, scalable_sizes=None, loc=None, ip=None)¶

Bases: _ods_ir

Vectorize the target ops, which must be Linalg ops.

Use the optional vector sizes to specify exactly what configuration the vectorizer should use. It will then use masked vectors of the specified size to enforce this configuration (“masked vectorization”). If no vector sizes are specified, the vectorizer will infer the shapes to use from the target Linalg ops (“regular vectorization”). More specifically:

# Masked vectorization - vector sizes are specified explicitly
transform.structured.vectorize %target vector_sizes [1, 4] : !transform.any_op
# Regular vectorization - vector sizes are inferred from the target Op
transform.structured.vectorize %target : !transform.any_op

The vector sizes can be either static or dynamic (SSA values). In case of SSA values, the handle must be mapped to exactly one payload op with exactly one index-typed result.

Note: The input vector sizes must be bigger than or equal to their counterpart iteration space sizes.

Typically this operator should be applied to linalg operations that have already been tiled to the appropriate sizes.

Return modes:¶

This operation produces a silenceable failure if at least one target op is not a Linalg op or fails to vectorize. It produces a definite failure if the dynamic vector sizes (SSA values) do not satisfy the constraints mentioned above.

OPERATION_NAME = 'transform.structured.vectorize'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

vector_sizes() → _ods_ir¶

static_vector_sizes() → _ods_ir | None¶

vectorize_nd_extract() → bool¶

assume_dynamic_dims_match_vec_sizes() → bool¶

create_named_contraction() → bool¶

scalable_sizes() → _ods_ir | None¶

mlir.dialects._structured_transform_ops_gen.structured_vectorize(target, vector_sizes, *, static_vector_sizes=None, vectorize_nd_extract=None, assume_dynamic_dims_match_vec_sizes=None, create_named_contraction=None, scalable_sizes=None, loc=None, ip=None) → VectorizeOp¶

class mlir.dialects._structured_transform_ops_gen.WinogradConv2DOp(transformed, target, fmr, *, loc=None, ip=None)¶

Bases: _ods_ir

Winograd Conv2D algorithm will convert linalg Conv2D operation into batched matrix multiply. Before the matrix multiply, it will convert filter and input into a format suitable for batched matrix multiply. After the matrix multiply, it will convert output to the final result tensor.

The algorithm F(m x m, r x r) is

Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A

The size of output Y is m x m. The size of filter g is r x r. The size of input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are transformation matrices.

Return modes:¶

This operation produces a silenceable failure if target is unsupported. Otherwise, the operation succeeds and returns a handle of the sequence that replaces the original convolution.

OPERATION_NAME = 'transform.structured.winograd_conv2d'¶

_ODS_REGIONS = (0, True)¶

target() → _ods_ir¶

fmr() → _ods_ir¶

transformed() → _ods_ir¶

mlir.dialects._structured_transform_ops_gen.structured_winograd_conv2d(transformed, target, fmr, *, loc=None, ip=None) → _ods_ir¶

mlir.dialects._structured_transform_ops_gen¶

Attributes¶

Classes¶

Functions¶

Module Contents¶

Return modes¶

Return modes:¶

Return modes¶

Return modes:¶

Return modes¶

Return modes:¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes:¶

Return modes:¶

Return modes:¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Example¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes¶

Return modes:¶

Return modes¶

Return modes¶

Example (default: use_scaling_algorithm = false, use_alloc = false):¶

Example (use_scaling_algorithm = true, use_alloc = true):¶

Example:¶

Return modes¶

Example:¶

Return modes¶

Example:¶

Return modes¶

Return modes¶

Example using num_threads¶

Example using tile_sizes¶

Return modes:¶

Return modes:¶

Return modes:¶

Return modes:¶

Return modes:¶

Example (default: `use_scaling_algorithm = false, use_alloc = false`):¶

Example (`use_scaling_algorithm = true, use_alloc = true`):¶

Example using `num_threads`¶

Example using `tile_sizes`¶