mlir.dialects._structured_transform_ops_gen¶
Attributes¶
Classes¶
Collect patterns to decompose linalg.pack and linalg.unpack into e.g. |
|
Collect patterns to decompose tensor.pad into e.g. tensor::EmptyOp, |
|
Collects patterns that promote inputs to outputs and remove unused inputs of |
|
Collects patterns to replace linalg.add when destination passing suffices |
|
Indicates that operations like tensor.pad and tensor.extract_slice should |
|
// TODO: |
|
Collects patterns to fold unit-extent dimensions in operands/results of |
|
Collects patterns to fold unit-extent dimensions in operands/results of |
|
Apply patterns that vectorize tensor.pad. |
|
Collects canonicalization patterns relevant to apply after tiling patterns. |
|
This transform bufferizes the targeted operation and materializes the |
|
This transform emits the IR computing the list of (1) exponentially |
|
Convert linalg.conv_2d_xxx into linalg.generic (for img2col packing) |
|
For operations that implement the |
|
TODO |
|
Decomposes named complex operations, such as higher-dimensional |
|
Decompose winograd operations. It will convert filter, input and output |
|
Try to eliminate all |
|
Flattens the iteration space and (applicable) operands of elementwise |
|
Fuses the |
|
Tiles the operations pointed to by the target handle and fuses their |
|
Transforms a named structured operation into the generic form with the |
|
Helper transform used to hoist a tensor.pad target operation. This operation |
|
Hoist the tensor.pad target operation by at most the given number of loops. |
|
Hoist vector.extract / vector.broadcasts pairs out of immediately |
|
Hoist vector.transfer_read / vector.transfer_write pairs out of immediately |
|
Targeted rewrite of an tensor.insert_slice to linalg.copy. |
|
Interchanges the iterators of the operations pointed to by the target handle |
|
Targeted rewrite of a linalg.copy on memrefs to a memref.copy. |
|
Rewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose. |
|
Lower a linalg.unpack into empty + linalg.transpose + tensor.collapse_shape + |
|
Targeted mapping of a linalg.copy / tensor.pad operation on tensors to a GPU |
|
Match op with the specified constraints, within the target op. |
|
Emits the IR computing the tile sizes |
|
Target a Linalg op and rewrite it into packed LinalgOp form by trying to |
|
Pack a LinalgOp by applying a data tiling transformation on the op and |
|
Apply a transposition to a single |
|
Pads the operations pointed to by the target handle using the options |
|
Pads the iteration domain of the operations pointed to by the target |
|
Promotes the specified operands of the target into a separate memory buffer. |
|
Requests that a tensor value lives in a specific memory space for its |
|
Replace all |
|
Rewrite a supported tensor operation that is not in destination-passing style |
|
Indicates that ops of a specific kind in the given function should be |
|
Transforms a generic operation into the equivalent named form. |
|
Splits the given |
|
Indicates that the given |
|
Indicates that the given |
|
Tile a PartialReductionOpInterface op to a tiled |
|
Indicates that the given |
|
Tile a TilingInterface op to a tiled |
|
Convert linalg.conv_2d_nhwc_fhwc into linalg.conv_2d_nhwc_hwcf by introducing |
|
Convert Linalg matmul ops to transposed variants. |
|
Vectorizes all children contained in the given |
|
Vectorize the target ops, which must be Linalg ops. |
|
Winograd Conv2D algorithm will convert linalg Conv2D operation into batched |
Functions¶
Module Contents¶
- mlir.dialects._structured_transform_ops_gen._ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ApplyDecomposeTensorPackUnpackPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollect patterns to decompose linalg.pack and linalg.unpack into e.g. tensor::PadOp, linalg::transposeOp Ops. Requires all outer dims to be unit.
- OPERATION_NAME = 'transform.apply_patterns.linalg.decompose_pack_unpack'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_decompose_pack_unpack(*, loc=None, ip=None) ApplyDecomposeTensorPackUnpackPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyDecomposeTensorPadPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollect patterns to decompose tensor.pad into e.g. tensor::EmptyOp, linalg::FillOp and tensor::InsertSliceOp.
- OPERATION_NAME = 'transform.apply_patterns.linalg.decompose_pad'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_decompose_pad(*, loc=None, ip=None) ApplyDecomposeTensorPadPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyEraseUnnecessaryInputsPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects patterns that promote inputs to outputs and remove unused inputs of
linalg.genericops.- OPERATION_NAME = 'transform.apply_patterns.linalg.erase_unnecessary_inputs'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_erase_unnecessary_inputs(*, loc=None, ip=None) ApplyEraseUnnecessaryInputsPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyFoldAddIntoDestPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects patterns to replace linalg.add when destination passing suffices for achieving the sum.
- OPERATION_NAME = 'transform.apply_patterns.linalg.fold_add_into_dest'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_add_into_dest(*, loc=None, ip=None) ApplyFoldAddIntoDestPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyFoldIntoPackAndUnpackPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irIndicates that operations like tensor.pad and tensor.extract_slice should be folded into linalg.pack and linalg.unpack operations, respectively.
- OPERATION_NAME = 'transform.apply_patterns.tensor.fold_into_pack_and_unpack'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_tensor_fold_into_pack_and_unpack(*, loc=None, ip=None) ApplyFoldIntoPackAndUnpackPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyFoldPackUnpackIntoEmptyPatternsOp(*, fold_single_use_only=None, loc=None, ip=None)¶
Bases:
_ods_ir// TODO:
- OPERATION_NAME = 'transform.apply_patterns.linalg.fold_pack_unpack_into_empty'¶
- _ODS_REGIONS = (0, True)¶
- fold_single_use_only() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_pack_unpack_into_empty(*, fold_single_use_only=None, loc=None, ip=None) ApplyFoldPackUnpackIntoEmptyPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyFoldUnitExtentDimsViaReshapesPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects patterns to fold unit-extent dimensions in operands/results of linalg ops on tensors via reassociative reshape ops.
- OPERATION_NAME = 'transform.apply_patterns.linalg.fold_unit_extent_dims_via_reshapes'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_unit_extent_dims_via_reshapes(*, loc=None, ip=None) ApplyFoldUnitExtentDimsViaReshapesPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyFoldUnitExtentDimsViaSlicesPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects patterns to fold unit-extent dimensions in operands/results of linalg ops on tensors via rank-reducing slices.
- OPERATION_NAME = 'transform.apply_patterns.linalg.fold_unit_extent_dims_via_slices'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_fold_unit_extent_dims_via_slices(*, loc=None, ip=None) ApplyFoldUnitExtentDimsViaSlicesPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyPadVectorizationPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irApply patterns that vectorize tensor.pad.
These patterns rewrite tensor.pad Ops using vector.transfer_read and vector.transfer_write operations. This is done either by:
#. Folding tensor.pad with an existing vector.transfer_read / vector.transfer_write Op (generated prior to running these patterns). #. Rewriting it (when matched together with q tensor.insert_slice consumer Op) as a vector.transfer_read + vector.transfer_write pair.
In both cases, these patterns look at producers and consumers for the matched tensor.pad Op to find opportunities for vectorization.
- OPERATION_NAME = 'transform.apply_patterns.linalg.pad_vectorization'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_pad_vectorization(*, loc=None, ip=None) ApplyPadVectorizationPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.ApplyTilingCanonicalizationPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects canonicalization patterns relevant to apply after tiling patterns.
- OPERATION_NAME = 'transform.apply_patterns.linalg.tiling_canonicalization'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._structured_transform_ops_gen.apply_patterns_linalg_tiling_canonicalization(*, loc=None, ip=None) ApplyTilingCanonicalizationPatternsOp¶
- class mlir.dialects._structured_transform_ops_gen.BufferizeToAllocationOp(target, *, memory_space=None, memcpy_op=None, alloc_op=None, bufferize_destination_only=None, emit_dealloc=None, results=None, loc=None, ip=None)¶
Bases:
_ods_irThis transform bufferizes the targeted operation and materializes the result in a new allocation. It replaces all original uses of the target result with the newly allocated buffer, wrapped in a
bufferization.to_tensorop. It returns a handle to the newly allocated buffer. Furthermore, it returns a handle that is mapped to all newly created ops.Only bufferizable ops are that bufferize to a memory write or have an aliasing OpOperand (and do not themselves bufferize to an allocation) are supported. They are bufferized using their BufferizableOpInterface implementation. E.g.:
%0 = tensor.insert %f into %dest[%pos] : tensor<10xf32>
Is bufferized to:
%alloc = memref.alloc() : memref<10xf32> bufferization.materialize_in_destination %dest in %alloc memref.store %f, %alloc[%pos] : memref<10xf32> %0 = bufferization.to_tensor %alloc restrict writable : memref<10xf32>
Selected ops that bufferize to an allocation (or need special handling) are also supported:
tensor.padis lowered to an allocation, followed by alinalg.filland
and a buffer copy (all on memrefs). *
vector.maskis bufferized together with its region. The allocation is placed in front of thevector.maskop.An optional memory space attribute can be specified for the materialized buffer allocation.
If a memory copy is needed, a “bufferization.materialize_in_destination” is used when possible. This is an op with tensor semantics that will bufferize to a memory copy later. Which concrete op will be used for the memory copy is up to the bufferization framework. Alternatively, a custom memcpy op can be specified via
memcpy_op. Currently supported are “memref.copy” and “linalg.copy”. In that case, the source of each memcpy must not have a custom memory space. Furthermore, because the future buffer layout unknown for a given tensor, a fully dynamic layout is assumed for best compatibility. Users should use “bufferization.materialize_in_destination” when possible.“memref.alloc” is used for new buffer allocations. The buffer is deallocated at the end of the block if the “emit_dealloc” attribute is present. If this attribute is not present, the allocated memory will be leaked. However, running the
-buffer-deallocation-pipelineafter all bufferization is done will properly insert the corresponding deallocation(s). Custom allocation ops can be specified viaalloc_op. Currently supported are “memref.alloc” and “memref.alloca”. In case of a “memref.alloca”, the buffer is not deallocated.If
bufferize_destination_onlyis set, only the destination operands of the op are bufferized to a new memory allocation, but not the op itself.Return modes¶
This operation consumes the
targethandle and produces theallocated_bufferandnew_opshandles. It always succeeds.- OPERATION_NAME = 'transform.structured.bufferize_to_allocation'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- memory_space() _ods_ir | None¶
- memcpy_op() _ods_ir¶
- alloc_op() _ods_ir¶
- bufferize_destination_only() bool¶
- emit_dealloc() bool¶
- allocated_buffer() _ods_ir¶
- new_ops() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_bufferize_to_allocation(target, *, memory_space=None, memcpy_op=None, alloc_op=None, bufferize_destination_only=None, emit_dealloc=None, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ContinuousTileSizesOp(tile_sizes, chunk_sizes, target, dimension, target_size, *, loc=None, ip=None)¶
Bases:
_ods_irThis transform emits the IR computing the list of (1) exponentially diminishing tile sizes that are powers of 2; and (2) the corresponding chunk-sizes the target op should be split into along the given dimension.
For example, for
target_size9, anddimension0 for the following linalg op as target%0 = linalg.matmul ins(%arg0, %arg1: tensor<25x34xf32>, tensor<34x25xf32>) outs(%arg2: tensor<25x25xf32>)
the first result
tile_sizeswill be a list of diminishing tile sizes 9, 4, 2, 1; and the second result will be a list of chunk sizes 18, 4, 2, 1 that the corresponding dimension should be split into.After the target op has been split along the given dimension (for example using multiway split), each chunk can be tiled with the corresponding tile size in the
tile_sizeslist generated as a result of this op.Specifying the output type as !transform.param will cause
tile_sizesandchunk_sizesto be computed statically and not dynamically.- OPERATION_NAME = 'transform.structured.continuous_tile_sizes'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- dimension() _ods_ir¶
- target_size() _ods_ir¶
- tile_sizes() _ods_ir¶
- chunk_sizes() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_continuous_tile_sizes(tile_sizes, chunk_sizes, target, dimension, target_size, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ConvertConv2DToImg2ColOp(img2col_tensor, transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irConvert linalg.conv_2d_xxx into linalg.generic (for img2col packing) and linalg.matmul.
A convolution operation can be written as a matrix-matrix multiplication by unfolding the cross-correlation between input and filter and explicitly copy overlapped sliding window inputs.
Consider 2D input X with single channel input and output and 2x2 filter W:
[x(0, 0) , x(0, 1) , ..., x(0, n) ] [x(1, 0) , x(1, 1) , ..., x(1, n) ] [. , . ,. , . ] [w(0, 0), w(0, 1)] [. , . , . , . ] (conv) [w(1, 0), w(1, 1)] [. , . , ., . ] [x(n-1, 0), x(n-1, 1), ..., x(n-1, n-1)]
The packed input data (img2col) is a matrix with |rows| = output spatial size, |columns| = filter spatial size. To compute the output Y(i, j) we need to calculate the dot product between filter window at input X(x, y)) and the filter which will look like the following where r.h.s is the img2col matrix and l.h.s is the flattned filter:
[x(0,0), x(0,1), x(1,0), x(1,1)] [x(0,1), x(1,1), x(0,2), x(1,2)] (matmul) [w(0,0), w(0,1), w(1,0), w(1,1)] [x(0,1), x(1,1), x(0,2), x(1,2)] [ . , . , . , . ]
In general for 2D case with (N, H, W, C) input and (Kh, Kw, C, D) filter and output (N, Ho, Wo, D) the convolution is the following matrix-matrix multiplication (Ho x Wo, Kh x Kw x C) * (Kh x Kw x C, D) for each input in the N input. For the case where N > 1 its a batched matrxi-matrix multplication.
Returns two handles:
One on the operation that produces the img2col tensor.
One on the final operation of the sequence that replaces the original
convolution.
Return modes:¶
Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.
- OPERATION_NAME = 'transform.structured.convert_conv2d_to_img2col'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- img2col_tensor() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_convert_conv2d_to_img2col(img2col_tensor, transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ConvertToLoopsOp(result, target, *, loc=None, ip=None)¶
Bases:
_ods_irFor operations that implement the
TilingInterface, and implement thegenerateScalarImplementationmethod, lowers the operation to loops. The return handle points to all generated loops. Fails if the payload ops cannot be lowered to loops.- OPERATION_NAME = 'transform.structured.convert_to_loops'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- result() _ods_ir¶
Shortcut to get an op result if it has only one (throws an error otherwise).
- mlir.dialects._structured_transform_ops_gen.structured_convert_to_loops(result, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.DecomposeInterfaceOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irTODO
- OPERATION_NAME = 'transform.structured.decompose_interface'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_decompose_interface(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.DecomposeOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irDecomposes named complex operations, such as higher-dimensional (depthwise) convolutions, into combinations of lower-dimensional equivalents when possible.
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the
targethandle decompose properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced computational operations, which can be empty.- OPERATION_NAME = 'transform.structured.decompose'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_decompose(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.DecomposeWinogradOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irDecompose winograd operations. It will convert filter, input and output transform operations into a combination of scf, tensor, and linalg equivalent operations. Before applying this transform operations, users need to tile winograd transform operations into supported sizes.
Return modes:¶
This operation fails if
targetis unsupported. Otherwise, the operation succeeds and returns a handle of the sequence that replaces the original operations.- OPERATION_NAME = 'transform.structured.decompose_winograd_op'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_decompose_winograd_op(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.EliminateLinalgOpAnchoredEmptyTensorsOp(target, *, loc=None, ip=None)¶
Bases:
_ods_irTry to eliminate all
tensor.emptyop uses that are anchored on a LinalgOp within the targeted op.This op is similar to
bufferization.eliminate_empty_tensors, but specific to LinalgOps.tensor.emptyops cannot be bufferized. They can either be converted tobufferization.alloc_tensoror replaced with another tensor (via this transform).tensor.emptydoes not specify the contents of the returned tensor so their results can be replaced with arbitrary tensor values as long as the dimensions match.This transform looks for
tensor.emptyops where the SSA use-def chain of the result ends in a supported LinalgOp (always following the aliasing OpOperand/OpResult chain). The following LinalgOps are supported:Only parallel iterator types.
The use-def chain ends in an input operand of the LinalgOp.
The LinalgOp has an unused output operand with the same shape and
indexing map.
Example:
%0 = tensor.empty() %1 = linalg.matmul ins(...) outs(%0) %2 = linalg.generic ins(%1) outs(%dest) { ^bb0(%in: f32, %out: f32): // out not used }
Is rewritten with:
%0 = tensor.empty() %1 = linalg.matmul ins(...) outs(%dest) %2 = linalg.generic ins(%0) outs(%1) { ^bb0(%in: f32, %out: f32): // Use %out instead of %in }
After this transformation, the “ins” operand has no uses inside the body of the LinalgOp and can be folded away with existing cleanup patterns. Afterwards, the tensor::EmptyOp can also fold away, so that the example can bufferize without an allocation (in the absence of other conflicts).
Return modes¶
This transform reads the target handle and modifies the payload. It does not produce any handle.
- OPERATION_NAME = 'transform.structured.eliminate_empty_tensors'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_eliminate_empty_tensors(target, *, loc=None, ip=None) EliminateLinalgOpAnchoredEmptyTensorsOp¶
- class mlir.dialects._structured_transform_ops_gen.FlattenElementwiseLinalgOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irFlattens the iteration space and (applicable) operands of elementwise linalg ops to a single dimension.
Returns one handle:
Flattened linalg operation.
Return modes:¶
Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.
- OPERATION_NAME = 'transform.structured.flatten_elementwise'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_flatten_elementwise(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.FuseIntoContainingOp(fused_op, new_containing_op, producer_op, containing_op, *, loc=None, ip=None)¶
Bases:
_ods_irFuses the
producer_opinto thecontaining_op. Returns a handle to the fused ops and thenew_containing_op.The producer is typically a slice of a tileable op (i.e., implements TilingInterface). In that case, this transform computes the accessed producer slice inside of the containing op (“tile and fuse”) and if required, creates a new containing op with outputs from the fused producer. Otherwise, the entire producer is cloned inside the containing op (“clone and fuse”).
The containing op handle must be associated with exactly one payload op. The producer op handle may be associated with multiple payload ops. This transform fuses producers one-by-one, always picking an unspecified producer that has at least one use inside the containing op among the producers. A producer can be listed multiple times in the handle.
Note: If a producer has multiple uses inside the containing op, it is currently tiled and/or cloned multiple times into the containing op. TODO: Reuse already fused OpResults instead of tiling/cloning a second time when possible. Fuse producers according to a topological sorting to achieve the largest amount of reuse.
Return modes¶
If at least one producer could not be fused, this operation produces a silenceable failure. This is the case when tiling fails or when no producer op could be found among the remaining producers that has at least one use within the containing op. I.e., “producers” that are not consumed within the containing op are rejected by this operation.
This operation consumes the producer handle. This operation only reads the containing op handle.
- OPERATION_NAME = 'transform.structured.fuse_into_containing_op'¶
- _ODS_REGIONS = (0, True)¶
- producer_op() _ods_ir¶
- containing_op() _ods_ir¶
- fused_op() _ods_ir¶
- new_containing_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_fuse_into_containing_op(fused_op, new_containing_op, producer_op, containing_op, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.FuseOp(transformed, loops, target, tile_sizes, tile_interchange, *, static_tile_sizes=None, static_tile_interchange=None, apply_cleanup=None, use_forall=None, loc=None, ip=None)¶
Bases:
_ods_irTiles the operations pointed to by the target handle and fuses their producers greedily using the options provided as attributes. Tile sizes and loop interchange permutation can be provided as either static attributes or dynamic values (transform parameters or payload handles).
If
apply_cleanupis true then slice canonicalization is applied between fusion steps. Ifuse_forallis true then tiling method generates ascf.forallloop instead ofscf.forloops.- OPERATION_NAME = 'transform.structured.fuse'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- tile_sizes() _ods_ir¶
- tile_interchange() _ods_ir¶
- static_tile_sizes() _ods_ir | None¶
- static_tile_interchange() _ods_ir | None¶
- apply_cleanup() bool¶
- use_forall() bool¶
- transformed() _ods_ir¶
- loops() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_fuse(transformed, loops, target, tile_sizes, tile_interchange, *, static_tile_sizes=None, static_tile_interchange=None, apply_cleanup=None, use_forall=None, loc=None, ip=None) _ods_ir | _ods_ir | FuseOp¶
- class mlir.dialects._structured_transform_ops_gen.GeneralizeOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irTransforms a named structured operation into the generic form with the explicit attached region.
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the
targethandle generalize properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced equivalent generic operations, which can be empty or contain the original ops if they were already in generic form.- OPERATION_NAME = 'transform.structured.generalize'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_generalize(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.HoistPadBuildPackingLoopNestOp(packing_loop, target, loop, *, transpose=None, loc=None, ip=None)¶
Bases:
_ods_irHelper transform used to hoist a tensor.pad target operation. This operation creates the packing loop nest required by the hoist_pad operation and makes that functionality available independently.
TODO: In the future, we should consider rewriting as a linalg.pack after hoisting since this abstraction is now available.
Return modes¶
This operation ignores non-tensor.pad ops and drops them in the result. If any non-tensor.pad is passed, the transform emits a silenceable failure.
The return handle points to only the subset of successfully created packing loop nests, which can be empty.
- OPERATION_NAME = 'transform.structured.hoist_pad.build_packing_loop_nest'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- loop() _ods_ir¶
- transpose() _ods_ir¶
- packing_loop() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_hoist_pad_build_packing_loop_nest(packing_loop, target, loop, *, transpose=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.HoistPadOp(transformed, target, num_loops, *, transpose=None, loc=None, ip=None)¶
Bases:
_ods_irHoist the tensor.pad target operation by at most the given number of loops. Optionally apply the transpose attribute to the inner dimensions.
TODO: In the future, we should consider rewriting as a linalg.pack after hoisting since this abstraction is now available. TODO: Maybe also return the linalg.generic transpose created at some point.
Return modes¶
This operation ignores non-tensor.pad ops and drops them in the result. If any non-tensor.pad is passed, the transform emits a silenceable failure.
If all the operations referred to by the
targethandle padproperly, the transform succeeds. Otherwise the transform produces a silenceable failure.The return handle points to only the subset of successfully hoisted tensor.pad operations, which can be empty.
- OPERATION_NAME = 'transform.structured.hoist_pad'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- num_loops() _ods_ir¶
- transpose() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_hoist_pad(transformed, target, num_loops, *, transpose=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.HoistRedundantVectorBroadcastsOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irHoist vector.extract / vector.broadcasts pairs out of immediately enclosing scf::ForOp iteratively.
Return modes:¶
The operation always succeeds and returns a handle to the transformed function op.
- OPERATION_NAME = 'transform.structured.hoist_redundant_vector_broadcasts'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_hoist_redundant_vector_broadcasts(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.HoistRedundantVectorTransfersOp(transformed, target, *, verify_non_zero_trip=None, loc=None, ip=None)¶
Bases:
_ods_irHoist vector.transfer_read / vector.transfer_write pairs out of immediately enclosing scf::ForOp iteratively, if the following conditions are true:
The 2 ops access the same memref with the same indices.
All operands are invariant under the enclosing scf::ForOp.
#. No uses of the memref either dominate the transfer_read or are dominated by the transfer_write (i.e. no aliasing between the write and the read across the loop)
WARNING: This hoisting does not model parallelism and is generally incorrect when used on distributed loops with memref semantics! TODO: obsolete and should be retired.
Return modes:¶
The operation always succeeds and returns a handle to the transformed function op.
- OPERATION_NAME = 'transform.structured.hoist_redundant_vector_transfers'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- verify_non_zero_trip() bool¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_hoist_redundant_vector_transfers(transformed, target, *, verify_non_zero_trip=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.InsertSliceToCopyOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irTargeted rewrite of an tensor.insert_slice to linalg.copy. This is useful to materialize copies explicitly before bufferization and transform them, avoiding the need to rediscover them after bufferization.
If the insert_slice source is already a linalg.copy, only return the source op (i.e. do not create an additional linalg.copy op).
Return modes:¶
The operation always succeeds and returns a handle to the relevant linalg.copy op.
- OPERATION_NAME = 'transform.structured.insert_slice_to_copy'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_insert_slice_to_copy(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.InterchangeOp(transformed, target, *, iterator_interchange=None, loc=None, ip=None)¶
Bases:
_ods_irInterchanges the iterators of the operations pointed to by the target handle using the iterator interchange attribute.
Return modes¶
This operation ignores non-linalg::Generic ops and drops them in the return. This operation fails if the interchange attribute is invalid. If all the operations referred to by the
targethandle interchange properly, the transform succeeds. If any interchange fails, the transform produces a definite failure. The return handle points to only the subset of successfully produced interchanged operations, which can be empty.- OPERATION_NAME = 'transform.structured.interchange'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- iterator_interchange() _ods_ir | None¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_interchange(transformed, target, *, iterator_interchange=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.LinalgCopyToMemrefOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irTargeted rewrite of a linalg.copy on memrefs to a memref.copy. This is useful when bufferizing copies to a linalg.copy, later applying some transformations, and then rewriting the copy into a memref.copy. If the element types of the source and destination differ, or if the source is a scalar, the transform produces a silenceable failure.
- OPERATION_NAME = 'transform.structured.linalg_copy_to_memref'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_linalg_copy_to_memref(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.LowerPackOp(pad_op, expand_shape_op, transpose_op, target, *, lowerPadLikeWithInsertSlice=None, loc=None, ip=None)¶
Bases:
_ods_irRewrite a linalg.pack into tensor.pad + tensor.expand_shape + linalg.transpose.
Return modes¶
This operation ignores non-pack ops and drops them in the return. This operation produces a silenceable failure if the rewrite fails for any reason. If all the operations referred to by the
targetare rewritten, the transform succeeds. Return handles to the newly produced pad, expand_shape and transpose ops.- OPERATION_NAME = 'transform.structured.lower_pack'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- lowerPadLikeWithInsertSlice() _ods_ir¶
- pad_op() _ods_ir¶
- expand_shape_op() _ods_ir¶
- transpose_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_lower_pack(pad_op, expand_shape_op, transpose_op, target, *, lower_pad_like_with_insert_slice=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.LowerUnPackOp(empty_op, transpose_op, collapse_shape_op, extract_slice_op, target, *, lowerUnpadLikeWithExtractSlice=None, loc=None, ip=None)¶
Bases:
_ods_irLower a linalg.unpack into empty + linalg.transpose + tensor.collapse_shape + tensor.extract_slice.
Return modes¶
This operation ignores non-unpack ops and drops them in the return. This operation produces a silenceable failure if the rewrite fails for any reason. If all the operations referred to by the
targetare rewritten, the transform succeeds. Return handles to the newly produced empty, transpose, collapse_shape and extract_slice ops.- OPERATION_NAME = 'transform.structured.lower_unpack'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- lowerUnpadLikeWithExtractSlice() _ods_ir¶
- empty_op() _ods_ir¶
- transpose_op() _ods_ir¶
- collapse_shape_op() _ods_ir¶
- extract_slice_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_lower_unpack(empty_op, transpose_op, collapse_shape_op, extract_slice_op, target, *, lower_unpad_like_with_extract_slice=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.MapCopyToThreadsOp(forall_op, tiled_op, target, total_num_threads, desired_bit_alignment, *, loc=None, ip=None)¶
Bases:
_ods_irTargeted mapping of a linalg.copy / tensor.pad operation on tensors to a GPU thread mapping.
This operation implements a greedy heuristic that determines a good distribution of threads to break down the copy/pad operation into. The heuristic is driven by considerations related to the underlying architecture for which good high-level decisions are needed assuming certain hardware features. Relevant features are exposed via first-class attributes to control the behavior of the transformation at a high level.
For now, a single heuristic is implemented and can be extended on a per-need basis.
Return modes¶
This operation fails definitely if there is an unsupported op (i.e., not linalg.copy / tensor.pad) among the targeted op. Otherwise, the operation always succeeds and returns a handle to the relevant tiled linalg.copy / tensor.pad op and the enclosing scf.forall op.
- OPERATION_NAME = 'transform.structured.gpu.map_copy_to_threads'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- total_num_threads() _ods_ir¶
- desired_bit_alignment() _ods_ir¶
- forall_op() _ods_ir¶
- tiled_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_gpu_map_copy_to_threads(forall_op, tiled_op, target, total_num_threads, desired_bit_alignment, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.MatchOp(results_, target, *, ops=None, interface=None, op_attrs=None, filter_result_type=None, filter_operand_types=None, loc=None, ip=None)¶
Bases:
_ods_irMatch op with the specified constraints, within the target op.
The following constraints are supported:
interface: an optional MatchInterfaceEnum specifying an enum
representation for an interface to target. * ops: an optional StrArrayAttr specifying the concrete name of an op. Multiple names can be specified. Matched ops must have one of specified names. * attribute: the matched op must have all specified attributes (with their specified values). * filter_result_type: the matched op must return exactly this one type. * filter_operand_types: all the operands of the matched op must must be of this type. If more than a type is specified, then the length of the list must be equal to the number of operands in the matched op, and the match will succeed only if the operand types match all the types in the list in the order in which they are specified.
Note: Only ops that satisfy all specified constraints are matched.
TODO: Extend with regions to allow a limited form of constraints.
Return modes¶
This op traverses the ops nested under
targetand returns the handles to all the operations that match the requirements.This op fails if the target is not a handle to exactly one operation. Otherwise it succeeds.
This operation does not consume the target handle and produces new handles: it is a navigation op.
- OPERATION_NAME = 'transform.structured.match'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- ops() _ods_ir | None¶
- interface() _ods_ir | None¶
- op_attrs() _ods_ir | None¶
- filter_result_type() _ods_ir | None¶
- filter_operand_types() _ods_ir | None¶
- results_() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_match(results_, target, *, ops=None, interface=None, op_attrs=None, filter_result_type=None, filter_operand_types=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.MultiTileSizesOp(low_size, high_size, split_point, target, dimension, target_size, *, divisor=None, loc=None, ip=None)¶
Bases:
_ods_irEmits the IR computing the tile sizes
s1ands2such that:there exists a combination of
ntiles of sizes1andmtiles of
size
s2that covers the entirety of the iteration spacedimensionof the target structured op; *s1,s2is less than or equal totarget_size; *s1ands2are divisible by `divisor.For example, for a dimension of size 54 with target size 12 and divisor 2, this can emit the IR computing the tile size 10, used for 3 tiles, and 12, used for 2 tiles, totally 10*3 + 12*2 = 54. Note that when the divisor does not divide the original dimension size, it is impossible to compute such tile sizes. An assertion is emitted to guard against this in the dynamic case.
Expects the target size and the divisor to be strictly positive. Folds the IR as much as possible, normally obtaining constant sizes and numbers of tiles for a statically known dimension.
This does not consume the target handle and produces three handles each pointing to single-result index-typed operations (which may be arithmetic constant operations) defining the two respective tile sizes and the product of the first tile size with the number of tiles of that size (useful for splitting the iteration space).
This operation composes with the regular tiling when applied per-dimension:
%sz1, %sz2, %split = structured.multitile_sizes %target { target_size = 10, dimension = 1 } : !transform.any_op, !transform.param<i64>, !transform.param<i64>, !transform.param<i64> %handles = structured.split %target after %split { dimension = 1 } : !transform.any_op, !transform.param<i64> %low, %high = transform.split_handle %handles : (!transform.any_op) -> (!transform.any_op, !transform.any_op) %tiled_low, %loop1 = structured.tile_using_for %low [0, %sz1] : (!transform.any_op, !transform.param<i64>) -> (!transform.any_op, !transform.any_op) %tiled_high, %loop2 = structured.tile_using_for %high [0, %sz2] : (!transform.any_op, !transform.param<i64>) -> (!transform.any_op, !transform.any_op) %common = merge_handles %tiled_low, %tiled_high : !transform.any_op %sz3, %sz4, %split = structured.multitile_size %target { target_size = 42, dimension = 0 } : !transform.any_op, !transform.any_op, !transform.any_op, !transform.any_op %sz3r, %sz4r, %splitr = replicate num(%common) %sz3, %sz4, %splitr : !transform.any_op, !transform.any_op, !transform.any_op structured.split %common after %splitr { dimension = 0 } : !transform.any_op, !transform.any_op // ...
- OPERATION_NAME = 'transform.structured.multitile_sizes'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- dimension() _ods_ir¶
- target_size() _ods_ir¶
- divisor() _ods_ir¶
- low_size() _ods_ir¶
- high_size() _ods_ir¶
- split_point() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_multitile_sizes(low_size, high_size, split_point, target, dimension, target_size, *, divisor=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PackGreedilyOp(packed_op, target, matmul_packed_sizes, *, static_matmul_packed_sizes=None, matmul_padded_sizes_next_multiple_of=None, matmul_inner_dims_order=None, loc=None, ip=None)¶
Bases:
_ods_irTarget a Linalg op and rewrite it into packed LinalgOp form by trying to infer whether a known suboperation is embedded
Different packing strategies are applied in order, when one applies successfully, the transform returns:
#. Matmul packing: Try to infer a matmul operation embedded in the target op. Specifically, this looks for 2 parallel dimensions that participate in an outer-product and 1 reduction dimension. These dimensions are referred as (m, n, k) to match canonical matmul terminology.The packed sizes for (m, n, k) are specified by
matmul_packed_sizesand the optionalmatmul_padded_sizes_next_multiple_of. When an entrymatmul_packed_sizes[i]is non-0, the corresponding dimension is packed bymatmul_packed_sizes[i]. Otherwise, the dimension is merely padded to the next multiple ofmatmul_padded_sizes_next_multiple_of[i].``matmul_padded_sizes_next_multiple_of`` is optional and is expected to either be empty or of size3, matching the size ofmatmul_packed_sizes. For each individual element ofmatmul_packed_sizesandmatmul_padded_sizes_next_multiple_of, only one of them is allowed to be non-zero.The ordering of the packed dimensions (mm, nn, kk) is specified by thematmul_inner_dims_orderattribute.Packing occurs as follows:
Find the dimensions to pack according to the strategy.
The target is converted to linalg.generic form.
#. An interchange transform is applied to isolate the dimensions to pack as the most minor indexing dimensions of the linalg.generic. The most minor dimensions are themselves ordered according to
inner_dims_order. #. An elementwise traversal ofmatmul_packed_sizesandmatmul_padded_sizes_next_multiple_ofis performed and for each dimensiond, either pack tomatmul_packed_sizes[d]or pad to thematmul_padded_sizes_next_multiple_of[d]. #. Packing/padding is performed by the amounts determined in step 4. and followinginner_dims_order.By normalizing the most minor dimensions to
inner_dims_order, the transform guarantees that packing immediately generates inner dimensions in a desirable layout.Outer dimension layout permutations are not controlled by this transform op at the moment and can be obtained by composing with the pack_transpose transformation.
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. It returns the list of packed Linalg ops or the original op when all available packing strategies failed to apply.
- OPERATION_NAME = 'transform.structured.pack_greedily'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- matmul_packed_sizes() _ods_ir¶
- static_matmul_packed_sizes() _ods_ir¶
- matmul_padded_sizes_next_multiple_of() _ods_ir¶
- matmul_inner_dims_order() _ods_ir¶
- packed_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_pack_greedily(packed_op, target, matmul_packed_sizes, *, static_matmul_packed_sizes=None, matmul_padded_sizes_next_multiple_of=None, matmul_inner_dims_order=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PackOp(packed_op, target, packed_sizes, *, static_packed_sizes=None, loc=None, ip=None)¶
Bases:
_ods_irPack a LinalgOp by applying a data tiling transformation on the op and packing the operands according to the
packed_sizesspecification.Iterator dimensions are tiled in their canonical order in the op spec. Operands are packed according to the same canonical order of the op iterator dimensions.
Specifying a packed size of 0 for an iterator removes it from consideration for packing.
linalg.pack(resp.linalg.unpack) operations are inserted for the operands (resp. results) that need to be packed (resp. unpacked) according to thepacked_sizesspecification.Example¶
Consider a
linalg.matmulwith indexing maps:// M N K M K // affine_map<(d0, d1, d2) -> (d0, d2)> // K N // affine_map<(d0, d1, d2) -> (d2, d1)> // M N // affine_map<(d0, d1, d2) -> (d0, d1)> %0 = linalg.matmul ins(%A, %B: tensor<?x?xf32>, tensor<?x?xf32>) outs( %C: tensor<?x?xf32>)Specifying packed_sizes [2, 3, 4] results in tiling the iterator dimensions M, N and K, in this order, in both the op and its operands.
// M N K m n k M K m k // affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d2, d3, d5)> // K N n k // affine_map<(d0, d1, d2, d3, d4, d5) -> (d2, d1, d4, d5)> // M N m n // affine_map<(d0, d1, d2, d3, d4, d5) -> (d0, d1, d3, d4)> %0 = linalg.generic_representing_some_higher_d_matmul ins(%A, %B: tensor<?x?x2x4xf32>, tensor<?x?x4x3xf32>) outs( %C: tensor<?x?x2x3xf32>)In particular, note that the second operand
Bhas shapeKxNxnxk(and notKxNxkxnas one could expect by looking only at the operand).Other layouts can be obtained unsurprisingly from this canonical transformation by composing the resulting operation with a
transform.structured.pack_transposeop. This composition allows separating concerns and composes better compared to adding additional permutation attributes to this transform op.Return modes¶
This operation applies to a single Linalg op, otherwise it fails. This operation may produce a definite failure if the packing fails for any reason.
The returned handle point to the packed LinalgOp.
- OPERATION_NAME = 'transform.structured.pack'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- packed_sizes() _ods_ir¶
- static_packed_sizes() _ods_ir¶
- packed_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_pack(packed_op, target, packed_sizes, *, static_packed_sizes=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PackTransposeOp(packed_op, pack_op, un_pack_op, target_pack_or_un_pack_op, target_linalg_op, *, outer_perm=None, inner_perm=None, loc=None, ip=None)¶
Bases:
_ods_irApply a transposition to a single
linalg.pack(resp.linalg.unpack) and update thelinalg.genericop that consumes (resp. produces) the operation.This transform allows composing a simple
structured.packwith additional transpositions to e.g. match the data format required by a specific library call or ISA instruction.The transpose spec must specify at least one of
outer_permorinner_permattributes, which will act upon theouter_dims_permorinner_dims_posof the specifiedlinalg.packorlinalg.unpackop.If the
targetof this op is alinalg.packthen a newtensor.emptywill be created along with transposed versions of thelinalg.packand the consuminglinalg.generic, which is expected to be the sole consumer.If the
targetof this op is alinalg.unpackthen the whole pack / compute / unpack chain will be transposed and transposed clones oflinalg.pack, the consuminglinalg.genericand the taillinalg.packwill be created.Return modes¶
This operation targets a single
linalg.pack/linalg.unpackop and a single matchinglinalg.genericthat consumes / produces the op. Otherwise, it produces a silenceableFailure.This operation may produce a silenceableFailure if the transpose spec is ill-formed (i.e.
outer_permorinner_permare not permutations of the proper rank) or if the transposition of all involved operations fails for any reason.This operation returns 3 handles, one to the transformed LinalgOp, one to the transformed
linalg.packand one to the transformedlinalg.unpack. The last handle forlinalg.unpackis empty iftarget_pack_or_unpack_opwas not itself alinalg.unpack.- OPERATION_NAME = 'transform.structured.pack_transpose'¶
- _ODS_REGIONS = (0, True)¶
- target_pack_or_un_pack_op() _ods_ir¶
- target_linalg_op() _ods_ir¶
- outer_perm() _ods_ir | None¶
- inner_perm() _ods_ir | None¶
- packed_op() _ods_ir¶
- pack_op() _ods_ir¶
- un_pack_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_pack_transpose(packed_op, pack_op, un_pack_op, target_pack_or_un_pack_op, target_linalg_op, *, outer_perm=None, inner_perm=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PadOp(padded, pad, copy, target, pad_to_multiple_of, *, padding_values=None, padding_dimensions=None, static_pad_to_multiple_of=None, nofold_flags=None, transpose_paddings=None, copy_back_op=None, use_prescribed_tensor_shapes=None, loc=None, ip=None)¶
Bases:
_ods_irPads the operations pointed to by the target handle using the options provides as operation attributes. The operation returns a handle to the padded operation and to the padding operation (“tensor.pad”).
To preserve tensor SSA use-def chains, the unpadded result is copied back to the original destination tensor of the targeted op. The op that copies back the result can be customized with
copy_back_op:“bufferization.materialize_in_destination” (default)
“linalg.copy”
“none” (no copy back)
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. This operation may produce a definite failure if the padding fails for any reason.
If all the operations referred to by the
targethandle pad properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced padded operations, which can be empty.- OPERATION_NAME = 'transform.structured.pad'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- pad_to_multiple_of() _ods_ir¶
- padding_values() _ods_ir¶
- padding_dimensions() _ods_ir¶
- static_pad_to_multiple_of() _ods_ir | None¶
- nofold_flags() _ods_ir¶
- transpose_paddings() _ods_ir¶
- copy_back_op() _ods_ir¶
- use_prescribed_tensor_shapes() bool¶
- padded() _ods_ir¶
- pad() _ods_ir¶
- copy() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_pad(padded, pad, copy, target, pad_to_multiple_of, *, padding_values=None, padding_dimensions=None, static_pad_to_multiple_of=None, nofold_flags=None, transpose_paddings=None, copy_back_op=None, use_prescribed_tensor_shapes=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PadTilingInterfaceOp(padded, pad, target, padding_sizes, *, padding_values=None, static_padding_sizes=None, pad_to_multiple_of=None, loc=None, ip=None)¶
Bases:
_ods_irPads the iteration domain of the operations pointed to by the target handle using the options provided as operation attributes. Padding the iteration domain induces a padding of the operands that is consistent across the op semantics and, unlike for simple elementwise ops, may not be trivially deducible or specifiable on operands only (e.g. convolutions). Currently, only a limited set of projected permutation maps are supported.
The specification of
padding_sizesfollows that oftile_sizesduring tiling: the value “0” on a particular iterator encode “no padding”. Like in the case of tiling, an automatic completion by 0 to the operation rank occurs.This transformation returns a handle to the padded operation and to the padding operation (“tensor.pad”).
TODO: in the future this should be moved out of a specific Linalg implementation file and into a more general “Structured” file.
Return modes¶
This operation ignores non-IndexingMapOpInterface ops and drops them in the return. In the future, this operation will support all TilingInterfaceOps for which the contract between iteration domain and operands can be reified.
This operation may produce a definite failure if the padding fails for any reason.
If all the operations referred to by the
targethandle pad properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to only the subset of successfully produced padded operations, which can be empty.- OPERATION_NAME = 'transform.structured.pad_tiling_interface'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- padding_sizes() _ods_ir¶
- padding_values() _ods_ir¶
- static_padding_sizes() _ods_ir | None¶
- pad_to_multiple_of() bool¶
- padded() _ods_ir¶
- pad() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_pad_tiling_interface(padded, pad, target, padding_sizes, *, padding_values=None, static_padding_sizes=None, pad_to_multiple_of=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PromoteOp(transformed, target, *, operands_to_promote=None, use_full_tile_buffers=None, use_full_tiles_by_default=None, use_original_subview_size=None, use_alloca=None, memory_space=None, mapping=None, alignment=None, loc=None, ip=None)¶
Bases:
_ods_irPromotes the specified operands of the target into a separate memory buffer.
At this point, this transform does not allow customizing alloc/dealloc functions nor the behavior on copy in/out operations.
Return modes¶
This operation applies to a single Linalg op that satisfies the
promoteSubviewsPrecondition, otherwise it fails.If the operations referred to by the
targethandle promote properly, the transform succeeds.When successful, the return handle points to the $target operation that was modified inplace.
- OPERATION_NAME = 'transform.structured.promote'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- operands_to_promote() _ods_ir¶
- use_full_tile_buffers() _ods_ir¶
- use_full_tiles_by_default() bool¶
- use_original_subview_size() bool¶
- use_alloca() bool¶
- memory_space() _ods_ir | None¶
- mapping() _ods_ir | None¶
- alignment() _ods_ir | None¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_promote(transformed, target, *, operands_to_promote=None, use_full_tile_buffers=None, use_full_tiles_by_default=None, use_original_subview_size=None, use_alloca=None, memory_space=None, mapping=None, alignment=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.PromoteTensorOp(tensor, *, memory_space=None, results=None, loc=None, ip=None)¶
Bases:
_ods_irRequests that a tensor value lives in a specific memory space for its lifetime. This is achieved by allocating a new tensor in the desired memory space with
bufferization.alloc_tensorand optionally materializing the source value into that allocation withbufferization.materialize_in_destination. All uses of the original value are then redirected to the promoted value.The generated code for promoting tensor value %0 resembles the following:
%1 = bufferization.alloc_tensor(<dynamic dims of %0>) { memory_space = memory_space } // Note: the materialization is omitted if %0 is never read and is only // written into (i.e., it behaves as a result tensor). %2 = bufferization.materialize_in_destination %0 in %1 // … <all users of %0 now use %2 instead>
Deallocation is not handled by this transform.
Return modes:
Produces a silenceable failure if the given handle does not point to
tensor-typed values. * Succeeds otherwise and returns a handle to the promoted value(s), i.e., the result of materialization if present and the allocation otherwise.
- OPERATION_NAME = 'transform.structured.promote_tensor'¶
- _ODS_REGIONS = (0, True)¶
- tensor() _ods_ir¶
- memory_space() _ods_ir | None¶
- promoted() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_promote_tensor(tensor, *, memory_space=None, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ReplaceOp(replacement, target, *, loc=None, ip=None)¶
Bases:
_ods_irReplace all
targetpayload ops with the single op that is contained in this op’s region. All targets must have zero arguments and must be isolated from above.This op is for debugging/experiments only.
Return modes¶
This operation consumes the
targethandle.- OPERATION_NAME = 'transform.structured.replace'¶
- _ODS_REGIONS = (1, True)¶
- target() _ods_ir¶
- replacement() _ods_ir¶
- bodyRegion() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_replace(replacement, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.RewriteInDestinationPassingStyleOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irRewrite a supported tensor operation that is not in destination-passing style into a form that is in destination-passing style. Currently supported operations are:
tensor.pad
tensor.generate
tensor.from_elements
This dichotomy hints at a future interface, for now the implementation just switches between different implementation.
Return modes¶
This operation ignores non-unsupported ops and drops them from the return. If all the operations referred to by the
targethandle generalize properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The return handle points to a subset of successfully produced operations:tensor.padcase, the returned handle points to the tensor.insert_slice.tensor.generatecase, the returned handle points to the linalg.generic.tensor.from_elementscase, the returned handle points to the last
tensor.insert.- OPERATION_NAME = 'transform.structured.rewrite_in_destination_passing_style'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_rewrite_in_destination_passing_style(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.ScalarizeOp(result, target, *, loc=None, ip=None)¶
Bases:
_ods_irIndicates that ops of a specific kind in the given function should be scalarized (i.e. their dynamic dimensions tiled by 1).
Return modes:¶
This operation ignores non-Linalg ops and drops them in the return. This operation produces definite failure if the scalarization fails for any reason. If all the operations referred to by the
targethandle scalarize properly, the transform succeeds. Otherwise the transform produces a silenceable failure.The return handle points to only the subset of successfully produced tiled-by-1 operations, which can be empty.
This operation does not return handles to the tiled loop. We make this design choice because it is hard to know ahead of time the number of loops that will be produced (it depends on the number of dynamic dimensions after multiple transformations have been applied). Loops can always be recovered by navigating from the tiled operations if needed.
- OPERATION_NAME = 'transform.structured.scalarize'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- result() _ods_ir¶
Shortcut to get an op result if it has only one (throws an error otherwise).
- mlir.dialects._structured_transform_ops_gen.structured_scalarize(result, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.SpecializeOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irTransforms a generic operation into the equivalent named form.
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. If all the operations referred to by the
targethandle specialize, the transform succeeds; otherwise, the operation produces a silenceable failure. The return handle points to only the subset of successfully produced equivalent named operations, which can be empty or contain the original ops if they were already in named form. The supported specialization to named Linalg operations are:linalg.copy of any rank.
- OPERATION_NAME = 'transform.structured.specialize'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_specialize(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.SplitOp(split_list, target, dimension, static_chunk_sizes, *, dynamic_chunk_sizes=None, multiway=None, loc=None, ip=None)¶
Bases:
_ods_irSplits the given
targetop into two or more complementary parts, which combined cover the entire iteration domain of the original op. The split is performed along the iteration space dimension provided as chunk size attribute specifying the size of the lower part; the remaining range in the iteration space is assigned as the upper part. In case of dimension overflow, the transformation fails. The split is performed at the dimension iterator value specified as either the static chunk size attribute when it is known at transform IR construction time or as the handle to an operation producing a single index-typed value when it is computed by payload IR. In the latter case, the chunk size point must be set toShapedType::kDynamicand the dynamic size handle must point to as many value-producing operations as there are structured operations pointed to by the target handle.The operation consumes the target handle, but preserves the chunk size handle if provided. Without the
multiwayattribute, it produces a new handle that is a list of the two parts of the structured op after splitting, whose lower index part corresponding to the part with lower iteration space indices.Multiway split mode is enabled by specifying the
multiwayattribute. In this mode a singletargetop is split into multiple parts covering the iteration space of the specified dimension.static_chunk_sizesanddynamic_chunk_sizesin this case is a list of chunk sizes that the given dimension should be split into. Withmultiwayit also produces a handle; The result handle is a list of the multiple parts of the structured op after splitting, where the target dimensions for each linalg op in the list corresponds to the chunk sizes specfied in the input split list. If the chunk sizes do not cover the entire iteration space, the leftover chunk is the last payload in the result handle.As the result handle is most of time a list, an
transform.split_handleis needed to access individual handle.- OPERATION_NAME = 'transform.structured.split'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- dynamic_chunk_sizes() _ods_ir | None¶
- dimension() _ods_ir¶
- static_chunk_sizes() _ods_ir¶
- multiway() bool¶
- split_list() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_split(split_list, target, dimension, static_chunk_sizes, *, dynamic_chunk_sizes=None, multiway=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.SplitReductionOp(init_or_alloc_op, fill_op, split_linalg_op, combining_linalg_op, target, *, split_factor=None, insert_split_dimension=None, inner_parallel=None, use_scaling_algorithm=None, use_alloc=None, loc=None, ip=None)¶
Bases:
_ods_irIndicates that the given
targetop should be transformed with thesplitReductiontransformation and split factor provided as attribute.The
splitReductiontransformation splits the first single linalg op reduction into a parallel and reduction dimension. A newlinalg.genericop is created to perform the rest of the reduction.The transformation supports different configurations attributes:
split_factor: the factor by which to split (i.e. the size of the
remaining reduction after splitting). * insert_split_dimension: the dimension in the temporary tensor into which the new parallel dimension is inserted. * inner_parallel: specifies whether the parallel dimension is before or after the reduction dimension in the splitting op. * use_scaling_algorithm: whether to use a scaling based formulation that does not create an ExpandShapeOp (default: do not use scaling) * use_alloc: whether to use an alloc op to allocate the temporary tensor (default: do not use alloc op)
Return modes¶
This operation ignores non-Linalg ops and drops them in the return. This operation produces a definite failure if the splitting fails for any reason.
If all the operations referred to by the
targethandle split properly, the transform succeeds. Otherwise the transform produces a silenceable failure. The 4 returned handles points to only the subset of successfully produced computational operations, which can all be empty. This 4 returned handles point to:the init op (or tensor_alloc op if use_alloc = true),
the fill op used to initialize the neutral element,
the split op and
the result-combining op.
Example (default:
use_scaling_algorithm = false, use_alloc = false):¶%r = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>], iterator_types = ["reduction"]} ins(%in : tensor<32xf32>) outs(%out : tensor<f32>) { ^bb0(%arg1: f32, %arg2: f32): %y = arith.addf %arg1, %arg2 : f32 linalg.yield %y : f32 } -> tensor<f32>
is split into:
%cst = arith.constant 0.000000e+00 : f32 %0 = tensor.expand_shape %in [[0, 1]] : tensor<32xf32> into tensor<4x8xf32> %1 = tensor.empty() : tensor<4xf32> %2 = linalg.fill ins(%cst : f32) outs(%1 : tensor<4xf32>) -> tensor<4xf32> %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%0 : tensor<4x8xf32>) outs(%2 : tensor<4xf32>) { ^bb0(%arg3: f32, %arg5: f32): %5 = arith.addf %arg3, %arg4 : f32 linalg.yield %5 : f32 } -> tensor<4xf32> %r = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> ()>], iterator_types = ["reduction"]} ins(%3 : tensor<4xf32>) outs(%out : tensor<f32>) { ^bb0(%arg3: f32, %arg4: f32): %5 = arith.addf %arg3, %arg4 : f32 linalg.yield %5 : f32 } -> tensor<f32>
Example (
use_scaling_algorithm = true, use_alloc = true):¶Instead of introducing an ExpandShapeOp, this scaling-based implementation rewrites a reduction dimension
kintok * split_factor + kk. The dimensionkkis added as an extra parallel dimension to the intermediate output tensor at positioninsert_split_dimension.Consider a minimal example where
kis reduced: O(i, j) += I(i, j, k) Assume i=3, j=5, k=128, split_factor=16 and insert_split_dimension=0. The compute is rewritten as: a. O_i(kk, i, j) += I(i, j, 16 * k + kk) b. O(i, j) += O_i(kk, i, j) The intermediate tensor O_i is of shape (128/16)x3x5 == 8x3x5.Example:¶
%0 = linalg.matmul ins(%A, %B: tensor<16x256xf32>, tensor<256x32xf32>) outs(%C: tensor<16x32xf32>) -> tensor<16x32xf32>
Is transformed to:
#map0 = affine_map<(d0, d1, d2, d3) -> (d0, d2 * 4 + d3)> #map1 = affine_map<(d0, d1, d2, d3) -> (d2 * 4 + d3, d1)> #map2 = affine_map<(d0, d1, d2, d3) -> (d2, d3)> #map3 = affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)> #map4 = affine_map<(d0, d1, d2) -> (d0, d1, d2)> #map5 = affine_map<(d0, d1, d2) -> (d0, d1)> %0 = tensor.empty() : tensor<16x32x64xf32> %cst = arith.constant 0.000000e+00 : f32 %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<16x32x64xf32>) -> tensor<16x32x64xf32> %2 = tensor.empty() : tensor<64x4xi1> %3 = linalg.generic {indexing_maps = [#map0, #map1, #map2, #map3], iterator_types = ["parallel", "parallel", "parallel", "reduction"]} ins(%A, %B, %2 : tensor<16x256xf32>, tensor<256x32xf32>, tensor<64x4xi1>) outs(%1 : tensor<16x32x64xf32>) { ^bb0(%arg3: f32, %arg4: f32, %arg5: i1, %arg6: f32): %5 = arith.mulf %arg3, %arg4 : f32 %6 = arith.addf %arg6, %5 : f32 linalg.yield %6 : f32 } -> tensor<16x32x64xf32> %4 = linalg.generic {indexing_maps = [#map4, #map5], iterator_types = ["parallel", "parallel", "reduction"]} ins(%3 : tensor<16x32x64xf32>) outs(%C : tensor<16x32xf32>) { ^bb0(%arg3: f32, %arg4: f32): %5 = arith.addf %arg3, %arg4 : f32 linalg.yield %5 : f32 } -> tensor<16x32xf32> return %4 : tensor<16x32xf32>
- OPERATION_NAME = 'transform.structured.split_reduction'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- split_factor() _ods_ir¶
- insert_split_dimension() _ods_ir¶
- inner_parallel() bool¶
- use_scaling_algorithm() bool¶
- use_alloc() bool¶
- init_or_alloc_op() _ods_ir¶
- fill_op() _ods_ir¶
- split_linalg_op() _ods_ir¶
- combining_linalg_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_split_reduction(init_or_alloc_op, fill_op, split_linalg_op, combining_linalg_op, target, *, split_factor=None, insert_split_dimension=None, inner_parallel=None, use_scaling_algorithm=None, use_alloc=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.TileReductionUsingForOp(fill_op, split_op, combining_op, for_op, target, *, reduction_dims=None, tile_sizes=None, loc=None, ip=None)¶
Bases:
_ods_irIndicates that the given
targetop should be transformed with thetileReductiontransformation with the tile size provided as attribute.This transformation tiles the
targetalong the reduction dimensions. It creates a tensor initialized with the identity value. Then it creates nested loops with a parallel version oftargetop inside. The parallel op dimensions are less or equal to the tile size passed by user. After the loop a merge operation is created to do a final reduction with the partial reductions. The initial tensor always uses the tile size dimension. This may overallocate if the tile size is greater than the reduction dimension.Return modes¶
Returns 4 handles associated with (in order):
the fill op used to initialize the neutral element,
the parallel tiled op and
the result-combining op,
the parent
forop.
The
reduction_dimscan be used to specify the subset of reduction dimensions of the operation to tile. If left unspecified, all reduction dimensions are tiled.Example:¶
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%arg0 : tensor<?x?xf32>) outs(%out : tensor<?xf32>) { ^bb0(%arg7: f32, %arg9: f32): %1 = arith.addf %arg7, %arg9 : f32 linalg.yield %1 : f32 } -> tensor<?xf32> return %red : tensor<?xf32>is transformed into:
%0 = tensor.empty(%dim_1) : tensor<?x5xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x5xf32>) -> tensor<?x5xf32> %2 = scf.for %arg2 = %c0 to %dim_0 step %c5 iter_args(%arg3 = %1) -> (tensor<?x5xf32>) { %extracted_slice = tensor.extract_slice %1[0, 0] [%dim, 5] [1, 1] : tensor<?x5xf32> to tensor<?x5xf32> %extracted_slice_2 = tensor.extract_slice %arg0[0, %arg2] [%dim, 5] [1, 1] : tensor<?x?xf32> to tensor<?x5xf32> %4 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} ins(%extracted_slice_2 : tensor<?x5xf32>) outs(%extracted_slice : tensor<?x5xf32>) { ^bb0(%in: f32, %out: f32): %5 = arith.addf %in, %out : f32 linalg.yield %5 : f32 } -> tensor<?x5xf32> %dim_3 = tensor.dim %1, %c0 : tensor<?x5xf32> %inserted_slice = tensor.insert_slice %4 into %arg3[0, 0] [%dim_3, 5] [1, 1] : tensor<?x5xf32> into tensor<?x5xf32> scf.yield %inserted_slice : tensor<?x5xf32> } %3 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%2 : tensor<?x5xf32>) outs(%arg1 : tensor<?xf32>) { ^bb0(%in: f32, %out: f32): %4 = arith.addf %in, %out : f32 linalg.yield %4 : f32 } -> tensor<?xf32>- OPERATION_NAME = 'transform.structured.tile_reduction_using_for'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- reduction_dims() _ods_ir¶
- tile_sizes() _ods_ir¶
- fill_op() _ods_ir¶
- split_op() _ods_ir¶
- combining_op() _ods_ir¶
- for_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_tile_reduction_using_for(fill_op, split_op, combining_op, for_op, target, *, reduction_dims=None, tile_sizes=None, loc=None, ip=None) _ods_ir | _ods_ir | TileReductionUsingForOp¶
- class mlir.dialects._structured_transform_ops_gen.TileReductionUsingForallOp(fill_op, split_op, combining_op, forall_op, target, *, reduction_dims=None, num_threads=None, tile_sizes=None, mapping=None, loc=None, ip=None)¶
Bases:
_ods_irTile a PartialReductionOpInterface op to a tiled
scf.foralldoing partial reduction.This transformation tiles the
targetalong the reduction dimensions. It creates a tensor initialized with the identity value. Then it creates ascf.forallloops with the number threads given bynum_threads. The op is tiled op with a size equal tofloordiv(size, num_threads). All the partial reduction value is are parallel inserted to create a new tensor. After the loop a merge operation is created to do a final reduction with the partial reductions tensor. If an extratile_sizesparameter is passed the tiles are cyclically distributed on the threads of thescf.forallsloop.Return modes¶
Returns 4 handles associated with (in order):
the fill op used to initialize the neutral element,
the parallel tiled op and
the result-combining op,
the parent
forallop.
Example:¶
%red = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>, affine_map<(d0, d1) -> (d0)>], iterator_types = ["parallel", "reduction"]} ins(%arg0 : tensor<?x?xf32>) outs(%out : tensor<?xf32>) { ^bb0(%arg7: f32, %arg9: f32): %1 = arith.addf %arg7, %arg9 : f32 linalg.yield %1 : f32 } -> tensor<?xf32> return %red : tensor<?xf32>is transformed into:
%0 = tensor.empty(%dim_1) : tensor<?x5xf32> %1 = linalg.fill ins(%cst : f32) outs(%0 : tensor<?x5xf32>) -> tensor<?x5xf32> %2 = scf.forall (%arg2) in (%c5) shared_outs(%arg3 = %1) -> (tensor<?x5xf32>) { %4 = affine.min #map(%arg2)[%dim_0] %5 = affine.max #map1(%4) %extracted_slice = tensor.extract_slice %arg3[0, %arg2] [%dim, 1] [1, 1] : tensor<?x5xf32> to tensor<?xf32> %6 = affine.apply #map2(%arg2)[%dim_0] %extracted_slice_2 = tensor.extract_slice %arg0[0, %6] [%dim, %5] [1, 1] : tensor<?x?xf32> to tensor<?x?xf32> %extracted_slice_3 = tensor.extract_slice %extracted_slice[0] [%dim] [1] : tensor<?xf32> to tensor<?xf32> %7 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["parallel", "reduction"]} ins(%extracted_slice_2 : tensor<?x?xf32>) outs(%extracted_slice_3 : tensor<?xf32>) { ^bb0(%in: f32, %out: f32): %9 = arith.addf %in, %out : f32 linalg.yield %9 : f32 } -> tensor<?xf32> scf.forall.in_parallel { tensor.parallel_insert_slice %7 into %arg3[0, %arg2] [%dim, 1] [1, 1] : tensor<?xf32> into tensor<?x5xf32> } } {mapping = []} %3 = linalg.generic {indexing_maps = [#map3, #map4], iterator_types = ["parallel", "reduction"]} ins(%2 : tensor<?x5xf32>) outs(%arg1 : tensor<?xf32>) { ^bb0(%in: f32, %out: f32): %4 = arith.addf %in, %out : f32 linalg.yield %4 : f32 } -> tensor<?xf32>- OPERATION_NAME = 'transform.structured.tile_reduction_using_forall'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- reduction_dims() _ods_ir¶
- num_threads() _ods_ir¶
- tile_sizes() _ods_ir¶
- mapping() _ods_ir | None¶
- fill_op() _ods_ir¶
- split_op() _ods_ir¶
- combining_op() _ods_ir¶
- forall_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_tile_reduction_using_forall(fill_op, split_op, combining_op, forall_op, target, *, reduction_dims=None, num_threads=None, tile_sizes=None, mapping=None, loc=None, ip=None) _ods_ir | _ods_ir | TileReductionUsingForallOp¶
- class mlir.dialects._structured_transform_ops_gen.TileUsingForOp(tiled_linalg_op, loops, target, dynamic_sizes, *, static_sizes=None, interchange=None, scalable_sizes=None, loc=None, ip=None)¶
Bases:
_ods_irIndicates that the given
targetop should be tiled with the given sizes. This transform generates a loop nest with a smaller (“tiled”) target operation in its body. Currently limited to LinalgOps.Tile sizes may be known at transformation time, in which case they are expected to be provided in the
static_sizeattribute, or not, in which case the tile value must be computed by the payload IR and the handle to the operation computing it must be provided throughdynamic_sizes. When the sizes are not known statically, the corresponding entry in thestatic_sizesattribute must be set toShapedType::kDynamic. Only the dynamic sizes must be provided indynamic_sizes, i.e., there should be as many handles asShapedType::kDynamicvalues in thestatic_sizesattribute. A static size of0indicates that the dimension should not be tiled. No loop will be generated for such dimensions. If all tile sizes are0, this transform is effectively a no-op.This op returns handles to the tiled op (in the generated loop nest) and the generated loops. The number of loops is the number of tile sizes that are statically known to be non-zero.
Return modes¶
On success, the resulting handles are associated with co-indexed lists of tiled operations and loops around them.
This operation only supports Linalg ops and produces a silenceable failure if the input contains any non-Linalg ops. The ops preceding it in the list associated with the
targethandle will have been tiled.This operation produces a silenceable failure if the
dynamic_sizeshandles are associated with lists of payload operations of a size different than that of the list associated with thetargethandle.If the internal implementation of tiling for any of the operations fails, produces a definite failure.
- OPERATION_NAME = 'transform.structured.tile_using_for'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- dynamic_sizes() _ods_ir¶
- static_sizes() _ods_ir | None¶
- interchange() _ods_ir | None¶
- scalable_sizes() _ods_ir | None¶
- tiled_linalg_op() _ods_ir¶
- loops() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_tile_using_for(tiled_linalg_op, loops, target, dynamic_sizes, *, static_sizes=None, interchange=None, scalable_sizes=None, loc=None, ip=None) _ods_ir | _ods_ir | TileUsingForOp¶
- class mlir.dialects._structured_transform_ops_gen.TileUsingForallOp(tiled_op, forall_op, target, num_threads, tile_sizes, *, packed_num_threads=None, packed_tile_sizes=None, static_num_threads=None, static_tile_sizes=None, mapping=None, loc=None, ip=None)¶
Bases:
_ods_irTile a TilingInterface op to a tiled
scf.forall.Tiling is applied by either specifying
num_threadsortile_size. Ifnum_threadsis specified, then the tile size for each dimensioniis calculated dynamically viaceilDiv(dimSize[i], num_threads[i]).num_threadsandtile_sizecan be either static index attributes or operation handles (or a mix thereof). Operation handles must be mapped to exactly one op that has exactly one result of index type.Static zero tile sizes indicate that the dimension is not tiled and can be thought of as tiling by the full size of data.
It is the user’s responsibility to ensure that
num_threads/tile_sizesis a valid tiling specification (i.e. that only tiles parallel dimensions, e.g. in the Linalg case). If the dimension is not parallelizable, a warning is issued to notify the user that the generated code is not safe to parallelize.If non-empty, the
mappingis added as an attribute to the resultingscf.forall.Note:
tile_sizesandnum_threadsare variadic. Each tile size/number of threads can be an index attribute or a transform handle that is mapped to exactly one payload op with exactly one index result.Return modes¶
This operation ignores ops that do not implement the TilingInterface and drops them in the return.
If all the operations referred to by the
targethandle tile successfully, the transform succeeds. Otherwise the transform produces a silenceable failure.The two returned handles point to only the subset of successfully produced tiled operations, which can all be empty.
These two returned handles point to:
the tiled op that implements TilingInterface,
the new scf.forall op.
Example using
num_threads¶%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %3:2 = transform.structured.tile_using_forall %0 num_threads [10, 20] : (!transform.any_op) -> (!transform.any_op, !transform.any_op)Example using
tile_sizes¶%0 = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!transform.any_op) -> !transform.any_op %sz = transform.structured.match ... %3:2 = transform.structured.tile_using_forall %0 tile_sizes [0, %sz, 20] : (!transform.any_op, !transform.any_op) -> (!transform.any_op, !transform.any_op)- OPERATION_NAME = 'transform.structured.tile_using_forall'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- num_threads() _ods_ir¶
- tile_sizes() _ods_ir¶
- packed_num_threads() _ods_ir | None¶
- packed_tile_sizes() _ods_ir | None¶
- static_num_threads() _ods_ir | None¶
- static_tile_sizes() _ods_ir | None¶
- mapping() _ods_ir | None¶
- tiled_op() _ods_ir¶
- forall_op() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_tile_using_forall(tiled_op, forall_op, target, num_threads, tile_sizes, *, packed_num_threads=None, packed_tile_sizes=None, static_num_threads=None, static_tile_sizes=None, mapping=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.TransposeConv2DOp(transformed, target, *, loc=None, ip=None)¶
Bases:
_ods_irConvert linalg.conv_2d_nhwc_fhwc into linalg.conv_2d_nhwc_hwcf by introducing a linalg.transpose on the filter tensor/memref.
Whilst the fhwc filter channel ordering can be desirable for certain targets and is a more direct mapping to higher level dialects such as TOSA (which only supports this ordering) hwcf is better suited for transformations such as img2col which can make use of optimized BLAS routines such as GEMM.
Returns one handle:
The final operation of the sequence that replaces the original
convolution.
Return modes:¶
Returns a definite failure if target is not isolated from above. Returns a silenceable failure if the pattern application failed.
- OPERATION_NAME = 'transform.structured.transpose_conv2d'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_transpose_conv2d(transformed, target, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.TransposeMatmulOp(transformed, target, *, inputToTranspose=None, loc=None, ip=None)¶
Bases:
_ods_irConvert Linalg matmul ops to transposed variants.
By default the LHS matrix is transposed. Specify
<rhs>to instead transpose RHS matrix.Return modes:¶
This operation fails if
targetis unsupported, i.e., not alinalg.matmulorlinalg.batch_matmul. Otherwise, the operation succeeds and returns a handle to the transposed matmul op.- OPERATION_NAME = 'transform.structured.transpose_matmul'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- inputToTranspose() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_transpose_matmul(transformed, target, *, input_to_transpose=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.VectorizeChildrenAndApplyPatternsOp(transformed, target, *, fold_type_extensions_into_contract=None, vectorize_padding=None, vectorize_nd_extract=None, flatten_1d_depthwise_conv=None, disable_multi_reduction_to_contract_patterns=None, disable_transfer_permutation_map_lowering_patterns=None, loc=None, ip=None)¶
Bases:
_ods_irVectorizes all children contained in the given
targetusing the configuration specified by the attributes of this op. This only vectorizes structured ops that operate on shaped types and does not vectorize loops or straight-line. Internally, it applies a set of rewrite patterns, some of which enable vectorization and some of which clean up the results. Therefore, it can only be applied to an op with the “isolated from above” property. This transformation only fails if the entire pattern rewriting failed, i.e., it does not fail when no ops were vectorized.Finer granularity can be achieved either with the
VectorizeOpfor individual ops or by outlining the target part of the payload IR into, e.g., a function, performing this transformation, and inlining it back.Note that this transformation invalidates the handles to any payload IR operation that is contained inside the vectorization target.
This transformation supports the following attributes:
fold_type_extensions_into_contract: aUnitAttrto enable the folding of
type extension operations into
vector.contractto create a mixed precision operation. *vectorize_padding: aUnitAttrto activate the vectorization oftensor.padops. Different pipelines may prefer to lower such ops to loops. *disable_multi_reduction_to_contract_patterns: aUnitAttrto deactivate the rewrite ofvector.multi_reductiontovector.contract. This is intended to be used in tests only. *disable_transfer_permutation_map_lowering_patterns: aUnitAttrto deactivate the rewrite ofvector.transferwith permutation maps into explicitvector.transposeoperations. This is intended to be used in tests only but may be promoted to a first class attribute in the future.Return modes:¶
This operation produces a definite failure if vectorization fails for any reason. The operation always returns the handle to the target op that is expected to be isolated from above.
- OPERATION_NAME = 'transform.structured.vectorize_children_and_apply_patterns'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- fold_type_extensions_into_contract() bool¶
- vectorize_padding() bool¶
- vectorize_nd_extract() bool¶
- flatten_1d_depthwise_conv() bool¶
- disable_multi_reduction_to_contract_patterns() bool¶
- disable_transfer_permutation_map_lowering_patterns() bool¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_vectorize_children_and_apply_patterns(transformed, target, *, fold_type_extensions_into_contract=None, vectorize_padding=None, vectorize_nd_extract=None, flatten_1d_depthwise_conv=None, disable_multi_reduction_to_contract_patterns=None, disable_transfer_permutation_map_lowering_patterns=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._structured_transform_ops_gen.VectorizeOp(target, vector_sizes, *, static_vector_sizes=None, vectorize_nd_extract=None, assume_dynamic_dims_match_vec_sizes=None, create_named_contraction=None, scalable_sizes=None, loc=None, ip=None)¶
Bases:
_ods_irVectorize the target ops, which must be Linalg ops.
Use the optional vector sizes to specify exactly what configuration the vectorizer should use. It will then use masked vectors of the specified size to enforce this configuration (“masked vectorization”). If no vector sizes are specified, the vectorizer will infer the shapes to use from the target Linalg ops (“regular vectorization”). More specifically:
# Masked vectorization - vector sizes are specified explicitly transform.structured.vectorize %target vector_sizes [1, 4] : !transform.any_op # Regular vectorization - vector sizes are inferred from the target Op transform.structured.vectorize %target : !transform.any_op
The vector sizes can be either static or dynamic (SSA values). In case of SSA values, the handle must be mapped to exactly one payload op with exactly one index-typed result.
Note: The input vector sizes must be bigger than or equal to their counterpart iteration space sizes.
Typically this operator should be applied to linalg operations that have already been tiled to the appropriate sizes.
Return modes:¶
This operation produces a silenceable failure if at least one target op is not a Linalg op or fails to vectorize. It produces a definite failure if the dynamic vector sizes (SSA values) do not satisfy the constraints mentioned above.
- OPERATION_NAME = 'transform.structured.vectorize'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- vector_sizes() _ods_ir¶
- static_vector_sizes() _ods_ir | None¶
- vectorize_nd_extract() bool¶
- assume_dynamic_dims_match_vec_sizes() bool¶
- create_named_contraction() bool¶
- scalable_sizes() _ods_ir | None¶
- mlir.dialects._structured_transform_ops_gen.structured_vectorize(target, vector_sizes, *, static_vector_sizes=None, vectorize_nd_extract=None, assume_dynamic_dims_match_vec_sizes=None, create_named_contraction=None, scalable_sizes=None, loc=None, ip=None) VectorizeOp¶
- class mlir.dialects._structured_transform_ops_gen.WinogradConv2DOp(transformed, target, fmr, *, loc=None, ip=None)¶
Bases:
_ods_irWinograd Conv2D algorithm will convert linalg Conv2D operation into batched matrix multiply. Before the matrix multiply, it will convert filter and input into a format suitable for batched matrix multiply. After the matrix multiply, it will convert output to the final result tensor.
The algorithm F(m x m, r x r) is
Y = A^T x [(G x g x G^T) @ (B^T x d x B)] x A
The size of output Y is m x m. The size of filter g is r x r. The size of input d is (m + r - 1) x (m + r - 1). A^T, A, G^T, G, B^T, and B are transformation matrices.
Return modes:¶
This operation produces a silenceable failure if
targetis unsupported. Otherwise, the operation succeeds and returns a handle of the sequence that replaces the original convolution.- OPERATION_NAME = 'transform.structured.winograd_conv2d'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- fmr() _ods_ir¶
- transformed() _ods_ir¶
- mlir.dialects._structured_transform_ops_gen.structured_winograd_conv2d(transformed, target, fmr, *, loc=None, ip=None) _ods_ir¶