MLIR

Multi-Level IR Compiler Framework

Dialect 'vector' definition

Operation definition

vector.broadcast (vector::BroadcastOp)

broadcast operation

Description:

Broadcasts the scalar or k-D vector value in the source operand to a n-D result vector such that the broadcast makes sense, i.e., the source operand is duplicated to match the given rank and sizes in the result vector. The legality rules are:

  • the source operand must have the same element type as the result type
  • a k-D vector <s_1 x .. x s_k x type> can be broadcast to a n-D vector <t_1 x .. x t_n x type> if
    • k <= n, and
    • the sizes in the trailing dimensions n-k < i <= n with j=i+k-n match exactly as s_j = t_i or s_j = 1:
        t_1 x   ..  t_n-k x t_n-k+1 x .. x t_i x .. x t_n
                            s_1     x .. x s_j x .. x s_k
            <duplication>         <potential stretch>
    

The source operand is duplicated over all the missing leading dimensions and stretched over the trailing dimensions where the source has a non-equal dimension of 1. These rules imply that any scalar broadcast (k=0) to any shaped vector with the same element type is always legal.

Examples:

  %0 = constant 0.0 : f32
  %1 = vector.broadcast %0 : f32 to vector<16xf32>
  %2 = vector.broadcast %1 : vector<16xf32> to vector<4x16xf32>

Operands:

  1. source: any type

Attributes:

Results:

  1. vector: vector of any type values

vector.constant_mask (vector::ConstantMaskOp)

creates a constant vector mask

Description:

Creates and returns a vector mask where elements of the result vector are set to ‘0’ or ‘1’, based on whether the element indices are contained within a hyper-rectangular region specified by the ‘mask_dim_sizes’ array attribute argument. Each element of the ‘mask_dim_sizes’ array, specifies an exclusive upper bound [0, mask-dim-size-element-value) for a unique dimension in the vector result. The conjunction of the ranges define a hyper-rectangular region within which elements values are set to 1 (otherwise element values are set to 0).

Example: create a constant vector mask of size 4x3xi1 with elements in range 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0).

%1 = vector.constant_mask [3, 2] : vector<4x3xi1>

print %1 columns 0 1 2 |———— 0 | 1 1 0 rows 1 | 1 1 0 2 | 1 1 0 3 | 0 0 0

Operands:

Attributes:

AttributeMLIR TypeDescription
mask_dim_sizesArrayAttr64-bit integer array attribute attribute

Results:

  1. «unnamed»: vector of 1-bit integer values

vector.contract (vector::ContractionOp)

vector contraction operation

Description:

Computes the sum of products of vector elements along contracting dimension pairs from 2 vectors of rank M and N respectively, adds this intermediate result to the accumulator argument of rank K, and returns a vector result of rank K (where K = num_lhs_free_dims + num_rhs_free_dims + num_batch_dims (see dimension type descriptions below)). For K = 0 (no free or batch dimensions), the accumulator and output are a scalar.

Optional vector mask arguments (produced by CreateMaskOp or ConstantMaskOp) specify the dynamic dimension sizes of valid data within the lhs/rhs vector arguments.

An iterator type attribute list must be specified, where each element of the list represents an iterator with one of the following types:

*) “reduction”: reduction dimensions are present in the lhs and rhs arguments but not in the output (and accumulator argument). These are the dimensions along which the vector contraction op computes the sum of products, and contracting dimension pair dimension sizes must match between lhs/rhs. *) “parallel”: Batch dimensions are iterator type “parallel”, and are non-contracting dimensions present in the lhs, rhs and output. The lhs/rhs co-iterate along the batch dimensions, which should be expressed in their indexing maps.

           Free dimensions are iterator type "parallel", and are
           non-contraction, non-batch dimensions accessed by either the
           lhs or rhs (but not both). The lhs and rhs free dimensions
           are unrelated to each other and do not co-iterate, which
           should be expressed in their indexing maps.

An indexing map attribute list must be specified with an entry for lhs, rhs and acc arguments. An indexing map attribute specifies a mapping from each iterator in the iterator type list, to each dimension of an N-D vector.

Examples:

// Simple dot product (K = 0). #contraction_accesses = [ affine_map<(i) -> (i)>, affine_map<(i) -> (i)>, affine_map<(i) -> ()> ] #contraction_trait = { indexing_maps = #contraction_accesses, iterator_types = [“reduction”] } %3 = vector.contract #contraction_trait %0, %1, %2 : vector<10xf32>, vector<10xf32> into f32

// 2D vector contraction with one contracting dimension (matmul, K = 2). #contraction_accesses = [ affine_map<(i, j, k) -> (i, k)>, affine_map<(i, j, k) -> (k, j)>, affine_map<(i, j, k) -> (i, j)> ] #contraction_trait = { indexing_maps = #contraction_accesses, iterator_types = [“parallel”, “parallel”, “reduction”] }

%3 = vector.contract #contraction_trait %0, %1, %2 : vector<4x3xf32>, vector<3x7xf32> into vector<4x7xf32>

// 4D to 3D vector contraction with two contracting dimensions and // one batch dimension (K = 3). #contraction_accesses = [ affine_map<(b0, f0, f1, c0, c1) -> (c0, b0, c1, f0)>, affine_map<(b0, f0, f1, c0, c1) -> (b0, c1, c0, f1)>, affine_map<(b0, f0, f1, c0, c1) -> (b0, f0, f1)> ] #contraction_trait = { indexing_maps = #contraction_accesses, iterator_types = [“parallel”, “parallel”, “parallel”, “reduction”, “reduction”] }

%4 = vector.contract #contraction_trait %0, %1, %2 : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x5xf32>

// 4D vector contraction with two contracting dimensions and optional // vector mask arguments. %lhs_mask = vector.constant_mask [7, 8, 16, 15] : vector<7x8x16x15xi1> %rhs_mask = vector.constant_mask [8, 16, 7, 5] : vector<8x16x7x5xi1>

%5 = vector.contract #contraction_trait %0, %1, %2, %lhs_mask, %rhs_mask : vector<7x8x16x15xf32>, vector<8x16x7x5xf32> into vector<8x15x8x5xf32>

Operands:

  1. lhs: vector of any type values
  2. rhs: vector of any type values
  3. acc: any type
  4. masks: vector of 1-bit integer values

Attributes:

AttributeMLIR TypeDescription
indexing_mapsArrayAttrAffineMap array attribute attribute
iterator_typesArrayAttrarray attribute attribute

Results:

  1. «unnamed»: any type

vector.create_mask (vector::CreateMaskOp)

creates a vector mask

Description:

Creates and returns a vector mask where elements of the result vector are set to ‘0’ or ‘1’, based on whether the element indices are contained within a hyper-rectangular region specified by the operands. Specifically, each operand specifies a range [0, operand-value) for a unique dimension in the vector result. The conjunction of the operand ranges define a hyper-rectangular region within which elements values are set to 1 (otherwise element values are set to 0).

Example: create a vector mask of size 4x3xi1 where elements in range 0 <= row <= 2 and 0 <= col <= 1 are set to 1 (others to 0).

%1 = vector.create_mask %c3, %c2 : vector<4x3xi1>

print %1 columns 0 1 2 |———— 0 | 1 1 0 rows 1 | 1 1 0 2 | 1 1 0 3 | 0 0 0

Operands:

  1. operands: index

Attributes:

Results:

  1. «unnamed»: vector of 1-bit integer values

vector.extractelement (vector::ExtractElementOp)

extractelement operation

Description:

Takes an 1-D vector and a dynamic index position and extracts the scalar at that position. Note that this instruction resembles vector.extract, but is restricted to 1-D vectors and relaxed to dynamic indices. It is meant to be closer to LLVM’s version: https://llvm.org/docs/LangRef.html#extractelement-instruction

Example:

  %c = constant 15 : i32
  %1 = vector.extractelement %0[%c : i32]: vector<16xf32>

Operands:

  1. vector: vector of any type values
  2. position: integer

Attributes:

Results:

  1. result: any type

vector.extract (vector::ExtractOp)

extract operation

Description:

Takes an n-D vector and a k-D position and extracts the (n-k)-D vector at the proper position. Degenerates to an element type in the 0-D case.

Examples:

  %1 = vector.extract %0[3]: vector<4x8x16xf32>
  %2 = vector.extract %0[3, 3, 3]: vector<4x8x16xf32>

Operands:

  1. vector: vector of any type values

Attributes:

AttributeMLIR TypeDescription
positionArrayAttr64-bit integer array attribute attribute

Results:

  1. «unnamed»: any type

vector.extract_slices (vector::ExtractSlicesOp)

vector extract slices operation

Description:

Takes an N-d vector and returns a tuple of vector slices of ‘vector’, based on ‘sizes’ and ‘strides’ parameters.

The arguments ‘sizes’ and ‘strides’ represent a specification for generating the unrolling of ‘vector’ shape, which has all slices of shape ‘sizes’ except for slices at dimension boundaries when ‘vector’ dimension sizes are not a multiple of ‘sizes’.

Each slice is returned at the tuple element index corresponding to the linear index of the slice w.r.t the unrolling scheme represented by ‘sizes’. Currently, only unit strides are supported.

Examples:

  %0 = vector.transfer_read ...: vector<4x2xf32>

  %1 = vector.extract_slices %0, [2, 2], [1, 1]
    : vector<4x2xf32> into tuple<vector<2x2xf32>, vector<2x2xf32>>

  // Example with partial slices at dimension boundaries.
  %2 = vector.transfer_read ...: vector<4x3xf32>

  %3 = vector.extract_slices %2, [2, 2], [1, 1]
    : vector<4x3xf32> into tuple<vector<2x2xf32>, vector<2x1xf32>,
                                 vector<2x2xf32>, vector<2x1xf32>>

Operands:

  1. vector: vector of any type values

Attributes:

AttributeMLIR TypeDescription
sizesArrayAttr64-bit integer array attribute attribute
stridesArrayAttr64-bit integer array attribute attribute

Results:

  1. «unnamed»: tuple with any combination of vector of any type values values

vector.fma (vector::FMAOp)

vector fused multiply-add

Description:

Multiply-add expressions operate on n-D vectors and compute a fused pointwise multiply-and-accumulate: $result = $lhs * $rhs + $acc. All operands and result have the same vector type. The semantics of the operation correspond to those of the llvm.fma[intrinsic](https://llvm.org/docs/LangRef.html#int-fma). In the particular case of lowering to LLVM, this is guaranteed to lower to the llvm.fma.*` intrinsic.

Example:

  %3 = vector.fma %0, %1, %2: vector<8x16xf32>

Operands:

  1. lhs: vector of any type values
  2. rhs: vector of any type values
  3. acc: vector of any type values

Attributes:

Results:

  1. result: vector of any type values

vector.insertelement (vector::InsertElementOp)

insertelement operation

Description:

Takes a scalar source, an 1-D destination vector and a dynamic index position and inserts the source into the destination at the proper position. Note that this instruction resembles vector.insert, but is restricted to 1-D vectors and relaxed to dynamic indices. It is meant to be closer to LLVM’s version: https://llvm.org/docs/LangRef.html#insertelement-instruction

Example:

  %c = constant 15 : i32
  %f = constant 0.0f : f32
  %1 = vector.insertelement %f, %0[%c : i32]: vector<16xf32>

Operands:

  1. source: any type
  2. dest: vector of any type values
  3. position: integer

Attributes:

Results:

  1. result: vector of any type values

vector.insert (vector::InsertOp)

insert operation

Description:

Takes an n-D source vector, an (n+k)-D destination vector and a k-D position and inserts the n-D source into the (n+k)-D destination at the proper position. Degenerates to a scalar source type when n = 0.

Examples:

  %2 = vector.insert %0, %1[3]:
    vector<8x16xf32> into vector<4x8x16xf32>
  %5 = vector.insert %3, %4[3, 3, 3]:
    f32 into vector<4x8x16xf32>

Operands:

  1. source: any type
  2. dest: vector of any type values

Attributes:

AttributeMLIR TypeDescription
positionArrayAttr64-bit integer array attribute attribute

Results:

  1. res: vector of any type values

vector.insert_slices (vector::InsertSlicesOp)

vector insert slices operation

Description:

Takes a tuple of vector slices and inserts them into the vector result according to the ‘sizes’ and ‘strides’ parameters.

The arguments ‘sizes’ and ‘strides’ represent a specification for generating the unrolling of ‘vector’ shape, which has all slices of shape ‘sizes’ except for slices at dimension boundaries when ‘vector’ dimension sizes are not a multiple of ‘sizes’.

Each slice in ‘vectors’ is at the tuple element index corresponding to the linear index of the slice w.r.t the unrolling scheme represented by ‘sizes’. Currently, only unit strides are supported.

Examples:

  %0 = vector.extract_slices %0, [2, 2], [1, 1]
    : vector<4x2xf32> into tuple<vector<2x2xf32>, vector<2x2xf32>>

  %1 = vector.insert_slices %0, [2, 2], [1, 1]
    : tuple<vector<2x2xf32>, vector<2x2xf32>> into vector<4x2xf32>

  // Example with partial slices at dimension boundaries.
  %3 = vector.extract_slices %2, [2, 2], [1, 1]
    : vector<4x3xf32> into tuple<vector<2x2xf32>, vector<2x1xf32>,
                                 vector<2x2xf32>, vector<2x1xf32>>

  %4 = vector.insert_slices %3, [2, 2], [1, 1]
    : tuple<vector<2x2xf32>, vector<2x1xf32>,
            vector<2x2xf32>, vector<2x1xf32>> into vector<4x3xf32>

Operands:

  1. vectors: tuple with any combination of vector of any type values values

Attributes:

AttributeMLIR TypeDescription
sizesArrayAttr64-bit integer array attribute attribute
stridesArrayAttr64-bit integer array attribute attribute

Results:

  1. «unnamed»: vector of any type values

vector.insert_strided_slice (vector::InsertStridedSliceOp)

strided_slice operation

Description:

Takes a k-D source vector, an n-D destination vector (n >= k), n-sized offsets integer array attribute, a k-sized strides integer array attribute and inserts the k-D source vector as a strided subvector at the proper offset into the n-D destination vector.

At the moment strides must contain only 1s.

Returns an n-D vector that is a copy of the n-D destination vector in which the last k-D dimensions contain the k-D source vector elements strided at the proper location as specified by the offsets.

Examples:

  %2 = vector.insert_strided_slice %0, %1
      {offsets = [0, 0, 2], strides = [1, 1]}:
    vector<2x4xf32> into vector<16x4x8xf32>

Operands:

  1. source: vector of any type values
  2. dest: vector of any type values

Attributes:

AttributeMLIR TypeDescription
offsetsArrayAttr64-bit integer array attribute attribute
stridesArrayAttr64-bit integer array attribute attribute

Results:

  1. res: vector of any type values

vector.outerproduct (vector::OuterProductOp)

vector outerproduct with optional fused add

Description:

Takes 2 1-D vectors and returns the 2-D vector containing the outer-product.

An optional extra 2-D vector argument may be specified in which case the operation returns the sum of the outer-product and the extra vector. In this multiply-accumulate scenario, the rounding mode is that obtained by guaranteeing that a fused-multiply add operation is emitted. When lowered to the LLVMIR dialect, this form emits llvm.intr.fma, which is guaranteed to lower to actual fma instructions on x86.

Examples:

  %2 = vector.outerproduct %0, %1: vector<4xf32>, vector<8xf32>
  return %2: vector<4x8xf32>

  %3 = vector.outerproduct %0, %1, %2:
    vector<4xf32>, vector<8xf32>, vector<4x8xf32>
  return %3: vector<4x8xf32>

Operands:

  1. lhs: vector of any type values
  2. rhs: vector of any type values
  3. acc: vector of any type values

Attributes:

Results:

  1. «unnamed»: vector of any type values

vector.print (vector::PrintOp)

print operation (for testing and debugging)

Description:

Prints the source vector (or scalar) to stdout in human readable format (for testing and debugging). No return value.

Examples:

  %0 = constant 0.0 : f32
  %1 = vector.broadcast %0 : f32 to vector<4xf32>
  vector.print %1 : vector<4xf32>

  when lowered to LLVM, the vector print is unrolled into
  elementary printing method calls that at runtime will yield

  ( 0.0, 0.0, 0.0, 0.0 )

  on stdout when linked with a small runtime support library,
  which only needs to provide a few printing methods (single
  value for all data types, opening/closing bracket, comma,
  newline).

Operands:

  1. source: any type

Attributes:

Results:

vector.reduction (vector::ReductionOp)

reduction operation

Description:

Reduces an 1-D vector “horizontally” into a scalar using the given operation (add/mul/min/max for int/fp and and/or/xor for int only). Note that these operations are restricted to 1-D vectors to remain close to the corresponding LLVM intrinsics:

http://llvm.org/docs/LangRef.html#experimental-vector-reduction-intrinsics

Examples:

  %1 = vector.reduction "add", %0 : vector<16xf32> into f32

  %3 = vector.reduction "xor", %2 : vector<4xi32> into i32

Operands:

  1. vector: vector of any type values

Attributes:

AttributeMLIR TypeDescription
kindStringAttrstring attribute attribute

Results:

  1. dest: any type

vector.reductionv2 (vector::ReductionV2Op)

reduction operation

Description:

As vector.reduction, but with a fused accumulator (add/mul for fp only).

Operands:

  1. vector: vector of 32-bit float or 64-bit float values
  2. acc: any type

Attributes:

AttributeMLIR TypeDescription
kindStringAttrstring attribute attribute

Results:

  1. dest: any type

vector.reshape (vector::ReshapeOp)

vector reshape operation

Description:

Reshapes its vector operand from ‘input_shape’ to ‘output_shape’ maintaining fixed vector dimension ‘fixed_vector_sizes’ on the innermost vector dimensions.

The parameters ‘input_shape’ and ‘output_shape’ represent valid data shapes across fixed vector shapes. For example, if a vector has a valid data shape [6] with fixed vector size [8], then the valid data elements are assumed to be stored at the beginning of the vector with the remaining vector elements undefined.

In the examples below, valid data elements are represented by an alphabetic character, and undefined data elements are represented by ‘-'.

Example

vector<1x8xf32> with valid data shape [6], fixed vector sizes [8]

        input: [a, b, c, d, e, f]

   layout map: (d0) -> (d0 floordiv 8, d0 mod 8)

vector layout: [a, b, c, d, e, f, -, -]

Example

vector<2x8xf32> with valid data shape [10], fixed vector sizes [8]

        input: [a, b, c, d, e, f, g, h, i, j]

   layout map: (d0) -> (d0 floordiv 8, d0 mod 8)

vector layout: [[a, b, c, d, e, f, g, h],
                [i, j, -, -, -, -, -, -]]

Example

vector<2x2x2x3xf32> with valid data shape [3, 5], fixed vector sizes [2, 3]

        input: [[a, b, c, d, e],
                [f, g, h, i, j],
                [k, l, m, n, o]]

   layout map: (d0, d1) -> (d0 floordiv 3, d1 floordiv 5,
                            d0 mod 3, d1 mod 5)

vector layout: [[[[a, b, c],
                  [f, g, h]]
                 [[d, e, -],
                  [i, j, -]]],
                [[[k, l, m],
                  [-, -, -]]
                 [[n, o, -],
                  [-, -, -]]]]

Example

%1 = vector.reshape %0, [%c3, %c6], [%c2, %c9], [4] : vector<3x2x4xf32> to vector<2x3x4xf32>

     input: [[a, b, c, d, e, f],
             [g, h, i, j, k, l],
             [m, n, o, p, q, r]]

layout map: (d0, d1) -> (d0, d1 floordiv 4, d1 mod 4)

Input vector: [[[a, b, c, d], [e, f, -, -]], [[g, h, i, j], [k, l, -, -]], [[m, n, o, p], [q, r, -, -]]]

Output vector: [[[a, b, c, d], [e, f, g, h], [i, -, -, -]], [[j, k, l, m], [n, o, p, q], [r, -, -, -]]]

Operands:

  1. vector: vector of any type values
  2. input_shape: index
  3. output_shape: index

Attributes:

AttributeMLIR TypeDescription
fixed_vector_sizesArrayAttr64-bit integer array attribute attribute
operand_segment_sizesDenseIntElementsAttr32-bit integer elements attribute attribute

Results:

  1. «unnamed»: vector of any type values

vector.shape_cast (vector::ShapeCastOp)

shape_cast casts between vector shapes

Description:

The shape_cast operation casts between an n-D source vector shape and a k-D result vector shape (the element type remains the same).

If reducing rank (n > k), result dimension sizes must be a product of contiguous source dimension sizes. If expanding rank (n < k), source dimensions must factor into a contiguous sequence of destination dimension sizes. Each source dim is expanded (or contiguous sequence of source dims combined) in source dimension list order (i.e. 0 <= i < n), to produce a contiguous sequence of result dims (or a single result dim), in result dimension list order (i.e. 0 <= j < k). The product of all source dimension sizes and all result dimension sizes must match.

If the source/result types are a tuple of vectors, the casting operation described above is applied to each source/result tuple element pair.

It is currently assumed that this operation does not require moving data, and that it will be canonicalized away before lowering vector operations.

Examples:

// Example casting to a lower vector rank.
%1 = vector.shape_cast %0 : vector<5x1x4x3xf32> to vector<20x3xf32>

// Example casting to a higher vector rank.
%3 = vector.shape_cast %2 : vector<10x12x8xf32> to vector<5x2x3x4x8xf32>

// Example casting a tuple of vectors of same rank, where tuple elements
// may have different shapes.
%5 = vector.shape_cast %4 : tuple<vector<3x4x2xf32>, vector<3x3x2xf32>> to
                            tuple<vector<12x2xf32>, vector<9x2xf32>>

Operands:

  1. source: vector of any type values or tuple with any combination of vector of any type values values

Attributes:

Results:

  1. result: vector of any type values or tuple with any combination of vector of any type values values

vector.shuffle (vector::ShuffleOp)

shuffle operation

Description:

The shuffle operation constructs a permutation (or duplication) of elements from two input vectors, returning a vector with the same element type as the input and a length that is the same as the shuffle mask. The two input vectors must have the same element type, rank, and trailing dimension sizes and shuffles their values in the leading dimension (which may differ in size) according to the given mask. The legality rules are:

  • the two operands must have the same element type as the result
  • the two operands and the result must have the same rank and trailing dimension sizes, viz. given two k-D operands v1 : <s_1 x s_2 x .. x s_k x type> and v2 : <t_1 x t_2 x .. x t_k x type> we have s_i = t_i for all 1 < i <= k
  • the mask length equals the leading dimension size of the result
  • numbering the input vector indices left to right across the operands, all mask values must be within range, viz. given two k-D operands v1 and v2 above, all mask values are in the range [0,s_1+t_1)

Examples:

%0 = vector.shuffle %a, %b[0, 3]
           : vector<2xf32>, vector<2xf32>       ; yields vector<2xf32>
%1 = vector.shuffle %c, %b[0, 1, 2]
           : vector<2x16xf32>, vector<1x16xf32> ; yields vector<3x16xf32>
%2 = vector.shuffle %a, %b[3, 2, 1, 0]
           : vector<2xf32>, vector<2xf32>       ; yields vector<4xf32>

Operands:

  1. v1: vector of any type values
  2. v2: vector of any type values

Attributes:

AttributeMLIR TypeDescription
maskArrayAttr64-bit integer array attribute attribute

Results:

  1. vector: vector of any type values

vector.strided_slice (vector::StridedSliceOp)

strided_slice operation

Description:

Takes an n-D vector, k-D offsets integer array attribute, a k-sized sizes integer array attribute, a k-sized strides integer array attribute and extracts the n-D subvector at the proper offset.

At the moment strides must contain only 1s. // TODO(ntv) support non-1 strides.

Returns an n-D vector where the first k-D dimensions match the sizes attribute. The returned subvector contains the elements starting at offset offsets and ending at offsets + sizes.

Examples:

  %1 = vector.strided_slice %0
      {offsets = [0, 2], sizes = [2, 4], strides = [1, 1]}:
    vector<4x8x16xf32> to vector<2x4x16xf32>

// TODO(ntv) Evolve to a range form syntax similar to: %1 = vector.strided_slice %0[0:2:1][2:4:1] vector<4x8x16xf32> to vector<2x4x16xf32>

Operands:

  1. vector: vector of any type values

Attributes:

AttributeMLIR TypeDescription
offsetsArrayAttr64-bit integer array attribute attribute
sizesArrayAttr64-bit integer array attribute attribute
stridesArrayAttr64-bit integer array attribute attribute

Results:

  1. «unnamed»: vector of any type values

vector.transfer_read (vector::TransferReadOp)

Reads a supervector from memory into an SSA vector value.

Description:

The vector.transfer_read op performs a blocking read from a slice within a MemRef supplied as its first operand into a vector of the same base elemental type.

A memref operand with vector element type, must have its vector element type match a suffix (shape and element type) of the vector (e.g. memref<3x2x6x4x3xf32>, vector<1x1x4x3xf32>).

The slice is further defined by a full-rank index within the MemRef, supplied as the operands 2 .. 1 + rank(memref). The permutation_map attribute is an affine-map which specifies the transposition on the slice to match the vector shape. The size of the slice is specified by the size of the vector, given as the return type. An ssa-value of the same elemental type as the MemRef is provided as the last operand to specify padding in the case of out-of-bounds accesses. This operation is called ‘read’ by opposition to ‘load’ because the super-vector granularity is generally not representable with a single hardware register. A vector.transfer_read is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile-only code.

More precisely, let’s dive deeper into the permutation_map for the following MLIR:

vector.transfer_read %A[%expr1, %expr2, %expr3, %expr4]
  { permutation_map : (d0,d1,d2,d3) -> (d2,0,d0) } :
  memref<?x?x?x?xf32>, vector<3x4x5xf32>

This operation always reads a slice starting at %A[%expr1, %expr2, %expr3, %expr4]. The size of the slice is 3 along d2 and 5 along d0, so the slice is: %A[%expr1 : %expr1 + 5, %expr2, %expr3:%expr3 + 3, %expr4]

That slice needs to be read into a vector<3x4x5xf32>. Since the permutation map is not full rank, there must be a broadcast along vector dimension 1.

A notional lowering of vector.transfer_read could generate code resembling:

// %expr1, %expr2, %expr3, %expr4 defined before this point
%tmp = alloc() : vector<3x4x5xf32>
%view_in_tmp = "element_type_cast"(%tmp) : memref<1xvector<3x4x5xf32>>
for %i = 0 to 3 {
  affine.for %j = 0 to 4 {
    affine.for %k = 0 to 5 {
      %a = load %A[%expr1 + %k, %expr2, %expr3 + %i, %expr4] :
        memref<?x?x?x?xf32>
      store %tmp[%i, %j, %k] : vector<3x4x5xf32>
}}}
%c0 = constant 0 : index
%vec = load %view_in_tmp[%c0] : vector<3x4x5xf32>

On a GPU one could then map i, j, k to blocks and threads. Notice that the temporary storage footprint is 3 * 5 values but 3 * 4 * 5 values are actually transferred between %A and %tmp.

Alternatively, if a notional vector broadcast operation were available, the lowered code would resemble:

// %expr1, %expr2, %expr3, %expr4 defined before this point
%tmp = alloc() : vector<3x4x5xf32>
%view_in_tmp = "element_type_cast"(%tmp) : memref<1xvector<3x4x5xf32>>
for %i = 0 to 3 {
  affine.for %k = 0 to 5 {
    %a = load %A[%expr1 + %k, %expr2, %expr3 + %i, %expr4] :
      memref<?x?x?x?xf32>
    store %tmp[%i, 0, %k] : vector<3x4x5xf32>
}}
%c0 = constant 0 : index
%tmpvec = load %view_in_tmp[%c0] : vector<3x4x5xf32>
%vec = broadcast %tmpvec, 1 : vector<3x4x5xf32>

where broadcast broadcasts from element 0 to all others along the specified dimension. This time, the temporary storage footprint is 3 * 5 values which is the same amount of data as the 3 * 5 values transferred. An additional 1 broadcast is required. On a GPU this broadcast could be implemented using a warp-shuffle if loop j were mapped to threadIdx.x.

Syntax

operation ::= ssa-id `=` `vector.transfer_read` ssa-use-list
  `{` attribute-entry `} :` memref-type `,` vector-type

Examples:

// Read the slice `%A[%i0, %i1:%i1+256, %i2:%i2+32]` into vector<32x256xf32>
// and pad with %f0 to handle the boundary case:
%f0 = constant 0.0f : f32
for %i0 = 0 to %0 {
  affine.for %i1 = 0 to %1 step 256 {
    affine.for %i2 = 0 to %2 step 32 {
      %v = vector.transfer_read %A[%i0, %i1, %i2], (%f0)
           {permutation_map: (d0, d1, d2) -> (d2, d1)} :
           memref<?x?x?xf32>, vector<32x256xf32>
}}}

// Read the slice `%A[%i0, %i1]` (i.e. the element `%A[%i0, %i1]`) into
// vector<128xf32>. The underlying implementation will require a 1-D vector
// broadcast:
for %i0 = 0 to %0 {
  affine.for %i1 = 0 to %1 {
    %3 = vector.transfer_read %A[%i0, %i1]
         {permutation_map: (d0, d1) -> (0)} :
         memref<?x?xf32>, vector<128xf32>
  }
}

// Read from a memref with vector element type.
%4 = vector.transfer_read %arg1[%c3, %c3], %vf0
  {permutation_map = (d0, d1)->(d0, d1)}
    : memref<?x?xvector<4x3xf32>>, vector<1x1x4x3xf32>

Operands:

  1. memref: memref of any type values
  2. indices: index
  3. padding: any type

Attributes:

AttributeMLIR TypeDescription
permutation_mapAffineMapAttrAffineMap attribute attribute

Results:

  1. vector: vector of any type values

vector.transfer_write (vector::TransferWriteOp)

The vector.transfer_write op writes a supervector to memory.

Description:

The vector.transfer_write performs a blocking write from a vector , supplied as its first operand, into a slice within a MemRef of the same base elemental type, supplied as its second operand.

A vector memref operand must have its vector element type match a suffix (shape and element type) of the vector (e.g. memref<3x2x6x4x3xf32>, vector<1x1x4x3xf32>).

The slice is further defined by a full-rank index within the MemRef, supplied as the operands 3 .. 2 + rank(memref). The permutation_map attribute is an affine-map which specifies the transposition on the slice to match the vector shape. The size of the slice is specified by the size of the vector. This operation is called ‘write’ by opposition to ‘store’ because the super-vector granularity is generally not representable with a single hardware register. A vector.transfer_write is thus a mid-level abstraction that supports super-vectorization with non-effecting padding for full-tile-only code. It is the responsibility of vector.transfer_write's implementation to ensure the memory writes are valid. Different lowerings may be pertinent depending on the hardware support.

Syntax:

operation ::= `vector.transfer_write` ssa-use-list `{` attribute-entry `} :
  ` vector-type ', ' memref-type '

Examples:

// write vector<16x32x64xf32> into the slice
//   `%A[%i0, %i1:%i1+32, %i2:%i2+64, %i3:%i3+16]`:
for %i0 = 0 to %0 {
  affine.for %i1 = 0 to %1 step 32 {
    affine.for %i2 = 0 to %2 step 64 {
      affine.for %i3 = 0 to %3 step 16 {
        %val = `ssa-value` : vector<16x32x64xf32>
        vector.transfer_write %val, %A[%i0, %i1, %i2, %i3]
          {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d2)} :
          vector<16x32x64xf32>, memref<?x?x?x?xf32>
}}}}

// write to a memref with vector element type.
vector.transfer_write %4, %arg1[%c3, %c3]
  {permutation_map = (d0, d1)->(d0, d1)}
    : vector<1x1x4x3xf32>, memref<?x?xvector<4x3xf32>>

Operands:

  1. vector: vector of any type values
  2. memref: memref of any type values
  3. indices: index

Attributes:

AttributeMLIR TypeDescription
permutation_mapAffineMapAttrAffineMap attribute attribute

Results:

vector.tuple_get (vector::TupleGetOp)

vector tuple get operation

Description:

Returns the tuple element of ‘vectors’ at ‘index’.

Note that this operation is used during the vector op unrolling transformation and should be removed before lowering to lower-level dialects.

Examples:

  %4 = vector.tuple %0, %1, %2, %3
    : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32>>

  %5 = vector.tuple_get %4, 1
    : tuple<vector<2x2xf32>, vector<2x1xf32>,
            vector<2x2xf32>, vector<2x1xf32>>

Operands:

  1. vectors: tuple with any combination of vector of any type values values

Attributes:

AttributeMLIR TypeDescription
indexIntegerAttrarbitrary integer attribute attribute

Results:

  1. «unnamed»: vector of any type values

vector.tuple (vector::TupleOp)

make tuple of vectors operation

Description:

Returns a tuple of its operands ‘vectors’.

Note that this operation is used during the vector op unrolling transformation and should be removed before lowering to lower-level dialects.

Examples:

  %0 = vector.transfer_read ... : vector<2x2xf32>
  %1 = vector.transfer_read ... : vector<2x1xf32>
  %2 = vector.transfer_read ... : vector<2x2xf32>
  %3 = vector.transfer_read ... : vector<2x1xf32>

  %4 = vector.tuple %0, %1, %2, %3
    : vector<2x2xf32>, vector<2x1xf32>, vector<2x2xf32>, vector<2x1xf32>

Operands:

  1. vectors: vector of any type values

Attributes:

Results:

  1. «unnamed»: tuple with any combination of vector of any type values values

vector.type_cast (vector::TypeCastOp)

type_cast op converts a scalar memref to a vector memref

Description:

Performs a conversion from a memref with scalar element to a memref with a single vector element, copying the shape of the memref to the vector. This is the minimal viable operation that is required to makeke super-vectorization operational. It can be seen as a special case of the view operation but scoped in the super-vectorization context.

Syntax:

operation ::= `vector.type_cast` ssa-use : memref-type to memref-type

Example:

%A  = alloc() : memref<5x4x3xf32>
%VA = vector.type_cast %A : memref<5x4x3xf32> to memref<vector<5x4x3xf32>>

Operands:

  1. memref: statically shaped memref of any type values

Attributes:

Results:

  1. «unnamed»: memref of any type values