MLIR

Multi-Level IR Compiler Framework

'ArmSME' Dialect

Basic dialect to target Arm SME.

This dialect defines custom and LLVM IR intrinsic operations that are used to target Arm Scalable Matrix Extension. Through the available conversion and ArmSME passes you can, for example, lower a linalg.matmul operation to Arm SME FMOPA (floating-point outer product) operations. See one of the in-tree end-to-end integration tests for reference:

In order to run ArmSME integration tests, include these flags in the CMake invocation when configuring LLVM and MLIR:

  -DMLIR_INCLUDE_INTEGRATION_TESTS=On
  -DMLIR_RUN_ARM_SME_TESTS=On
  -DARM_EMULATOR_EXECUTABLE=<path-to-emulator>

These tests are run “post-commit” by the clang-aarch64-sve-vla LLVM BuildBot worker.

References:

Operations 

source

arm_sme.copy_tile (arm_sme::CopyTileOp) 

Copies an SME tile value

Syntax:

operation ::= `arm_sme.copy_tile` $tile attr-dict `:` type($result)

Copies an SME “virtual tile” value to a new SSA value. This operation is primarily intended to be used to normalize the IR prior to tile allocation.

Example:

%copy = arm_sme.copy_tile %tile : vector<[4]x[4]xf32>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
tilea vector type that fits into a SME tile

Results: 

ResultDescription
resulta vector type that fits into a SME tile

arm_sme.extract_tile_slice (arm_sme::ExtractTileSliceOp) 

Extract 1-D scalable vector from slice of 2-D tile

Syntax:

operation ::= `arm_sme.extract_tile_slice` $tile `[` $tile_slice_index `]` (`layout` `` $layout^)? attr-dict
              `:` type($result) `from` type($tile)

Extracts a 1-D scalable slice from a 2-D scalable tile at the given index. A tile slice is a 1-D vector of horizontally or vertically contiguous elements within a ZA tile.

An optional tile slice layout attribute specifies whether the tile slice is horizontal (default) or vertical.

Example 1: Extract vector<[16]xi8> from tile horizontally at the given index.

%slice = arm_sme.extract_tile_slice %tile[%tile_slice_index] : vector<[16]xi8> from vector<[16]x[16]xi8>

Example 2: Extract vector<[2]xf64> from tile vertically at the given index.

%slice = arm_sme.extract_tile_slice %tile[%tile_slice_index] layout<vertical> : vector<[2]xf64> from vector<[2]x[2]xf64>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
tilea vector type that fits into a SME tile
tile_slice_indexindex

Results: 

ResultDescription
resulta vector type that matches the size of a SVE vector

arm_sme.fmopa_2way (arm_sme::FMopa2WayOp) 

Floating-point sum of 2 outer products and accumulate

Syntax:

operation ::= `arm_sme.fmopa_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

This operation represents a sum of 2 widened outer products. It takes 2 1-D scalable vectors as input and a 2-D scalable vector (ZA tile) as output.

For example (fp16 to fp32):

%result = arm_sme.fmopa_2way %lhs, %rhs :
  vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>

The lhs encodes a matrix of shape SVLSx2 and the rhs a matrix of 2xSVLS, where SVLS (spec [1], section B2.1) is the number of 32-bit elements in a vector of SVL bits. To illustrate, below is a breakdown of this operation for fp16 to fp32, SVL=128 (i.e., vscale=1):

                      LHS                          RHS
           [A0 A1 A2 A3 A4 A5 A6 A7]    [B0 B1 B2 B3 B4 B5 B6 B7]

----------------------------------------------------------------------------

                              implicit layout

                          [A0 A1]    |
                          [A2 A3]    |    [B0 B2 B4 B6]
                          [A4 A5]    |    [B1 B3 B5 B7]
                          [A6 A7]    |

----------------------------------------------------------------------------

                              2 outer products

                  Acol0 ⊗ Brow0      |           Acol1 ⊗ Brow1
                  -------------      |           -------------
                                     |
              [B0 B2 B4 B6]          |       [B1 B3 B5 B7]
                                     |
         [A0  [A0B0 A0B2 A0B4 A0B6]  |  [A1  [A1B1 A1B3 A1B5 A1B7]
          A2  [A2B0 A2B2 A2B4 A2B6]  |   A3  [A3B1 A3B3 A3B5 A3B7]
          A4  [A4B0 A4B2 A4B4 A4B6]  |   A5  [A5B1 A5B3 A5B5 A5B7]
          A6] [A6B0 A6B2 A6B4 A6B6]  |   A7] [A7B1 A7B3 A7B5 A7B7]
                                     |

----------------------------------------------------------------------------

                          sum of 2 outer products

                       Acol0 ⊗ Brow0 + Acol1 ⊗ Brow1

             [A0B0 + A1B1 A0B2 + A1B3 A0B4 + A1B5 A0B6 + A1B7]
             [A2B0 + A3B1 A2B2 + A3B3 A2B4 + A3B5 A2B6 + A3B7]
             [A4B0 + A5B1 A4B2 + A5B3 A4B4 + A5B5 A4B6 + A5B7]
             [A6B0 + A7B1 A6B2 + A7B3 A6B4 + A7B5 A6B6 + A7B7]

----------------------------------------------------------------------------

This operation enables the folding of 2 outer products chained via the accumulator into a single outer product.

For example:

%a0_ext = arith.extf %a0 : vector<[4]xf16> to vector<[4]xf32>
%b0_ext = arith.extf %b0 : vector<[4]xf16> to vector<[4]xf32>
%a1_ext = arith.extf %a1 : vector<[4]xf16> to vector<[4]xf32>
%b1_ext = arith.extf %b1 : vector<[4]xf16> to vector<[4]xf32>

%0 = arm_sme.outerproduct %a0_ext, %b0_ext : vector<[4]xf32>, vector<[4]xf32>
%1 = arm_sme.outerproduct %a1_ext, %b1_ext acc(%0) : vector<[4]xf32>, vector<[4]xf32>

The 2 outer products in the example above can be fused into a single outer product as follows:

%a_packed = vector.interleave %a0, %a1 : vector<[4]xf16> -> vector<[8]xf16>
%b_packed = vector.interleave %b0, %b1 : vector<[4]xf16> -> vector<[8]xf16>
%0 = arm_sme.fmopa_2way %a_packed, %b_packed : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>

This is implemented in the -arm-sme-outer-product-fusion pass.

Example: FP16 to FP32

%result = arm_sme.fmopa_2way $lhs, $rhs : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>

Example: BF16 to FP32

%result = arm_sme.fmopa_2way $lhs, $rhs : vector<[8]xbf16>, vector<[8]xbf16> into vector<[4]x[4]xf32>
SpecFeatures
FMOPA (widening, 2-way, FP16 to FP32)+sme
BFMOPA (widening, 2-way, BF16 to FP32)+sme

[1] https://developer.arm.com/documentation/ddi0616

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit float or bfloat16 type values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xf32> of 32-bit float values

arm_sme.fmops_2way (arm_sme::FMops2WayOp) 

Floating-point sum of 2 outer products and subtract

Syntax:

operation ::= `arm_sme.fmops_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Equivalent to fmopa_2way but outer products are subtracted from destination result.

Example: FP16 to FP32

%result = arm_sme.fmops_2way $lhs, $rhs : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>

Example: BF16 to FP32

%result = arm_sme.fmops_2way $lhs, $rhs : vector<[8]xbf16>, vector<[8]xbf16> into vector<[4]x[4]xf32>

Refer to fmopa_2way for a detailed description of 2-way outer products.

SpecFeatures
FMOPS (widening, 2-way, FP16 to FP32)+sme
BFMOPS (widening, 2-way, BF16 to FP32)+sme

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit float or bfloat16 type values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xf32> of 32-bit float values

arm_sme.get_tile (arm_sme::GetTileOp) 

Creates an undefined value of SME virtual tile type

Syntax:

operation ::= `arm_sme.get_tile` attr-dict `:` type($tile)

Creates a new SME “virtual tile” value within a function. The contents of the tile returned from this operation are undefined.

Example 1:

// Create an 8-bit element "virtual tile" value:
%za0_b = arm_sme.get_tile: vector<[16]x[16]xi8>

Example 2:

// Create two 16-bit element "virtual tiles" values:
%za0_h = arm_sme.get_tile : vector<[8]x[8]xi16>
%za1_h = arm_sme.get_tile : vector<[8]x[8]xi16>

Example 3:

// Create an 128-bit element "virtual tile" value:
%za0_q = arm_sme.get_tile : vector<[1]x[1]xi128>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
tilea vector type that fits into a SME tile

arm_sme.insert_tile_slice (arm_sme::InsertTileSliceOp) 

Insert 1-D scalable vector into slice of 2-D tile

Syntax:

operation ::= `arm_sme.insert_tile_slice` $vector `,` $tile `[` $tile_slice_index `]` (`layout` `` $layout^)?
              attr-dict `:` type($vector) `into` type($result)

Inserts a 1-D scalable vector into a slice of a 2-D scalable vector tile at the given index. The type of the 1-D scalable vector to be inserted must match the type of the tile slice. A tile slice is a 1-D vector of horizontally or vertically contiguous elements within a ZA tile. The updated tile is returned as the result.

An optional tile slice layout attribute specifies whether the tile slice is horizontal (default) or vertical.

Example 1: Insert vector<[16]xi8> into tile horizontally at the given index.

%tile_update = arm_sme.insert_tile_slice %vector, %tile[%tile_slice_index] : vector<[16]xi8> into vector<[16]x[16]xi8>

Example 2: Insert vector<[2]xf64> into tile vertically at the given index.

%tile_update = arm_sme.insert_tile_slice %vector, %tile[%tile_slice_index] layout<vertical> : vector<[2]xf64> into vector<[2]x[2]xf64>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
vectora vector type that matches the size of a SVE vector
tilea vector type that fits into a SME tile
tile_slice_indexindex

Results: 

ResultDescription
resulta vector type that fits into a SME tile

arm_sme.load_tile_slice (arm_sme::LoadTileSliceOp) 

Tile slice load and update operation

Syntax:

operation ::= `arm_sme.load_tile_slice` $base `[` $indices `]` `,` $mask `,` $tile `,` $tile_slice_index
              (`layout` `` $layout^)? attr-dict `:` type($base) `,` type($mask) `,`
              type($result)

Loads a 1D tile slice from memory into a 2D SME “virtual tile”. The tile slice is defined by the dimension of the 2D scalable vector type pointed by the index. A tile slice index describes where in the input tile the tile slice is loaded to. An optional tile slice layout attribute specifies whether the tile slice being loaded at the given index is horizontal (default) or vertical. The updated tile is returned as the result.

The slice of memory read is defined by a base and indices and must be contiguous. The memref must be either rank 1 or rank 2, have dynamic dimensions since the operation is scalable, and the element type must be a scalar that matches the element type of the result.

The provided mask is used to specify which elements of the tile slice will be loaded.

Example 1: Load a vector<[16]xi8> tile slice from memory into tile horizontally (default) at given index.

%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index : memref<?x?xi8>, vector<[16]xi1>, vector<[16]x[16]xi8>

Example 2: Load a vector<[4]xf32> tile slice from memory into tile vertically at given index.

%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index layout<vertical> : memref<?x?xf32>, vector<[4]xi1>, vector<[4]x[4]xf32>

Example 3: Load a vector<[1]xi128> tile slice from memory into tile vertically at given index.

%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index layout<vertical> : memref<?x?xi128>, vector<[1]xi1>, vector<[1]x[1]xi128>

Interfaces: ArmSMETileOpInterface, InferTypeOpInterface

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
basememref of any type values
maska vector type that matches the size of a SVE predicate
tilea vector type that fits into a SME tile
indicesvariadic of index
tile_slice_indexindex

Results: 

ResultDescription
resulta vector type that fits into a SME tile

arm_sme.outerproduct (arm_sme::OuterProductOp) 

Outer product with optional fused add/sub

Syntax:

operation ::= `arm_sme.outerproduct` $lhs `,` $rhs
              oilist(
              `kind` `` $kind
              | `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs)

This operation represents an outer product that fits within an SME tile. All operands must be SVE vectors and the result a SME tile. Unlike vector.outerproduct masking is on the operands (rather than the result), which mirrors the SME instructions.

Example 1: Unmasked outerproduct (without accumulator)

// Not specifying an accumulator implicitly zeros the destination tile.
%result = arm_sme.outerproduct $lhs, $rhs : vector<[4]xf32>, vector<[4]xf32>

Example 2: Unmasked outerproduct (with accumulator)

%result = arm_sme.outerproduct $lhs, $rhs acc($accumulator)
            : vector<[4]xf32>, vector<[4]xf32>

Example 3: Masked outerproduct

%result = arm_sme.outerproduct $lhs, $rhs masks($lhsMask, $rhsMask)
            : vector<[4]xf32>, vector<[4]xf32>

Example 4: Masked outerproduct (with accumulator)

%result = arm_sme.outerproduct $lhs, $rhs acc($accumulator) masks($lhsMask, $rhsMask)
            : vector<[4]xf32>, vector<[4]xf32>

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
kind::mlir::arm_sme::CombiningKindAttr
Kind of combining function

Enum cases:

  • add (Add)
  • sub (Sub)

Operands: 

OperandDescription
lhsa vector type that matches the size of a SVE vector
rhsa vector type that matches the size of a SVE vector
lhsMaska vector type that matches the size of a SVE predicate
rhsMaska vector type that matches the size of a SVE predicate
acca vector type that fits into a SME tile

Results: 

ResultDescription
resulta vector type that fits into a SME tile

arm_sme.smopa_2way (arm_sme::SMopa2WayOp) 

Signed integer sum of 2 outer products and accumulate

Syntax:

operation ::= `arm_sme.smopa_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example:

%result = arm_sme.smopa_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>

Refer to fmopa_2way for a detailed description of 2-way outer products.

SpecFeatures
SMOPA (2-way)+sme2

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values

arm_sme.smopa_4way (arm_sme::SMopa4WayOp) 

Signed integer sum of 4 outer products and accumulate

Syntax:

operation ::= `arm_sme.smopa_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

This operation represents a sum of 4 widened outer products. It takes 2 1-D scalable vectors as input and a 2-D scalable vector (ZA tile) as output.

For example (i8 to i32):

%result = arm_sme.smopa_4way $lhs, $rhs :
  vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

The lhs encodes a matrix of shape SVLSx4 and the rhs a matrix of 4xSVLS, where SVLS (spec [1], section B2.1) is the number of 32-bit elements in a vector of SVL bits. To illustrate, below is a breakdown of this operation for i8 to i32, SVL=128 (i.e., vscale=1):

                                    LHS
          [A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A15 A14 A15]

                                    RHS
          [B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15]

----------------------------------------------------------------------------

                              implicit layout

                [A0   A1  A2  A3]    |    [B0 B4  B8 B12]
                [A4   A5  A6  A7]    |    [B1 B5  B9 B13]
                [A8   A9 A10 A11]    |    [B2 B6 B10 B14]
                [A12 A13 A14 A15]    |    [B3 B7 B11 B15]

----------------------------------------------------------------------------

                              4 outer products

             Acol0 ⊗ Brow0           |            Acol1 ⊗ Brow1
             -------------           |            -------------
                                     |
         [B0 B4 B8 B12]              |        [B1 B5 B9 B13]
                                     |
   [A0   [ A0B0  A0B4  A0B8  A0B12]  |  [A1   [ A1B1  A1B5  A1B9  A1B13]
    A4   [ A4B0  A4B4  A4B8  A4B12]  |   A5   [ A5B1  A5B5  A5B9  A5B13]
    A8   [ A8B0  A8B4  A8B8  A8B12]  |   A9   [ A9B1  A9B5  A9B9  A9B13]
    A12] [A12B0 A12B4 A12B8 A12B12]  |   A13] [A13B1 A13B5 A13B9 A13B13]
                                     |
             Acol2 ⊗ Brow2           |            Acol3 ⊗ Brow3
             -------------           |            -------------
                                     |
         [B2, B6, B10, B14]          |        [B3 B7 B11 B15]
                                     |
   [A2   [ A2B2  A2B6  A2B10  A2B14] |  [A3   [ A3B3  A3B7  A3B11  A3B15]
    A6   [ A6B2  A6B6  A6B10  A6B14] |   A7   [ A7B3  A7B7  A7B11  A7B15]
    A10  [A10B2 A10B6 A10B10 A10B14] |   A11  [A11B3 A11B7 A11B11 A11B15]
    A14] [A14B2 A14B6 A14B10 A14B14] |   A15] [A15B3 A15B7 A15B11 A15B15]
                                     |

----------------------------------------------------------------------------

                          sum of 4 outer products

       Acol0 ⊗ Brow0 + Acol1 ⊗ Brow1 + Acol2 ⊗ Brow2 + Acol3 ⊗ Brow3

 [ A0B0 +  A1B1 +  A2B2 +  A3B3 ... ...  A0B12 +  A1B13 +  A2B14 +  A3B15]
 [ A4B0 +  A5B1 +  A6B2 +  A7B3 ... ...  A4B12 +  A5B13 +  A6B14 +  A7B15]
 [ A8B0 +  A9B1 + A10B2 + A11B3 ... ...  A8B12 +  A9B13 + A10B14 + A11B15]
 [A12B0 + A13B1 + A14B2 + A15B3 ... ... A12B12 + A13B13 + A14B14 + A15B15]

----------------------------------------------------------------------------

This operation enables the folding of 4 outer products chained via the accumulator into a single outer product.

For example:

%a0_ext = arith.extsi %a0 : vector<[4]xi8> to vector<[4]xi32>
%b0_ext = arith.extsi %b0 : vector<[4]xi8> to vector<[4]xi32>

%a1_ext = arith.extsi %a1 : vector<[4]xi8> to vector<[4]xi32>
%b1_ext = arith.extsi %b1 : vector<[4]xi8> to vector<[4]xi32>

%a2_ext = arith.extsi %a2 : vector<[4]xi8> to vector<[4]xi32>
%b2_ext = arith.extsi %b2 : vector<[4]xi8> to vector<[4]xi32>

%a3_ext = arith.extsi %a3 : vector<[4]xi8> to vector<[4]xi32>
%b3_ext = arith.extsi %b3 : vector<[4]xi8> to vector<[4]xi32>

%0 = arm_sme.outerproduct %a0_ext, %b0_ext : vector<[4]xi32>, vector<[4]xi32>
%1 = arm_sme.outerproduct %a1_ext, %b1_ext acc(%0) : vector<[4]xi32>, vector<[4]xi32>
%2 = arm_sme.outerproduct %a2_ext, %b2_ext acc(%1) : vector<[4]xi32>, vector<[4]xi32>
%3 = arm_sme.outerproduct %a3_ext, %b3_ext acc(%2) : vector<[4]xi32>, vector<[4]xi32>

The 4 outer products in the example above can be fused into a single outer product as follows:

%lhs0 = vector.interleave %a0, %a2 : vector<[4]xi8> -> vector<[8]xi8>
%lhs1 = vector.interleave %a1, %a3 : vector<[4]xi8> -> vector<[8]xi8>
%lhs = vector.interleave %lhs0, %lhs1 : vector<[8]xi8> -> vector<[16]xi8>

%rhs0 = vector.interleave %b0, %b2 : vector<[4]xi8> -> vector<[8]xi8>
%rhs1 = vector.interleave %b1, %b3 : vector<[4]xi8> -> vector<[8]xi8>
%rhs = vector.interleave %rhs0, %rhs1 : vector<[8]xi8> -> vector<[16]xi8>

%0 = arm_sme.smopa_4way %lhs, %rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

This is implemented in the -arm-sme-outer-product-fusion pass.

Example: I8 to I32

%result = arm_sme.smopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.smopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
SpecFeatures
SMOPA (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.smops_2way (arm_sme::SMops2WayOp) 

Signed integer sum of 2 outer products and subtract

Syntax:

operation ::= `arm_sme.smops_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example:

%result = arm_sme.smops_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>

Refer to fmopa_2way for a detailed description of 2-way outer products.

SpecFeatures
SMOPS (2-way)+sme2

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values

arm_sme.smops_4way (arm_sme::SMops4WayOp) 

Signed integer sum of 4 outer products and subtract

Syntax:

operation ::= `arm_sme.smops_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Equivalent to smopa_4way but outer products are subtracted from destination result.

Example: I8 to I32

%result = arm_sme.smops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.smops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
SMOPS (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.store_tile_slice (arm_sme::StoreTileSliceOp) 

Tile slice store operation

Syntax:

operation ::= `arm_sme.store_tile_slice` $tile `,` $tile_slice_index `,` $mask `,` $base `[` $indices `]` (`layout` `` $layout^)?
              attr-dict `:` type($base) `,` type($mask) `,` type($tile)

Stores a 1D tile slice from a 2D SME “virtual tile” into memory. The tile slice is defined by the dimension of the 2D scalable vector type pointed by the index. A tile slice index describes where in the input tile the tile slice is stored from. An optional tile slice layout attribute specifies whether the tile slice being stored from the given index is horizontal (default) or vertical.

The slice of memory written is defined by a base and indices and must be contiguous. The memref must be either rank 1 or rank 2, have dynamic dimensions since the operation is scalable, and the element type must be a scalar that matches the element type of the input tile.

The provided mask is used to specify which elements of the tile slice will be stored.

Example 1: Store vector<[16]xi8> horizontal (default) tile slice from tile at given index to memory.

arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] : vector<[16]x[16]xi8>, vector<[16]xi1>, memref<?x?xi8>

Example 2: Store vector<[4]xf32> vertical tile slice from tile at given index to memory.

arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] layout<vertical> : vector<[4]x[4]xf32>, vector<[4]xi1>, memref<?x?xf32>

Example 3: Store a vector<[1]xi128> vertical tile slice from tile at given index to memory.

arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] layout<vertical> : vector<[1]x[1]xi128>, vector<[1]xi1>, memref<?x?xi128>

Interfaces: ArmSMETileOpInterface

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
tilea vector type that fits into a SME tile
tile_slice_indexindex
maska vector type that matches the size of a SVE predicate
basememref of any type values
indicesvariadic of index

arm_sme.streaming_vl (arm_sme::StreamingVLOp) 

Query the streaming vector length

Syntax:

operation ::= `arm_sme.streaming_vl` $type_size attr-dict

This operation returns the streaming vector length (SVL) for a given type size. Unlike vector.vscale the value returned is invariant to the streaming mode.

Example:

// Streaming vector length in:
// - bytes (8-bit, SVL.B)
%svl_b = arm_sme.streaming_vl <byte>
// - half words (16-bit, SVL.H)
%svl_h = arm_sme.streaming_vl <half>
// - words (32-bit, SVL.W)
%svl_w = arm_sme.streaming_vl <word>
// - double words (64-bit, SVL.D)
%svl_d = arm_sme.streaming_vl <double>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
type_size::mlir::arm_sme::TypeSizeAttr
Size of a vector element type

Enum cases:

  • byte (Byte)
  • half (Half)
  • word (Word)
  • double (Double)

Results: 

ResultDescription
«unnamed»index

arm_sme.sumopa_4way (arm_sme::SuMopa4WayOp) 

Signed by unsigned integer sum of 4 outer products and accumulate

Syntax:

operation ::= `arm_sme.sumopa_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.sumopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.sumopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
SUMOPA (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.sumops_4way (arm_sme::SuMops4WayOp) 

Signed by unsigned integer sum of 4 outer products and subtract

Syntax:

operation ::= `arm_sme.sumops_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.sumops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.sumops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
SUMOPS (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.tile_load (arm_sme::TileLoadOp) 

Tile load operation

Syntax:

operation ::= `arm_sme.tile_load` $base `[` $indices `]` (`,` $padding `,` $mask^)? (`layout` `` $layout^)?attr-dict `:` type($base) `,` type($result)

Loads a 2D SME “virtual tile” from memory defined by a base and indices, with the shape defined by the 2D scalable vector type of the result tile. An optional tile slice layout attribute specifies whether the slices of the tile being loaded are horizontal (default) or vertical. The slice of memory must be contiguous. The memref must be either rank 1 or rank 2 with dynamic dimensions, since the operation is scalable, and the element type must be a scalar that matches the element type of the result.

An optional SSA value padding of the same elemental type as the MemRef is provided to specify a fallback value in the case of masking.

An optional SSA value mask may be specified to mask out elements read from the MemRef. The mask type is an i1 vector with a shape that matches how elements are read from the MemRef. Elements whose corresponding mask element is 0 are masked out and replaced with padding.

If either padding or mask are specified, both must be specified.

Example 1: Load an 8-bit element ZA tile with horizontal layout (default) from memory (ZA0.B).

%tile = arm_sme.tile_load %base[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>

Example 2: Load a FP 32-bit element ZA tile with vertical layout from memory.

%tile = arm_sme.tile_load %base[%c0, %c0] layout<vertical> : memref<?x?xf32>, vector<[4]x[4]xf32>

Example 3: Load a 128-bit element ZA tile with horizontal layout (default) from memory.

%tile = arm_sme.tile_load %base[%c0, %c0] layout<horizontal> : memref<?x?xi128>, vector<[1]x[1]xi128>

Example 4: Masked load of int 32-bit element ZA tile with horizontal layout (default) from memory.

%tile = arm_sme.tile_load %base[%c0, %c0], %pad, %mask : memref<?x?xf32>, vector<[4]x[4]xf32>

Traits: AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
basememref of any type values
indicesvariadic of index
paddingany type
maskvector of any type values

Results: 

ResultDescription
resulta vector type that fits into a SME tile

arm_sme.tile_store (arm_sme::TileStoreOp) 

Tile store operation

Syntax:

operation ::= `arm_sme.tile_store` $valueToStore `,` $base `[` $indices `]` (`,` $mask^)? (`layout` `` $layout^)?attr-dict `:` type($base) `,` type($valueToStore)

Stores a 2D SME “virtual tile” to memory defined by a base and indices, with the shape defined by the 2D scalable vector type of the tile being stored. An optional tile slice layout attribute specifies whether the slices of the tile being stored are horizontal (default) or vertical. The slice of memory must be contiguous. The memref must be either rank 1 or rank 2 with dynamic dimensions, since the operation is scalable, and the element type must be a scalar that matches the element type of the result.

An optional mask may be provided, the shape of which corresponds to the tile, and selects which elements of the tile will be stored.

Example 1: Store an 8-bit element ZA tile with horizontal (default) layout to memory (ZA0.B).

arm_sme.tile_store %tile, %base[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>

Example 2: Store a FP 32-bit element ZA tile with vertical layout to memory.

arm_sme.tile_store %tile, %base[%c0, %c0] layout<vertical> : vector<[4]x[4]xf32>, memref<?x?xf32>

Example 3: Store a 128-bit element ZA tile with horizontal (default) layout to memory.

arm_sme.tile_store %tile, %base[%c0, %c0] layout<horizontal> : vector<[1]x[1]xi128>, memref<?x?xi128>

Example 4: Masked store a int 32-bit element ZA tile with vertical layout to memory.

arm_sme.tile_store %tile, %base[%c0, %c0], %mask layout<vertical> : vector<[4]x[4]xf32>, memref<?x?xf32>

Traits: AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::arm_sme::TileSliceLayoutAttr
Layout of a tile slice

Enum cases:

  • horizontal (Horizontal)
  • vertical (Vertical)

Operands: 

OperandDescription
valueToStorea vector type that fits into a SME tile
basememref of any type values
indicesvariadic of index
maskvector of any type values

arm_sme.umopa_2way (arm_sme::UMopa2WayOp) 

Unsiged integer sum of 2 outer products and accumulate

Syntax:

operation ::= `arm_sme.umopa_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example:

%result = arm_sme.umopa_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>

Refer to fmopa_2way for a detailed description of 2-way outer products.

SpecFeatures
UMOPA (2-way)+sme2

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values

arm_sme.umopa_4way (arm_sme::UMopa4WayOp) 

Unsigned integer sum of 4 outer products and accumulate

Syntax:

operation ::= `arm_sme.umopa_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.umopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.umopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
UMOPA (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.umops_2way (arm_sme::UMops2WayOp) 

Unsiged integer sum of 2 outer products and subtract

Syntax:

operation ::= `arm_sme.umops_2way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example:

%result = arm_sme.umops_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>

Refer to fmopa_2way for a detailed description of 2-way outer products.

SpecFeatures
UMOPS (2-way)+sme2

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values

arm_sme.umops_4way (arm_sme::UMops4WayOp) 

Unsigned integer sum of 4 outer products and subtract

Syntax:

operation ::= `arm_sme.umops_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.umops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.umops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
UMOPS (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.usmopa_4way (arm_sme::UsMopa4WayOp) 

Unsigned by signed integer sum of 4 outer products and accumulate

Syntax:

operation ::= `arm_sme.usmopa_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.usmopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.usmopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
USMOPA (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.usmops_4way (arm_sme::UsMops4WayOp) 

Unsigned by signed integer sum of 4 outer products and subtract

Syntax:

operation ::= `arm_sme.usmops_4way` $lhs `,` $rhs
              oilist(
              `acc` `` `(` $acc `)`
              | `masks` `` `(` $lhsMask `,` $rhsMask `)`
              ) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)

Example: I8 to I32

%result = arm_sme.usmops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>

Example: I16 to I64

%result = arm_sme.usmops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>

Refer to smopa_4way for a detailed description of 4-way outer products.

SpecFeatures
USMOPS (4-way)+sme (32-bit), +sme-i16i64 (64-bit)

Traits: AlwaysSpeculatableImplTrait, AttrSizedOperandSegments

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
lhsof ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8
rhsvector of any type values
lhsMaskvector of any type values
rhsMaskvector of any type values
accvector of any type values

Results: 

ResultDescription
resultvector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values

arm_sme.zero (arm_sme::ZeroOp) 

Creates a zero-initialized value of SME virtual tile type

Syntax:

operation ::= `arm_sme.zero` attr-dict `:` type($res)

Creates a new SME “virtual tile” value within a function. The contents of the tile returned from this operation are zero-initialized.

Example 1: Zero an 8-bit element ZA tile.

%0 = arm_sme.zero : vector<[16]x[16]xi8>

Example 2: Zero a 64-bit element ZA tile.

%0 = arm_sme.zero : vector<[2]x[2]xi64>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArmSMETileOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resa vector type that fits into a SME tile

Operations for LLVM IR Intrinsics 

source

arm_sme.intr.cntsb (arm_sme::aarch64_sme_cntsb) 

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.cntsd (arm_sme::aarch64_sme_cntsd) 

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.cntsh (arm_sme::aarch64_sme_cntsh) 

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.cntsw (arm_sme::aarch64_sme_cntsw) 

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.ld1b.horiz (arm_sme::aarch64_sme_ld1b_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1b.vert (arm_sme::aarch64_sme_ld1b_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1d.horiz (arm_sme::aarch64_sme_ld1d_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1d.vert (arm_sme::aarch64_sme_ld1d_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1h.horiz (arm_sme::aarch64_sme_ld1h_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1h.vert (arm_sme::aarch64_sme_ld1h_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1q.horiz (arm_sme::aarch64_sme_ld1q_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1q.vert (arm_sme::aarch64_sme_ld1q_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1w.horiz (arm_sme::aarch64_sme_ld1w_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.ld1w.vert (arm_sme::aarch64_sme_ld1w_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
load_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.mopa (arm_sme::aarch64_sme_mopa) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.mopa.wide (arm_sme::aarch64_sme_mopa_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.mops (arm_sme::aarch64_sme_mops) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.mops.wide (arm_sme::aarch64_sme_mops_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.read.horiz (arm_sme::aarch64_sme_read_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
vectora vector type that matches the size of a SVE vector
predicatea vector type that matches the size of a SVE predicate
tile_slice_index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.read.vert (arm_sme::aarch64_sme_read_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
vectora vector type that matches the size of a SVE vector
predicatea vector type that matches the size of a SVE predicate
tile_slice_index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

arm_sme.intr.smopa.wide (arm_sme::aarch64_sme_smopa_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.smopa.za32 (arm_sme::aarch64_sme_smopa_za32) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.smops.wide (arm_sme::aarch64_sme_smops_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.smops.za32 (arm_sme::aarch64_sme_smops_za32) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.st1b.horiz (arm_sme::aarch64_sme_st1b_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1b.vert (arm_sme::aarch64_sme_st1b_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1d.horiz (arm_sme::aarch64_sme_st1d_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1d.vert (arm_sme::aarch64_sme_st1d_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1h.horiz (arm_sme::aarch64_sme_st1h_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1h.vert (arm_sme::aarch64_sme_st1h_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1q.horiz (arm_sme::aarch64_sme_st1q_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1q.vert (arm_sme::aarch64_sme_st1q_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1w.horiz (arm_sme::aarch64_sme_st1w_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.st1w.vert (arm_sme::aarch64_sme_st1w_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
predicatea vector type that matches the size of a SVE predicate
store_addressLLVM pointer type
tile_slice_index32-bit signless integer

arm_sme.intr.str (arm_sme::aarch64_sme_str) 

Operands: 

OperandDescription
index32-bit signless integer
store_addressLLVM pointer type
offset32-bit signless integer

arm_sme.intr.sumopa.wide (arm_sme::aarch64_sme_sumopa_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.sumops.wide (arm_sme::aarch64_sme_sumops_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.umopa.wide (arm_sme::aarch64_sme_umopa_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.umopa.za32 (arm_sme::aarch64_sme_umopa_za32) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.umops.wide (arm_sme::aarch64_sme_umops_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.umops.za32 (arm_sme::aarch64_sme_umops_za32) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.usmopa.wide (arm_sme::aarch64_sme_usmopa_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.usmops.wide (arm_sme::aarch64_sme_usmops_wide) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
lhs_predicatea vector type that is a supported predicate for the SME MOP instructions
rhs_predicatea vector type that is a supported predicate for the SME MOP instructions
lhs_vectora vector type that is a supported input for the SME MOP instructions
rhs_vectora vector type that is a supported input for the SME MOP instructions

arm_sme.intr.write.horiz (arm_sme::aarch64_sme_write_horiz) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
tile_slice_index32-bit signless integer
predicatea vector type that matches the size of a SVE predicate
vectora vector type that matches the size of a SVE vector

arm_sme.intr.write.vert (arm_sme::aarch64_sme_write_vert) 

Attributes: 

AttributeMLIR TypeDescription
tile_id::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
tile_slice_index32-bit signless integer
predicatea vector type that matches the size of a SVE predicate
vectora vector type that matches the size of a SVE vector

arm_sme.intr.zero (arm_sme::aarch64_sme_zero) 

Attributes: 

AttributeMLIR TypeDescription
tile_mask::mlir::IntegerAttr32-bit signless integer attribute