'ArmSME' Dialect
Basic dialect to target Arm SME.
This dialect defines custom and LLVM IR intrinsic operations that are used to target Arm Scalable Matrix Extension. Through the available conversion and ArmSME passes you can, for example, lower a linalg.matmul operation to Arm SME FMOPA (floating-point outer product) operations. See one of the in-tree end-to-end integration tests for reference:
In order to run ArmSME integration tests, include these flags in the CMake invocation when configuring LLVM and MLIR:
-DMLIR_INCLUDE_INTEGRATION_TESTS=On
-DMLIR_RUN_ARM_SME_TESTS=On
-DARM_EMULATOR_EXECUTABLE=<path-to-emulator>
These tests are run “post-commit” by the clang-aarch64-sve-vla LLVM BuildBot worker.
References:
Operations ¶
arm_sme.copy_tile
(arm_sme::CopyTileOp) ¶
Copies an SME tile value
Syntax:
operation ::= `arm_sme.copy_tile` $tile attr-dict `:` type($result)
Copies an SME “virtual tile” value to a new SSA value. This operation is primarily intended to be used to normalize the IR prior to tile allocation.
Example:
%copy = arm_sme.copy_tile %tile : vector<[4]x[4]xf32>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
tile | a vector type that fits into a SME tile |
Results: ¶
Result | Description |
---|---|
result | a vector type that fits into a SME tile |
arm_sme.extract_tile_slice
(arm_sme::ExtractTileSliceOp) ¶
Extract 1-D scalable vector from slice of 2-D tile
Syntax:
operation ::= `arm_sme.extract_tile_slice` $tile `[` $tile_slice_index `]` (`layout` `` $layout^)? attr-dict
`:` type($result) `from` type($tile)
Extracts a 1-D scalable slice from a 2-D scalable tile at the given index. A tile slice is a 1-D vector of horizontally or vertically contiguous elements within a ZA tile.
An optional tile slice layout attribute specifies whether the tile slice is horizontal (default) or vertical.
Example 1: Extract vector<[16]xi8>
from tile horizontally at the given index.
%slice = arm_sme.extract_tile_slice %tile[%tile_slice_index] : vector<[16]xi8> from vector<[16]x[16]xi8>
Example 2: Extract vector<[2]xf64>
from tile vertically at the given index.
%slice = arm_sme.extract_tile_slice %tile[%tile_slice_index] layout<vertical> : vector<[2]xf64> from vector<[2]x[2]xf64>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
tile | a vector type that fits into a SME tile |
tile_slice_index | index |
Results: ¶
Result | Description |
---|---|
result | a vector type that matches the size of a SVE vector |
arm_sme.fmopa_2way
(arm_sme::FMopa2WayOp) ¶
Floating-point sum of 2 outer products and accumulate
Syntax:
operation ::= `arm_sme.fmopa_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
This operation represents a sum of 2 widened outer products. It takes 2 1-D scalable vectors as input and a 2-D scalable vector (ZA tile) as output.
For example (fp16 to fp32):
%result = arm_sme.fmopa_2way %lhs, %rhs :
vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>
The lhs
encodes a matrix of shape SVLSx2 and the rhs
a matrix of
2xSVLS, where SVLS (spec [1], section B2.1) is the number of 32-bit
elements in a vector of SVL bits. To illustrate, below is a breakdown of
this operation for fp16 to fp32, SVL=128 (i.e., vscale=1):
LHS RHS
[A0 A1 A2 A3 A4 A5 A6 A7] [B0 B1 B2 B3 B4 B5 B6 B7]
----------------------------------------------------------------------------
implicit layout
[A0 A1] |
[A2 A3] | [B0 B2 B4 B6]
[A4 A5] | [B1 B3 B5 B7]
[A6 A7] |
----------------------------------------------------------------------------
2 outer products
Acol0 ⊗ Brow0 | Acol1 ⊗ Brow1
------------- | -------------
|
[B0 B2 B4 B6] | [B1 B3 B5 B7]
|
[A0 [A0B0 A0B2 A0B4 A0B6] | [A1 [A1B1 A1B3 A1B5 A1B7]
A2 [A2B0 A2B2 A2B4 A2B6] | A3 [A3B1 A3B3 A3B5 A3B7]
A4 [A4B0 A4B2 A4B4 A4B6] | A5 [A5B1 A5B3 A5B5 A5B7]
A6] [A6B0 A6B2 A6B4 A6B6] | A7] [A7B1 A7B3 A7B5 A7B7]
|
----------------------------------------------------------------------------
sum of 2 outer products
Acol0 ⊗ Brow0 + Acol1 ⊗ Brow1
[A0B0 + A1B1 A0B2 + A1B3 A0B4 + A1B5 A0B6 + A1B7]
[A2B0 + A3B1 A2B2 + A3B3 A2B4 + A3B5 A2B6 + A3B7]
[A4B0 + A5B1 A4B2 + A5B3 A4B4 + A5B5 A4B6 + A5B7]
[A6B0 + A7B1 A6B2 + A7B3 A6B4 + A7B5 A6B6 + A7B7]
----------------------------------------------------------------------------
This operation enables the folding of 2 outer products chained via the accumulator into a single outer product.
For example:
%a0_ext = arith.extf %a0 : vector<[4]xf16> to vector<[4]xf32>
%b0_ext = arith.extf %b0 : vector<[4]xf16> to vector<[4]xf32>
%a1_ext = arith.extf %a1 : vector<[4]xf16> to vector<[4]xf32>
%b1_ext = arith.extf %b1 : vector<[4]xf16> to vector<[4]xf32>
%0 = arm_sme.outerproduct %a0_ext, %b0_ext : vector<[4]xf32>, vector<[4]xf32>
%1 = arm_sme.outerproduct %a1_ext, %b1_ext acc(%0) : vector<[4]xf32>, vector<[4]xf32>
The 2 outer products in the example above can be fused into a single outer product as follows:
%a_packed = vector.interleave %a0, %a1 : vector<[4]xf16> -> vector<[8]xf16>
%b_packed = vector.interleave %b0, %b1 : vector<[4]xf16> -> vector<[8]xf16>
%0 = arm_sme.fmopa_2way %a_packed, %b_packed : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>
This is implemented in the -arm-sme-outer-product-fusion
pass.
Example: FP16 to FP32
%result = arm_sme.fmopa_2way $lhs, $rhs : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>
Example: BF16 to FP32
%result = arm_sme.fmopa_2way $lhs, $rhs : vector<[8]xbf16>, vector<[8]xbf16> into vector<[4]x[4]xf32>
Spec | Features |
---|---|
FMOPA (widening, 2-way, FP16 to FP32) | +sme |
BFMOPA (widening, 2-way, BF16 to FP32) | +sme |
[1] https://developer.arm.com/documentation/ddi0616
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit float or bfloat16 type values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xf32> of 32-bit float values |
arm_sme.fmops_2way
(arm_sme::FMops2WayOp) ¶
Floating-point sum of 2 outer products and subtract
Syntax:
operation ::= `arm_sme.fmops_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Equivalent to fmopa_2way
but outer products are subtracted from
destination result
.
Example: FP16 to FP32
%result = arm_sme.fmops_2way $lhs, $rhs : vector<[8]xf16>, vector<[8]xf16> into vector<[4]x[4]xf32>
Example: BF16 to FP32
%result = arm_sme.fmops_2way $lhs, $rhs : vector<[8]xbf16>, vector<[8]xbf16> into vector<[4]x[4]xf32>
Refer to fmopa_2way for a detailed description of 2-way outer products.
Spec | Features |
---|---|
FMOPS (widening, 2-way, FP16 to FP32) | +sme |
BFMOPS (widening, 2-way, BF16 to FP32) | +sme |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit float or bfloat16 type values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xf32> of 32-bit float values |
arm_sme.get_tile
(arm_sme::GetTileOp) ¶
Creates an undefined value of SME virtual tile type
Syntax:
operation ::= `arm_sme.get_tile` attr-dict `:` type($tile)
Creates a new SME “virtual tile” value within a function. The contents of the tile returned from this operation are undefined.
Example 1:
// Create an 8-bit element "virtual tile" value:
%za0_b = arm_sme.get_tile: vector<[16]x[16]xi8>
Example 2:
// Create two 16-bit element "virtual tiles" values:
%za0_h = arm_sme.get_tile : vector<[8]x[8]xi16>
%za1_h = arm_sme.get_tile : vector<[8]x[8]xi16>
Example 3:
// Create an 128-bit element "virtual tile" value:
%za0_q = arm_sme.get_tile : vector<[1]x[1]xi128>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
tile | a vector type that fits into a SME tile |
arm_sme.insert_tile_slice
(arm_sme::InsertTileSliceOp) ¶
Insert 1-D scalable vector into slice of 2-D tile
Syntax:
operation ::= `arm_sme.insert_tile_slice` $vector `,` $tile `[` $tile_slice_index `]` (`layout` `` $layout^)?
attr-dict `:` type($vector) `into` type($result)
Inserts a 1-D scalable vector into a slice of a 2-D scalable vector tile at the given index. The type of the 1-D scalable vector to be inserted must match the type of the tile slice. A tile slice is a 1-D vector of horizontally or vertically contiguous elements within a ZA tile. The updated tile is returned as the result.
An optional tile slice layout attribute specifies whether the tile slice is horizontal (default) or vertical.
Example 1: Insert vector<[16]xi8>
into tile horizontally at the given index.
%tile_update = arm_sme.insert_tile_slice %vector, %tile[%tile_slice_index] : vector<[16]xi8> into vector<[16]x[16]xi8>
Example 2: Insert vector<[2]xf64>
into tile vertically at the given index.
%tile_update = arm_sme.insert_tile_slice %vector, %tile[%tile_slice_index] layout<vertical> : vector<[2]xf64> into vector<[2]x[2]xf64>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
vector | a vector type that matches the size of a SVE vector |
tile | a vector type that fits into a SME tile |
tile_slice_index | index |
Results: ¶
Result | Description |
---|---|
result | a vector type that fits into a SME tile |
arm_sme.load_tile_slice
(arm_sme::LoadTileSliceOp) ¶
Tile slice load and update operation
Syntax:
operation ::= `arm_sme.load_tile_slice` $base `[` $indices `]` `,` $mask `,` $tile `,` $tile_slice_index
(`layout` `` $layout^)? attr-dict `:` type($base) `,` type($mask) `,`
type($result)
Loads a 1D tile slice from memory into a 2D SME “virtual tile”. The tile slice is defined by the dimension of the 2D scalable vector type pointed by the index. A tile slice index describes where in the input tile the tile slice is loaded to. An optional tile slice layout attribute specifies whether the tile slice being loaded at the given index is horizontal (default) or vertical. The updated tile is returned as the result.
The slice of memory read is defined by a base and indices and must be contiguous. The memref must be either rank 1 or rank 2, have dynamic dimensions since the operation is scalable, and the element type must be a scalar that matches the element type of the result.
The provided mask
is used to specify which elements of the tile slice
will be loaded.
Example 1: Load a vector<[16]xi8> tile slice from memory into tile horizontally (default) at given index.
%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index : memref<?x?xi8>, vector<[16]xi1>, vector<[16]x[16]xi8>
Example 2: Load a vector<[4]xf32> tile slice from memory into tile vertically at given index.
%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index layout<vertical> : memref<?x?xf32>, vector<[4]xi1>, vector<[4]x[4]xf32>
Example 3: Load a vector<[1]xi128> tile slice from memory into tile vertically at given index.
%tile_update = arm_sme.load_tile_slice %base[%c0], %mask, %tile, %tile_slice_index layout<vertical> : memref<?x?xi128>, vector<[1]xi1>, vector<[1]x[1]xi128>
Interfaces: ArmSMETileOpInterface
, InferTypeOpInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
base | memref of any type values |
mask | a vector type that matches the size of a SVE predicate |
tile | a vector type that fits into a SME tile |
indices | variadic of index |
tile_slice_index | index |
Results: ¶
Result | Description |
---|---|
result | a vector type that fits into a SME tile |
arm_sme.outerproduct
(arm_sme::OuterProductOp) ¶
Outer product with optional fused add/sub
Syntax:
operation ::= `arm_sme.outerproduct` $lhs `,` $rhs
oilist(
`kind` `` $kind
| `acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs)
This operation represents an outer product that fits within an SME tile.
All operands must be SVE vectors and the result a SME tile. Unlike
vector.outerproduct
masking is on the operands (rather than the result),
which mirrors the SME instructions.
Example 1: Unmasked outerproduct (without accumulator)
// Not specifying an accumulator implicitly zeros the destination tile.
%result = arm_sme.outerproduct $lhs, $rhs : vector<[4]xf32>, vector<[4]xf32>
Example 2: Unmasked outerproduct (with accumulator)
%result = arm_sme.outerproduct $lhs, $rhs acc($accumulator)
: vector<[4]xf32>, vector<[4]xf32>
Example 3: Masked outerproduct
%result = arm_sme.outerproduct $lhs, $rhs masks($lhsMask, $rhsMask)
: vector<[4]xf32>, vector<[4]xf32>
Example 4: Masked outerproduct (with accumulator)
%result = arm_sme.outerproduct $lhs, $rhs acc($accumulator) masks($lhsMask, $rhsMask)
: vector<[4]xf32>, vector<[4]xf32>
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::arm_sme::CombiningKindAttr | Kind of combining functionEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
lhs | a vector type that matches the size of a SVE vector |
rhs | a vector type that matches the size of a SVE vector |
lhsMask | a vector type that matches the size of a SVE predicate |
rhsMask | a vector type that matches the size of a SVE predicate |
acc | a vector type that fits into a SME tile |
Results: ¶
Result | Description |
---|---|
result | a vector type that fits into a SME tile |
arm_sme.smopa_2way
(arm_sme::SMopa2WayOp) ¶
Signed integer sum of 2 outer products and accumulate
Syntax:
operation ::= `arm_sme.smopa_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example:
%result = arm_sme.smopa_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>
Refer to fmopa_2way for a detailed description of 2-way outer products.
Spec | Features |
---|---|
SMOPA (2-way) | +sme2 |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values |
arm_sme.smopa_4way
(arm_sme::SMopa4WayOp) ¶
Signed integer sum of 4 outer products and accumulate
Syntax:
operation ::= `arm_sme.smopa_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
This operation represents a sum of 4 widened outer products. It takes 2 1-D scalable vectors as input and a 2-D scalable vector (ZA tile) as output.
For example (i8 to i32):
%result = arm_sme.smopa_4way $lhs, $rhs :
vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
The lhs
encodes a matrix of shape SVLSx4 and the rhs
a matrix of
4xSVLS, where SVLS (spec [1], section B2.1) is the number of 32-bit
elements in a vector of SVL bits. To illustrate, below is a breakdown of
this operation for i8 to i32, SVL=128 (i.e., vscale=1):
LHS
[A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A15 A14 A15]
RHS
[B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11 B12 B13 B14 B15]
----------------------------------------------------------------------------
implicit layout
[A0 A1 A2 A3] | [B0 B4 B8 B12]
[A4 A5 A6 A7] | [B1 B5 B9 B13]
[A8 A9 A10 A11] | [B2 B6 B10 B14]
[A12 A13 A14 A15] | [B3 B7 B11 B15]
----------------------------------------------------------------------------
4 outer products
Acol0 ⊗ Brow0 | Acol1 ⊗ Brow1
------------- | -------------
|
[B0 B4 B8 B12] | [B1 B5 B9 B13]
|
[A0 [ A0B0 A0B4 A0B8 A0B12] | [A1 [ A1B1 A1B5 A1B9 A1B13]
A4 [ A4B0 A4B4 A4B8 A4B12] | A5 [ A5B1 A5B5 A5B9 A5B13]
A8 [ A8B0 A8B4 A8B8 A8B12] | A9 [ A9B1 A9B5 A9B9 A9B13]
A12] [A12B0 A12B4 A12B8 A12B12] | A13] [A13B1 A13B5 A13B9 A13B13]
|
Acol2 ⊗ Brow2 | Acol3 ⊗ Brow3
------------- | -------------
|
[B2, B6, B10, B14] | [B3 B7 B11 B15]
|
[A2 [ A2B2 A2B6 A2B10 A2B14] | [A3 [ A3B3 A3B7 A3B11 A3B15]
A6 [ A6B2 A6B6 A6B10 A6B14] | A7 [ A7B3 A7B7 A7B11 A7B15]
A10 [A10B2 A10B6 A10B10 A10B14] | A11 [A11B3 A11B7 A11B11 A11B15]
A14] [A14B2 A14B6 A14B10 A14B14] | A15] [A15B3 A15B7 A15B11 A15B15]
|
----------------------------------------------------------------------------
sum of 4 outer products
Acol0 ⊗ Brow0 + Acol1 ⊗ Brow1 + Acol2 ⊗ Brow2 + Acol3 ⊗ Brow3
[ A0B0 + A1B1 + A2B2 + A3B3 ... ... A0B12 + A1B13 + A2B14 + A3B15]
[ A4B0 + A5B1 + A6B2 + A7B3 ... ... A4B12 + A5B13 + A6B14 + A7B15]
[ A8B0 + A9B1 + A10B2 + A11B3 ... ... A8B12 + A9B13 + A10B14 + A11B15]
[A12B0 + A13B1 + A14B2 + A15B3 ... ... A12B12 + A13B13 + A14B14 + A15B15]
----------------------------------------------------------------------------
This operation enables the folding of 4 outer products chained via the accumulator into a single outer product.
For example:
%a0_ext = arith.extsi %a0 : vector<[4]xi8> to vector<[4]xi32>
%b0_ext = arith.extsi %b0 : vector<[4]xi8> to vector<[4]xi32>
%a1_ext = arith.extsi %a1 : vector<[4]xi8> to vector<[4]xi32>
%b1_ext = arith.extsi %b1 : vector<[4]xi8> to vector<[4]xi32>
%a2_ext = arith.extsi %a2 : vector<[4]xi8> to vector<[4]xi32>
%b2_ext = arith.extsi %b2 : vector<[4]xi8> to vector<[4]xi32>
%a3_ext = arith.extsi %a3 : vector<[4]xi8> to vector<[4]xi32>
%b3_ext = arith.extsi %b3 : vector<[4]xi8> to vector<[4]xi32>
%0 = arm_sme.outerproduct %a0_ext, %b0_ext : vector<[4]xi32>, vector<[4]xi32>
%1 = arm_sme.outerproduct %a1_ext, %b1_ext acc(%0) : vector<[4]xi32>, vector<[4]xi32>
%2 = arm_sme.outerproduct %a2_ext, %b2_ext acc(%1) : vector<[4]xi32>, vector<[4]xi32>
%3 = arm_sme.outerproduct %a3_ext, %b3_ext acc(%2) : vector<[4]xi32>, vector<[4]xi32>
The 4 outer products in the example above can be fused into a single outer product as follows:
%lhs0 = vector.interleave %a0, %a2 : vector<[4]xi8> -> vector<[8]xi8>
%lhs1 = vector.interleave %a1, %a3 : vector<[4]xi8> -> vector<[8]xi8>
%lhs = vector.interleave %lhs0, %lhs1 : vector<[8]xi8> -> vector<[16]xi8>
%rhs0 = vector.interleave %b0, %b2 : vector<[4]xi8> -> vector<[8]xi8>
%rhs1 = vector.interleave %b1, %b3 : vector<[4]xi8> -> vector<[8]xi8>
%rhs = vector.interleave %rhs0, %rhs1 : vector<[8]xi8> -> vector<[16]xi8>
%0 = arm_sme.smopa_4way %lhs, %rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
This is implemented in the -arm-sme-outer-product-fusion
pass.
Example: I8 to I32
%result = arm_sme.smopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.smopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Spec | Features |
---|---|
SMOPA (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.smops_2way
(arm_sme::SMops2WayOp) ¶
Signed integer sum of 2 outer products and subtract
Syntax:
operation ::= `arm_sme.smops_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example:
%result = arm_sme.smops_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>
Refer to fmopa_2way for a detailed description of 2-way outer products.
Spec | Features |
---|---|
SMOPS (2-way) | +sme2 |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values |
arm_sme.smops_4way
(arm_sme::SMops4WayOp) ¶
Signed integer sum of 4 outer products and subtract
Syntax:
operation ::= `arm_sme.smops_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Equivalent to smopa_4way
but outer products are subtracted from
destination result
.
Example: I8 to I32
%result = arm_sme.smops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.smops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
SMOPS (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.store_tile_slice
(arm_sme::StoreTileSliceOp) ¶
Tile slice store operation
Syntax:
operation ::= `arm_sme.store_tile_slice` $tile `,` $tile_slice_index `,` $mask `,` $base `[` $indices `]` (`layout` `` $layout^)?
attr-dict `:` type($base) `,` type($mask) `,` type($tile)
Stores a 1D tile slice from a 2D SME “virtual tile” into memory. The tile slice is defined by the dimension of the 2D scalable vector type pointed by the index. A tile slice index describes where in the input tile the tile slice is stored from. An optional tile slice layout attribute specifies whether the tile slice being stored from the given index is horizontal (default) or vertical.
The slice of memory written is defined by a base and indices and must be contiguous. The memref must be either rank 1 or rank 2, have dynamic dimensions since the operation is scalable, and the element type must be a scalar that matches the element type of the input tile.
The provided mask
is used to specify which elements of the tile slice
will be stored.
Example 1: Store vector<[16]xi8> horizontal (default) tile slice from tile at given index to memory.
arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] : vector<[16]x[16]xi8>, vector<[16]xi1>, memref<?x?xi8>
Example 2: Store vector<[4]xf32> vertical tile slice from tile at given index to memory.
arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] layout<vertical> : vector<[4]x[4]xf32>, vector<[4]xi1>, memref<?x?xf32>
Example 3: Store a vector<[1]xi128> vertical tile slice from tile at given index to memory.
arm_sme.store_tile_slice %tile, %tile_slice_index, %mask, %base[%c0] layout<vertical> : vector<[1]x[1]xi128>, vector<[1]xi1>, memref<?x?xi128>
Interfaces: ArmSMETileOpInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
tile | a vector type that fits into a SME tile |
tile_slice_index | index |
mask | a vector type that matches the size of a SVE predicate |
base | memref of any type values |
indices | variadic of index |
arm_sme.streaming_vl
(arm_sme::StreamingVLOp) ¶
Query the streaming vector length
Syntax:
operation ::= `arm_sme.streaming_vl` $type_size attr-dict
This operation returns the streaming vector length (SVL) for a given type
size. Unlike vector.vscale
the value returned is invariant to the
streaming mode.
Example:
// Streaming vector length in:
// - bytes (8-bit, SVL.B)
%svl_b = arm_sme.streaming_vl <byte>
// - half words (16-bit, SVL.H)
%svl_h = arm_sme.streaming_vl <half>
// - words (32-bit, SVL.W)
%svl_w = arm_sme.streaming_vl <word>
// - double words (64-bit, SVL.D)
%svl_d = arm_sme.streaming_vl <double>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferTypeOpInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
type_size | ::mlir::arm_sme::TypeSizeAttr | Size of a vector element typeEnum cases:
|
Results: ¶
Result | Description |
---|---|
«unnamed» | index |
arm_sme.sumopa_4way
(arm_sme::SuMopa4WayOp) ¶
Signed by unsigned integer sum of 4 outer products and accumulate
Syntax:
operation ::= `arm_sme.sumopa_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.sumopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.sumopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
SUMOPA (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.sumops_4way
(arm_sme::SuMops4WayOp) ¶
Signed by unsigned integer sum of 4 outer products and subtract
Syntax:
operation ::= `arm_sme.sumops_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.sumops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.sumops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
SUMOPS (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.tile_load
(arm_sme::TileLoadOp) ¶
Tile load operation
Syntax:
operation ::= `arm_sme.tile_load` $base `[` $indices `]` (`,` $padding `,` $mask^)? (`layout` `` $layout^)?attr-dict `:` type($base) `,` type($result)
Loads a 2D SME “virtual tile” from memory defined by a base and indices, with the shape defined by the 2D scalable vector type of the result tile. An optional tile slice layout attribute specifies whether the slices of the tile being loaded are horizontal (default) or vertical. The slice of memory must be contiguous. The memref must be either rank 1 or rank 2 with dynamic dimensions, since the operation is scalable, and the element type must be a scalar that matches the element type of the result.
An optional SSA value padding
of the same elemental type as the MemRef is
provided to specify a fallback value in the case of masking.
An optional SSA value mask
may be specified to mask out elements read
from the MemRef. The mask
type is an i1
vector with a shape that
matches how elements are read from the MemRef. Elements whose corresponding
mask element is 0
are masked out and replaced with padding
.
If either padding
or mask
are specified, both must be specified.
Example 1: Load an 8-bit element ZA tile with horizontal layout (default) from memory (ZA0.B).
%tile = arm_sme.tile_load %base[%c0, %c0] : memref<?x?xi8>, vector<[16]x[16]xi8>
Example 2: Load a FP 32-bit element ZA tile with vertical layout from memory.
%tile = arm_sme.tile_load %base[%c0, %c0] layout<vertical> : memref<?x?xf32>, vector<[4]x[4]xf32>
Example 3: Load a 128-bit element ZA tile with horizontal layout (default) from memory.
%tile = arm_sme.tile_load %base[%c0, %c0] layout<horizontal> : memref<?x?xi128>, vector<[1]x[1]xi128>
Example 4: Masked load of int 32-bit element ZA tile with horizontal layout (default) from memory.
%tile = arm_sme.tile_load %base[%c0, %c0], %pad, %mask : memref<?x?xf32>, vector<[4]x[4]xf32>
Traits: AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
base | memref of any type values |
indices | variadic of index |
padding | any type |
mask | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | a vector type that fits into a SME tile |
arm_sme.tile_store
(arm_sme::TileStoreOp) ¶
Tile store operation
Syntax:
operation ::= `arm_sme.tile_store` $valueToStore `,` $base `[` $indices `]` (`,` $mask^)? (`layout` `` $layout^)?attr-dict `:` type($base) `,` type($valueToStore)
Stores a 2D SME “virtual tile” to memory defined by a base and indices, with the shape defined by the 2D scalable vector type of the tile being stored. An optional tile slice layout attribute specifies whether the slices of the tile being stored are horizontal (default) or vertical. The slice of memory must be contiguous. The memref must be either rank 1 or rank 2 with dynamic dimensions, since the operation is scalable, and the element type must be a scalar that matches the element type of the result.
An optional mask
may be provided, the shape of which corresponds to the
tile
, and selects which elements of the tile will be stored.
Example 1: Store an 8-bit element ZA tile with horizontal (default) layout to memory (ZA0.B).
arm_sme.tile_store %tile, %base[%c0, %c0] : vector<[16]x[16]xi8>, memref<?x?xi8>
Example 2: Store a FP 32-bit element ZA tile with vertical layout to memory.
arm_sme.tile_store %tile, %base[%c0, %c0] layout<vertical> : vector<[4]x[4]xf32>, memref<?x?xf32>
Example 3: Store a 128-bit element ZA tile with horizontal (default) layout to memory.
arm_sme.tile_store %tile, %base[%c0, %c0] layout<horizontal> : vector<[1]x[1]xi128>, memref<?x?xi128>
Example 4: Masked store a int 32-bit element ZA tile with vertical layout to memory.
arm_sme.tile_store %tile, %base[%c0, %c0], %mask layout<vertical> : vector<[4]x[4]xf32>, memref<?x?xf32>
Traits: AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::arm_sme::TileSliceLayoutAttr | Layout of a tile sliceEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
valueToStore | a vector type that fits into a SME tile |
base | memref of any type values |
indices | variadic of index |
mask | vector of any type values |
arm_sme.umopa_2way
(arm_sme::UMopa2WayOp) ¶
Unsiged integer sum of 2 outer products and accumulate
Syntax:
operation ::= `arm_sme.umopa_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example:
%result = arm_sme.umopa_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>
Refer to fmopa_2way for a detailed description of 2-way outer products.
Spec | Features |
---|---|
UMOPA (2-way) | +sme2 |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values |
arm_sme.umopa_4way
(arm_sme::UMopa4WayOp) ¶
Unsigned integer sum of 4 outer products and accumulate
Syntax:
operation ::= `arm_sme.umopa_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.umopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.umopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
UMOPA (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.umops_2way
(arm_sme::UMops2WayOp) ¶
Unsiged integer sum of 2 outer products and subtract
Syntax:
operation ::= `arm_sme.umops_2way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example:
%result = arm_sme.umops_2way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[4]x[4]xi32>
Refer to fmopa_2way for a detailed description of 2-way outer products.
Spec | Features |
---|---|
UMOPS (2-way) | +sme2 |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values |
arm_sme.umops_4way
(arm_sme::UMops4WayOp) ¶
Unsigned integer sum of 4 outer products and subtract
Syntax:
operation ::= `arm_sme.umops_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.umops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.umops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
UMOPS (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.usmopa_4way
(arm_sme::UsMopa4WayOp) ¶
Unsigned by signed integer sum of 4 outer products and accumulate
Syntax:
operation ::= `arm_sme.usmopa_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.usmopa_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.usmopa_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
USMOPA (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.usmops_4way
(arm_sme::UsMops4WayOp) ¶
Unsigned by signed integer sum of 4 outer products and subtract
Syntax:
operation ::= `arm_sme.usmops_4way` $lhs `,` $rhs
oilist(
`acc` `` `(` $acc `)`
| `masks` `` `(` $lhsMask `,` $rhsMask `)`
) attr-dict `:` type($lhs) `,` type($rhs) `into` type($result)
Example: I8 to I32
%result = arm_sme.usmops_4way $lhs, $rhs : vector<[16]xi8>, vector<[16]xi8> into vector<[4]x[4]xi32>
Example: I16 to I64
%result = arm_sme.usmops_4way $lhs, $rhs : vector<[8]xi16>, vector<[8]xi16> into vector<[2]x[2]xi64>
Refer to smopa_4way for a detailed description of 4-way outer products.
Spec | Features |
---|---|
USMOPS (4-way) | +sme (32-bit), +sme-i16i64 (64-bit) |
Traits: AlwaysSpeculatableImplTrait
, AttrSizedOperandSegments
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
lhs | of ranks 1scalable vector of 8-bit signless integer values of length 16 or of ranks 1scalable vector of 16-bit signless integer values of length 8 |
rhs | vector of any type values |
lhsMask | vector of any type values |
rhsMask | vector of any type values |
acc | vector of any type values |
Results: ¶
Result | Description |
---|---|
result | vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values |
arm_sme.zero
(arm_sme::ZeroOp) ¶
Creates a zero-initialized value of SME virtual tile type
Syntax:
operation ::= `arm_sme.zero` attr-dict `:` type($res)
Creates a new SME “virtual tile” value within a function. The contents of the tile returned from this operation are zero-initialized.
Example 1: Zero an 8-bit element ZA tile.
%0 = arm_sme.zero : vector<[16]x[16]xi8>
Example 2: Zero a 64-bit element ZA tile.
%0 = arm_sme.zero : vector<[2]x[2]xi64>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArmSMETileOpInterface
, ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | a vector type that fits into a SME tile |
Operations for LLVM IR Intrinsics ¶
arm_sme.intr.cntsb
(arm_sme::aarch64_sme_cntsb) ¶
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.cntsd
(arm_sme::aarch64_sme_cntsd) ¶
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.cntsh
(arm_sme::aarch64_sme_cntsh) ¶
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.cntsw
(arm_sme::aarch64_sme_cntsw) ¶
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.ld1b.horiz
(arm_sme::aarch64_sme_ld1b_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1b.vert
(arm_sme::aarch64_sme_ld1b_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1d.horiz
(arm_sme::aarch64_sme_ld1d_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1d.vert
(arm_sme::aarch64_sme_ld1d_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1h.horiz
(arm_sme::aarch64_sme_ld1h_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1h.vert
(arm_sme::aarch64_sme_ld1h_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1q.horiz
(arm_sme::aarch64_sme_ld1q_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1q.vert
(arm_sme::aarch64_sme_ld1q_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1w.horiz
(arm_sme::aarch64_sme_ld1w_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.ld1w.vert
(arm_sme::aarch64_sme_ld1w_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
load_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.mopa
(arm_sme::aarch64_sme_mopa) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.mopa.wide
(arm_sme::aarch64_sme_mopa_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.mops
(arm_sme::aarch64_sme_mops) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.mops.wide
(arm_sme::aarch64_sme_mops_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.read.horiz
(arm_sme::aarch64_sme_read_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
vector | a vector type that matches the size of a SVE vector |
predicate | a vector type that matches the size of a SVE predicate |
tile_slice_index | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.read.vert
(arm_sme::aarch64_sme_read_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
vector | a vector type that matches the size of a SVE vector |
predicate | a vector type that matches the size of a SVE predicate |
tile_slice_index | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
arm_sme.intr.smopa.wide
(arm_sme::aarch64_sme_smopa_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.smopa.za32
(arm_sme::aarch64_sme_smopa_za32) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.smops.wide
(arm_sme::aarch64_sme_smops_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.smops.za32
(arm_sme::aarch64_sme_smops_za32) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.st1b.horiz
(arm_sme::aarch64_sme_st1b_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1b.vert
(arm_sme::aarch64_sme_st1b_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1d.horiz
(arm_sme::aarch64_sme_st1d_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1d.vert
(arm_sme::aarch64_sme_st1d_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1h.horiz
(arm_sme::aarch64_sme_st1h_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1h.vert
(arm_sme::aarch64_sme_st1h_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1q.horiz
(arm_sme::aarch64_sme_st1q_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1q.vert
(arm_sme::aarch64_sme_st1q_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1w.horiz
(arm_sme::aarch64_sme_st1w_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.st1w.vert
(arm_sme::aarch64_sme_st1w_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
predicate | a vector type that matches the size of a SVE predicate |
store_address | LLVM pointer type |
tile_slice_index | 32-bit signless integer |
arm_sme.intr.str
(arm_sme::aarch64_sme_str) ¶
Operands: ¶
Operand | Description |
---|---|
index | 32-bit signless integer |
store_address | LLVM pointer type |
offset | 32-bit signless integer |
arm_sme.intr.sumopa.wide
(arm_sme::aarch64_sme_sumopa_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.sumops.wide
(arm_sme::aarch64_sme_sumops_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.umopa.wide
(arm_sme::aarch64_sme_umopa_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.umopa.za32
(arm_sme::aarch64_sme_umopa_za32) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.umops.wide
(arm_sme::aarch64_sme_umops_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.umops.za32
(arm_sme::aarch64_sme_umops_za32) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.usmopa.wide
(arm_sme::aarch64_sme_usmopa_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.usmops.wide
(arm_sme::aarch64_sme_usmops_wide) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
lhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
rhs_predicate | a vector type that is a supported predicate for the SME MOP instructions |
lhs_vector | a vector type that is a supported input for the SME MOP instructions |
rhs_vector | a vector type that is a supported input for the SME MOP instructions |
arm_sme.intr.write.horiz
(arm_sme::aarch64_sme_write_horiz) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
tile_slice_index | 32-bit signless integer |
predicate | a vector type that matches the size of a SVE predicate |
vector | a vector type that matches the size of a SVE vector |
arm_sme.intr.write.vert
(arm_sme::aarch64_sme_write_vert) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
tile_slice_index | 32-bit signless integer |
predicate | a vector type that matches the size of a SVE predicate |
vector | a vector type that matches the size of a SVE vector |
arm_sme.intr.zero
(arm_sme::aarch64_sme_zero) ¶
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
tile_mask | ::mlir::IntegerAttr | 32-bit signless integer attribute |