MLIR

Multi-Level IR Compiler Framework

'rocdl' Dialect

Dialect for wrapping LLVM AMDGPU backend intrinsics and attributes

The ROCDL dialect, like the other platform-specific LLVM dialects, serves as the location of wrappers around the AMD-specific intrinsics and attributes in LLVM.

This dialect, like other GPU lowering targets, also contains the infrastructure used by the built-in compilation/offloading framework to compile AMD-specific LLVM IR into binaries.

Dialect inclusion criteria and guidelines

The operations in this dialect are 1:1 wrappers around their corresponding LLVM intrinsics. Operations that do not correspond to intrinsics should not be placed in this dialect.

The definition of a ROCDL op should match its LLVM counterpart. If the argument and result types are fixed, they should be specified as type constraints, including by overriding the default variadic type on LLVM intrinsics by doing a let results in the operation definition.

LLVM attributes do not need to be replicated exactly if it wouldn’t be easy to do so, but pure operations and ones that read/write memory should be annotated as such.

While LLVM intrinsics currently don’t allow constraining the values an any_type can take, it is acceptable (but not required) to impose such constraints if they are known.

When an LLVM intrinsic uses an immarg, this corresponds to an attribute in MLIR.

Human-readable assembly formats (those that, for example, explicitly indicate parameter names) may be used, and are encouraged for intrinsics that have complex argument schemes and don’t have any higher-level wrapper (such as in the amdgpu dialect).

While not all existing operations follow this convention, new operations should generally provide argument and result types except in cases where they are clearly redundant (such as with operations like rocdl.fmed3, which doesn’t need to reiterate the single type at issue multiple times). This convention enhances the readability of low-level IR and prevents programmers from needing to find non-local type information.

Dialect-defined discardable attributes (any attribute starting with rocdl. that has special handling) need to correspond to AMD-specific attributes, metadata, or other entities (such as calling conventions) in LLVM, or be needed for GPU compilation management. Outside of the compilation infrastructure, dialect-specific enums or attributes are extmelely unlikely to be needed and should be avoided.

Operation documentation should specify when the operation was introduced (if relevant) and include usage examples. Operations should have parser/printer tests in mlir/test/Dialect/LLVMIR/rocdl.mlir and lowering tests in mlir/test/Target/LLVMIR/rocdl.mlir.

General documentation (What does this op do?)

While rocdl ops sometimes carry their own documentation, there is no expectation that such documentation will exist (or be kept up to date).

Since ROCDL operations correspond to LLVM intrinsics, the semantics and behavior of these operations can be determined by investigating the documentation for the corresponding intrinsic. This documentation can be found in

  • llvm/docs/AMDGPUUsage.rst and
  • The comments of llvm/include/llvm/IR/IntrinsicsAMDGPU.td, which is where details of the meaning of certain bitfields or of how an intrinsic corresponds to hardware instructions are most likely to be found.

Since many intrinsics are themselves minimal wrappers around hardware instructions, these documentation sources often do not repeat hardware documentation. If an intrinsic appears undocumented, information about its behavior will often be available in published ISA descriptions or (sometimes known as shader programming guides).

If an operation doesn’t provide usage examples, it is likely that they can be found in mlir/test/Dialect/LLVMIR/rocdl.mlir (op syntax and verification) or mlir/test/Target/LLVMIR/rocdl.mlir (translation to LLVM IR).

Operations 

source

rocdl.asyncmark (ROCDL::AsyncmarkOp) 

Mark the end of a group of asynchronous operations

Syntax:

operation ::= `rocdl.asyncmark` attr-dict

This operation, in conjunction with rocdl.wait.asyncmark, forms the compiler-provided framework for tracking explicitly asynchronous memory operations, such as copies to LDS that use async intrinsics and gfx1250’s tensor loads.

Details of its behavior can be found in the LLVM documentation on async tracking.

See rocdl.wait.asyncmark’s documentation for a usage example.

Example:

// Mark the end of an async operation group.
rocdl.asyncmark

Available on gfx9 and later.

rocdl.ballot (ROCDL::BallotOp) 

Vote across thread group

Syntax:

operation ::= `rocdl.ballot` $pred attr-dict `:` type($res)

Ballot provides a bit mask containing the 1-bit predicate value from each lane. The nth bit of the result contains the 1 bit contributed by the nth warp lane.

Example:

// Ballot across thread group.
%0 = rocdl.ballot %pred : i64

Operands: 

OperandDescription
pred1-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.barrier (ROCDL::BarrierOp) 

Syntax:

operation ::= `rocdl.barrier` attr-dict

An operation with the same expansion as HIP’s __synchthreads();

DEPRECATION NOTICE: Use gpu.barrier, which will expand to these operations, instead.

Example:

// Workgroup barrier with acquire/release fences.
rocdl.barrier

rocdl.cluster.id.x (ROCDL::ClusterIdXOp) 

Syntax:

operation ::= `rocdl.cluster.id.x` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cluster.id.y (ROCDL::ClusterIdYOp) 

Syntax:

operation ::= `rocdl.cluster.id.y` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cluster.id.z (ROCDL::ClusterIdZOp) 

Syntax:

operation ::= `rocdl.cluster.id.z` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cluster.load.async.to.lds.b128 (ROCDL::ClusterLoadAsyncToLDSB128Op) 

Syntax:

operation ::= `rocdl.cluster.load.async.to.lds.b128` $globalPtr `,`  $ldsPtr `,` $offset `,` $cpol `,` $mask
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Broadcasts memory load of 128 bits of data for a cluster of workgroups.

Available on gfx1250+.

Example:

// Cluster broadcast 128-bit load to LDS.
rocdl.cluster.load.async.to.lds.b128 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
cpol::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3
mask32-bit signless integer

rocdl.cluster.load.async.to.lds.b32 (ROCDL::ClusterLoadAsyncToLDSB32Op) 

Syntax:

operation ::= `rocdl.cluster.load.async.to.lds.b32` $globalPtr `,`  $ldsPtr `,` $offset `,` $cpol `,` $mask
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Broadcasts memory load of 32 bits of data for a cluster of workgroups.

Available on gfx1250+.

Example:

// Cluster broadcast 32-bit load to LDS.
rocdl.cluster.load.async.to.lds.b32 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
cpol::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3
mask32-bit signless integer

rocdl.cluster.load.async.to.lds.b64 (ROCDL::ClusterLoadAsyncToLDSB64Op) 

Syntax:

operation ::= `rocdl.cluster.load.async.to.lds.b64` $globalPtr `,`  $ldsPtr `,` $offset `,` $cpol `,` $mask
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Broadcasts memory load of 64 bits of data for a cluster of workgroups.

Available on gfx1250+.

Example:

// Cluster broadcast 64-bit load to LDS.
rocdl.cluster.load.async.to.lds.b64 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
cpol::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3
mask32-bit signless integer

rocdl.cluster.load.async.to.lds.b8 (ROCDL::ClusterLoadAsyncToLDSB8Op) 

Syntax:

operation ::= `rocdl.cluster.load.async.to.lds.b8` $globalPtr `,`  $ldsPtr `,` $offset `,` $cpol `,` $mask
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Broadcasts memory load of 8 bits of data for a cluster of workgroups.

Available on gfx1250+.

Example:

// Cluster broadcast 8-bit load to LDS.
rocdl.cluster.load.async.to.lds.b8 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
cpol::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3
mask32-bit signless integer

rocdl.cluster.workgroup.id.x (ROCDL::ClusterWorkgroupIdXOp) 

Syntax:

operation ::= `rocdl.cluster.workgroup.id.x` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cluster.workgroup.id.y (ROCDL::ClusterWorkgroupIdYOp) 

Syntax:

operation ::= `rocdl.cluster.workgroup.id.y` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cluster.workgroup.id.z (ROCDL::ClusterWorkgroupIdZOp) 

Syntax:

operation ::= `rocdl.cluster.workgroup.id.z` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cos (ROCDL::ROCDLCos) 

Syntax:

operation ::= `rocdl.cos` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.cos %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.cvt.f32.bf8 (ROCDL::CvtF32Bf8Op) 

Convert bf8 to f32

Syntax:

operation ::= `rocdl.cvt.f32.bf8` attr-dict $srcA `[` $byteSel `]` `:` type($res)

Convert 8-bit bf8 value from the byteSelth bit of srcA to fp32.

Example:

// Convert bf8 byte 0 to f32.
%0 = rocdl.cvt.f32.bf8 %src[0] : f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
byteSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.f32.fp8 (ROCDL::CvtF32Fp8Op) 

Convert fp8 to f32

Syntax:

operation ::= `rocdl.cvt.f32.fp8` attr-dict $srcA `[` $byteSel `]` `:` type($res)

Convert 8-bit fp8 value from the byteSelth bit of srcA to fp32.

Example:

// Convert fp8 byte 0 to f32.
%0 = rocdl.cvt.f32.fp8 %src[0] : f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
byteSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.pk.bf8.f32 (ROCDL::CvtPkBf8F32Op) 

Convert two f32’s to bf8

Syntax:

operation ::= `rocdl.cvt.pk.bf8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $wordSel `]` `:` type($res)

Convert srcA and srcB to bf8 and store into the low/high word of old, preserving the other word.

Example:

// Pack two f32 values into bf8 in the low word of old.
%0 = rocdl.cvt.pk.bf8.f32 %a, %b -> %old[false] : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
wordSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit float
srcB32-bit float
old32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.pk.f32.bf8 (ROCDL::CvtPkF32Bf8Op) 

Convert packed bf8 to packed f32

Syntax:

operation ::= `rocdl.cvt.pk.f32.bf8` attr-dict $src `[` $wordSel `]` `:` type($res)

Convert src based on $wordSel to packed fp32.

Example:

// Unpack bf8 word to packed f32.
%0 = rocdl.cvt.pk.f32.bf8 %src[false] : vector<2xf32>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
wordSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.pk.f32.fp8 (ROCDL::CvtPkF32Fp8Op) 

Convert packed fp8 to packed f32

Syntax:

operation ::= `rocdl.cvt.pk.f32.fp8` attr-dict $src `[` $wordSel `]` `:` type($res)

Convert src based on $wordSel to packed fp32.

Example:

// Unpack fp8 word to packed f32.
%0 = rocdl.cvt.pk.f32.fp8 %src[false] : vector<2xf32>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
wordSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.pk.fp8.f32 (ROCDL::CvtPkFp8F32Op) 

Convert two f32’s to fp8

Syntax:

operation ::= `rocdl.cvt.pk.fp8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $wordSel `]` `:` type($res)

Convert srcA and srcB to fp8 and store into the low/high word of old, preserving the other word.

Example:

// Pack two f32 values into fp8 in the low word of old.
%0 = rocdl.cvt.pk.fp8.f32 %a, %b -> %old[false] : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
wordSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit float
srcB32-bit float
old32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.pkrtz (ROCDL::CvtPkRtz) 

Convert two f32 input into a vector<2xf16>

Syntax:

operation ::= `rocdl.cvt.pkrtz` attr-dict $srcA `,` $srcB `:` type($res)

Convert two f32 values into a packed vector<2xf16>.

Example:

// Pack two f32 values into a vector<2xf16> with round-to-zero.
%0 = rocdl.cvt.pkrtz %a, %b : vector<2xf16>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcA32-bit float
srcB32-bit float

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.scale.pk16.bf16.bf6 (ROCDL::CvtPkScalePk16Bf16Bf6Op) 

Scales 16 bf6 and converts them to 16 bf16.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.bf16.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 16

rocdl.cvt.scale.pk16.bf16.fp6 (ROCDL::CvtPkScalePk16Bf16Fp6Op) 

Scales 16 fp6 and converts them to 16 bf16.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.bf16.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 16

rocdl.cvt.scale.pk16.f16.bf6 (ROCDL::CvtPkScalePk16F16Bf6Op) 

Scales 16 bf6 and converts them to 16 f16.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.f16.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 16

rocdl.cvt.scale.pk16.f16.fp6 (ROCDL::CvtPkScalePk16F16Fp6Op) 

Scales 16 fp6 and converts them to 16 f16.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.f16.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 16

rocdl.cvt.scale.pk16.f32.bf6 (ROCDL::CvtPkScalePk16F32Bf6Op) 

Scales 16 bf6 and converts them to 16 f32.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.f32.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.cvt.scale.pk16.f32.fp6 (ROCDL::CvtPkScalePk16F32Fp6Op) 

Scales 16 fp6 and converts them to 16 f32.

Syntax:

operation ::= `rocdl.cvt.scale.pk16.f32.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 3
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.cvt.scale.pk8.bf16.bf8 (ROCDL::CvtPkScalePk8Bf16Bf8Op) 

Scales 8 bf8 and converts them to 8 bf16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.bf16.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 8

rocdl.cvt.scale.pk8.bf16.fp4 (ROCDL::CvtPkScalePk8Bf16Fp4Op) 

Scales 8 fp4 and converts them to 8 bf16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.bf16.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 8

rocdl.cvt.scale.pk8.bf16.fp8 (ROCDL::CvtPkScalePk8Bf16Fp8Op) 

Scales 8 fp8 and converts them to 8 bf16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.bf16.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 8

rocdl.cvt.scale.pk8.f16.bf8 (ROCDL::CvtPkScalePk8F16Bf8Op) 

Scales 8 bf8 and converts them to 8 f16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f16.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 8

rocdl.cvt.scale.pk8.f16.fp4 (ROCDL::CvtPkScalePk8F16Fp4Op) 

Scales 8 fp4 and converts them to 8 f16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f16.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 8

rocdl.cvt.scale.pk8.f16.fp8 (ROCDL::CvtPkScalePk8F16Fp8Op) 

Scales 8 fp8 and converts them to 8 f16.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f16.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 8

rocdl.cvt.scale.pk8.f32.bf8 (ROCDL::CvtPkScalePk8F32Bf8Op) 

Scales 8 bf8 and converts them to 8 f32.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f32.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 8

rocdl.cvt.scale.pk8.f32.fp4 (ROCDL::CvtPkScalePk8F32Fp4Op) 

Scales 8 fp4 and converts them to 8 f32.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f32.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 8

rocdl.cvt.scale.pk8.f32.fp8 (ROCDL::CvtPkScalePk8F32Fp8Op) 

Scales 8 fp8 and converts them to 8 f32.

Syntax:

operation ::= `rocdl.cvt.scale.pk8.f32.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)

Available on gfx1250+.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
scaleSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 2
scale32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 8

rocdl.cvt.scalef32.2xpk16.bf6.f32 (ROCDL::CvtScaleF322xPk16Bf6F32Op) 

Scale and convert two vector<16xf32> to 32 packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.2xpk16.bf6.f32` attr-dict $src0 `,` $src1 `,` $scale `:` type($res)

Convert 32 single-precision float values, packed into two length-16 vectors that will be logically concanenated, to packed bf6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
src0fixed-length vector of 32-bit float values of length 16
src1fixed-length vector of 32-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.2xpk16.fp6.f32 (ROCDL::CvtScaleF322xPk16Fp6F32Op) 

Scale and convert two vector<16xf32> to 32 packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.2xpk16.fp6.f32` attr-dict $src0 `,` $src1 `,` $scale `:` type($res)

Convert 32 single-precision float values, packed into two length-16 vectors that will be logically concanenated, to packed fp6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
src0fixed-length vector of 32-bit float values of length 16
src1fixed-length vector of 32-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.f16.bf8 (ROCDL::CvtScaleF32F16Bf8Op) 

Scaled convert bf8 from packed vector to f16, updating tied result

Syntax:

operation ::= `rocdl.cvt.scalef32.f16.bf8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert a bf8 byte from src, selected by srcSelIndex, to f16 while multiplying it by the expontent of scale, and place it into the dstLoHiSelth bit of oldVdst preserving the other element of that vector in the return value.

The bytes are stored as an i32 and not a <4 x i8>.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit float values of length 2
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 2

rocdl.cvt.scalef32.f16.fp8 (ROCDL::CvtScaleF32F16Fp8Op) 

Scaled convert fp8 from packed vector to f16, updating tied result

Syntax:

operation ::= `rocdl.cvt.scalef32.f16.fp8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert a fp8 byte from src, selected by srcSelIndex, to f16 while multiplying it by the expontent of scale, and place it into the dstLoHiSelth bit of oldVdst preserving the other element of that vector in the return value.

The bytes are stored as an i32 and not a <4 x i8>.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit float values of length 2
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 2

rocdl.cvt.scalef32.f32.bf8 (ROCDL::CvtScaleF32F32Bf8Op) 

Scaled convert bf8 from packed vector to f32

Syntax:

operation ::= `rocdl.cvt.scalef32.f32.bf8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)

Convert a bf8 byte from src, selected by srcSelIndex, to f32, multiplying it by the exponent of scale.

The bytes are stored in an i32, not a <4 x i8>.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit float

rocdl.cvt.scalef32.f32.fp8 (ROCDL::CvtScaleF32F32Fp8Op) 

Scaled convert fp8 from packed vector to f32

Syntax:

operation ::= `rocdl.cvt.scalef32.f32.fp8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)

Convert a fp8 byte from src, selected by srcSelIndex, to f32, multiplying it by the exponent of scale.

The bytes are stored in an i32, not a <4 x i8>.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit float

rocdl.cvt.scalef32.pk.bf16.bf8 (ROCDL::CvtScaleF32PkBf16Bf8Op) 

Scaled convert two bf8to two bf16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf16.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed bf8 values in src0 to two bf16 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 2

rocdl.cvt.scalef32.pk.bf16.fp4 (ROCDL::CvtScaleF32PkBf16Fp4Op) 

Scale and convert two packed fp4 to packed bf16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf16.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)

Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed bf16, multiplying by the exponent part of scale before doing so.

The byte to convert is chosen by srcSelIndex.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 2

rocdl.cvt.scalef32.pk.bf16.fp8 (ROCDL::CvtScaleF32PkBf16Fp8Op) 

Scaled convert two fp8to two bf16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf16.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed fp8 values in src0 to two bf16 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 2

rocdl.cvt.scalef32.pk.bf8.bf16 (ROCDL::CvtScaleF32PkBf8Bf16Op) 

Scaled convert two bf16to two bf8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf8.bf16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two bf16 values in src0 to two bf8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src0fixed-length vector of bfloat16 type values of length 2
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk.bf8.f16 (ROCDL::CvtScaleF32PkBf8F16Op) 

Scaled convert two f16to two bf8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf8.f16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two f16 values in src0 to two bf8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src0fixed-length vector of 16-bit float values of length 2
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk.bf8.f32 (ROCDL::CvtScaleF32PkBf8F32Op) 

Scaled convert two f32 to two bf8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.bf8.f32` attr-dict  $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two f32 values in src0 and src1 to two bf8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src032-bit float
src132-bit float
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk.f16.bf8 (ROCDL::CvtScaleF32PkF16Bf8Op) 

Scaled convert two bf8to two f16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f16.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed bf8 values in src0 to two f16 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 2

rocdl.cvt.scalef32.pk.f16.fp4 (ROCDL::CvtScaleF32PkF16Fp4Op) 

Scale and convert two packed fp4 to packed f16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f16.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)

Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed f16, multiplying by the exponent part of scale before doing so.

The byte to convert is chosen by srcSelIndex.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 2

rocdl.cvt.scalef32.pk.f16.fp8 (ROCDL::CvtScaleF32PkF16Fp8Op) 

Scaled convert two fp8to two f16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f16.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed fp8 values in src0 to two f16 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 2

rocdl.cvt.scalef32.pk.f32.bf8 (ROCDL::CvtScaleF32PkF32Bf8Op) 

Scaled convert two bf8to two f32

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f32.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed bf8 values in src0 to two f32 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 2

rocdl.cvt.scalef32.pk.f32.fp4 (ROCDL::CvtScaleF32PkF32Fp4Op) 

Scale and convert two packed fp4 to packed f32

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f32.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)

Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed f32, multiplying by the exponent part of scale before doing so.

The byte to convert is chosen by srcSelIndex.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 2

rocdl.cvt.scalef32.pk.f32.fp8 (ROCDL::CvtScaleF32PkF32Fp8Op) 

Scaled convert two fp8to two f32

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.f32.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)

Convert two packed fp8 values in src0 to two f32 values, multiplying by the exponent in scale. The two values to be converted are selected from the low or high half of src (a packed vector represented as an i32) on the basis of srcLoHiSel.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
srcLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
src32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 2

rocdl.cvt.scalef32.pk.fp4.bf16 (ROCDL::CvtScaleF32PkFp4Bf16Op) 

Scale and convert two bf16 to packed fp4, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp4.bf16` attr-dict $src `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two packed bf16 values to packed fp4, dividing by the exponent part of scale before doing so.

The two scaled values are packed into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
srcfixed-length vector of bfloat16 type values of length 2
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk.fp4.f16 (ROCDL::CvtScaleF32PkFp4F16Op) 

Scale and convert two f16 to packed fp4, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp4.f16` attr-dict $src `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two packed f16 values to packed fp4, dividing by the exponent part of scale before doing so.

The two scaled values are packed into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
srcfixed-length vector of 16-bit float values of length 2
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk.fp4.f32 (ROCDL::CvtScaleF32PkFp4F32Op) 

Scale and convert two f32 values to two packed fp4, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp4.f32` attr-dict $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two single-precision float values, passed in src0 and src1 into two fp4 values, dividing them by the expontent part of scale before doing so.

The two scaled values are packed into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Example:

// Scaled convert two f32 values to packed fp4 in byte 0 of old.
%0 = rocdl.cvt.scalef32.pk.fp4.f32 %a, %b, %scale -> %old[0] : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src032-bit float
src132-bit float
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk.fp8.bf16 (ROCDL::CvtScaleF32PkFp8Bf16Op) 

Scaled convert two bf16to two fp8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp8.bf16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two bf16 values in src0 to two fp8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src0fixed-length vector of bfloat16 type values of length 2
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk.fp8.f16 (ROCDL::CvtScaleF32PkFp8F16Op) 

Scaled convert two f16to two fp8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp8.f16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two f16 values in src0 to two fp8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src0fixed-length vector of 16-bit float values of length 2
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk.fp8.f32 (ROCDL::CvtScaleF32PkFp8F32Op) 

Scaled convert two f32 to two fp8, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.pk.fp8.f32` attr-dict  $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)

Convert two f32 values in src0 and src1 to two fp8 bytes, dividing by the exponent in scale. The bytes are packed into a 16-bit value which is inserted into oldVdst at the dstLoHiSel position, with the entire updated vector being returned.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstLoHiSel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldVdstfixed-length vector of 16-bit signless integer values of length 2
src032-bit float
src132-bit float
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit signless integer values of length 2

rocdl.cvt.scalef32.pk16.bf6.bf16 (ROCDL::CvtScaleF32Pk16Bf6Bf16Op) 

Scale and convert packed bf16 to packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.bf6.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk16.bf6.f16 (ROCDL::CvtScaleF32Pk16Bf6F16Op) 

Scale and convert packed f16 to packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.bf6.f16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk16.bf6.f32 (ROCDL::CvtScaleF32Pk16Bf6F32Op) 

Scale and convert packed f32 to packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.bf6.f32` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk16.fp6.bf16 (ROCDL::CvtScaleF32Pk16Fp6Bf16Op) 

Scale and convert packed bf16 to packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.fp6.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk16.fp6.f16 (ROCDL::CvtScaleF32Pk16Fp6F16Op) 

Scale and convert packed f16 to packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.fp6.f16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk16.fp6.f32 (ROCDL::CvtScaleF32Pk16Fp6F32Op) 

Scale and convert packed f32 to packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk16.fp6.f32` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 16
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.pk32.bf16.bf6 (ROCDL::CvtScaleF32Pk32Bf16Bf6Op) 

Scale and convert packed bf6 to packed bf16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.bf16.bf6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed bf6 values to packed bf16, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 32

rocdl.cvt.scalef32.pk32.bf16.fp6 (ROCDL::CvtScaleF32Pk32Bf16Fp6Op) 

Scale and convert packed fp6 to packed bf16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.bf16.fp6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed fp6 values to packed bf16, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of bfloat16 type values of length 32

rocdl.cvt.scalef32.pk32.bf6.bf16 (ROCDL::CvtScaleF32Pk32Bf6Bf16Op) 

Scale and convert packed bf16 to packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.bf6.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 32
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.pk32.bf6.f16 (ROCDL::CvtScaleF32Pk32Bf6F16Op) 

Scale and convert packed f16 to packed bf6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.bf6.f16` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed f16 values to packed bf6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 32
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.pk32.f16.bf6 (ROCDL::CvtScaleF32Pk32F16Bf6Op) 

Scale and convert packed bf6 to packed f16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.f16.bf6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed bf6 values to packed f16, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 32

rocdl.cvt.scalef32.pk32.f16.fp6 (ROCDL::CvtScaleF32Pk32F16Fp6Op) 

Scale and convert packed fp6 to packed f16

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.f16.fp6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed fp6 values to packed f16, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 16-bit float values of length 32

rocdl.cvt.scalef32.pk32.f32.bf6 (ROCDL::CvtScaleF32Pk32F32Bf6Op) 

Scale and convert packed bf6 to packed f32

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.f32.bf6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed bf6 values to packed f32, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.cvt.scalef32.pk32.f32.fp6 (ROCDL::CvtScaleF32Pk32F32Fp6Op) 

Scale and convert packed fp6 to packed f32

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.f32.fp6` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed fp6 values to packed f32, multiplying by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit signless integer values of length 6
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.cvt.scalef32.pk32.fp6.bf16 (ROCDL::CvtScaleF32Pk32Fp6Bf16Op) 

Scale and convert packed bf16 to packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.fp6.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 32
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.pk32.fp6.f16 (ROCDL::CvtScaleF32Pk32Fp6F16Op) 

Scale and convert packed f16 to packed fp6

Syntax:

operation ::= `rocdl.cvt.scalef32.pk32.fp6.f16` attr-dict $src `,` $scale `:` type($res)

Convert 32 packed f16 values to packed fp6, dividing by the exponent part of scale before doing so.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 32
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.pk8.bf8.bf16 (ROCDL::CvtScaleF32Pk8Bf8Bf16Op) 

Scale and convert packed bf16 to packed bf8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.bf8.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.pk8.bf8.f16 (ROCDL::CvtScaleF32Pk8Bf8F16Op) 

Scale and convert packed f16 to packed bf8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.bf8.f16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.pk8.bf8.f32 (ROCDL::CvtScaleF32Pk8Bf8F32Op) 

Scale and convert packed f32 to packed bf8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.bf8.f32` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.pk8.fp4.bf16 (ROCDL::CvtScaleF32Pk8Fp4Bf16Op) 

Scale and convert packed bf16 to packed fp4

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp4.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk8.fp4.f16 (ROCDL::CvtScaleF32Pk8Fp4F16Op) 

Scale and convert packed f16 to packed fp4

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp4.f16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk8.fp4.f32 (ROCDL::CvtScaleF32Pk8Fp4F32Op) 

Scale and convert packed f32 to packed fp4

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp4.f32` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.pk8.fp8.bf16 (ROCDL::CvtScaleF32Pk8Fp8Bf16Op) 

Scale and convert packed bf16 to packed fp8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp8.bf16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.pk8.fp8.f16 (ROCDL::CvtScaleF32Pk8Fp8F16Op) 

Scale and convert packed f16 to packed fp8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp8.f16` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.pk8.fp8.f32 (ROCDL::CvtScaleF32Pk8Fp8F32Op) 

Scale and convert packed f32 to packed fp8

Syntax:

operation ::= `rocdl.cvt.scalef32.pk8.fp8.f32` attr-dict $src `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of scale before doing so. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.bf8.bf16 (ROCDL::CvtScaleF32SrBf8BF16Op) 

Scaled convert bf16to bf8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.bf8.bf16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a bf16 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src0bfloat16 type
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.bf8.f16 (ROCDL::CvtScaleF32SrBf8F16Op) 

Scaled convert f16to bf8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.bf8.f16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a f16 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src016-bit float
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.bf8.f32 (ROCDL::CvtScaleF32SrBf8F32Op) 

Scaled convert f32to bf8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.bf8.f32` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a f32 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src032-bit float
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.fp8.bf16 (ROCDL::CvtScaleF32SrFp8BF16Op) 

Scaled convert bf16to fp8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.fp8.bf16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a bf16 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src0bfloat16 type
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.fp8.f16 (ROCDL::CvtScaleF32SrFp8F16Op) 

Scaled convert f16to fp8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.fp8.f16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a f16 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src016-bit float
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.fp8.f32 (ROCDL::CvtScaleF32SrFp8F32Op) 

Scaled convert f32to fp8 with stochiastic rounding, updating packed vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.fp8.f32` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert a f32 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed for stochiastic rounding. Place the resulting byte in the dstSelIndexth bit of oldVdst and return the entire packed vector, which is stored as an i32.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
src032-bit float
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk.fp4.bf16 (ROCDL::CvtScaleF32SrPkFp4Bf16Op) 

Scale and convert two bf16 to packed fp4 with stochiastic rounding, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.bf16` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two packed bf16 values to packed fp4, dividing by the exponent part of scale before doing so and using seed as the random seed for stochiastic rounding.

The two scaled values are packed (little-endian) into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
srcfixed-length vector of bfloat16 type values of length 2
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk.fp4.f16 (ROCDL::CvtScaleF32SrPkFp4F16Op) 

Scale and convert two f16 to packed fp4 with stochiastic rounding, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.f16` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two packed f16 values to packed fp4, dividing by the exponent part of scale before doing so and using seed as the random seed for stochiastic rounding.

The two scaled values are packed (little-endian) into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
srcfixed-length vector of 16-bit float values of length 2
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk.fp4.f32 (ROCDL::CvtScaleF32SrPkFp4F32Op) 

Scale and convert two f32 to packed fp4 with stochiastic rounding, updating tied vector

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.f32` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)

Convert two packed f32 values to packed fp4, dividing by the exponent part of scale before doing so and using seed as the random seed for stochiastic rounding.

The two scaled values are packed (little-endian) into a byte. That byte is used to update the dstSelIndexth byte of oldVdst, which is returned in its entirity.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dstSelIndex::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
oldVdst32-bit signless integer
srcfixed-length vector of 32-bit float values of length 2
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk16.bf6.bf16 (ROCDL::CvtScaleF32SrPk16Bf6Bf16Op) 

Scale and convert packed bf16 to packed bf6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk16.bf6.f16 (ROCDL::CvtScaleF32SrPk16Bf6F16Op) 

Scale and convert packed f16 to packed bf6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk16.bf6.f32 (ROCDL::CvtScaleF32SrPk16Bf6F32Op) 

Scale and convert packed f32 to packed bf6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk16.fp6.bf16 (ROCDL::CvtScaleF32SrPk16Fp6Bf16Op) 

Scale and convert packed bf16 to packed fp6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk16.fp6.f16 (ROCDL::CvtScaleF32SrPk16Fp6F16Op) 

Scale and convert packed f16 to packed fp6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk16.fp6.f32 (ROCDL::CvtScaleF32SrPk16Fp6F32Op) 

Scale and convert packed f32 to packed fp6 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 16
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 3

rocdl.cvt.scalef32.sr.pk32.bf6.bf16 (ROCDL::CvtScaleF32SrPk32Bf6Bf16Op) 

Scale and convert packed bf16 to packed bf6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk32.bf6.f16 (ROCDL::CvtScaleF32SrPk32Bf6F16Op) 

Scale and convert packed f16 to packed bf6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed f16 values to packed bf6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk32.bf6.f32 (ROCDL::CvtScaleF32SrPk32Bf6F32Op) 

Scale and convert packed f32 to packed bf6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed f32 values to packed bf6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk32.fp6.bf16 (ROCDL::CvtScaleF32SrPk32Fp6Bf16Op) 

Scale and convert packed bf16 to packed fp6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk32.fp6.f16 (ROCDL::CvtScaleF32SrPk32Fp6F16Op) 

Scale and convert packed f16 to packed fp6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed f16 values to packed fp6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk32.fp6.f32 (ROCDL::CvtScaleF32SrPk32Fp6F32Op) 

Scale and convert packed f32 to packed fp6 with stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 32 packed f32 values to packed fp6, dividing by the exponent part of scale before doing so and applying random rounding derived from seed.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 32
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 6

rocdl.cvt.scalef32.sr.pk8.bf8.bf16 (ROCDL::CvtScaleF32SrPk8Bf8Bf16Op) 

Scale and convert packed bf16 to packed bf8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.pk8.bf8.f16 (ROCDL::CvtScaleF32SrPk8Bf8F16Op) 

Scale and convert packed f16 to packed bf8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.pk8.bf8.f32 (ROCDL::CvtScaleF32SrPk8Bf8F32Op) 

Scale and convert packed f32 to packed bf8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.pk8.fp4.bf16 (ROCDL::CvtScaleF32SrPk8Fp4Bf16Op) 

Scale and convert packed bf16 to packed fp4 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk8.fp4.f16 (ROCDL::CvtScaleF32SrPk8Fp4F16Op) 

Scale and convert packed f16 to packed fp4 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk8.fp4.f32 (ROCDL::CvtScaleF32SrPk8Fp4F32Op) 

Scale and convert packed f32 to packed fp4 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
res32-bit signless integer

rocdl.cvt.scalef32.sr.pk8.fp8.bf16 (ROCDL::CvtScaleF32SrPk8Fp8Bf16Op) 

Scale and convert packed bf16 to packed fp8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of bfloat16 type values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.pk8.fp8.f16 (ROCDL::CvtScaleF32SrPk8Fp8F16Op) 

Scale and convert packed f16 to packed fp8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 16-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.scalef32.sr.pk8.fp8.f32 (ROCDL::CvtScaleF32SrPk8Fp8F32Op) 

Scale and convert packed f32 to packed fp8 with stochastic rounding

Syntax:

operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)

Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of scale before doing so and apply stochastic rounding. This op is for gfx1250+ arch.

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
srcfixed-length vector of 32-bit float values of length 8
seed32-bit signless integer
scale32-bit float

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 2

rocdl.cvt.sr.bf8.f32 (ROCDL::CvtSrBf8F32Op) 

Convert f32 to bf8, stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.sr.bf8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $byteSel `]` `:` type($res)

Convert srcA to bf8, adding the rounding factor from srcB, and store into the byteSelth byte of old, preserving the others.

Example:

// Stochastic rounding convert f32 to bf8 in byte 2 of old.
%0 = rocdl.cvt.sr.bf8.f32 %val, %stoch -> %old[2] : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
byteSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit float
srcB32-bit signless integer
old32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.cvt.sr.fp8.f32 (ROCDL::CvtSrFp8F32Op) 

Convert f32 to fp8, stochiastic rounding

Syntax:

operation ::= `rocdl.cvt.sr.fp8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $byteSel `]` `:` type($res)

Convert srcA to fp8, adding the rounding factor from srcB, and store into the byteSelth byte of old, preserving the others.

Example:

// Stochastic rounding convert f32 to fp8 in byte 3 of old.
%0 = rocdl.cvt.sr.fp8.f32 %val, %stoch -> %old[3] : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
byteSel::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
srcA32-bit float
srcB32-bit signless integer
old32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.atomic.async.barrier.arrive.b64 (ROCDL::DsAtomicAsyncBarrierArriveOp) 

Syntax:

operation ::= `rocdl.ds.atomic.async.barrier.arrive.b64` $barrierPtr attr-dict `:` qualified(type($barrierPtr))

Waits on a given DS barrier and decrements pending count by -1. Stays in order with ASYNC loads to LDS, and uses ASYNCcnt to track its completion. Available on gfx1250+.

Example:

// Async atomic barrier arrive (fire-and-forget).
rocdl.ds.atomic.async.barrier.arrive.b64 %ptr : !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
barrierPtrLLVM pointer in address space 3

rocdl.ds.atomic.barrier.arrive.rtn.b64 (ROCDL::DsAtomicBarrierArriveRtnOp) 

Syntax:

operation ::= `rocdl.ds.atomic.barrier.arrive.rtn.b64` $barrierPtr `,` $val attr-dict `:` qualified(type($barrierPtr)) `,` type($val) `->` type($res)

Waits on a given DS barrier and decrements its pending count by a given value. Note, the barrier state is given as a 64-bit structure containing pending count, phase and init count. The op returns the old barrier state. The op is executed as an ordinary LDS operations and it is ordered with other LDS operations. Thus, check DSCNT to determine when this instruction has executed. Available on gfx1250+.

Example:

// Atomic barrier arrive with return of old barrier state.
%res = rocdl.ds.atomic.barrier.arrive.rtn.b64 %ptr, %val : !llvm.ptr<3>, i64 -> i64

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
barrierPtrLLVM pointer in address space 3
val64-bit signless integer

Results: 

ResultDescription
res64-bit signless integer

rocdl.ds.load.tr16.b128 (ROCDL::DsLoadTr16_B128) 

Loads and transposes a matrix from ds memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.ds.load.tr16.b128` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 16-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.load.tr4.b64 (ROCDL::DsLoadTr4_B64) 

Loads and transposes a matrix from ds memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.ds.load.tr4.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 4-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.load.tr6.b96 (ROCDL::DsLoadTr6_B96) 

Loads and transposes a matrix from ds memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.ds.load.tr6.b96` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 6-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.load.tr8.b64 (ROCDL::DsLoadTr8_B64) 

Loads and transposes a matrix from ds memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.ds.load.tr8.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 8-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.read.tr16.b64 (ROCDL::ds_read_tr16_b64) 

Syntax:

operation ::= `rocdl.ds.read.tr16.b64` $ptr attr-dict `:` type($ptr) `->` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.read.tr4.b64 (ROCDL::ds_read_tr4_b64) 

Syntax:

operation ::= `rocdl.ds.read.tr4.b64` $ptr attr-dict `:` type($ptr) `->` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.read.tr6.b96 (ROCDL::ds_read_tr6_b96) 

Syntax:

operation ::= `rocdl.ds.read.tr6.b96` $ptr attr-dict `:` type($ptr) `->` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds.read.tr8.b64 (ROCDL::ds_read_tr8_b64) 

Syntax:

operation ::= `rocdl.ds.read.tr8.b64` $ptr attr-dict `:` type($ptr) `->` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.ds_bpermute (ROCDL::DsBpermuteOp) 

Syntax:

operation ::= `rocdl.ds_bpermute` $index `,` $src  attr-dict `:` `(` type($index) `,` type($src) `)` `->` type($res)

Perform a backward permute (pull) operation across lanes using DS/LDS permute hardware.

Each lane reads the value of src from the lane whose byte address is given by index (i.e. lane id = index / 4).

This is “backward” (pull) in contrast to ds_permute_b32, which is “forward” (push/scatter).

Example:

// Backward permute across lanes (pull from selected lane).
%0 = rocdl.ds_bpermute %index, %src : (i32, i32) -> i32

Operands: 

OperandDescription
index32-bit signless integer
src32-bit signless integer

Results: 

ResultDescription
res32-bit signless integer

rocdl.ds_swizzle (ROCDL::DsSwizzleOp) 

Syntax:

operation ::= `rocdl.ds_swizzle` $src `,` $offset  attr-dict `:` `(` type($src) `,` type($offset) `)` `->` type($res)

Perform a data-sharing swizzle operation within a wavefront.

The offset operand encodes the swizzle pattern that will be placed in the instruction’s offset field (i.e., the pattern used by ds_swizzle_b32). See https://llvm.org/docs/AMDGPUModifierSyntax.html#swizzle-pattern for how this 16-bit pattern is constructed.

Example:

// Swizzle data within a wavefront.
%0 = rocdl.ds_swizzle %src, %offset : (i32, i32) -> i32

Operands: 

OperandDescription
src32-bit signless integer
offset32-bit signless integer

Results: 

ResultDescription
res32-bit signless integer

rocdl.exp (ROCDL::ROCDLExp) 

Syntax:

operation ::= `rocdl.exp` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.exp %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.exp2 (ROCDL::ROCDLExp2) 

Syntax:

operation ::= `rocdl.exp2` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.exp2 %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.flat.prefetch (ROCDL::FlatPrefetchOp) 

Syntax:

operation ::= `rocdl.flat.prefetch` $ptr `,` `scope` $scope attr-dict `:` qualified(type($ptr))

Prefetches 1 byte of data per lane using flat-memory addresses into the WGP-cache or L2-cache. Available on gfx1250+.

Example:

// Prefetch from flat memory into cache.
rocdl.flat.prefetch %ptr, scope 0 : !llvm.ptr

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
scope::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 0

rocdl.fmed3 (ROCDL::FMed3Op) 

Median of three float/half values

Syntax:

operation ::= `rocdl.fmed3` $src0 `,` $src1 `,` $src2 attr-dict `:` type($res)

Computes the median of three floating-point values using the AMDGPU fmed3 intrinsic. This operation is equivalent to max(min(a, b), min(max(a, b), c)) but uses the hardware-accelerated V_MED3_F16/V_MED3_F32 instruction for better performance.

The operation supports both scalar and vector floating-point types (f16, f32).

Example:

// Scalar f32 median
%result = rocdl.fmed3 %a, %b, %c : f32

// Vector f16 median
%result = rocdl.fmed3 %va, %vb, %vc : vector<4xf16>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
src0floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type
src1floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type
src2floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type

rocdl.global.load.async.lds (ROCDL::GlobalLoadAsyncLDSOp) 

Version of rocdl.load.async.to.lds specialized to global pointers

Syntax:

operation ::= `rocdl.global.load.async.lds` $globalPtr `,`  $ldsPtr `,` $size `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

This operation works identically to rocdl.load.async.to.lds except that the global pointer argument is limited to pointers in address space 1 (pure global pointers) instead of also allowing fat buffer pointers.

Available on gfx9 and gfx10.

For the operation introduced in gfx1250, see rocdl.global.load.async.to.lds.bN.

Example:

// Async load from global pointer to LDS (address space 1 only).
rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.async.to.lds.b128 (ROCDL::GlobalLoadAsyncToLDSB128Op) 

Syntax:

operation ::= `rocdl.global.load.async.to.lds.b128` $globalPtr `,`  $ldsPtr `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Asynchronously loads 128 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.

Available on gfx1250+.

Example:

// Async 128-bit load from global to LDS.
rocdl.global.load.async.to.lds.b128 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.async.to.lds.b32 (ROCDL::GlobalLoadAsyncToLDSB32Op) 

Syntax:

operation ::= `rocdl.global.load.async.to.lds.b32` $globalPtr `,`  $ldsPtr `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Asynchronously loads 32 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.

Available on gfx1250+.

Example:

// Async 32-bit load from global to LDS.
rocdl.global.load.async.to.lds.b32 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.async.to.lds.b64 (ROCDL::GlobalLoadAsyncToLDSB64Op) 

Syntax:

operation ::= `rocdl.global.load.async.to.lds.b64` $globalPtr `,`  $ldsPtr `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Asynchronously loads 64 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.

Available on gfx1250+.

Example:

// Async 64-bit load from global to LDS.
rocdl.global.load.async.to.lds.b64 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.async.to.lds.b8 (ROCDL::GlobalLoadAsyncToLDSB8Op) 

Syntax:

operation ::= `rocdl.global.load.async.to.lds.b8` $globalPtr `,`  $ldsPtr `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Asynchronously loads 8 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.

Available on gfx1250+.

Example:

// Async 8-bit load from global to LDS.
rocdl.global.load.async.to.lds.b8 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.lds (ROCDL::GlobalLoadLDSOp) 

Syntax:

operation ::= `rocdl.global.load.lds` $globalPtr `,`  $ldsPtr `,` $size `,` $offset `,` $aux
              attr-dict

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer in address space 1
ldsPtrLLVM pointer in address space 3

rocdl.global.load.tr.b128 (ROCDL::GlobalLoadTr8_B128) 

Loads and transposes a matrix from global memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.global.load.tr.b128` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 16-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 1

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.global.load.tr.b64 (ROCDL::GlobalLoadTr8_B64) 

Loads and transposes a matrix from global memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.global.load.tr.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 8-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 1

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.global.load.tr4.b64 (ROCDL::GlobalLoadTr4_B64) 

Loads and transposes a matrix from global memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.global.load.tr4.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 4-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 1

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.global.load.tr6.b96 (ROCDL::GlobalLoadTr6_B96) 

Loads and transposes a matrix from global memory to registers (available in gfx1250+).

Syntax:

operation ::= `rocdl.global.load.tr6.b96` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Load a matrix of 6-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.

Available in gfx1250+.

Example (concrete mnemonics depend on address space and element size):

// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>

// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>

// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>

// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 1

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.global.prefetch (ROCDL::GlobalPrefetchOp) 

Syntax:

operation ::= `rocdl.global.prefetch` $ptr `,` `scope` $scope attr-dict `:` qualified(type($ptr))

Prefetches 1 byte of data per lane from global memory into the WGP-cache or L2-cache. Available on gfx1250+.

Example:

// Prefetch from global memory into cache.
rocdl.global.prefetch %ptr, scope 0 : !llvm.ptr<1>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
scope::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
ptrLLVM pointer in address space 1

rocdl.grid.dim.x (ROCDL::GridDimXOp) 

Syntax:

operation ::= `rocdl.grid.dim.x` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.grid.dim.y (ROCDL::GridDimYOp) 

Syntax:

operation ::= `rocdl.grid.dim.y` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.grid.dim.z (ROCDL::GridDimZOp) 

Syntax:

operation ::= `rocdl.grid.dim.z` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.iglp.opt (ROCDL::IglpOpt) 

Syntax:

operation ::= `rocdl.iglp.opt` $variant attr-dict

Instruction-group-level parallelism optimization hint.

Example:

// IGLP optimization hint variant 0.
rocdl.iglp.opt 0

Attributes: 

AttributeMLIR TypeDescription
variant::mlir::IntegerAttr32-bit signless integer attribute

rocdl.load.async.to.lds (ROCDL::LoadAsyncToLDSOp) 

Gathering load to LDS that requires explicit async memory tracking

Syntax:

operation ::= `rocdl.load.async.to.lds` $globalPtr `,`  $ldsPtr `,` $size `,` $offset `,` $aux
              attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))

Load size bytes (the valid sizes vary by architecture) from the global memory pointed to by globalPtr and put them at ldsPtr, concantenating (and applying padding for sizes less than 4 bytes, along with padding out 12-byte reads to 16-byte writes). The value of globalPtr can vary between lanes, while sharedPtr must be subgroup-uniform (the values from each lane are concatentated before being written to LDS with appropriate padding applied.)

offset is a constant offset applied to both pointers, and aux sets the cache policy. Unlike rocdl.load.to.lds, the compiler will not automatically inserts waits for this load to complete at the point it thinks you’re using a region of LDS you’ve stored values to - you need to use the rocdl.asyncmark and rocdl.wait.asyncmark operations to explicitly group these operations and wait for their completion.

Available on gfx10 and earlier with varying suppported values of size.

Example:

// Async load 4 bytes from global pointer to LDS.
rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>

// Async load 4 bytes from fat buffer pointer to LDS.
rocdl.load.async.to.lds %fatBuffer, %shared, 4, 0, 0 : !llvm.ptr<7>, !llvm.ptr<3>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer type
ldsPtrLLVM pointer in address space 3

rocdl.load.to.lds (ROCDL::LoadToLDSOp) 

Syntax:

operation ::= `rocdl.load.to.lds` $globalPtr `,`  $ldsPtr `,` $size `,` $offset `,` $aux
              attr-dict `:` type($globalPtr)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
offset::mlir::IntegerAttr32-bit signless integer attribute
aux::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
globalPtrLLVM pointer type
ldsPtrLLVM pointer in address space 3

rocdl.log (ROCDL::ROCDLLog) 

Syntax:

operation ::= `rocdl.log` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.log %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.make.buffer.rsrc (ROCDL::MakeBufferRsrcOp) 

Syntax:

operation ::= `rocdl.make.buffer.rsrc` operands attr-dict `:` type($base) `to` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
baseLLVM pointer type
stride16-bit signless integer
numRecords64-bit signless integer
flags32-bit signless integer

Results: 

ResultDescription
resLLVM pointer type

rocdl.mbcnt.hi (ROCDL::MbcntHiOp) 

Syntax:

operation ::= `rocdl.mbcnt.hi` $in0 `,` $in1  attr-dict `:` `(` type($in0) `,` type($in1) `)` `->` type($res)

Masked bit count of threads below the current lane in a wavefront.

in0 is a 32-bit mask that is AND-ed with the relevant half of the execution mask and the bits below the current lane; in1 is added to the resulting popcount:

  • lo: in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1))
  • hi: in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))

To obtain a unique thread index within a wave64, chain the two ops with in0 = -1 (all bits set):

Example:

%all_ones = arith.constant -1 : i32
%zero = arith.constant 0 : i32

// Count active threads below this lane in the low 32 lanes.
%lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32

// Add the count from the high 32 lanes to get the full lane index.
%hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArgAndResultAttrsOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
arg_attrs::mlir::ArrayAttrArray of dictionary attributes
res_attrs::mlir::ArrayAttrArray of dictionary attributes

Operands: 

OperandDescription
in032-bit signless integer
in132-bit signless integer

Results: 

ResultDescription
res32-bit signless integer

rocdl.mbcnt.lo (ROCDL::MbcntLoOp) 

Syntax:

operation ::= `rocdl.mbcnt.lo` $in0 `,` $in1  attr-dict `:` `(` type($in0) `,` type($in1) `)` `->` type($res)

Masked bit count of threads below the current lane in a wavefront.

in0 is a 32-bit mask that is AND-ed with the relevant half of the execution mask and the bits below the current lane; in1 is added to the resulting popcount:

  • lo: in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1))
  • hi: in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))

To obtain a unique thread index within a wave64, chain the two ops with in0 = -1 (all bits set):

Example:

%all_ones = arith.constant -1 : i32
%zero = arith.constant 0 : i32

// Count active threads below this lane in the low 32 lanes.
%lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32

// Add the count from the high 32 lanes to get the full lane index.
%hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ArgAndResultAttrsOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
arg_attrs::mlir::ArrayAttrArray of dictionary attributes
res_attrs::mlir::ArrayAttrArray of dictionary attributes

Operands: 

OperandDescription
in032-bit signless integer
in132-bit signless integer

Results: 

ResultDescription
res32-bit signless integer

rocdl.mfma.f32.16x16x16bf16.1k (ROCDL::mfma_f32_16x16x16bf16_1k) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x16bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x16f16 (ROCDL::mfma_f32_16x16x16f16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x16f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 4
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x1f32 (ROCDL::mfma_f32_16x16x1f32) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float
b32-bit float
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.16x16x2bf16 (ROCDL::mfma_f32_16x16x2bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 2
bfixed-length vector of 16-bit signless integer values of length 2
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.16x16x32.bf16 (ROCDL::mfma_f32_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of bfloat16 type values of length 8
bfixed-length vector of bfloat16 type values of length 8
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x32.bf8.bf8 (ROCDL::mfma_f32_16x16x32_bf8_bf8) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.bf8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x32.bf8.fp8 (ROCDL::mfma_f32_16x16x32_bf8_fp8) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.bf8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x32.f16 (ROCDL::mfma_f32_16x16x32_f16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 8
bfixed-length vector of 16-bit float values of length 8
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x32.fp8.bf8 (ROCDL::mfma_f32_16x16x32_fp8_bf8) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.fp8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x32.fp8.fp8 (ROCDL::mfma_f32_16x16x32_fp8_fp8) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x32.fp8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x4bf16.1k (ROCDL::mfma_f32_16x16x4bf16_1k) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.16x16x4f16 (ROCDL::mfma_f32_16x16x4f16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 4
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.16x16x4f32 (ROCDL::mfma_f32_16x16x4f32) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x4f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float
b32-bit float
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x8.xf32 (ROCDL::mfma_f32_16x16x8_xf32) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x8.xf32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit float values of length 2
bfixed-length vector of 32-bit float values of length 2
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.16x16x8bf16 (ROCDL::mfma_f32_16x16x8bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.16x16x8bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 2
bfixed-length vector of 16-bit signless integer values of length 2
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.32x32x16.bf16 (ROCDL::mfma_f32_32x32x16_bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of bfloat16 type values of length 8
bfixed-length vector of bfloat16 type values of length 8
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x16.bf8.bf8 (ROCDL::mfma_f32_32x32x16_bf8_bf8) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.bf8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x16.bf8.fp8 (ROCDL::mfma_f32_32x32x16_bf8_fp8) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.bf8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x16.f16 (ROCDL::mfma_f32_32x32x16_f16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 8
bfixed-length vector of 16-bit float values of length 8
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x16.fp8.bf8 (ROCDL::mfma_f32_32x32x16_fp8_bf8) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.fp8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x16.fp8.fp8 (ROCDL::mfma_f32_32x32x16_fp8_fp8) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x16.fp8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x1f32 (ROCDL::mfma_f32_32x32x1f32) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float
b32-bit float
cfixed-length vector of 32-bit float values of length 32

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.mfma.f32.32x32x2bf16 (ROCDL::mfma_f32_32x32x2bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 2
bfixed-length vector of 16-bit signless integer values of length 2
cfixed-length vector of 32-bit float values of length 32

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.mfma.f32.32x32x2f32 (ROCDL::mfma_f32_32x32x2f32) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x2f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float
b32-bit float
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x4.xf32 (ROCDL::mfma_f32_32x32x4_xf32) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x4.xf32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit float values of length 2
bfixed-length vector of 32-bit float values of length 2
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x4bf16 (ROCDL::mfma_f32_32x32x4bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x4bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 2
bfixed-length vector of 16-bit signless integer values of length 2
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x4bf16.1k (ROCDL::mfma_f32_32x32x4bf16_1k) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 32

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.mfma.f32.32x32x4f16 (ROCDL::mfma_f32_32x32x4f16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 4
cfixed-length vector of 32-bit float values of length 32

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 32

rocdl.mfma.f32.32x32x8bf16.1k (ROCDL::mfma_f32_32x32x8bf16_1k) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x8bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.32x32x8f16 (ROCDL::mfma_f32_32x32x8f16) 

Syntax:

operation ::= `rocdl.mfma.f32.32x32x8f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 4
cfixed-length vector of 32-bit float values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.mfma.f32.4x4x1f32 (ROCDL::mfma_f32_4x4x1f32) 

Syntax:

operation ::= `rocdl.mfma.f32.4x4x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float
b32-bit float
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.4x4x2bf16 (ROCDL::mfma_f32_4x4x2bf16) 

Syntax:

operation ::= `rocdl.mfma.f32.4x4x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 2
bfixed-length vector of 16-bit signless integer values of length 2
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.4x4x4bf16.1k (ROCDL::mfma_f32_4x4x4bf16_1k) 

Syntax:

operation ::= `rocdl.mfma.f32.4x4x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f32.4x4x4f16 (ROCDL::mfma_f32_4x4x4f16) 

Syntax:

operation ::= `rocdl.mfma.f32.4x4x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 4
cfixed-length vector of 32-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.mfma.f64.16x16x4f64 (ROCDL::mfma_f64_16x16x4f64) 

Syntax:

operation ::= `rocdl.mfma.f64.16x16x4f64` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit float
b64-bit float
cfixed-length vector of 64-bit float values of length 4

Results: 

ResultDescription
resfixed-length vector of 64-bit float values of length 4

rocdl.mfma.f64.4x4x4f64 (ROCDL::mfma_f64_4x4x4f64) 

Syntax:

operation ::= `rocdl.mfma.f64.4x4x4f64` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit float
b64-bit float
c64-bit float

Results: 

ResultDescription
res64-bit float

rocdl.mfma.i32.16x16x16i8 (ROCDL::mfma_i32_16x16x16i8) 

Syntax:

operation ::= `rocdl.mfma.i32.16x16x16i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit signless integer
b32-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.mfma.i32.16x16x32.i8 (ROCDL::mfma_i32_16x16x32_i8) 

Syntax:

operation ::= `rocdl.mfma.i32.16x16x32.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.mfma.i32.16x16x4i8 (ROCDL::mfma_i32_16x16x4i8) 

Syntax:

operation ::= `rocdl.mfma.i32.16x16x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit signless integer
b32-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.mfma.i32.16x16x64.i8 (ROCDL::mfma_i32_16x16x64_i8) 

Syntax:

operation ::= `rocdl.mfma.i32.16x16x64.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit signless integer values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.mfma.i32.32x32x16.i8 (ROCDL::mfma_i32_32x32x16_i8) 

Syntax:

operation ::= `rocdl.mfma.i32.32x32x16.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a64-bit signless integer
b64-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.mfma.i32.32x32x32.i8 (ROCDL::mfma_i32_32x32x32_i8) 

Syntax:

operation ::= `rocdl.mfma.i32.32x32x32.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit signless integer values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.mfma.i32.32x32x4i8 (ROCDL::mfma_i32_32x32x4i8) 

Syntax:

operation ::= `rocdl.mfma.i32.32x32x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit signless integer
b32-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 32

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 32

rocdl.mfma.i32.32x32x8i8 (ROCDL::mfma_i32_32x32x8i8) 

Syntax:

operation ::= `rocdl.mfma.i32.32x32x8i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit signless integer
b32-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 16

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.mfma.i32.4x4x4i8 (ROCDL::mfma_i32_4x4x4i8) 

Syntax:

operation ::= `rocdl.mfma.i32.4x4x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)

Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C with matrix operands. The cbsz, abid, and blgp attributes control broadcast and block layout modes.

Example:

// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
  (f32, f32, vector<32xf32>) -> vector<32xf32>

// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
  (i32, i32, vector<32xi32>) -> vector<32xi32>

// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
  (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
a32-bit signless integer
b32-bit signless integer
cfixed-length vector of 32-bit signless integer values of length 4

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.mfma.scale.f32.16x16x128.f8f6f4 (ROCDL::mfma_scale_f32_16x16x128_f8f6f4) 

Syntax:

operation ::= `rocdl.mfma.scale.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $cbsz `,` $blgp `,` $opselA `,` $scaleA `,` $opselB `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. The opselA/opselB and scaleA/scaleB arguments control the scaling of input operands.

Example:

// Scaled MFMA with fp8 * fp8 inputs.
%r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

// Scaled MFMA with fp8 * bf8 inputs.
%r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

// Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B).
%r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute
opselA::mlir::IntegerAttr32-bit signless integer attribute
opselB::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
aLLVM dialect-compatible vector of 32-bit signless integer
bLLVM dialect-compatible vector of 32-bit signless integer
cLLVM dialect-compatible vector of 32-bit float
scaleA32-bit signless integer
scaleB32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.mfma.scale.f32.32x32x64.f8f6f4 (ROCDL::mfma_scale_f32_32x32x64_f8f6f4) 

Syntax:

operation ::= `rocdl.mfma.scale.f32.32x32x64.f8f6f4` $a `,` $b `,` $c `,` $cbsz `,` $blgp `,` $opselA `,` $scaleA `,` $opselB `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. The opselA/opselB and scaleA/scaleB arguments control the scaling of input operands.

Example:

// Scaled MFMA with fp8 * fp8 inputs.
%r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

// Scaled MFMA with fp8 * bf8 inputs.
%r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

// Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B).
%r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB :
  (vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
blgp::mlir::IntegerAttr32-bit signless integer attribute
opselA::mlir::IntegerAttr32-bit signless integer attribute
opselB::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
aLLVM dialect-compatible vector of 32-bit signless integer
bLLVM dialect-compatible vector of 32-bit signless integer
cLLVM dialect-compatible vector of 32-bit float
scaleA32-bit signless integer
scaleB32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.permlane16.swap (ROCDL::Permlane16SwapOp) 

Syntax:

operation ::= `rocdl.permlane16.swap` attr-dict $old `,` $src `,` $fi `,` $boundControl `:` `(` type($old) `,` type($src) `)` `->` type($res)

Performs a permlane16.swap operation with the given operands, applying the permutation specified by $fi to the provided inputs.

Example:

// Swap lanes between groups of 16 threads.
%res = rocdl.permlane16.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>

Attributes: 

AttributeMLIR TypeDescription
fi::mlir::IntegerAttr1-bit signless integer attribute
boundControl::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
old32-bit signless integer
src32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible struct of 32-bit signless integerand32-bit signless integer

rocdl.permlane32.swap (ROCDL::Permlane32SwapOp) 

Syntax:

operation ::= `rocdl.permlane32.swap` attr-dict $old `,` $src `,` $fi `,` $boundControl `:` `(` type($old) `,` type($src) `)` `->` type($res)

Performs a permlane32.swap operation with the given operands, applying the permutation specified by $fi to the provided inputs.

Example:

// Swap lanes between groups of 32 threads.
%res = rocdl.permlane32.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>

Attributes: 

AttributeMLIR TypeDescription
fi::mlir::IntegerAttr1-bit signless integer attribute
boundControl::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
old32-bit signless integer
src32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible struct of 32-bit signless integerand32-bit signless integer

rocdl.permlanex16 (ROCDL::PermlaneX16Op) 

Syntax:

operation ::= `rocdl.permlanex16` attr-dict $old `,` $src0 `,` $src1 `,` $src2 `,` $fi `,` $boundControl `:` type($src0) `,` type($src1)

Performs a permlanex16 operation with the given operands, applying the permutation specified by $fi to the provided inputs.

Example:

// Scalar permlanex16.
%ret0 = rocdl.permlanex16 %src0, %src0, %sel, %sel, 0, -1 : f32, i32

// Vector permlanex16.
%ret1 = rocdl.permlanex16 %src1, %src1, %sel, %sel, 0, -1 : vector<2xf32>, i32

Attributes: 

AttributeMLIR TypeDescription
fi::mlir::IntegerAttr1-bit signless integer attribute
boundControl::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldLLVM dialect-compatible type
src0LLVM dialect-compatible type
src1LLVM dialect-compatible type
src2LLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.raw.buffer.atomic.cmpswap (ROCDL::RawBufferAtomicCmpSwap) 

Syntax:

operation ::= `rocdl.raw.buffer.atomic.cmpswap` attr-dict `(` operands `)` `:` type($res) `,` type($rsrc)

Operands: 

OperandDescription
srcLLVM dialect-compatible type
cmpLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.raw.buffer.atomic.fadd (ROCDL::RawBufferAtomicFAddOp) 

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

rocdl.raw.buffer.atomic.fmax (ROCDL::RawBufferAtomicFMaxOp) 

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

rocdl.raw.buffer.atomic.smax (ROCDL::RawBufferAtomicSMaxOp) 

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

rocdl.raw.buffer.atomic.umin (ROCDL::RawBufferAtomicUMinOp) 

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

rocdl.raw.buffer.load (ROCDL::RawBufferLoadOp) 

Operands: 

OperandDescription
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.raw.buffer.store (ROCDL::RawBufferStoreOp) 

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM dialect-compatible type
offsetLLVM dialect-compatible type
soffsetLLVM dialect-compatible type
auxLLVM dialect-compatible type

rocdl.raw.ptr.buffer.atomic.cmpswap (ROCDL::RawPtrBufferAtomicCmpSwap) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.atomic.cmpswap` operands attr-dict `:` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
srcLLVM dialect-compatible type
cmpLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.raw.ptr.buffer.atomic.fadd (ROCDL::RawPtrBufferAtomicFaddOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.atomic.fadd` operands attr-dict `:` type($vdata)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.atomic.fmax (ROCDL::RawPtrBufferAtomicFmaxOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.atomic.fmax` operands attr-dict `:` type($vdata)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.atomic.smax (ROCDL::RawPtrBufferAtomicSmaxOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.atomic.smax` operands attr-dict `:` type($vdata)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.atomic.umin (ROCDL::RawPtrBufferAtomicUminOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.atomic.umin` operands attr-dict `:` type($vdata)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.load (ROCDL::RawPtrBufferLoadOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.load` operands attr-dict `:` type($res)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.raw.ptr.buffer.load.async.lds (ROCDL::RawPtrBufferLoadAsyncLdsOp) 

Async variant of raw.ptr.buffer.load.lds

Syntax:

operation ::= `rocdl.raw.ptr.buffer.load.async.lds` operands attr-dict

Load from a buffer resource rsrc to ldsPtr, which must be uniform.

See rocdl.load.async.to.lds for overall semantics of such loads, noting that here voffset can be lane-varying and that rsrc (which holds the base addres) must, as always, be uniform.

Available on gfx9 and gfx10.

Example:

// Async buffer load to LDS via buffer resource pointer.
rocdl.raw.ptr.buffer.load.async.lds %rsrc, %ldsPtr, %size, %voffset, %soffset, %offset, %aux

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
rsrcLLVM pointer in address space 8
ldsPtrLLVM pointer in address space 3
size32-bit signless integer
voffset32-bit signless integer
soffset32-bit signless integer
offset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.load.lds (ROCDL::RawPtrBufferLoadLdsOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.load.lds` operands attr-dict

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
rsrcLLVM pointer in address space 8
ldsPtrLLVM pointer in address space 3
size32-bit signless integer
voffset32-bit signless integer
soffset32-bit signless integer
offset32-bit signless integer
aux32-bit signless integer

rocdl.raw.ptr.buffer.store (ROCDL::RawPtrBufferStoreOp) 

Syntax:

operation ::= `rocdl.raw.ptr.buffer.store` operands attr-dict `:` type($vdata)

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
vdataLLVM dialect-compatible type
rsrcLLVM pointer in address space 8
offset32-bit signless integer
soffset32-bit signless integer
aux32-bit signless integer

rocdl.rcp (ROCDL::ROCDLRcp) 

Syntax:

operation ::= `rocdl.rcp` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.rcp %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.readfirstlane (ROCDL::ReadfirstlaneOp) 

Get the value in first active lane.

Syntax:

operation ::= `rocdl.readfirstlane` $src attr-dict `:` type($res)

Returns the value in the lowest active lane of the input operand.

Example:

// Scalar readfirstlane.
%0 = rocdl.readfirstlane %src0 : f32

// Vector readfirstlane.
%1 = rocdl.readfirstlane %src1 : vector<2xf32>

Operands: 

OperandDescription
srcLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.readlane (ROCDL::ReadlaneOp) 

Get the value in the specific lane.

Syntax:

operation ::= `rocdl.readlane` $src0 `,` $src1  attr-dict `:` `(` type($src0) `,` type($src1) `)` `->` type($res)

Get the value in lane src1 from input src0.

Example:

// Scalar readlane.
%0 = rocdl.readlane %src0, %idx : (f32, i32) -> f32

// Vector readlane.
%1 = rocdl.readlane %src1, %idx : (vector<2xf32>, i32) -> vector<2xf32>

Operands: 

OperandDescription
src0LLVM dialect-compatible type
src132-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.rsq (ROCDL::ROCDLRsq) 

Syntax:

operation ::= `rocdl.rsq` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.rsq %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.s.barrier (ROCDL::SBarrierOp) 

Syntax:

operation ::= `rocdl.s.barrier` attr-dict

Insert a workgroup barrier without memory fences.

Available on gfx9 and later but deprecated on gfx12+; see rocdl.s.barrier.signal and rocdl.s.barrier.wait instead.

Example:

// Synchronize threads within a workgroup.
rocdl.s.barrier

rocdl.s.barrier.init (ROCDL::BarrierInitOp) 

Syntax:

operation ::= `rocdl.s.barrier.init` $ptr `member_cnt` `=` $memberCnt attr-dict `:` qualified(type($ptr))

Available on gfx1250+.

Example:

// Initialize a named barrier with member count.
rocdl.s.barrier.init %ptr member_cnt = 1 : !llvm.ptr<3>

Attributes: 

AttributeMLIR TypeDescription
memberCnt::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

rocdl.s.barrier.join (ROCDL::BarrierJoinOp) 

Syntax:

operation ::= `rocdl.s.barrier.join` $ptr attr-dict `:` qualified(type($ptr))

Available on gfx1250+.

Example:

// Join a named barrier.
rocdl.s.barrier.join %ptr : !llvm.ptr<3>

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

rocdl.s.barrier.leave (ROCDL::BarrierLeaveOp) 

Syntax:

operation ::= `rocdl.s.barrier.leave` `id` `=` $id attr-dict

Available on gfx1250+.

Example:

// Leave a named barrier by id.
rocdl.s.barrier.leave id = 1

Attributes: 

AttributeMLIR TypeDescription
id::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.barrier.signal (ROCDL::BarrierSignalOp) 

Syntax:

operation ::= `rocdl.s.barrier.signal` `id` `=` $id attr-dict

Signal a barrier by id. Available on gfx1250+.

Example:

// Signal barrier with id -1 (all barriers).
rocdl.s.barrier.signal id = -1

Attributes: 

AttributeMLIR TypeDescription
id::mlir::IntegerAttr32-bit signless integer attribute

rocdl.s.barrier.signal.isfirst (ROCDL::BarrierSignalIsfirstOp) 

Syntax:

operation ::= `rocdl.s.barrier.signal.isfirst` `id` `=` $id attr-dict `->` type($res)

Available on gfx1200+.

Example:

// Signal barrier and check if this wave is first to arrive.
%0 = rocdl.s.barrier.signal.isfirst id = 1 -> i1

Attributes: 

AttributeMLIR TypeDescription
id::mlir::IntegerAttr32-bit signless integer attribute

Results: 

ResultDescription
res1-bit signless integer

rocdl.s.barrier.signal.var (ROCDL::BarrierSignalVarOp) 

Syntax:

operation ::= `rocdl.s.barrier.signal.var` $ptr `member_cnt` `=` $memberCnt attr-dict `:` qualified(type($ptr))

Available on gfx1250+.

Example:

// Signal a named barrier with variable ID.
rocdl.s.barrier.signal.var %ptr member_cnt = 1 : !llvm.ptr<3>

Attributes: 

AttributeMLIR TypeDescription
memberCnt::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

rocdl.s.barrier.wait (ROCDL::BarrierWaitOp) 

Syntax:

operation ::= `rocdl.s.barrier.wait` `id` `=` $id attr-dict

Wait on a barrier by id. Available on gfx1200+.

Example:

// Wait on barrier with id -1 (all barriers).
rocdl.s.barrier.wait id = -1

Attributes: 

AttributeMLIR TypeDescription
id::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.get.barrier.state (ROCDL::GetBarrierStateOp) 

Syntax:

operation ::= `rocdl.s.get.barrier.state` `id` `=` $id attr-dict `->` type($res)

Available on gfx1200+.

Example:

// Query barrier state by id.
%0 = rocdl.s.get.barrier.state id = 1 -> i32

Attributes: 

AttributeMLIR TypeDescription
id::mlir::IntegerAttr32-bit signless integer attribute

Results: 

ResultDescription
res32-bit signless integer

rocdl.s.get.named.barrier.state (ROCDL::GetNamedBarrierStateOp) 

Syntax:

operation ::= `rocdl.s.get.named.barrier.state` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)

Available on gfx1250+.

Example:

// Query named barrier state by pointer.
%0 = rocdl.s.get.named.barrier.state %ptr : !llvm.ptr<3> -> i32

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

Results: 

ResultDescription
res32-bit signless integer

rocdl.s.nop (ROCDL::SNopOp) 

Syntax:

operation ::= `rocdl.s.nop` attr-dict $count

Insert a number of NOP cycles.

Example:

// Insert a no-op.
rocdl.s.nop 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.setprio (ROCDL::SetPrioOp) 

Syntax:

operation ::= `rocdl.s.setprio` $priority attr-dict

Set the wavefront scheduling priority.

Example:

// Set priority to 0.
rocdl.s.setprio 0

Attributes: 

AttributeMLIR TypeDescription
priority::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.sleep (ROCDL::SSleepOp) 

Syntax:

operation ::= `rocdl.s.sleep` attr-dict $count

Sleep for a number of clock cycles.

Example:

// Sleep for a minimum duration.
rocdl.s.sleep 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr32-bit signless integer attribute

rocdl.s.wait.asynccnt (ROCDL::WaitAsynccntOp) 

Wait until ASYNCCNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.asynccnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx1250+.

Example:

// Wait for async counter to drain.
rocdl.s.wait.asynccnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.wait.dscnt (ROCDL::WaitDscntOp) 

Wait until DSCNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.dscnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx12+.

Example:

// Wait for data-sharing counter to drain.
rocdl.s.wait.dscnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.wait.expcnt (ROCDL::WaitExpcntOp) 

Wait until EXPCNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.expcnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx12+.

Example:

// Wait for export counter to drain.
rocdl.s.wait.expcnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.wait.loadcnt (ROCDL::WaitLoadcntOp) 

Wait until LOADCNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.loadcnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx12+.

Example:

// Wait for load counter to drain.
rocdl.s.wait.loadcnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.wait.storecnt (ROCDL::WaitStorecntOp) 

Wait until STORECNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.storecnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx12+.

Example:

// Wait for store counter to drain.
rocdl.s.wait.storecnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.wait.tensorcnt (ROCDL::WaitTensorcntOp) 

Wait until TENSORCNT is less than or equal to count

Syntax:

operation ::= `rocdl.s.wait.tensorcnt` $count attr-dict

Wait for the counter specified to be less-than or equal-to the count before continuing.

Available on gfx1250+.

Example:

// Wait for tensor counter to drain.
rocdl.s.wait.tensorcnt 0

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.s.waitcnt (ROCDL::SWaitcntOp) 

Syntax:

operation ::= `rocdl.s.waitcnt` attr-dict $bitfield

Wait for outstanding memory operations to complete, as specified by a bitfield whose semantics depend on the target chipset.

Example:

// Wait for all counters to reach zero.
rocdl.s.waitcnt 0

Attributes: 

AttributeMLIR TypeDescription
bitfield::mlir::IntegerAttr32-bit signless integer attribute

rocdl.s.wakeup.barrier (ROCDL::WakeupBarrierOp) 

Syntax:

operation ::= `rocdl.s.wakeup.barrier` $ptr attr-dict `:` qualified(type($ptr))

Wakes up waves associated with a given named barrier. Note, This op does not release waves waiting at the barrier. It just signal other waves in the same work-group waiting on the indicated named barrier to wake up. Available on gfx1250+.

Example:

// Wake up waves waiting on a named barrier.
rocdl.s.wakeup.barrier %ptr : !llvm.ptr<3>

Operands: 

OperandDescription
ptrLLVM pointer in address space 3

rocdl.sched.barrier (ROCDL::SchedBarrier) 

Syntax:

operation ::= `rocdl.sched.barrier` $mask attr-dict

Insert a scheduling barrier with the given mask. The mask is a bitfield that controls which instruction types may be scheduled across the barrier (e.g. 0x0000 = no instructions may cross, 0x0001 = ALU only, 0x0010 = all VMEM, etc.). See https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L349 for the full list of mask values.

Example:

// Scheduling barrier with mask 0.
rocdl.sched.barrier 0

Attributes: 

AttributeMLIR TypeDescription
mask::mlir::IntegerAttr32-bit signless integer attribute

rocdl.sched.group.barrier (ROCDL::SchedGroupBarrier) 

Syntax:

operation ::= `rocdl.sched.group.barrier` $mask `,` $size `,` $groupId attr-dict

Insert a scheduling group barrier.

Example:

// Schedule group barrier with mask, size, and group id.
rocdl.sched.group.barrier 8, 1, 0

Attributes: 

AttributeMLIR TypeDescription
mask::mlir::IntegerAttr32-bit signless integer attribute
size::mlir::IntegerAttr32-bit signless integer attribute
groupId::mlir::IntegerAttr32-bit signless integer attribute

rocdl.sin (ROCDL::ROCDLSin) 

Syntax:

operation ::= `rocdl.sin` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.sin %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.smfmac.f32.16x16x128.bf8.bf8 (ROCDL::smfmac_f32_16x16x128_bf8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x128.bf8.fp8 (ROCDL::smfmac_f32_16x16x128_bf8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x128.fp8.bf8 (ROCDL::smfmac_f32_16x16x128_fp8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x128.fp8.fp8 (ROCDL::smfmac_f32_16x16x128_fp8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x32.bf16 (ROCDL::smfmac_f32_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x32.f16 (ROCDL::smfmac_f32_16x16x32_f16) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x32.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 8
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.bf16 (ROCDL::smfmac_f32_16x16x64_bf16) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of bfloat16 type values of length 8
bfixed-length vector of bfloat16 type values of length 16
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.bf8.bf8 (ROCDL::smfmac_f32_16x16x64_bf8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.bf8.fp8 (ROCDL::smfmac_f32_16x16x64_bf8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.f16 (ROCDL::smfmac_f32_16x16x64_f16) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 8
bfixed-length vector of 16-bit float values of length 16
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.fp8.bf8 (ROCDL::smfmac_f32_16x16x64_fp8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.16x16x64.fp8.fp8 (ROCDL::smfmac_f32_16x16x64_fp8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.16x16x64.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 4

rocdl.smfmac.f32.32x32x16.bf16 (ROCDL::smfmac_f32_32x32x16_bf16) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x16.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit signless integer values of length 4
bfixed-length vector of 16-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x16.f16 (ROCDL::smfmac_f32_32x32x16_f16) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x16.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 4
bfixed-length vector of 16-bit float values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.bf16 (ROCDL::smfmac_f32_32x32x32_bf16) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of bfloat16 type values of length 8
bfixed-length vector of bfloat16 type values of length 16
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.bf8.bf8 (ROCDL::smfmac_f32_32x32x32_bf8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.bf8.fp8 (ROCDL::smfmac_f32_32x32x32_bf8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.f16 (ROCDL::smfmac_f32_32x32x32_f16) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 16-bit float values of length 8
bfixed-length vector of 16-bit float values of length 16
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.fp8.bf8 (ROCDL::smfmac_f32_32x32x32_fp8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x32.fp8.fp8 (ROCDL::smfmac_f32_32x32x32_fp8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x32.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x64.bf8.bf8 (ROCDL::smfmac_f32_32x32x64_bf8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x64.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x64.bf8.fp8 (ROCDL::smfmac_f32_32x32x64_bf8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x64.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x64.fp8.bf8 (ROCDL::smfmac_f32_32x32x64_fp8_bf8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x64.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.f32.32x32x64.fp8.fp8 (ROCDL::smfmac_f32_32x32x64_fp8_fp8) 

Syntax:

operation ::= `rocdl.smfmac.f32.32x32x64.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit float values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit float values of length 16

rocdl.smfmac.i32.16x16x128.i8 (ROCDL::smfmac_i32_16x16x128_i8) 

Syntax:

operation ::= `rocdl.smfmac.i32.16x16x128.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit signless integer values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.smfmac.i32.16x16x64.i8 (ROCDL::smfmac_i32_16x16x64_i8) 

Syntax:

operation ::= `rocdl.smfmac.i32.16x16x64.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit signless integer values of length 4
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 4

rocdl.smfmac.i32.32x32x32.i8 (ROCDL::smfmac_i32_32x32x32_i8) 

Syntax:

operation ::= `rocdl.smfmac.i32.32x32x32.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 2
bfixed-length vector of 32-bit signless integer values of length 4
cfixed-length vector of 32-bit signless integer values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.smfmac.i32.32x32x64.i8 (ROCDL::smfmac_i32_32x32x64_i8) 

Syntax:

operation ::= `rocdl.smfmac.i32.32x32x64.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)

Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The index operand provides the sparsity metadata, and cbsz/abid control broadcast modes.

Example:

// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
  (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
  (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>

// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>

// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
  (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>

Attributes: 

AttributeMLIR TypeDescription
cbsz::mlir::IntegerAttr32-bit signless integer attribute
abid::mlir::IntegerAttr32-bit signless integer attribute

Operands: 

OperandDescription
afixed-length vector of 32-bit signless integer values of length 4
bfixed-length vector of 32-bit signless integer values of length 8
cfixed-length vector of 32-bit signless integer values of length 16
index32-bit signless integer

Results: 

ResultDescription
resfixed-length vector of 32-bit signless integer values of length 16

rocdl.sqrt (ROCDL::ROCDLSqrt) 

Syntax:

operation ::= `rocdl.sqrt` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.sqrt %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.swmmac.bf16.16x16x32.bf16 (ROCDL::swmmac_bf16_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.swmmac.bf16.16x16x32.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
aLLVM dialect-compatible vector of integer
bLLVM dialect-compatible vector of integer
cLLVM dialect-compatible vector of integer
index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible vector of integer

rocdl.swmmac.bf16.16x16x64.bf16 (ROCDL::swmmac_bf16_16x16x64_bf16) 

Syntax:

operation ::= `rocdl.swmmac.bf16.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
cbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
index32-bit signless integer

Results: 

ResultDescription
resbfloat16 type or LLVM dialect-compatible vector of bfloat16 type

rocdl.swmmac.bf16f32.16x16x64.bf16 (ROCDL::swmmac_bf16f32_16x16x64_bf16) 

Syntax:

operation ::= `rocdl.swmmac.bf16f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
cbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
index32-bit signless integer

Results: 

ResultDescription
resbfloat16 type or LLVM dialect-compatible vector of bfloat16 type

rocdl.swmmac.f16.16x16x128.bf8.bf8 (ROCDL::swmmac_f16_16x16x128_bf8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f16.16x16x128.bf8.fp8 (ROCDL::swmmac_f16_16x16x128_bf8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f16.16x16x128.fp8.bf8 (ROCDL::swmmac_f16_16x16x128_fp8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f16.16x16x128.fp8.fp8 (ROCDL::swmmac_f16_16x16x128_fp8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f16.16x16x32.f16 (ROCDL::swmmac_f16_16x16x32_f16) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x32.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
aLLVM dialect-compatible vector of 16-bit float
bLLVM dialect-compatible vector of 16-bit float
cLLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f16.16x16x64.f16 (ROCDL::swmmac_f16_16x16x64_f16) 

Syntax:

operation ::= `rocdl.swmmac.f16.16x16x64.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c16-bit float or LLVM dialect-compatible vector of 16-bit float
index32-bit signless integer

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.swmmac.f32.16x16x128.bf8.bf8 (ROCDL::swmmac_f32_16x16x128_bf8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x128.bf8.fp8 (ROCDL::swmmac_f32_16x16x128_bf8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x128.fp8.bf8 (ROCDL::swmmac_f32_16x16x128_fp8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x128.fp8.fp8 (ROCDL::swmmac_f32_16x16x128_fp8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.bf16 (ROCDL::swmmac_f32_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
aLLVM dialect-compatible vector of integer
bLLVM dialect-compatible vector of integer
cLLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.bf8.bf8 (ROCDL::swmmac_f32_16x16x32_bf8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.bf8.fp8 (ROCDL::swmmac_f32_16x16x32_bf8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.f16 (ROCDL::swmmac_f32_16x16x32_f16) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
aLLVM dialect-compatible vector of 16-bit float
bLLVM dialect-compatible vector of 16-bit float
cLLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.fp8.bf8 (ROCDL::swmmac_f32_16x16x32_fp8_bf8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x32.fp8.fp8 (ROCDL::swmmac_f32_16x16x32_fp8_fp8) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x32.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x64.bf16 (ROCDL::swmmac_f32_16x16x64_bf16) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.f32.16x16x64.f16 (ROCDL::swmmac_f32_16x16x64_f16) 

Syntax:

operation ::= `rocdl.swmmac.f32.16x16x64.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c32-bit float or LLVM dialect-compatible vector of 32-bit float
index32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.swmmac.i32.16x16x128.iu8 (ROCDL::swmmac_i32_16x16x128_iu8) 

Syntax:

operation ::= `rocdl.swmmac.i32.16x16x128.iu8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer
index32-bit signless integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.swmmac.i32.16x16x32.iu4 (ROCDL::swmmac_i32_16x16x32_iu4) 

Syntax:

operation ::= `rocdl.swmmac.i32.16x16x32.iu4` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer
index32-bit signless integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.swmmac.i32.16x16x32.iu8 (ROCDL::swmmac_i32_16x16x32_iu8) 

Syntax:

operation ::= `rocdl.swmmac.i32.16x16x32.iu8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer
index32-bit signless integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.swmmac.i32.16x16x64.iu4 (ROCDL::swmmac_i32_16x16x64_iu4) 

Syntax:

operation ::= `rocdl.swmmac.i32.16x16x64.iu4` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer
index32-bit signless integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.tanh (ROCDL::ROCDLTanh) 

Syntax:

operation ::= `rocdl.tanh` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))

Note: In the general case, prefer the conventional arith, math, or llvm ops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.

Example:

%0 = rocdl.tanh %a f32 -> f32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
argfloating point LLVM type

Results: 

ResultDescription
resfloating point LLVM type

rocdl.tensor.load.to.lds (ROCDL::TensorLoadToLDSOp) 

Base class for ROCDL tensor load/store to/from LDS.

Syntax:

operation ::= `rocdl.tensor.load.to.lds` attr-dict operands `cachepolicy` $cachePolicy `:` type($dgroup0) `,` type($dgroup1)

Moves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.

This op is for gfx1250+ architectures.

Example:

// Tensor load from global memory to LDS using 4 descriptor groups.
rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>

// Tensor store from LDS to global memory using 4 descriptor groups.
rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
cachePolicy::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
dgroup0fixed-length vector of 32-bit signless integer values of length 4
dgroup1fixed-length vector of 32-bit signless integer values of length 8
dgroup2fixed-length vector of 32-bit signless integer values of length 4
dgroup3fixed-length vector of 32-bit signless integer values of length 4
dgroup4fixed-length vector of 32-bit signless integer values of length 8

rocdl.tensor.store.from.lds (ROCDL::TensorStoreFromLDSOp) 

Base class for ROCDL tensor load/store to/from LDS.

Syntax:

operation ::= `rocdl.tensor.store.from.lds` attr-dict operands `cachepolicy` $cachePolicy `:` type($dgroup0) `,` type($dgroup1)

Moves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.

This op is for gfx1250+ architectures.

Example:

// Tensor load from global memory to LDS using 4 descriptor groups.
rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>

// Tensor store from LDS to global memory using 4 descriptor groups.
rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>

Interfaces: AliasAnalysisOpInterface

Attributes: 

AttributeMLIR TypeDescription
cachePolicy::mlir::IntegerAttr32-bit signless integer attribute
alias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
noalias_scopes::mlir::ArrayAttrLLVM dialect alias scope array
tbaa::mlir::ArrayAttrLLVM dialect TBAA tag metadata array

Operands: 

OperandDescription
dgroup0fixed-length vector of 32-bit signless integer values of length 4
dgroup1fixed-length vector of 32-bit signless integer values of length 8
dgroup2fixed-length vector of 32-bit signless integer values of length 4
dgroup3fixed-length vector of 32-bit signless integer values of length 4
dgroup4fixed-length vector of 32-bit signless integer values of length 8

rocdl.update.dpp (ROCDL::DPPUpdateOp) 

Syntax:

operation ::= `rocdl.update.dpp` attr-dict $old `,` $src `with` $dppCtrl `,` $rowMask `,` $bankMask `,` $boundCtrl `:` type($src)

Attributes: 

AttributeMLIR TypeDescription
dppCtrl::mlir::IntegerAttr32-bit signless integer attribute
rowMask::mlir::IntegerAttr32-bit signless integer attribute
bankMask::mlir::IntegerAttr32-bit signless integer attribute
boundCtrl::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
oldLLVM dialect-compatible type
srcLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.wait.asyncmark (ROCDL::WaitAsyncmarkOp) 

Wait until N or fewer async operation groups are unexecuted

Syntax:

operation ::= `rocdl.wait.asyncmark` $count attr-dict

This operation, along with rocdl.asyncmark, forms the compiler-provided framework for explicitly tracking asynchronous operations.

At the point where a wait.asyncmark operation is executed, all async operations that were parts of any async group (established by asyncmark in program order) other than the count previously-added ones will have finished executing.

For more detail, including on how this mechanism composes with function calls, see the LLVM documentation on async tracking.

Available on gfx9 and later.

Example:

// Wait until at most N async groups remain outstanding.
rocdl.wait.asyncmark 1

Usage example:

rocdl.tensor.load.to.lds ...
rocdl.global.async.load.to.lds ...

rocdl.asyncmark

rocdl.tensor.load.to.lds ...
rocdl.global.async.load.to.lds ...

rocdl.asyncmark

rocdl.wait.asyncmark 1 // First group of loads completes after this

Attributes: 

AttributeMLIR TypeDescription
count::mlir::IntegerAttr16-bit signless integer attribute

rocdl.wave.id (ROCDL::WaveId) 

Syntax:

operation ::= `rocdl.wave.id` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.wavefrontsize (ROCDL::WavefrontSizeOp) 

Syntax:

operation ::= `rocdl.wavefrontsize` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.wmma.bf16.16x16x16.bf16 (ROCDL::wmma_bf16_16x16x16_bf16) 

Syntax:

operation ::= `rocdl.wmma.bf16.16x16x16.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with output operand selection.

Example:

// WMMA f16 with opsel control.
%r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} :
  (vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>

Attributes: 

AttributeMLIR TypeDescription
opsel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.wmma.bf16.16x16x32.bf16 (ROCDL::wmma_bf16_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.wmma.bf16.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.

Example:

// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
cbfloat16 type or LLVM dialect-compatible vector of bfloat16 type

Results: 

ResultDescription
resbfloat16 type or LLVM dialect-compatible vector of bfloat16 type

rocdl.wmma.bf16f32.16x16x32.bf16 (ROCDL::wmma_bf16f32_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.wmma.bf16f32.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with different C and D types.

Example:

// WMMA bf16 output from f32 accumulator with bf16 inputs.
%r = rocdl.wmma.bf16f32.16x16x32.bf16 %a, %b, %c :
  (vector<16xbf16>, vector<16xbf16>, vector<8xf32>) -> vector<16xbf16>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
resbfloat16 type or LLVM dialect-compatible vector of bfloat16 type

rocdl.wmma.f16.16x16x128.bf8_bf8 (ROCDL::wmma_f16_16x16x128_bf8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x128.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x128.bf8_fp8 (ROCDL::wmma_f16_16x16x128_bf8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x128.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x128.fp8_bf8 (ROCDL::wmma_f16_16x16x128_fp8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x128.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x128.fp8_fp8 (ROCDL::wmma_f16_16x16x128_fp8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x128.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x16.f16 (ROCDL::wmma_f16_16x16x16_f16) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x16.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with output operand selection.

Example:

// WMMA f16 with opsel control.
%r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} :
  (vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>

Attributes: 

AttributeMLIR TypeDescription
opsel::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x32.f16 (ROCDL::wmma_f16_16x16x32_f16) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x32.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.

Example:

// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x64.bf8_bf8 (ROCDL::wmma_f16_16x16x64_bf8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x64.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x64.bf8_fp8 (ROCDL::wmma_f16_16x16x64_bf8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x64.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x64.fp8_bf8 (ROCDL::wmma_f16_16x16x64_fp8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x64.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f16.16x16x64.fp8_fp8 (ROCDL::wmma_f16_16x16x64_fp8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f16.16x16x64.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c16-bit float or LLVM dialect-compatible vector of 16-bit float

Results: 

ResultDescription
res16-bit float or LLVM dialect-compatible vector of 16-bit float

rocdl.wmma.f32.16x16x128.bf8_bf8 (ROCDL::wmma_f32_16x16x128_bf8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x128.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x128.bf8_fp8 (ROCDL::wmma_f32_16x16x128_bf8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x128.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x128.fp8_bf8 (ROCDL::wmma_f32_16x16x128_fp8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x128.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x128.fp8_fp8 (ROCDL::wmma_f32_16x16x128_fp8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x128.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.bf16 (ROCDL::wmma_f32_16x16x16_bf16) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.bf8_bf8 (ROCDL::wmma_f32_16x16x16_bf8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.bf8_fp8 (ROCDL::wmma_f32_16x16x16_bf8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.f16 (ROCDL::wmma_f32_16x16x16_f16) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.fp8_bf8 (ROCDL::wmma_f32_16x16x16_fp8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x16.fp8_fp8 (ROCDL::wmma_f32_16x16x16_fp8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x16.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) intrinsic.

Example:

// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x32.bf16 (ROCDL::wmma_f32_16x16x32_bf16) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.

Example:

// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
abfloat16 type or LLVM dialect-compatible vector of bfloat16 type
bbfloat16 type or LLVM dialect-compatible vector of bfloat16 type
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x32.f16 (ROCDL::wmma_f32_16x16x32_f16) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x32.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.

Example:

// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a16-bit float or LLVM dialect-compatible vector of 16-bit float
b16-bit float or LLVM dialect-compatible vector of 16-bit float
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x4.f32 (ROCDL::wmma_f32_16x16x4_f32) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x4.f32` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.

Example:

// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
  (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
a32-bit float or LLVM dialect-compatible vector of 32-bit float
b32-bit float or LLVM dialect-compatible vector of 32-bit float
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x64.bf8_bf8 (ROCDL::wmma_f32_16x16x64_bf8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x64.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x64.bf8_fp8 (ROCDL::wmma_f32_16x16x64_bf8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x64.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x64.fp8_bf8 (ROCDL::wmma_f32_16x16x64_fp8_bf8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x64.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.f32.16x16x64.fp8_fp8 (ROCDL::wmma_f32_16x16x64_fp8_fp8) 

Syntax:

operation ::= `rocdl.wmma.f32.16x16x64.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.

Example:

// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.i32.16x16x16.iu4 (ROCDL::wmma_i32_16x16x16_iu4) 

Syntax:

operation ::= `rocdl.wmma.i32.16x16x16.iu4` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.

Example:

// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
  {signA = false, signB = false, clamp = false} :
  (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.wmma.i32.16x16x16.iu8 (ROCDL::wmma_i32_16x16x16_iu8) 

Syntax:

operation ::= `rocdl.wmma.i32.16x16x16.iu8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.

Example:

// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
  {signA = false, signB = false, clamp = false} :
  (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.wmma.i32.16x16x32.iu4 (ROCDL::wmma_i32_16x16x32_iu4) 

Syntax:

operation ::= `rocdl.wmma.i32.16x16x32.iu4` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.

Example:

// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
  {signA = false, signB = false, clamp = false} :
  (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.wmma.i32.16x16x64.iu8 (ROCDL::wmma_i32_16x16x64_iu8) 

Syntax:

operation ::= `rocdl.wmma.i32.16x16x64.iu8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)

Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign, reuse, and clamp controls.

Example:

// WMMA i32 with unsigned i8 inputs and reuse controls.
%r = rocdl.wmma.i32.16x16x64.iu8 %a, %b, %c
  {signA = false, signB = false, reuseA = false, reuseB = false, clamp = false} :
  (vector<8xi32>, vector<8xi32>, vector<8xi32>) -> vector<8xi32>

Attributes: 

AttributeMLIR TypeDescription
signA::mlir::IntegerAttr1-bit signless integer attribute
signB::mlir::IntegerAttr1-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute
clamp::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
cinteger or LLVM dialect-compatible vector of integer

Results: 

ResultDescription
resinteger or LLVM dialect-compatible vector of integer

rocdl.wmma.scale.f32.16x16x128.f8f6f4 (ROCDL::wmma_scale_f32_16x16x128_f8f6f4) 

Syntax:

operation ::= `rocdl.wmma.scale.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.

Example:

// Scaled WMMA with f8f6f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
fmtA::mlir::IntegerAttr32-bit signless integer attribute
fmtB::mlir::IntegerAttr32-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
scaleAType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleA::mlir::IntegerAttr32-bit signless integer attribute
scaleBType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleB::mlir::IntegerAttr32-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
scaleA32-bit signless integer
scaleB32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.scale.f32.32x16x128.f4 (ROCDL::wmma_scale_f32_32x16x128_f4) 

Syntax:

operation ::= `rocdl.wmma.scale.f32.32x16x128.f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.

Example:

// Scaled WMMA with f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
scaleAType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleA::mlir::IntegerAttr32-bit signless integer attribute
scaleBType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleB::mlir::IntegerAttr32-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
scaleA32-bit signless integer
scaleB32-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.scale16.f32.16x16x128.f8f6f4 (ROCDL::wmma_scale16_f32_16x16x128_f8f6f4) 

Syntax:

operation ::= `rocdl.wmma.scale16.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.

Example:

// Scaled WMMA with f8f6f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB :
  (vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
fmtA::mlir::IntegerAttr32-bit signless integer attribute
fmtB::mlir::IntegerAttr32-bit signless integer attribute
modC::mlir::IntegerAttr16-bit signless integer attribute
scaleAType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleA::mlir::IntegerAttr32-bit signless integer attribute
scaleBType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleB::mlir::IntegerAttr32-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
scaleA64-bit signless integer
scaleB64-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.wmma.scale16.f32.32x16x128.f4 (ROCDL::wmma_scale16_f32_32x16x128_f4) 

Syntax:

operation ::= `rocdl.wmma.scale16.f32.32x16x128.f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)

Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.

Example:

// Scaled WMMA with f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB :
  (vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>

Attributes: 

AttributeMLIR TypeDescription
modC::mlir::IntegerAttr16-bit signless integer attribute
scaleAType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleA::mlir::IntegerAttr32-bit signless integer attribute
scaleBType::mlir::IntegerAttr32-bit signless integer attribute
fmtScaleB::mlir::IntegerAttr32-bit signless integer attribute
reuseA::mlir::IntegerAttr1-bit signless integer attribute
reuseB::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
ainteger or LLVM dialect-compatible vector of integer
binteger or LLVM dialect-compatible vector of integer
c32-bit float or LLVM dialect-compatible vector of 32-bit float
scaleA64-bit signless integer
scaleB64-bit signless integer

Results: 

ResultDescription
res32-bit float or LLVM dialect-compatible vector of 32-bit float

rocdl.workgroup.dim.x (ROCDL::BlockDimXOp) 

Syntax:

operation ::= `rocdl.workgroup.dim.x` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workgroup.dim.y (ROCDL::BlockDimYOp) 

Syntax:

operation ::= `rocdl.workgroup.dim.y` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workgroup.dim.z (ROCDL::BlockDimZOp) 

Syntax:

operation ::= `rocdl.workgroup.dim.z` (`range` $range^)? attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workgroup.id.x (ROCDL::BlockIdXOp) 

Syntax:

operation ::= `rocdl.workgroup.id.x` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workgroup.id.y (ROCDL::BlockIdYOp) 

Syntax:

operation ::= `rocdl.workgroup.id.y` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workgroup.id.z (ROCDL::BlockIdZOp) 

Syntax:

operation ::= `rocdl.workgroup.id.z` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workitem.id.x (ROCDL::ThreadIdXOp) 

Syntax:

operation ::= `rocdl.workitem.id.x` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workitem.id.y (ROCDL::ThreadIdYOp) 

Syntax:

operation ::= `rocdl.workitem.id.y` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

rocdl.workitem.id.z (ROCDL::ThreadIdZOp) 

Syntax:

operation ::= `rocdl.workitem.id.z` (`range` $range^)? attr-dict `:` type($res)

Read a hardware register for thread/workgroup/cluster identification. An optional range attribute can constrain the returned value.

Example:

// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32

// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
range::mlir::LLVM::ConstantRangeAttr
A range of two integers, corresponding to LLVM's ConstantRange
A pair of two integers, mapping to the ConstantRange structure in LLVM IR,
which is allowed to wrap or be empty.

The range represented is [Lower, Upper), and is either signed or unsigned depending on context.

lower and upper must have the same width.

Syntax:

`&lt;` `i`(width($lower)) $lower `,` $upper `&gt;`

Results: 

ResultDescription
resLLVM dialect-compatible type

Attributes 

ROCDLTargetAttr 

Syntax:

#rocdl.target<
  int,   # O
  ::llvm::StringRef,   # triple
  ::llvm::StringRef,   # chip
  ::llvm::StringRef,   # features
  ::llvm::StringRef,   # abi
  DictionaryAttr,   # flags
  ArrayAttr   # link
>

ROCDL target attribute for controlling compilation of AMDGPU targets. All parameters decay into default values if not present.

Examples:

  1. Target with default values.
  gpu.module @mymodule [#rocdl.target] attributes {...} {
    ...
  }
  1. Target with gfx90a chip and fast math.
  gpu.module @mymodule [#rocdl.target<chip = "gfx90a", flags = {fast, no_wave64}>] {
    ...
  }

Parameters: 

ParameterC++ typeDescription
OintOptimization level to apply.
triple::llvm::StringRefTarget triple.
chip::llvm::StringRefTarget chip.
features::llvm::StringRefTarget chip features.
abi::llvm::StringRefABI version.
flagsDictionaryAttrTarget specific flags.
linkArrayAttrFiles to link to the LLVM module.