'rocdl' Dialect
Dialect for wrapping LLVM AMDGPU backend intrinsics and attributes
The ROCDL dialect, like the other platform-specific LLVM dialects, serves as the location of wrappers around the AMD-specific intrinsics and attributes in LLVM.
This dialect, like other GPU lowering targets, also contains the infrastructure used by the built-in compilation/offloading framework to compile AMD-specific LLVM IR into binaries.
Dialect inclusion criteria and guidelines
The operations in this dialect are 1:1 wrappers around their corresponding LLVM intrinsics. Operations that do not correspond to intrinsics should not be placed in this dialect.
The definition of a ROCDL op should match its LLVM counterpart. If the
argument and result types are fixed, they should be specified as type
constraints, including by overriding the default variadic type on LLVM
intrinsics by doing a let results in the operation definition.
LLVM attributes do not need to be replicated exactly if it wouldn’t be easy to do so, but pure operations and ones that read/write memory should be annotated as such.
While LLVM intrinsics currently don’t allow constraining the values an
any_type can take, it is acceptable (but not required) to impose such
constraints if they are known.
When an LLVM intrinsic uses an immarg, this corresponds to an attribute
in MLIR.
Human-readable assembly formats (those that, for example, explicitly indicate
parameter names) may be used, and are encouraged for intrinsics that have
complex argument schemes and don’t have any higher-level wrapper (such as
in the amdgpu dialect).
While not all existing operations follow this convention, new operations should
generally provide argument and result types except in cases where they are
clearly redundant (such as with operations like rocdl.fmed3, which doesn’t
need to reiterate the single type at issue multiple times). This convention
enhances the readability of low-level IR and prevents programmers from needing
to find non-local type information.
Dialect-defined discardable attributes (any attribute starting with rocdl.
that has special handling) need to correspond to AMD-specific attributes, metadata,
or other entities (such as calling conventions) in LLVM, or be needed for
GPU compilation management. Outside of the compilation infrastructure,
dialect-specific enums or attributes are extmelely unlikely to be needed
and should be avoided.
Operation documentation should specify when the operation was introduced
(if relevant) and include usage examples. Operations should have
parser/printer tests in mlir/test/Dialect/LLVMIR/rocdl.mlir and
lowering tests in mlir/test/Target/LLVMIR/rocdl.mlir.
General documentation (What does this op do?)
While rocdl ops sometimes carry their own documentation, there is no expectation that such documentation will exist (or be kept up to date).
Since ROCDL operations correspond to LLVM intrinsics, the semantics and behavior of these operations can be determined by investigating the documentation for the corresponding intrinsic. This documentation can be found in
llvm/docs/AMDGPUUsage.rstand- The comments of
llvm/include/llvm/IR/IntrinsicsAMDGPU.td, which is where details of the meaning of certain bitfields or of how an intrinsic corresponds to hardware instructions are most likely to be found.
Since many intrinsics are themselves minimal wrappers around hardware instructions, these documentation sources often do not repeat hardware documentation. If an intrinsic appears undocumented, information about its behavior will often be available in published ISA descriptions or (sometimes known as shader programming guides).
If an operation doesn’t provide usage examples, it is likely that they
can be found in mlir/test/Dialect/LLVMIR/rocdl.mlir (op syntax and
verification) or mlir/test/Target/LLVMIR/rocdl.mlir (translation
to LLVM IR).
Operations ¶
rocdl.asyncmark (ROCDL::AsyncmarkOp) ¶
Mark the end of a group of asynchronous operations
Syntax:
operation ::= `rocdl.asyncmark` attr-dict
This operation, in conjunction with rocdl.wait.asyncmark, forms the
compiler-provided framework for tracking explicitly asynchronous
memory operations, such as copies to LDS that use async intrinsics
and gfx1250’s tensor loads.
Details of its behavior can be found in the LLVM documentation on async tracking.
See rocdl.wait.asyncmark’s documentation for a usage example.
Example:
// Mark the end of an async operation group.
rocdl.asyncmark
Available on gfx9 and later.
rocdl.ballot (ROCDL::BallotOp) ¶
Vote across thread group
Syntax:
operation ::= `rocdl.ballot` $pred attr-dict `:` type($res)
Ballot provides a bit mask containing the 1-bit predicate value from each lane. The nth bit of the result contains the 1 bit contributed by the nth warp lane.
Example:
// Ballot across thread group.
%0 = rocdl.ballot %pred : i64
Operands: ¶
| Operand | Description |
|---|---|
pred | 1-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.barrier (ROCDL::BarrierOp) ¶
Syntax:
operation ::= `rocdl.barrier` attr-dict
An operation with the same expansion as HIP’s __synchthreads();
DEPRECATION NOTICE: Use gpu.barrier, which will expand to these
operations, instead.
Example:
// Workgroup barrier with acquire/release fences.
rocdl.barrier
rocdl.cluster.id.x (ROCDL::ClusterIdXOp) ¶
Syntax:
operation ::= `rocdl.cluster.id.x` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cluster.id.y (ROCDL::ClusterIdYOp) ¶
Syntax:
operation ::= `rocdl.cluster.id.y` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cluster.id.z (ROCDL::ClusterIdZOp) ¶
Syntax:
operation ::= `rocdl.cluster.id.z` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cluster.load.async.to.lds.b128 (ROCDL::ClusterLoadAsyncToLDSB128Op) ¶
Syntax:
operation ::= `rocdl.cluster.load.async.to.lds.b128` $globalPtr `,` $ldsPtr `,` $offset `,` $cpol `,` $mask
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Broadcasts memory load of 128 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 128-bit load to LDS.
rocdl.cluster.load.async.to.lds.b128 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
cpol | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
mask | 32-bit signless integer |
rocdl.cluster.load.async.to.lds.b32 (ROCDL::ClusterLoadAsyncToLDSB32Op) ¶
Syntax:
operation ::= `rocdl.cluster.load.async.to.lds.b32` $globalPtr `,` $ldsPtr `,` $offset `,` $cpol `,` $mask
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Broadcasts memory load of 32 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 32-bit load to LDS.
rocdl.cluster.load.async.to.lds.b32 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
cpol | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
mask | 32-bit signless integer |
rocdl.cluster.load.async.to.lds.b64 (ROCDL::ClusterLoadAsyncToLDSB64Op) ¶
Syntax:
operation ::= `rocdl.cluster.load.async.to.lds.b64` $globalPtr `,` $ldsPtr `,` $offset `,` $cpol `,` $mask
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Broadcasts memory load of 64 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 64-bit load to LDS.
rocdl.cluster.load.async.to.lds.b64 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
cpol | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
mask | 32-bit signless integer |
rocdl.cluster.load.async.to.lds.b8 (ROCDL::ClusterLoadAsyncToLDSB8Op) ¶
Syntax:
operation ::= `rocdl.cluster.load.async.to.lds.b8` $globalPtr `,` $ldsPtr `,` $offset `,` $cpol `,` $mask
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Broadcasts memory load of 8 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 8-bit load to LDS.
rocdl.cluster.load.async.to.lds.b8 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
cpol | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
mask | 32-bit signless integer |
rocdl.cluster.workgroup.id.x (ROCDL::ClusterWorkgroupIdXOp) ¶
Syntax:
operation ::= `rocdl.cluster.workgroup.id.x` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cluster.workgroup.id.y (ROCDL::ClusterWorkgroupIdYOp) ¶
Syntax:
operation ::= `rocdl.cluster.workgroup.id.y` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cluster.workgroup.id.z (ROCDL::ClusterWorkgroupIdZOp) ¶
Syntax:
operation ::= `rocdl.cluster.workgroup.id.z` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cos (ROCDL::ROCDLCos) ¶
Syntax:
operation ::= `rocdl.cos` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.cos %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.cvt.f32.bf8 (ROCDL::CvtF32Bf8Op) ¶
Convert bf8 to f32
Syntax:
operation ::= `rocdl.cvt.f32.bf8` attr-dict $srcA `[` $byteSel `]` `:` type($res)
Convert 8-bit bf8 value from the byteSelth bit of srcA to fp32.
Example:
// Convert bf8 byte 0 to f32.
%0 = rocdl.cvt.f32.bf8 %src[0] : f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
byteSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.f32.fp8 (ROCDL::CvtF32Fp8Op) ¶
Convert fp8 to f32
Syntax:
operation ::= `rocdl.cvt.f32.fp8` attr-dict $srcA `[` $byteSel `]` `:` type($res)
Convert 8-bit fp8 value from the byteSelth bit of srcA to fp32.
Example:
// Convert fp8 byte 0 to f32.
%0 = rocdl.cvt.f32.fp8 %src[0] : f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
byteSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.pk.bf8.f32 (ROCDL::CvtPkBf8F32Op) ¶
Convert two f32’s to bf8
Syntax:
operation ::= `rocdl.cvt.pk.bf8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $wordSel `]` `:` type($res)
Convert srcA and srcB to bf8 and store into the low/high word of
old, preserving the other word.
Example:
// Pack two f32 values into bf8 in the low word of old.
%0 = rocdl.cvt.pk.bf8.f32 %a, %b -> %old[false] : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
wordSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit float |
srcB | 32-bit float |
old | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.pk.f32.bf8 (ROCDL::CvtPkF32Bf8Op) ¶
Convert packed bf8 to packed f32
Syntax:
operation ::= `rocdl.cvt.pk.f32.bf8` attr-dict $src `[` $wordSel `]` `:` type($res)
Convert src based on $wordSel to packed fp32.
Example:
// Unpack bf8 word to packed f32.
%0 = rocdl.cvt.pk.f32.bf8 %src[false] : vector<2xf32>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
wordSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.pk.f32.fp8 (ROCDL::CvtPkF32Fp8Op) ¶
Convert packed fp8 to packed f32
Syntax:
operation ::= `rocdl.cvt.pk.f32.fp8` attr-dict $src `[` $wordSel `]` `:` type($res)
Convert src based on $wordSel to packed fp32.
Example:
// Unpack fp8 word to packed f32.
%0 = rocdl.cvt.pk.f32.fp8 %src[false] : vector<2xf32>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
wordSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.pk.fp8.f32 (ROCDL::CvtPkFp8F32Op) ¶
Convert two f32’s to fp8
Syntax:
operation ::= `rocdl.cvt.pk.fp8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $wordSel `]` `:` type($res)
Convert srcA and srcB to fp8 and store into the low/high word of
old, preserving the other word.
Example:
// Pack two f32 values into fp8 in the low word of old.
%0 = rocdl.cvt.pk.fp8.f32 %a, %b -> %old[false] : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
wordSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit float |
srcB | 32-bit float |
old | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.pkrtz (ROCDL::CvtPkRtz) ¶
Convert two f32 input into a vector<2xf16>
Syntax:
operation ::= `rocdl.cvt.pkrtz` attr-dict $srcA `,` $srcB `:` type($res)
Convert two f32 values into a packed vector<2xf16>.
Example:
// Pack two f32 values into a vector<2xf16> with round-to-zero.
%0 = rocdl.cvt.pkrtz %a, %b : vector<2xf16>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit float |
srcB | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.scale.pk16.bf16.bf6 (ROCDL::CvtPkScalePk16Bf16Bf6Op) ¶
Scales 16 bf6 and converts them to 16 bf16.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.bf16.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 16 |
rocdl.cvt.scale.pk16.bf16.fp6 (ROCDL::CvtPkScalePk16Bf16Fp6Op) ¶
Scales 16 fp6 and converts them to 16 bf16.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.bf16.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 16 |
rocdl.cvt.scale.pk16.f16.bf6 (ROCDL::CvtPkScalePk16F16Bf6Op) ¶
Scales 16 bf6 and converts them to 16 f16.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.f16.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 16 |
rocdl.cvt.scale.pk16.f16.fp6 (ROCDL::CvtPkScalePk16F16Fp6Op) ¶
Scales 16 fp6 and converts them to 16 f16.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.f16.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 16 |
rocdl.cvt.scale.pk16.f32.bf6 (ROCDL::CvtPkScalePk16F32Bf6Op) ¶
Scales 16 bf6 and converts them to 16 f32.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.f32.bf6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.cvt.scale.pk16.f32.fp6 (ROCDL::CvtPkScalePk16F32Fp6Op) ¶
Scales 16 fp6 and converts them to 16 f32.
Syntax:
operation ::= `rocdl.cvt.scale.pk16.f32.fp6` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 3 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.cvt.scale.pk8.bf16.bf8 (ROCDL::CvtPkScalePk8Bf16Bf8Op) ¶
Scales 8 bf8 and converts them to 8 bf16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.bf16.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 8 |
rocdl.cvt.scale.pk8.bf16.fp4 (ROCDL::CvtPkScalePk8Bf16Fp4Op) ¶
Scales 8 fp4 and converts them to 8 bf16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.bf16.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 8 |
rocdl.cvt.scale.pk8.bf16.fp8 (ROCDL::CvtPkScalePk8Bf16Fp8Op) ¶
Scales 8 fp8 and converts them to 8 bf16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.bf16.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 8 |
rocdl.cvt.scale.pk8.f16.bf8 (ROCDL::CvtPkScalePk8F16Bf8Op) ¶
Scales 8 bf8 and converts them to 8 f16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f16.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 8 |
rocdl.cvt.scale.pk8.f16.fp4 (ROCDL::CvtPkScalePk8F16Fp4Op) ¶
Scales 8 fp4 and converts them to 8 f16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f16.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 8 |
rocdl.cvt.scale.pk8.f16.fp8 (ROCDL::CvtPkScalePk8F16Fp8Op) ¶
Scales 8 fp8 and converts them to 8 f16.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f16.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 8 |
rocdl.cvt.scale.pk8.f32.bf8 (ROCDL::CvtPkScalePk8F32Bf8Op) ¶
Scales 8 bf8 and converts them to 8 f32.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f32.bf8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 8 |
rocdl.cvt.scale.pk8.f32.fp4 (ROCDL::CvtPkScalePk8F32Fp4Op) ¶
Scales 8 fp4 and converts them to 8 f32.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f32.fp4` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 8 |
rocdl.cvt.scale.pk8.f32.fp8 (ROCDL::CvtPkScalePk8F32Fp8Op) ¶
Scales 8 fp8 and converts them to 8 f32.
Syntax:
operation ::= `rocdl.cvt.scale.pk8.f32.fp8` attr-dict $src `,` $scale `[` $scaleSel `]` `:` type($res)
Available on gfx1250+.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scaleSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 2 |
scale | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 8 |
rocdl.cvt.scalef32.2xpk16.bf6.f32 (ROCDL::CvtScaleF322xPk16Bf6F32Op) ¶
Scale and convert two vector<16xf32> to 32 packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.2xpk16.bf6.f32` attr-dict $src0 `,` $src1 `,` $scale `:` type($res)
Convert 32 single-precision float values, packed into two length-16
vectors that will be logically concanenated, to packed bf6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src0 | fixed-length vector of 32-bit float values of length 16 |
src1 | fixed-length vector of 32-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.2xpk16.fp6.f32 (ROCDL::CvtScaleF322xPk16Fp6F32Op) ¶
Scale and convert two vector<16xf32> to 32 packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.2xpk16.fp6.f32` attr-dict $src0 `,` $src1 `,` $scale `:` type($res)
Convert 32 single-precision float values, packed into two length-16
vectors that will be logically concanenated, to packed fp6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src0 | fixed-length vector of 32-bit float values of length 16 |
src1 | fixed-length vector of 32-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.f16.bf8 (ROCDL::CvtScaleF32F16Bf8Op) ¶
Scaled convert bf8 from packed vector to f16, updating tied result
Syntax:
operation ::= `rocdl.cvt.scalef32.f16.bf8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert a bf8 byte from src, selected by
srcSelIndex, to f16 while multiplying it by the expontent of scale,
and place it into the dstLoHiSelth bit
of oldVdst preserving the other element of that vector in
the return value.
The bytes are stored as an i32 and not a <4 x i8>.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit float values of length 2 |
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 2 |
rocdl.cvt.scalef32.f16.fp8 (ROCDL::CvtScaleF32F16Fp8Op) ¶
Scaled convert fp8 from packed vector to f16, updating tied result
Syntax:
operation ::= `rocdl.cvt.scalef32.f16.fp8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert a fp8 byte from src, selected by
srcSelIndex, to f16 while multiplying it by the expontent of scale,
and place it into the dstLoHiSelth bit
of oldVdst preserving the other element of that vector in
the return value.
The bytes are stored as an i32 and not a <4 x i8>.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit float values of length 2 |
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 2 |
rocdl.cvt.scalef32.f32.bf8 (ROCDL::CvtScaleF32F32Bf8Op) ¶
Scaled convert bf8 from packed vector to f32
Syntax:
operation ::= `rocdl.cvt.scalef32.f32.bf8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)
Convert a bf8 byte from src, selected by
srcSelIndex, to f32, multiplying it by the exponent of scale.
The bytes are stored in an i32, not a <4 x i8>.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float |
rocdl.cvt.scalef32.f32.fp8 (ROCDL::CvtScaleF32F32Fp8Op) ¶
Scaled convert fp8 from packed vector to f32
Syntax:
operation ::= `rocdl.cvt.scalef32.f32.fp8` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)
Convert a fp8 byte from src, selected by
srcSelIndex, to f32, multiplying it by the exponent of scale.
The bytes are stored in an i32, not a <4 x i8>.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float |
rocdl.cvt.scalef32.pk.bf16.bf8 (ROCDL::CvtScaleF32PkBf16Bf8Op) ¶
Scaled convert two bf8to two bf16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf16.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed bf8 values in src0 to two bf16 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 2 |
rocdl.cvt.scalef32.pk.bf16.fp4 (ROCDL::CvtScaleF32PkBf16Fp4Op) ¶
Scale and convert two packed fp4 to packed bf16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf16.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer
to packed bf16, multiplying by the exponent part of scale
before doing so.
The byte to convert is chosen by srcSelIndex.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 2 |
rocdl.cvt.scalef32.pk.bf16.fp8 (ROCDL::CvtScaleF32PkBf16Fp8Op) ¶
Scaled convert two fp8to two bf16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf16.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed fp8 values in src0 to two bf16 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 2 |
rocdl.cvt.scalef32.pk.bf8.bf16 (ROCDL::CvtScaleF32PkBf8Bf16Op) ¶
Scaled convert two bf16to two bf8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf8.bf16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two bf16 values in src0 to two bf8 bytes, dividing by the exponent in scale. The bytes are
packed into a 16-bit value which is inserted into oldVdst at the
dstLoHiSel position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | fixed-length vector of bfloat16 type values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk.bf8.f16 (ROCDL::CvtScaleF32PkBf8F16Op) ¶
Scaled convert two f16to two bf8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf8.f16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two f16 values in src0 to two bf8 bytes, dividing by the exponent in scale. The bytes are
packed into a 16-bit value which is inserted into oldVdst at the
dstLoHiSel position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | fixed-length vector of 16-bit float values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk.bf8.f32 (ROCDL::CvtScaleF32PkBf8F32Op) ¶
Scaled convert two f32 to two bf8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.bf8.f32` attr-dict $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two f32 values in src0 and src1 to two bf8 bytes,
dividing by the exponent in scale. The bytes are packed into
a 16-bit value which is inserted into oldVdst at the dstLoHiSel
position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | 32-bit float |
src1 | 32-bit float |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk.f16.bf8 (ROCDL::CvtScaleF32PkF16Bf8Op) ¶
Scaled convert two bf8to two f16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f16.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed bf8 values in src0 to two f16 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 2 |
rocdl.cvt.scalef32.pk.f16.fp4 (ROCDL::CvtScaleF32PkF16Fp4Op) ¶
Scale and convert two packed fp4 to packed f16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f16.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer
to packed f16, multiplying by the exponent part of scale
before doing so.
The byte to convert is chosen by srcSelIndex.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 2 |
rocdl.cvt.scalef32.pk.f16.fp8 (ROCDL::CvtScaleF32PkF16Fp8Op) ¶
Scaled convert two fp8to two f16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f16.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed fp8 values in src0 to two f16 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 2 |
rocdl.cvt.scalef32.pk.f32.bf8 (ROCDL::CvtScaleF32PkF32Bf8Op) ¶
Scaled convert two bf8to two f32
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f32.bf8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed bf8 values in src0 to two f32 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 2 |
rocdl.cvt.scalef32.pk.f32.fp4 (ROCDL::CvtScaleF32PkF32Fp4Op) ¶
Scale and convert two packed fp4 to packed f32
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f32.fp4` attr-dict $src `[` $srcSelIndex `]` `,` $scale `:` type($res)
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer
to packed f32, multiplying by the exponent part of scale
before doing so.
The byte to convert is chosen by srcSelIndex.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 2 |
rocdl.cvt.scalef32.pk.f32.fp8 (ROCDL::CvtScaleF32PkF32Fp8Op) ¶
Scaled convert two fp8to two f32
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.f32.fp8` attr-dict $src `[` $srcLoHiSel `]` `,` $scale `:` type($res)
Convert two packed fp8 values in src0 to two f32 values, multiplying by the exponent in scale.
The two values to be converted are selected from the low or high half
of src (a packed vector represented as an i32)
on the basis of srcLoHiSel.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
srcLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 2 |
rocdl.cvt.scalef32.pk.fp4.bf16 (ROCDL::CvtScaleF32PkFp4Bf16Op) ¶
Scale and convert two bf16 to packed fp4, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp4.bf16` attr-dict $src `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two packed bf16 values to packed
fp4, dividing by the exponent part of scale
before doing so.
The two scaled values are packed into a byte.
That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src | fixed-length vector of bfloat16 type values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk.fp4.f16 (ROCDL::CvtScaleF32PkFp4F16Op) ¶
Scale and convert two f16 to packed fp4, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp4.f16` attr-dict $src `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two packed f16 values to packed
fp4, dividing by the exponent part of scale
before doing so.
The two scaled values are packed into a byte.
That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src | fixed-length vector of 16-bit float values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk.fp4.f32 (ROCDL::CvtScaleF32PkFp4F32Op) ¶
Scale and convert two f32 values to two packed fp4, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp4.f32` attr-dict $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two single-precision float values, passed in src0 and src1
into two fp4 values, dividing them by the expontent part of scale
before doing so.
The two scaled values are packed into a byte.
That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Example:
// Scaled convert two f32 values to packed fp4 in byte 0 of old.
%0 = rocdl.cvt.scalef32.pk.fp4.f32 %a, %b, %scale -> %old[0] : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | 32-bit float |
src1 | 32-bit float |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk.fp8.bf16 (ROCDL::CvtScaleF32PkFp8Bf16Op) ¶
Scaled convert two bf16to two fp8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp8.bf16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two bf16 values in src0 to two fp8 bytes, dividing by the exponent in scale. The bytes are
packed into a 16-bit value which is inserted into oldVdst at the
dstLoHiSel position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | fixed-length vector of bfloat16 type values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk.fp8.f16 (ROCDL::CvtScaleF32PkFp8F16Op) ¶
Scaled convert two f16to two fp8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp8.f16` attr-dict $src0 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two f16 values in src0 to two fp8 bytes, dividing by the exponent in scale. The bytes are
packed into a 16-bit value which is inserted into oldVdst at the
dstLoHiSel position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | fixed-length vector of 16-bit float values of length 2 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk.fp8.f32 (ROCDL::CvtScaleF32PkFp8F32Op) ¶
Scaled convert two f32 to two fp8, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.pk.fp8.f32` attr-dict $src0 `,` $src1 `,` $scale `->` $oldVdst `[` $dstLoHiSel `]` `:` type($res)
Convert two f32 values in src0 and src1 to two fp8 bytes,
dividing by the exponent in scale. The bytes are packed into
a 16-bit value which is inserted into oldVdst at the dstLoHiSel
position, with the entire updated vector being returned.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstLoHiSel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | fixed-length vector of 16-bit signless integer values of length 2 |
src0 | 32-bit float |
src1 | 32-bit float |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk16.bf6.bf16 (ROCDL::CvtScaleF32Pk16Bf6Bf16Op) ¶
Scale and convert packed bf16 to packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.bf6.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk16.bf6.f16 (ROCDL::CvtScaleF32Pk16Bf6F16Op) ¶
Scale and convert packed f16 to packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.bf6.f16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk16.bf6.f32 (ROCDL::CvtScaleF32Pk16Bf6F32Op) ¶
Scale and convert packed f32 to packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.bf6.f32` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk16.fp6.bf16 (ROCDL::CvtScaleF32Pk16Fp6Bf16Op) ¶
Scale and convert packed bf16 to packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.fp6.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk16.fp6.f16 (ROCDL::CvtScaleF32Pk16Fp6F16Op) ¶
Scale and convert packed f16 to packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.fp6.f16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk16.fp6.f32 (ROCDL::CvtScaleF32Pk16Fp6F32Op) ¶
Scale and convert packed f32 to packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk16.fp6.f32` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 16 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.pk32.bf16.bf6 (ROCDL::CvtScaleF32Pk32Bf16Bf6Op) ¶
Scale and convert packed bf6 to packed bf16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.bf16.bf6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed bf6 values to packed bf16, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 32 |
rocdl.cvt.scalef32.pk32.bf16.fp6 (ROCDL::CvtScaleF32Pk32Bf16Fp6Op) ¶
Scale and convert packed fp6 to packed bf16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.bf16.fp6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed fp6 values to packed bf16, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of bfloat16 type values of length 32 |
rocdl.cvt.scalef32.pk32.bf6.bf16 (ROCDL::CvtScaleF32Pk32Bf6Bf16Op) ¶
Scale and convert packed bf16 to packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.bf6.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 32 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.pk32.bf6.f16 (ROCDL::CvtScaleF32Pk32Bf6F16Op) ¶
Scale and convert packed f16 to packed bf6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.bf6.f16` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed f16 values to packed bf6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 32 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.pk32.f16.bf6 (ROCDL::CvtScaleF32Pk32F16Bf6Op) ¶
Scale and convert packed bf6 to packed f16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.f16.bf6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed bf6 values to packed f16, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 32 |
rocdl.cvt.scalef32.pk32.f16.fp6 (ROCDL::CvtScaleF32Pk32F16Fp6Op) ¶
Scale and convert packed fp6 to packed f16
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.f16.fp6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed fp6 values to packed f16, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 16-bit float values of length 32 |
rocdl.cvt.scalef32.pk32.f32.bf6 (ROCDL::CvtScaleF32Pk32F32Bf6Op) ¶
Scale and convert packed bf6 to packed f32
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.f32.bf6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed bf6 values to packed f32, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.cvt.scalef32.pk32.f32.fp6 (ROCDL::CvtScaleF32Pk32F32Fp6Op) ¶
Scale and convert packed fp6 to packed f32
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.f32.fp6` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed fp6 values to packed f32, multiplying by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit signless integer values of length 6 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.cvt.scalef32.pk32.fp6.bf16 (ROCDL::CvtScaleF32Pk32Fp6Bf16Op) ¶
Scale and convert packed bf16 to packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.fp6.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 32 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.pk32.fp6.f16 (ROCDL::CvtScaleF32Pk32Fp6F16Op) ¶
Scale and convert packed f16 to packed fp6
Syntax:
operation ::= `rocdl.cvt.scalef32.pk32.fp6.f16` attr-dict $src `,` $scale `:` type($res)
Convert 32 packed f16 values to packed fp6, dividing by the exponent part of scale
before doing so.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 32 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.pk8.bf8.bf16 (ROCDL::CvtScaleF32Pk8Bf8Bf16Op) ¶
Scale and convert packed bf16 to packed bf8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.bf8.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk8.bf8.f16 (ROCDL::CvtScaleF32Pk8Bf8F16Op) ¶
Scale and convert packed f16 to packed bf8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.bf8.f16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk8.bf8.f32 (ROCDL::CvtScaleF32Pk8Bf8F32Op) ¶
Scale and convert packed f32 to packed bf8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.bf8.f32` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk8.fp4.bf16 (ROCDL::CvtScaleF32Pk8Fp4Bf16Op) ¶
Scale and convert packed bf16 to packed fp4
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp4.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk8.fp4.f16 (ROCDL::CvtScaleF32Pk8Fp4F16Op) ¶
Scale and convert packed f16 to packed fp4
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp4.f16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk8.fp4.f32 (ROCDL::CvtScaleF32Pk8Fp4F32Op) ¶
Scale and convert packed f32 to packed fp4
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp4.f32` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.pk8.fp8.bf16 (ROCDL::CvtScaleF32Pk8Fp8Bf16Op) ¶
Scale and convert packed bf16 to packed fp8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp8.bf16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk8.fp8.f16 (ROCDL::CvtScaleF32Pk8Fp8F16Op) ¶
Scale and convert packed f16 to packed fp8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp8.f16` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.pk8.fp8.f32 (ROCDL::CvtScaleF32Pk8Fp8F32Op) ¶
Scale and convert packed f32 to packed fp8
Syntax:
operation ::= `rocdl.cvt.scalef32.pk8.fp8.f32` attr-dict $src `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of scale
before doing so. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.bf8.bf16 (ROCDL::CvtScaleF32SrBf8BF16Op) ¶
Scaled convert bf16to bf8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.bf8.bf16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a bf16 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | bfloat16 type |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.bf8.f16 (ROCDL::CvtScaleF32SrBf8F16Op) ¶
Scaled convert f16to bf8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.bf8.f16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a f16 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | 16-bit float |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.bf8.f32 (ROCDL::CvtScaleF32SrBf8F32Op) ¶
Scaled convert f32to bf8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.bf8.f32` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a f32 value in src0 to a bf8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | 32-bit float |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.fp8.bf16 (ROCDL::CvtScaleF32SrFp8BF16Op) ¶
Scaled convert bf16to fp8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.fp8.bf16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a bf16 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | bfloat16 type |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.fp8.f16 (ROCDL::CvtScaleF32SrFp8F16Op) ¶
Scaled convert f16to fp8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.fp8.f16` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a f16 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | 16-bit float |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.fp8.f32 (ROCDL::CvtScaleF32SrFp8F32Op) ¶
Scaled convert f32to fp8 with stochiastic rounding, updating packed vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.fp8.f32` attr-dict $src0 `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert a f32 value in src0 to a fp8 bytes, dividing by the exponent in scale and using seed
for stochiastic rounding. Place the resulting byte in the
dstSelIndexth bit of oldVdst and return the entire packed vector,
which is stored as an i32.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src0 | 32-bit float |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk.fp4.bf16 (ROCDL::CvtScaleF32SrPkFp4Bf16Op) ¶
Scale and convert two bf16 to packed fp4 with stochiastic rounding, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.bf16` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two packed bf16 values to packed
fp4, dividing by the exponent part of scale
before doing so and using seed as the random seed for
stochiastic rounding.
The two scaled values are packed (little-endian)
into a byte. That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src | fixed-length vector of bfloat16 type values of length 2 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk.fp4.f16 (ROCDL::CvtScaleF32SrPkFp4F16Op) ¶
Scale and convert two f16 to packed fp4 with stochiastic rounding, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.f16` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two packed f16 values to packed
fp4, dividing by the exponent part of scale
before doing so and using seed as the random seed for
stochiastic rounding.
The two scaled values are packed (little-endian)
into a byte. That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src | fixed-length vector of 16-bit float values of length 2 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk.fp4.f32 (ROCDL::CvtScaleF32SrPkFp4F32Op) ¶
Scale and convert two f32 to packed fp4 with stochiastic rounding, updating tied vector
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk.fp4.f32` attr-dict $src `,` $seed `,` $scale `->` $oldVdst `[` $dstSelIndex `]` `:` type($res)
Convert two packed f32 values to packed
fp4, dividing by the exponent part of scale
before doing so and using seed as the random seed for
stochiastic rounding.
The two scaled values are packed (little-endian)
into a byte. That byte is used to update the dstSelIndexth
byte of oldVdst, which is returned in its entirity.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dstSelIndex | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
oldVdst | 32-bit signless integer |
src | fixed-length vector of 32-bit float values of length 2 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk16.bf6.bf16 (ROCDL::CvtScaleF32SrPk16Bf6Bf16Op) ¶
Scale and convert packed bf16 to packed bf6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk16.bf6.f16 (ROCDL::CvtScaleF32SrPk16Bf6F16Op) ¶
Scale and convert packed f16 to packed bf6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk16.bf6.f32 (ROCDL::CvtScaleF32SrPk16Bf6F32Op) ¶
Scale and convert packed f32 to packed bf6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.bf6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk16.fp6.bf16 (ROCDL::CvtScaleF32SrPk16Fp6Bf16Op) ¶
Scale and convert packed bf16 to packed fp6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk16.fp6.f16 (ROCDL::CvtScaleF32SrPk16Fp6F16Op) ¶
Scale and convert packed f16 to packed fp6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk16.fp6.f32 (ROCDL::CvtScaleF32SrPk16Fp6F32Op) ¶
Scale and convert packed f32 to packed fp6 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk16.fp6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 16 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 3 |
rocdl.cvt.scalef32.sr.pk32.bf6.bf16 (ROCDL::CvtScaleF32SrPk32Bf6Bf16Op) ¶
Scale and convert packed bf16 to packed bf6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk32.bf6.f16 (ROCDL::CvtScaleF32SrPk32Bf6F16Op) ¶
Scale and convert packed f16 to packed bf6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed f16 values to packed bf6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk32.bf6.f32 (ROCDL::CvtScaleF32SrPk32Bf6F32Op) ¶
Scale and convert packed f32 to packed bf6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.bf6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed f32 values to packed bf6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk32.fp6.bf16 (ROCDL::CvtScaleF32SrPk32Fp6Bf16Op) ¶
Scale and convert packed bf16 to packed fp6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk32.fp6.f16 (ROCDL::CvtScaleF32SrPk32Fp6F16Op) ¶
Scale and convert packed f16 to packed fp6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed f16 values to packed fp6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk32.fp6.f32 (ROCDL::CvtScaleF32SrPk32Fp6F32Op) ¶
Scale and convert packed f32 to packed fp6 with stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk32.fp6.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 32 packed f32 values to packed fp6, dividing by the exponent part of scale
before doing so and applying random rounding derived from
seed.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 32 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 6 |
rocdl.cvt.scalef32.sr.pk8.bf8.bf16 (ROCDL::CvtScaleF32SrPk8Bf8Bf16Op) ¶
Scale and convert packed bf16 to packed bf8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.pk8.bf8.f16 (ROCDL::CvtScaleF32SrPk8Bf8F16Op) ¶
Scale and convert packed f16 to packed bf8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.pk8.bf8.f32 (ROCDL::CvtScaleF32SrPk8Bf8F32Op) ¶
Scale and convert packed f32 to packed bf8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.bf8.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.pk8.fp4.bf16 (ROCDL::CvtScaleF32SrPk8Fp4Bf16Op) ¶
Scale and convert packed bf16 to packed fp4 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk8.fp4.f16 (ROCDL::CvtScaleF32SrPk8Fp4F16Op) ¶
Scale and convert packed f16 to packed fp4 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk8.fp4.f32 (ROCDL::CvtScaleF32SrPk8Fp4F32Op) ¶
Scale and convert packed f32 to packed fp4 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp4.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.cvt.scalef32.sr.pk8.fp8.bf16 (ROCDL::CvtScaleF32SrPk8Fp8Bf16Op) ¶
Scale and convert packed bf16 to packed fp8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.bf16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of bfloat16 type values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.pk8.fp8.f16 (ROCDL::CvtScaleF32SrPk8Fp8F16Op) ¶
Scale and convert packed f16 to packed fp8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.f16` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 16-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.scalef32.sr.pk8.fp8.f32 (ROCDL::CvtScaleF32SrPk8Fp8F32Op) ¶
Scale and convert packed f32 to packed fp8 with stochastic rounding
Syntax:
operation ::= `rocdl.cvt.scalef32.sr.pk8.fp8.f32` attr-dict $src `,` $seed `,` $scale `:` type($res)
Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of scale
before doing so and apply stochastic rounding. This op is for gfx1250+ arch.
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src | fixed-length vector of 32-bit float values of length 8 |
seed | 32-bit signless integer |
scale | 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 2 |
rocdl.cvt.sr.bf8.f32 (ROCDL::CvtSrBf8F32Op) ¶
Convert f32 to bf8, stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.sr.bf8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $byteSel `]` `:` type($res)
Convert srcA to bf8, adding the rounding factor from srcB,
and store into the byteSelth byte of old, preserving the others.
Example:
// Stochastic rounding convert f32 to bf8 in byte 2 of old.
%0 = rocdl.cvt.sr.bf8.f32 %val, %stoch -> %old[2] : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
byteSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit float |
srcB | 32-bit signless integer |
old | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.cvt.sr.fp8.f32 (ROCDL::CvtSrFp8F32Op) ¶
Convert f32 to fp8, stochiastic rounding
Syntax:
operation ::= `rocdl.cvt.sr.fp8.f32` attr-dict $srcA `,` $srcB `->` $old `[` $byteSel `]` `:` type($res)
Convert srcA to fp8, adding the rounding factor from srcB,
and store into the byteSelth byte of old, preserving the others.
Example:
// Stochastic rounding convert f32 to fp8 in byte 3 of old.
%0 = rocdl.cvt.sr.fp8.f32 %val, %stoch -> %old[3] : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
byteSel | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
srcA | 32-bit float |
srcB | 32-bit signless integer |
old | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.atomic.async.barrier.arrive.b64 (ROCDL::DsAtomicAsyncBarrierArriveOp) ¶
Syntax:
operation ::= `rocdl.ds.atomic.async.barrier.arrive.b64` $barrierPtr attr-dict `:` qualified(type($barrierPtr))
Waits on a given DS barrier and decrements pending count by -1. Stays in order with ASYNC loads to LDS, and uses ASYNCcnt to track its completion. Available on gfx1250+.
Example:
// Async atomic barrier arrive (fire-and-forget).
rocdl.ds.atomic.async.barrier.arrive.b64 %ptr : !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
barrierPtr | LLVM pointer in address space 3 |
rocdl.ds.atomic.barrier.arrive.rtn.b64 (ROCDL::DsAtomicBarrierArriveRtnOp) ¶
Syntax:
operation ::= `rocdl.ds.atomic.barrier.arrive.rtn.b64` $barrierPtr `,` $val attr-dict `:` qualified(type($barrierPtr)) `,` type($val) `->` type($res)
Waits on a given DS barrier and decrements its pending count by a given value. Note, the barrier state is given as a 64-bit structure containing pending count, phase and init count. The op returns the old barrier state. The op is executed as an ordinary LDS operations and it is ordered with other LDS operations. Thus, check DSCNT to determine when this instruction has executed. Available on gfx1250+.
Example:
// Atomic barrier arrive with return of old barrier state.
%res = rocdl.ds.atomic.barrier.arrive.rtn.b64 %ptr, %val : !llvm.ptr<3>, i64 -> i64
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
barrierPtr | LLVM pointer in address space 3 |
val | 64-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 64-bit signless integer |
rocdl.ds.load.tr16.b128 (ROCDL::DsLoadTr16_B128) ¶
Loads and transposes a matrix from ds memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.ds.load.tr16.b128` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 16-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.load.tr4.b64 (ROCDL::DsLoadTr4_B64) ¶
Loads and transposes a matrix from ds memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.ds.load.tr4.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 4-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.load.tr6.b96 (ROCDL::DsLoadTr6_B96) ¶
Loads and transposes a matrix from ds memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.ds.load.tr6.b96` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 6-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.load.tr8.b64 (ROCDL::DsLoadTr8_B64) ¶
Loads and transposes a matrix from ds memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.ds.load.tr8.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 8-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.read.tr16.b64 (ROCDL::ds_read_tr16_b64) ¶
Syntax:
operation ::= `rocdl.ds.read.tr16.b64` $ptr attr-dict `:` type($ptr) `->` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.read.tr4.b64 (ROCDL::ds_read_tr4_b64) ¶
Syntax:
operation ::= `rocdl.ds.read.tr4.b64` $ptr attr-dict `:` type($ptr) `->` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.read.tr6.b96 (ROCDL::ds_read_tr6_b96) ¶
Syntax:
operation ::= `rocdl.ds.read.tr6.b96` $ptr attr-dict `:` type($ptr) `->` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds.read.tr8.b64 (ROCDL::ds_read_tr8_b64) ¶
Syntax:
operation ::= `rocdl.ds.read.tr8.b64` $ptr attr-dict `:` type($ptr) `->` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.ds_bpermute (ROCDL::DsBpermuteOp) ¶
Syntax:
operation ::= `rocdl.ds_bpermute` $index `,` $src attr-dict `:` `(` type($index) `,` type($src) `)` `->` type($res)
Perform a backward permute (pull) operation across lanes using DS/LDS permute hardware.
Each lane reads the value of src from the lane whose byte address is
given by index (i.e. lane id = index / 4).
This is “backward” (pull) in contrast to ds_permute_b32, which is
“forward” (push/scatter).
Example:
// Backward permute across lanes (pull from selected lane).
%0 = rocdl.ds_bpermute %index, %src : (i32, i32) -> i32
Operands: ¶
| Operand | Description |
|---|---|
index | 32-bit signless integer |
src | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.ds_swizzle (ROCDL::DsSwizzleOp) ¶
Syntax:
operation ::= `rocdl.ds_swizzle` $src `,` $offset attr-dict `:` `(` type($src) `,` type($offset) `)` `->` type($res)
Perform a data-sharing swizzle operation within a wavefront.
The offset operand encodes the swizzle pattern that will be placed in the
instruction’s offset field (i.e., the pattern used by ds_swizzle_b32).
See
https://llvm.org/docs/AMDGPUModifierSyntax.html#swizzle-pattern for
how this 16-bit pattern is constructed.
Example:
// Swizzle data within a wavefront.
%0 = rocdl.ds_swizzle %src, %offset : (i32, i32) -> i32
Operands: ¶
| Operand | Description |
|---|---|
src | 32-bit signless integer |
offset | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.exp (ROCDL::ROCDLExp) ¶
Syntax:
operation ::= `rocdl.exp` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.exp %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.exp2 (ROCDL::ROCDLExp2) ¶
Syntax:
operation ::= `rocdl.exp2` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.exp2 %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.flat.prefetch (ROCDL::FlatPrefetchOp) ¶
Syntax:
operation ::= `rocdl.flat.prefetch` $ptr `,` `scope` $scope attr-dict `:` qualified(type($ptr))
Prefetches 1 byte of data per lane using flat-memory addresses into the WGP-cache or L2-cache. Available on gfx1250+.
Example:
// Prefetch from flat memory into cache.
rocdl.flat.prefetch %ptr, scope 0 : !llvm.ptr
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scope | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 0 |
rocdl.fmed3 (ROCDL::FMed3Op) ¶
Median of three float/half values
Syntax:
operation ::= `rocdl.fmed3` $src0 `,` $src1 `,` $src2 attr-dict `:` type($res)
Computes the median of three floating-point values using the AMDGPU fmed3 intrinsic.
This operation is equivalent to max(min(a, b), min(max(a, b), c)) but uses the
hardware-accelerated V_MED3_F16/V_MED3_F32 instruction for better performance.
The operation supports both scalar and vector floating-point types (f16, f32).
Example:
// Scalar f32 median
%result = rocdl.fmed3 %a, %b, %c : f32
// Vector f16 median
%result = rocdl.fmed3 %va, %vb, %vc : vector<4xf16>
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
src0 | floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type |
src1 | floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type |
src2 | floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type or LLVM dialect-compatible vector of floating point LLVM type |
rocdl.global.load.async.lds (ROCDL::GlobalLoadAsyncLDSOp) ¶
Version of rocdl.load.async.to.lds specialized to global pointers
Syntax:
operation ::= `rocdl.global.load.async.lds` $globalPtr `,` $ldsPtr `,` $size `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
This operation works identically to rocdl.load.async.to.lds except that the
global pointer argument is limited to pointers in address space 1 (pure global
pointers) instead of also allowing fat buffer pointers.
Available on gfx9 and gfx10.
For the operation introduced in gfx1250, see rocdl.global.load.async.to.lds.bN.
Example:
// Async load from global pointer to LDS (address space 1 only).
rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.async.to.lds.b128 (ROCDL::GlobalLoadAsyncToLDSB128Op) ¶
Syntax:
operation ::= `rocdl.global.load.async.to.lds.b128` $globalPtr `,` $ldsPtr `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Asynchronously loads 128 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 128-bit load from global to LDS.
rocdl.global.load.async.to.lds.b128 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.async.to.lds.b32 (ROCDL::GlobalLoadAsyncToLDSB32Op) ¶
Syntax:
operation ::= `rocdl.global.load.async.to.lds.b32` $globalPtr `,` $ldsPtr `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Asynchronously loads 32 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 32-bit load from global to LDS.
rocdl.global.load.async.to.lds.b32 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.async.to.lds.b64 (ROCDL::GlobalLoadAsyncToLDSB64Op) ¶
Syntax:
operation ::= `rocdl.global.load.async.to.lds.b64` $globalPtr `,` $ldsPtr `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Asynchronously loads 64 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 64-bit load from global to LDS.
rocdl.global.load.async.to.lds.b64 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.async.to.lds.b8 (ROCDL::GlobalLoadAsyncToLDSB8Op) ¶
Syntax:
operation ::= `rocdl.global.load.async.to.lds.b8` $globalPtr `,` $ldsPtr `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Asynchronously loads 8 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 8-bit load from global to LDS.
rocdl.global.load.async.to.lds.b8 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.lds (ROCDL::GlobalLoadLDSOp) ¶
Syntax:
operation ::= `rocdl.global.load.lds` $globalPtr `,` $ldsPtr `,` $size `,` $offset `,` $aux
attr-dict
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer in address space 1 |
ldsPtr | LLVM pointer in address space 3 |
rocdl.global.load.tr.b128 (ROCDL::GlobalLoadTr8_B128) ¶
Loads and transposes a matrix from global memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.global.load.tr.b128` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 16-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 1 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.global.load.tr.b64 (ROCDL::GlobalLoadTr8_B64) ¶
Loads and transposes a matrix from global memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.global.load.tr.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 8-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 1 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.global.load.tr4.b64 (ROCDL::GlobalLoadTr4_B64) ¶
Loads and transposes a matrix from global memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.global.load.tr4.b64` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 4-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 1 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.global.load.tr6.b96 (ROCDL::GlobalLoadTr6_B96) ¶
Loads and transposes a matrix from global memory to registers (available in gfx1250+).
Syntax:
operation ::= `rocdl.global.load.tr6.b96` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Load a matrix of 6-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory.
%0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32>
// 128-bit transpose load from global memory with f16 result.
%1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16>
// 64-bit transpose load from LDS.
%2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32>
// 128-bit transpose load from LDS with bf16 result.
%3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 1 |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.global.prefetch (ROCDL::GlobalPrefetchOp) ¶
Syntax:
operation ::= `rocdl.global.prefetch` $ptr `,` `scope` $scope attr-dict `:` qualified(type($ptr))
Prefetches 1 byte of data per lane from global memory into the WGP-cache or L2-cache. Available on gfx1250+.
Example:
// Prefetch from global memory into cache.
rocdl.global.prefetch %ptr, scope 0 : !llvm.ptr<1>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
scope | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 1 |
rocdl.grid.dim.x (ROCDL::GridDimXOp) ¶
Syntax:
operation ::= `rocdl.grid.dim.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.grid.dim.y (ROCDL::GridDimYOp) ¶
Syntax:
operation ::= `rocdl.grid.dim.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.grid.dim.z (ROCDL::GridDimZOp) ¶
Syntax:
operation ::= `rocdl.grid.dim.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.iglp.opt (ROCDL::IglpOpt) ¶
Syntax:
operation ::= `rocdl.iglp.opt` $variant attr-dict
Instruction-group-level parallelism optimization hint.
Example:
// IGLP optimization hint variant 0.
rocdl.iglp.opt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
variant | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.load.async.to.lds (ROCDL::LoadAsyncToLDSOp) ¶
Gathering load to LDS that requires explicit async memory tracking
Syntax:
operation ::= `rocdl.load.async.to.lds` $globalPtr `,` $ldsPtr `,` $size `,` $offset `,` $aux
attr-dict `:` qualified(type($globalPtr)) `,` qualified(type($ldsPtr))
Load size bytes (the valid sizes vary by architecture) from the global memory
pointed to by globalPtr and put them at ldsPtr, concantenating (and applying
padding for sizes less than 4 bytes, along with padding out 12-byte reads
to 16-byte writes). The value of globalPtr can vary between lanes, while
sharedPtr must be subgroup-uniform (the values from each lane are concatentated
before being written to LDS with appropriate padding applied.)
offset is a constant offset applied to both pointers, and aux sets the cache
policy. Unlike rocdl.load.to.lds, the compiler will not automatically inserts waits
for this load to complete at the point it thinks you’re using a region of LDS you’ve
stored values to - you need to use the rocdl.asyncmark and rocdl.wait.asyncmark
operations to explicitly group these operations and wait for their completion.
Available on gfx10 and earlier with varying suppported values of size.
Example:
// Async load 4 bytes from global pointer to LDS.
rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
// Async load 4 bytes from fat buffer pointer to LDS.
rocdl.load.async.to.lds %fatBuffer, %shared, 4, 0, 0 : !llvm.ptr<7>, !llvm.ptr<3>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer type |
ldsPtr | LLVM pointer in address space 3 |
rocdl.load.to.lds (ROCDL::LoadToLDSOp) ¶
Syntax:
operation ::= `rocdl.load.to.lds` $globalPtr `,` $ldsPtr `,` $size `,` $offset `,` $aux
attr-dict `:` type($globalPtr)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
offset | ::mlir::IntegerAttr | 32-bit signless integer attribute |
aux | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
globalPtr | LLVM pointer type |
ldsPtr | LLVM pointer in address space 3 |
rocdl.log (ROCDL::ROCDLLog) ¶
Syntax:
operation ::= `rocdl.log` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.log %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.make.buffer.rsrc (ROCDL::MakeBufferRsrcOp) ¶
Syntax:
operation ::= `rocdl.make.buffer.rsrc` operands attr-dict `:` type($base) `to` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
base | LLVM pointer type |
stride | 16-bit signless integer |
numRecords | 64-bit signless integer |
flags | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM pointer type |
rocdl.mbcnt.hi (ROCDL::MbcntHiOp) ¶
Syntax:
operation ::= `rocdl.mbcnt.hi` $in0 `,` $in1 attr-dict `:` `(` type($in0) `,` type($in1) `)` `->` type($res)
Masked bit count of threads below the current lane in a wavefront.
in0 is a 32-bit mask that is AND-ed with the relevant half of the
execution mask and the bits below the current lane; in1 is added
to the resulting popcount:
- lo:
in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1)) - hi:
in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))
To obtain a unique thread index within a wave64, chain the two ops
with in0 = -1 (all bits set):
Example:
%all_ones = arith.constant -1 : i32
%zero = arith.constant 0 : i32
// Count active threads below this lane in the low 32 lanes.
%lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32
// Add the count from the high 32 lanes to get the full lane index.
%hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArgAndResultAttrsOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
arg_attrs | ::mlir::ArrayAttr | Array of dictionary attributes |
res_attrs | ::mlir::ArrayAttr | Array of dictionary attributes |
Operands: ¶
| Operand | Description |
|---|---|
in0 | 32-bit signless integer |
in1 | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.mbcnt.lo (ROCDL::MbcntLoOp) ¶
Syntax:
operation ::= `rocdl.mbcnt.lo` $in0 `,` $in1 attr-dict `:` `(` type($in0) `,` type($in1) `)` `->` type($res)
Masked bit count of threads below the current lane in a wavefront.
in0 is a 32-bit mask that is AND-ed with the relevant half of the
execution mask and the bits below the current lane; in1 is added
to the resulting popcount:
- lo:
in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1)) - hi:
in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))
To obtain a unique thread index within a wave64, chain the two ops
with in0 = -1 (all bits set):
Example:
%all_ones = arith.constant -1 : i32
%zero = arith.constant 0 : i32
// Count active threads below this lane in the low 32 lanes.
%lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32
// Add the count from the high 32 lanes to get the full lane index.
%hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ArgAndResultAttrsOpInterface, ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
arg_attrs | ::mlir::ArrayAttr | Array of dictionary attributes |
res_attrs | ::mlir::ArrayAttr | Array of dictionary attributes |
Operands: ¶
| Operand | Description |
|---|---|
in0 | 32-bit signless integer |
in1 | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.mfma.f32.16x16x16bf16.1k (ROCDL::mfma_f32_16x16x16bf16_1k) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x16bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x16f16 (ROCDL::mfma_f32_16x16x16f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x16f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x1f32 (ROCDL::mfma_f32_16x16x1f32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float |
b | 32-bit float |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.16x16x2bf16 (ROCDL::mfma_f32_16x16x2bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 2 |
b | fixed-length vector of 16-bit signless integer values of length 2 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.16x16x32.bf16 (ROCDL::mfma_f32_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of bfloat16 type values of length 8 |
b | fixed-length vector of bfloat16 type values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x32.bf8.bf8 (ROCDL::mfma_f32_16x16x32_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.bf8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x32.bf8.fp8 (ROCDL::mfma_f32_16x16x32_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.bf8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x32.f16 (ROCDL::mfma_f32_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 8 |
b | fixed-length vector of 16-bit float values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x32.fp8.bf8 (ROCDL::mfma_f32_16x16x32_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.fp8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x32.fp8.fp8 (ROCDL::mfma_f32_16x16x32_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x32.fp8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x4bf16.1k (ROCDL::mfma_f32_16x16x4bf16_1k) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.16x16x4f16 (ROCDL::mfma_f32_16x16x4f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.16x16x4f32 (ROCDL::mfma_f32_16x16x4f32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x4f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float |
b | 32-bit float |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x8.xf32 (ROCDL::mfma_f32_16x16x8_xf32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x8.xf32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit float values of length 2 |
b | fixed-length vector of 32-bit float values of length 2 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.16x16x8bf16 (ROCDL::mfma_f32_16x16x8bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.16x16x8bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 2 |
b | fixed-length vector of 16-bit signless integer values of length 2 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.32x32x16.bf16 (ROCDL::mfma_f32_32x32x16_bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of bfloat16 type values of length 8 |
b | fixed-length vector of bfloat16 type values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x16.bf8.bf8 (ROCDL::mfma_f32_32x32x16_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.bf8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x16.bf8.fp8 (ROCDL::mfma_f32_32x32x16_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.bf8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x16.f16 (ROCDL::mfma_f32_32x32x16_f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 8 |
b | fixed-length vector of 16-bit float values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x16.fp8.bf8 (ROCDL::mfma_f32_32x32x16_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.fp8.bf8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x16.fp8.fp8 (ROCDL::mfma_f32_32x32x16_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x16.fp8.fp8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x1f32 (ROCDL::mfma_f32_32x32x1f32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float |
b | 32-bit float |
c | fixed-length vector of 32-bit float values of length 32 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.mfma.f32.32x32x2bf16 (ROCDL::mfma_f32_32x32x2bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 2 |
b | fixed-length vector of 16-bit signless integer values of length 2 |
c | fixed-length vector of 32-bit float values of length 32 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.mfma.f32.32x32x2f32 (ROCDL::mfma_f32_32x32x2f32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x2f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float |
b | 32-bit float |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x4.xf32 (ROCDL::mfma_f32_32x32x4_xf32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x4.xf32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit float values of length 2 |
b | fixed-length vector of 32-bit float values of length 2 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x4bf16 (ROCDL::mfma_f32_32x32x4bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x4bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 2 |
b | fixed-length vector of 16-bit signless integer values of length 2 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x4bf16.1k (ROCDL::mfma_f32_32x32x4bf16_1k) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 32 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.mfma.f32.32x32x4f16 (ROCDL::mfma_f32_32x32x4f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 4 |
c | fixed-length vector of 32-bit float values of length 32 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 32 |
rocdl.mfma.f32.32x32x8bf16.1k (ROCDL::mfma_f32_32x32x8bf16_1k) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x8bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.32x32x8f16 (ROCDL::mfma_f32_32x32x8f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.32x32x8f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.mfma.f32.4x4x1f32 (ROCDL::mfma_f32_4x4x1f32) ¶
Syntax:
operation ::= `rocdl.mfma.f32.4x4x1f32` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float |
b | 32-bit float |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.4x4x2bf16 (ROCDL::mfma_f32_4x4x2bf16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.4x4x2bf16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 2 |
b | fixed-length vector of 16-bit signless integer values of length 2 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.4x4x4bf16.1k (ROCDL::mfma_f32_4x4x4bf16_1k) ¶
Syntax:
operation ::= `rocdl.mfma.f32.4x4x4bf16.1k` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f32.4x4x4f16 (ROCDL::mfma_f32_4x4x4f16) ¶
Syntax:
operation ::= `rocdl.mfma.f32.4x4x4f16` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.mfma.f64.16x16x4f64 (ROCDL::mfma_f64_16x16x4f64) ¶
Syntax:
operation ::= `rocdl.mfma.f64.16x16x4f64` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit float |
b | 64-bit float |
c | fixed-length vector of 64-bit float values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 64-bit float values of length 4 |
rocdl.mfma.f64.4x4x4f64 (ROCDL::mfma_f64_4x4x4f64) ¶
Syntax:
operation ::= `rocdl.mfma.f64.4x4x4f64` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit float |
b | 64-bit float |
c | 64-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 64-bit float |
rocdl.mfma.i32.16x16x16i8 (ROCDL::mfma_i32_16x16x16i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.16x16x16i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit signless integer |
b | 32-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.mfma.i32.16x16x32.i8 (ROCDL::mfma_i32_16x16x32_i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.16x16x32.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.mfma.i32.16x16x4i8 (ROCDL::mfma_i32_16x16x4i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.16x16x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit signless integer |
b | 32-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.mfma.i32.16x16x64.i8 (ROCDL::mfma_i32_16x16x64_i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.16x16x64.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit signless integer values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.mfma.i32.32x32x16.i8 (ROCDL::mfma_i32_32x32x16_i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.32x32x16.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 64-bit signless integer |
b | 64-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.mfma.i32.32x32x32.i8 (ROCDL::mfma_i32_32x32x32_i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.32x32x32.i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit signless integer values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.mfma.i32.32x32x4i8 (ROCDL::mfma_i32_32x32x4i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.32x32x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit signless integer |
b | 32-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 32 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 32 |
rocdl.mfma.i32.32x32x8i8 (ROCDL::mfma_i32_32x32x8i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.32x32x8i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit signless integer |
b | 32-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 16 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.mfma.i32.4x4x4i8 (ROCDL::mfma_i32_4x4x4i8) ¶
Syntax:
operation ::= `rocdl.mfma.i32.4x4x4i8` $a `,` $b `,` $c `,` $cbsz `,` $abid `,` $blgp attr-dict `:` functional-type(operands, $res)
Matrix fused multiply-add (MFMA) intrinsic. Computes D = A * B + C
with matrix operands. The cbsz, abid, and blgp attributes control
broadcast and block layout modes.
Example:
// MFMA with f32 inputs and 32-wide f32 accumulator.
%r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 :
(f32, f32, vector<32xf32>) -> vector<32xf32>
// MFMA with i8 inputs and 32-wide i32 accumulator.
%r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 :
(i32, i32, vector<32xi32>) -> vector<32xi32>
// MFMA with bf16 inputs and 32-wide f32 accumulator.
%r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 :
(vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit signless integer |
b | 32-bit signless integer |
c | fixed-length vector of 32-bit signless integer values of length 4 |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.mfma.scale.f32.16x16x128.f8f6f4 (ROCDL::mfma_scale_f32_16x16x128_f8f6f4) ¶
Syntax:
operation ::= `rocdl.mfma.scale.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $cbsz `,` $blgp `,` $opselA `,` $scaleA `,` $opselB `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling.
The opselA/opselB and scaleA/scaleB arguments control the scaling
of input operands.
Example:
// Scaled MFMA with fp8 * fp8 inputs.
%r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
// Scaled MFMA with fp8 * bf8 inputs.
%r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
// Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B).
%r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
opselA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
opselB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of 32-bit signless integer |
b | LLVM dialect-compatible vector of 32-bit signless integer |
c | LLVM dialect-compatible vector of 32-bit float |
scaleA | 32-bit signless integer |
scaleB | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.mfma.scale.f32.32x32x64.f8f6f4 (ROCDL::mfma_scale_f32_32x32x64_f8f6f4) ¶
Syntax:
operation ::= `rocdl.mfma.scale.f32.32x32x64.f8f6f4` $a `,` $b `,` $c `,` $cbsz `,` $blgp `,` $opselA `,` $scaleA `,` $opselB `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling.
The opselA/opselB and scaleA/scaleB arguments control the scaling
of input operands.
Example:
// Scaled MFMA with fp8 * fp8 inputs.
%r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
// Scaled MFMA with fp8 * bf8 inputs.
%r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
// Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B).
%r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB :
(vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
blgp | ::mlir::IntegerAttr | 32-bit signless integer attribute |
opselA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
opselB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of 32-bit signless integer |
b | LLVM dialect-compatible vector of 32-bit signless integer |
c | LLVM dialect-compatible vector of 32-bit float |
scaleA | 32-bit signless integer |
scaleB | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.permlane16.swap (ROCDL::Permlane16SwapOp) ¶
Syntax:
operation ::= `rocdl.permlane16.swap` attr-dict $old `,` $src `,` $fi `,` $boundControl `:` `(` type($old) `,` type($src) `)` `->` type($res)
Performs a permlane16.swap operation with the given operands, applying the
permutation specified by $fi to the provided inputs.
Example:
// Swap lanes between groups of 16 threads.
%res = rocdl.permlane16.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
fi | ::mlir::IntegerAttr | 1-bit signless integer attribute |
boundControl | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
old | 32-bit signless integer |
src | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible struct of 32-bit signless integerand32-bit signless integer |
rocdl.permlane32.swap (ROCDL::Permlane32SwapOp) ¶
Syntax:
operation ::= `rocdl.permlane32.swap` attr-dict $old `,` $src `,` $fi `,` $boundControl `:` `(` type($old) `,` type($src) `)` `->` type($res)
Performs a permlane32.swap operation with the given operands, applying the
permutation specified by $fi to the provided inputs.
Example:
// Swap lanes between groups of 32 threads.
%res = rocdl.permlane32.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
fi | ::mlir::IntegerAttr | 1-bit signless integer attribute |
boundControl | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
old | 32-bit signless integer |
src | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible struct of 32-bit signless integerand32-bit signless integer |
rocdl.permlanex16 (ROCDL::PermlaneX16Op) ¶
Syntax:
operation ::= `rocdl.permlanex16` attr-dict $old `,` $src0 `,` $src1 `,` $src2 `,` $fi `,` $boundControl `:` type($src0) `,` type($src1)
Performs a permlanex16 operation with the given operands, applying the
permutation specified by $fi to the provided inputs.
Example:
// Scalar permlanex16.
%ret0 = rocdl.permlanex16 %src0, %src0, %sel, %sel, 0, -1 : f32, i32
// Vector permlanex16.
%ret1 = rocdl.permlanex16 %src1, %src1, %sel, %sel, 0, -1 : vector<2xf32>, i32
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
fi | ::mlir::IntegerAttr | 1-bit signless integer attribute |
boundControl | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
old | LLVM dialect-compatible type |
src0 | LLVM dialect-compatible type |
src1 | LLVM dialect-compatible type |
src2 | LLVM dialect-compatible type |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.raw.buffer.atomic.cmpswap (ROCDL::RawBufferAtomicCmpSwap) ¶
Syntax:
operation ::= `rocdl.raw.buffer.atomic.cmpswap` attr-dict `(` operands `)` `:` type($res) `,` type($rsrc)
Operands: ¶
| Operand | Description |
|---|---|
src | LLVM dialect-compatible type |
cmp | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.raw.buffer.atomic.fadd (ROCDL::RawBufferAtomicFAddOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
rocdl.raw.buffer.atomic.fmax (ROCDL::RawBufferAtomicFMaxOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
rocdl.raw.buffer.atomic.smax (ROCDL::RawBufferAtomicSMaxOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
rocdl.raw.buffer.atomic.umin (ROCDL::RawBufferAtomicUMinOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
rocdl.raw.buffer.load (ROCDL::RawBufferLoadOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.raw.buffer.store (ROCDL::RawBufferStoreOp) ¶
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM dialect-compatible type |
offset | LLVM dialect-compatible type |
soffset | LLVM dialect-compatible type |
aux | LLVM dialect-compatible type |
rocdl.raw.ptr.buffer.atomic.cmpswap (ROCDL::RawPtrBufferAtomicCmpSwap) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.atomic.cmpswap` operands attr-dict `:` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
src | LLVM dialect-compatible type |
cmp | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.raw.ptr.buffer.atomic.fadd (ROCDL::RawPtrBufferAtomicFaddOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.atomic.fadd` operands attr-dict `:` type($vdata)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.atomic.fmax (ROCDL::RawPtrBufferAtomicFmaxOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.atomic.fmax` operands attr-dict `:` type($vdata)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.atomic.smax (ROCDL::RawPtrBufferAtomicSmaxOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.atomic.smax` operands attr-dict `:` type($vdata)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.atomic.umin (ROCDL::RawPtrBufferAtomicUminOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.atomic.umin` operands attr-dict `:` type($vdata)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.load (ROCDL::RawPtrBufferLoadOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.load` operands attr-dict `:` type($res)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.raw.ptr.buffer.load.async.lds (ROCDL::RawPtrBufferLoadAsyncLdsOp) ¶
Async variant of raw.ptr.buffer.load.lds
Syntax:
operation ::= `rocdl.raw.ptr.buffer.load.async.lds` operands attr-dict
Load from a buffer resource rsrc to ldsPtr, which must be uniform.
See rocdl.load.async.to.lds for overall semantics of such loads, noting that
here voffset can be lane-varying and that rsrc (which holds the base addres)
must, as always, be uniform.
Available on gfx9 and gfx10.
Example:
// Async buffer load to LDS via buffer resource pointer.
rocdl.raw.ptr.buffer.load.async.lds %rsrc, %ldsPtr, %size, %voffset, %soffset, %offset, %aux
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
rsrc | LLVM pointer in address space 8 |
ldsPtr | LLVM pointer in address space 3 |
size | 32-bit signless integer |
voffset | 32-bit signless integer |
soffset | 32-bit signless integer |
offset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.load.lds (ROCDL::RawPtrBufferLoadLdsOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.load.lds` operands attr-dict
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
rsrc | LLVM pointer in address space 8 |
ldsPtr | LLVM pointer in address space 3 |
size | 32-bit signless integer |
voffset | 32-bit signless integer |
soffset | 32-bit signless integer |
offset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.raw.ptr.buffer.store (ROCDL::RawPtrBufferStoreOp) ¶
Syntax:
operation ::= `rocdl.raw.ptr.buffer.store` operands attr-dict `:` type($vdata)
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
vdata | LLVM dialect-compatible type |
rsrc | LLVM pointer in address space 8 |
offset | 32-bit signless integer |
soffset | 32-bit signless integer |
aux | 32-bit signless integer |
rocdl.rcp (ROCDL::ROCDLRcp) ¶
Syntax:
operation ::= `rocdl.rcp` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.rcp %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.readfirstlane (ROCDL::ReadfirstlaneOp) ¶
Get the value in first active lane.
Syntax:
operation ::= `rocdl.readfirstlane` $src attr-dict `:` type($res)
Returns the value in the lowest active lane of the input operand.
Example:
// Scalar readfirstlane.
%0 = rocdl.readfirstlane %src0 : f32
// Vector readfirstlane.
%1 = rocdl.readfirstlane %src1 : vector<2xf32>
Operands: ¶
| Operand | Description |
|---|---|
src | LLVM dialect-compatible type |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.readlane (ROCDL::ReadlaneOp) ¶
Get the value in the specific lane.
Syntax:
operation ::= `rocdl.readlane` $src0 `,` $src1 attr-dict `:` `(` type($src0) `,` type($src1) `)` `->` type($res)
Get the value in lane src1 from input src0.
Example:
// Scalar readlane.
%0 = rocdl.readlane %src0, %idx : (f32, i32) -> f32
// Vector readlane.
%1 = rocdl.readlane %src1, %idx : (vector<2xf32>, i32) -> vector<2xf32>
Operands: ¶
| Operand | Description |
|---|---|
src0 | LLVM dialect-compatible type |
src1 | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.rsq (ROCDL::ROCDLRsq) ¶
Syntax:
operation ::= `rocdl.rsq` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.rsq %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.s.barrier (ROCDL::SBarrierOp) ¶
Syntax:
operation ::= `rocdl.s.barrier` attr-dict
Insert a workgroup barrier without memory fences.
Available on gfx9 and later but deprecated on gfx12+; see
rocdl.s.barrier.signal and rocdl.s.barrier.wait instead.
Example:
// Synchronize threads within a workgroup.
rocdl.s.barrier
rocdl.s.barrier.init (ROCDL::BarrierInitOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.init` $ptr `member_cnt` `=` $memberCnt attr-dict `:` qualified(type($ptr))
Available on gfx1250+.
Example:
// Initialize a named barrier with member count.
rocdl.s.barrier.init %ptr member_cnt = 1 : !llvm.ptr<3>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
memberCnt | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
rocdl.s.barrier.join (ROCDL::BarrierJoinOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.join` $ptr attr-dict `:` qualified(type($ptr))
Available on gfx1250+.
Example:
// Join a named barrier.
rocdl.s.barrier.join %ptr : !llvm.ptr<3>
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
rocdl.s.barrier.leave (ROCDL::BarrierLeaveOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.leave` `id` `=` $id attr-dict
Available on gfx1250+.
Example:
// Leave a named barrier by id.
rocdl.s.barrier.leave id = 1
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
id | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.barrier.signal (ROCDL::BarrierSignalOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.signal` `id` `=` $id attr-dict
Signal a barrier by id. Available on gfx1250+.
Example:
// Signal barrier with id -1 (all barriers).
rocdl.s.barrier.signal id = -1
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.s.barrier.signal.isfirst (ROCDL::BarrierSignalIsfirstOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.signal.isfirst` `id` `=` $id attr-dict `->` type($res)
Available on gfx1200+.
Example:
// Signal barrier and check if this wave is first to arrive.
%0 = rocdl.s.barrier.signal.isfirst id = 1 -> i1
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results: ¶
| Result | Description |
|---|---|
res | 1-bit signless integer |
rocdl.s.barrier.signal.var (ROCDL::BarrierSignalVarOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.signal.var` $ptr `member_cnt` `=` $memberCnt attr-dict `:` qualified(type($ptr))
Available on gfx1250+.
Example:
// Signal a named barrier with variable ID.
rocdl.s.barrier.signal.var %ptr member_cnt = 1 : !llvm.ptr<3>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
memberCnt | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
rocdl.s.barrier.wait (ROCDL::BarrierWaitOp) ¶
Syntax:
operation ::= `rocdl.s.barrier.wait` `id` `=` $id attr-dict
Wait on a barrier by id. Available on gfx1200+.
Example:
// Wait on barrier with id -1 (all barriers).
rocdl.s.barrier.wait id = -1
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
id | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.get.barrier.state (ROCDL::GetBarrierStateOp) ¶
Syntax:
operation ::= `rocdl.s.get.barrier.state` `id` `=` $id attr-dict `->` type($res)
Available on gfx1200+.
Example:
// Query barrier state by id.
%0 = rocdl.s.get.barrier.state id = 1 -> i32
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
id | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.s.get.named.barrier.state (ROCDL::GetNamedBarrierStateOp) ¶
Syntax:
operation ::= `rocdl.s.get.named.barrier.state` $ptr attr-dict `:` qualified(type($ptr)) `->` type($res)
Available on gfx1250+.
Example:
// Query named barrier state by pointer.
%0 = rocdl.s.get.named.barrier.state %ptr : !llvm.ptr<3> -> i32
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit signless integer |
rocdl.s.nop (ROCDL::SNopOp) ¶
Syntax:
operation ::= `rocdl.s.nop` attr-dict $count
Insert a number of NOP cycles.
Example:
// Insert a no-op.
rocdl.s.nop 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.setprio (ROCDL::SetPrioOp) ¶
Syntax:
operation ::= `rocdl.s.setprio` $priority attr-dict
Set the wavefront scheduling priority.
Example:
// Set priority to 0.
rocdl.s.setprio 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
priority | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.sleep (ROCDL::SSleepOp) ¶
Syntax:
operation ::= `rocdl.s.sleep` attr-dict $count
Sleep for a number of clock cycles.
Example:
// Sleep for a minimum duration.
rocdl.s.sleep 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.s.wait.asynccnt (ROCDL::WaitAsynccntOp) ¶
Wait until ASYNCCNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.asynccnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx1250+.
Example:
// Wait for async counter to drain.
rocdl.s.wait.asynccnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.wait.dscnt (ROCDL::WaitDscntOp) ¶
Wait until DSCNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.dscnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx12+.
Example:
// Wait for data-sharing counter to drain.
rocdl.s.wait.dscnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.wait.expcnt (ROCDL::WaitExpcntOp) ¶
Wait until EXPCNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.expcnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx12+.
Example:
// Wait for export counter to drain.
rocdl.s.wait.expcnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.wait.loadcnt (ROCDL::WaitLoadcntOp) ¶
Wait until LOADCNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.loadcnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx12+.
Example:
// Wait for load counter to drain.
rocdl.s.wait.loadcnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.wait.storecnt (ROCDL::WaitStorecntOp) ¶
Wait until STORECNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.storecnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx12+.
Example:
// Wait for store counter to drain.
rocdl.s.wait.storecnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.wait.tensorcnt (ROCDL::WaitTensorcntOp) ¶
Wait until TENSORCNT is less than or equal to count
Syntax:
operation ::= `rocdl.s.wait.tensorcnt` $count attr-dict
Wait for the counter specified to be less-than or equal-to the count
before continuing.
Available on gfx1250+.
Example:
// Wait for tensor counter to drain.
rocdl.s.wait.tensorcnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.s.waitcnt (ROCDL::SWaitcntOp) ¶
Syntax:
operation ::= `rocdl.s.waitcnt` attr-dict $bitfield
Wait for outstanding memory operations to complete, as specified by a bitfield whose semantics depend on the target chipset.
Example:
// Wait for all counters to reach zero.
rocdl.s.waitcnt 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
bitfield | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.s.wakeup.barrier (ROCDL::WakeupBarrierOp) ¶
Syntax:
operation ::= `rocdl.s.wakeup.barrier` $ptr attr-dict `:` qualified(type($ptr))
Wakes up waves associated with a given named barrier. Note, This op does not release waves waiting at the barrier. It just signal other waves in the same work-group waiting on the indicated named barrier to wake up. Available on gfx1250+.
Example:
// Wake up waves waiting on a named barrier.
rocdl.s.wakeup.barrier %ptr : !llvm.ptr<3>
Operands: ¶
| Operand | Description |
|---|---|
ptr | LLVM pointer in address space 3 |
rocdl.sched.barrier (ROCDL::SchedBarrier) ¶
Syntax:
operation ::= `rocdl.sched.barrier` $mask attr-dict
Insert a scheduling barrier with the given mask. The mask is a
bitfield that controls which instruction types may be scheduled
across the barrier (e.g. 0x0000 = no instructions may cross,
0x0001 = ALU only, 0x0010 = all VMEM, etc.). See
https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L349
for the full list of mask values.
Example:
// Scheduling barrier with mask 0.
rocdl.sched.barrier 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
mask | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.sched.group.barrier (ROCDL::SchedGroupBarrier) ¶
Syntax:
operation ::= `rocdl.sched.group.barrier` $mask `,` $size `,` $groupId attr-dict
Insert a scheduling group barrier.
Example:
// Schedule group barrier with mask, size, and group id.
rocdl.sched.group.barrier 8, 1, 0
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
mask | ::mlir::IntegerAttr | 32-bit signless integer attribute |
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
groupId | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rocdl.sin (ROCDL::ROCDLSin) ¶
Syntax:
operation ::= `rocdl.sin` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.sin %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.smfmac.f32.16x16x128.bf8.bf8 (ROCDL::smfmac_f32_16x16x128_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x128.bf8.fp8 (ROCDL::smfmac_f32_16x16x128_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x128.fp8.bf8 (ROCDL::smfmac_f32_16x16x128_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x128.fp8.fp8 (ROCDL::smfmac_f32_16x16x128_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x32.bf16 (ROCDL::smfmac_f32_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x32.f16 (ROCDL::smfmac_f32_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x32.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 8 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.bf16 (ROCDL::smfmac_f32_16x16x64_bf16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of bfloat16 type values of length 8 |
b | fixed-length vector of bfloat16 type values of length 16 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.bf8.bf8 (ROCDL::smfmac_f32_16x16x64_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.bf8.fp8 (ROCDL::smfmac_f32_16x16x64_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.f16 (ROCDL::smfmac_f32_16x16x64_f16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 8 |
b | fixed-length vector of 16-bit float values of length 16 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.fp8.bf8 (ROCDL::smfmac_f32_16x16x64_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.16x16x64.fp8.fp8 (ROCDL::smfmac_f32_16x16x64_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.16x16x64.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 4 |
rocdl.smfmac.f32.32x32x16.bf16 (ROCDL::smfmac_f32_32x32x16_bf16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x16.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit signless integer values of length 4 |
b | fixed-length vector of 16-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x16.f16 (ROCDL::smfmac_f32_32x32x16_f16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x16.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 4 |
b | fixed-length vector of 16-bit float values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.bf16 (ROCDL::smfmac_f32_32x32x32_bf16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.bf16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of bfloat16 type values of length 8 |
b | fixed-length vector of bfloat16 type values of length 16 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.bf8.bf8 (ROCDL::smfmac_f32_32x32x32_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.bf8.fp8 (ROCDL::smfmac_f32_32x32x32_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.f16 (ROCDL::smfmac_f32_32x32x32_f16) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.f16` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 16-bit float values of length 8 |
b | fixed-length vector of 16-bit float values of length 16 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.fp8.bf8 (ROCDL::smfmac_f32_32x32x32_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x32.fp8.fp8 (ROCDL::smfmac_f32_32x32x32_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x32.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x64.bf8.bf8 (ROCDL::smfmac_f32_32x32x64_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x64.bf8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x64.bf8.fp8 (ROCDL::smfmac_f32_32x32x64_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x64.bf8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x64.fp8.bf8 (ROCDL::smfmac_f32_32x32x64_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x64.fp8.bf8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.f32.32x32x64.fp8.fp8 (ROCDL::smfmac_f32_32x32x64_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.smfmac.f32.32x32x64.fp8.fp8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit float values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit float values of length 16 |
rocdl.smfmac.i32.16x16x128.i8 (ROCDL::smfmac_i32_16x16x128_i8) ¶
Syntax:
operation ::= `rocdl.smfmac.i32.16x16x128.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit signless integer values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.smfmac.i32.16x16x64.i8 (ROCDL::smfmac_i32_16x16x64_i8) ¶
Syntax:
operation ::= `rocdl.smfmac.i32.16x16x64.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit signless integer values of length 4 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 4 |
rocdl.smfmac.i32.32x32x32.i8 (ROCDL::smfmac_i32_32x32x32_i8) ¶
Syntax:
operation ::= `rocdl.smfmac.i32.32x32x32.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 2 |
b | fixed-length vector of 32-bit signless integer values of length 4 |
c | fixed-length vector of 32-bit signless integer values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.smfmac.i32.32x32x64.i8 (ROCDL::smfmac_i32_32x32x64_i8) ¶
Syntax:
operation ::= `rocdl.smfmac.i32.32x32x64.i8` $a `,` $b `,` $c `,` $index `,` $cbsz `,` $abid attr-dict `:` functional-type(operands, $res)
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4
structured sparsity. The index operand provides the sparsity metadata,
and cbsz/abid control broadcast modes.
Example:
// SMFMAC with f16 inputs.
%r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 :
(vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with bf16 inputs.
%r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 :
(vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32>
// SMFMAC with i8 inputs and i32 accumulator.
%r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32>
// SMFMAC with fp8 inputs.
%r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 :
(vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cbsz | ::mlir::IntegerAttr | 32-bit signless integer attribute |
abid | ::mlir::IntegerAttr | 32-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | fixed-length vector of 32-bit signless integer values of length 4 |
b | fixed-length vector of 32-bit signless integer values of length 8 |
c | fixed-length vector of 32-bit signless integer values of length 16 |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | fixed-length vector of 32-bit signless integer values of length 16 |
rocdl.sqrt (ROCDL::ROCDLSqrt) ¶
Syntax:
operation ::= `rocdl.sqrt` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.sqrt %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.swmmac.bf16.16x16x32.bf16 (ROCDL::swmmac_bf16_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.swmmac.bf16.16x16x32.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of integer |
b | LLVM dialect-compatible vector of integer |
c | LLVM dialect-compatible vector of integer |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible vector of integer |
rocdl.swmmac.bf16.16x16x64.bf16 (ROCDL::swmmac_bf16_16x16x64_bf16) ¶
Syntax:
operation ::= `rocdl.swmmac.bf16.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
rocdl.swmmac.bf16f32.16x16x64.bf16 (ROCDL::swmmac_bf16f32_16x16x64_bf16) ¶
Syntax:
operation ::= `rocdl.swmmac.bf16f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
rocdl.swmmac.f16.16x16x128.bf8.bf8 (ROCDL::swmmac_f16_16x16x128_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f16.16x16x128.bf8.fp8 (ROCDL::swmmac_f16_16x16x128_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f16.16x16x128.fp8.bf8 (ROCDL::swmmac_f16_16x16x128_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f16.16x16x128.fp8.fp8 (ROCDL::swmmac_f16_16x16x128_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f16.16x16x32.f16 (ROCDL::swmmac_f16_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x32.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of 16-bit float |
b | LLVM dialect-compatible vector of 16-bit float |
c | LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f16.16x16x64.f16 (ROCDL::swmmac_f16_16x16x64_f16) ¶
Syntax:
operation ::= `rocdl.swmmac.f16.16x16x64.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.swmmac.f32.16x16x128.bf8.bf8 (ROCDL::swmmac_f32_16x16x128_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x128.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x128.bf8.fp8 (ROCDL::swmmac_f32_16x16x128_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x128.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x128.fp8.bf8 (ROCDL::swmmac_f32_16x16x128_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x128.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x128.fp8.fp8 (ROCDL::swmmac_f32_16x16x128_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x128.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.bf16 (ROCDL::swmmac_f32_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of integer |
b | LLVM dialect-compatible vector of integer |
c | LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.bf8.bf8 (ROCDL::swmmac_f32_16x16x32_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.bf8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.bf8.fp8 (ROCDL::swmmac_f32_16x16x32_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.bf8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.f16 (ROCDL::swmmac_f32_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | LLVM dialect-compatible vector of 16-bit float |
b | LLVM dialect-compatible vector of 16-bit float |
c | LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.fp8.bf8 (ROCDL::swmmac_f32_16x16x32_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.fp8.bf8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x32.fp8.fp8 (ROCDL::swmmac_f32_16x16x32_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x32.fp8.fp8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x64.bf16 (ROCDL::swmmac_f32_16x16x64_bf16) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x64.bf16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.f32.16x16x64.f16 (ROCDL::swmmac_f32_16x16x64_f16) ¶
Syntax:
operation ::= `rocdl.swmmac.f32.16x16x64.f16` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.swmmac.i32.16x16x128.iu8 (ROCDL::swmmac_i32_16x16x128_iu8) ¶
Syntax:
operation ::= `rocdl.swmmac.i32.16x16x128.iu8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.swmmac.i32.16x16x32.iu4 (ROCDL::swmmac_i32_16x16x32_iu4) ¶
Syntax:
operation ::= `rocdl.swmmac.i32.16x16x32.iu4` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.swmmac.i32.16x16x32.iu8 (ROCDL::swmmac_i32_16x16x32_iu8) ¶
Syntax:
operation ::= `rocdl.swmmac.i32.16x16x32.iu8` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.swmmac.i32.16x16x64.iu4 (ROCDL::swmmac_i32_16x16x64_iu4) ¶
Syntax:
operation ::= `rocdl.swmmac.i32.16x16x64.iu4` $a `,` $b `,` $c `,` $index attr-dict `:` functional-type(operands, $res)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
index | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.tanh (ROCDL::ROCDLTanh) ¶
Syntax:
operation ::= `rocdl.tanh` $arg qualified(type($arg)) attr-dict `->` qualified(type($res))
Note: In the general case, prefer the conventional arith, math, or llvm ops over this.
Use this ROCDL-specific operation only when you fully understand its implication and
when it is strictly necessary. This op is usually chosen when a small loss in precision is
acceptable in exchange for higher execution speed.
Example:
%0 = rocdl.tanh %a f32 -> f32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
| Operand | Description |
|---|---|
arg | floating point LLVM type |
Results: ¶
| Result | Description |
|---|---|
res | floating point LLVM type |
rocdl.tensor.load.to.lds (ROCDL::TensorLoadToLDSOp) ¶
Base class for ROCDL tensor load/store to/from LDS.
Syntax:
operation ::= `rocdl.tensor.load.to.lds` attr-dict operands `cachepolicy` $cachePolicy `:` type($dgroup0) `,` type($dgroup1)
Moves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.
This op is for gfx1250+ architectures.
Example:
// Tensor load from global memory to LDS using 4 descriptor groups.
rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
// Tensor store from LDS to global memory using 4 descriptor groups.
rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cachePolicy | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
dgroup0 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup1 | fixed-length vector of 32-bit signless integer values of length 8 |
dgroup2 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup3 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup4 | fixed-length vector of 32-bit signless integer values of length 8 |
rocdl.tensor.store.from.lds (ROCDL::TensorStoreFromLDSOp) ¶
Base class for ROCDL tensor load/store to/from LDS.
Syntax:
operation ::= `rocdl.tensor.store.from.lds` attr-dict operands `cachepolicy` $cachePolicy `:` type($dgroup0) `,` type($dgroup1)
Moves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.
This op is for gfx1250+ architectures.
Example:
// Tensor load from global memory to LDS using 4 descriptor groups.
rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
// Tensor store from LDS to global memory using 4 descriptor groups.
rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
Interfaces: AliasAnalysisOpInterface
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
cachePolicy | ::mlir::IntegerAttr | 32-bit signless integer attribute |
alias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
noalias_scopes | ::mlir::ArrayAttr | LLVM dialect alias scope array |
tbaa | ::mlir::ArrayAttr | LLVM dialect TBAA tag metadata array |
Operands: ¶
| Operand | Description |
|---|---|
dgroup0 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup1 | fixed-length vector of 32-bit signless integer values of length 8 |
dgroup2 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup3 | fixed-length vector of 32-bit signless integer values of length 4 |
dgroup4 | fixed-length vector of 32-bit signless integer values of length 8 |
rocdl.update.dpp (ROCDL::DPPUpdateOp) ¶
Syntax:
operation ::= `rocdl.update.dpp` attr-dict $old `,` $src `with` $dppCtrl `,` $rowMask `,` $bankMask `,` $boundCtrl `:` type($src)
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
dppCtrl | ::mlir::IntegerAttr | 32-bit signless integer attribute |
rowMask | ::mlir::IntegerAttr | 32-bit signless integer attribute |
bankMask | ::mlir::IntegerAttr | 32-bit signless integer attribute |
boundCtrl | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
old | LLVM dialect-compatible type |
src | LLVM dialect-compatible type |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.wait.asyncmark (ROCDL::WaitAsyncmarkOp) ¶
Wait until N or fewer async operation groups are unexecuted
Syntax:
operation ::= `rocdl.wait.asyncmark` $count attr-dict
This operation, along with rocdl.asyncmark, forms the compiler-provided
framework for explicitly tracking asynchronous operations.
At the point where a wait.asyncmark operation is executed, all async operations
that were parts of any async group (established by asyncmark in program order)
other than the count previously-added ones will have finished executing.
For more detail, including on how this mechanism composes with function calls, see the LLVM documentation on async tracking.
Available on gfx9 and later.
Example:
// Wait until at most N async groups remain outstanding.
rocdl.wait.asyncmark 1
Usage example:
rocdl.tensor.load.to.lds ...
rocdl.global.async.load.to.lds ...
rocdl.asyncmark
rocdl.tensor.load.to.lds ...
rocdl.global.async.load.to.lds ...
rocdl.asyncmark
rocdl.wait.asyncmark 1 // First group of loads completes after this
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
count | ::mlir::IntegerAttr | 16-bit signless integer attribute |
rocdl.wave.id (ROCDL::WaveId) ¶
Syntax:
operation ::= `rocdl.wave.id` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.wavefrontsize (ROCDL::WavefrontSizeOp) ¶
Syntax:
operation ::= `rocdl.wavefrontsize` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.wmma.bf16.16x16x16.bf16 (ROCDL::wmma_bf16_16x16x16_bf16) ¶
Syntax:
operation ::= `rocdl.wmma.bf16.16x16x16.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with output operand selection.
Example:
// WMMA f16 with opsel control.
%r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} :
(vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
opsel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.wmma.bf16.16x16x32.bf16 (ROCDL::wmma_bf16_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.wmma.bf16.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
Results: ¶
| Result | Description |
|---|---|
res | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
rocdl.wmma.bf16f32.16x16x32.bf16 (ROCDL::wmma_bf16f32_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.wmma.bf16f32.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with different C and D types.
Example:
// WMMA bf16 output from f32 accumulator with bf16 inputs.
%r = rocdl.wmma.bf16f32.16x16x32.bf16 %a, %b, %c :
(vector<16xbf16>, vector<16xbf16>, vector<8xf32>) -> vector<16xbf16>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
rocdl.wmma.f16.16x16x128.bf8_bf8 (ROCDL::wmma_f16_16x16x128_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x128.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x128.bf8_fp8 (ROCDL::wmma_f16_16x16x128_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x128.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x128.fp8_bf8 (ROCDL::wmma_f16_16x16x128_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x128.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x128.fp8_fp8 (ROCDL::wmma_f16_16x16x128_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x128.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x16.f16 (ROCDL::wmma_f16_16x16x16_f16) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x16.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with output operand selection.
Example:
// WMMA f16 with opsel control.
%r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} :
(vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
opsel | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x32.f16 (ROCDL::wmma_f16_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x32.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x64.bf8_bf8 (ROCDL::wmma_f16_16x16x64_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x64.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x64.bf8_fp8 (ROCDL::wmma_f16_16x16x64_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x64.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x64.fp8_bf8 (ROCDL::wmma_f16_16x16x64_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x64.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f16.16x16x64.fp8_fp8 (ROCDL::wmma_f16_16x16x64_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f16.16x16x64.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
rocdl.wmma.f32.16x16x128.bf8_bf8 (ROCDL::wmma_f32_16x16x128_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x128.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x128.bf8_fp8 (ROCDL::wmma_f32_16x16x128_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x128.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x128.fp8_bf8 (ROCDL::wmma_f32_16x16x128_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x128.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x128.fp8_fp8 (ROCDL::wmma_f32_16x16x128_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x128.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.bf16 (ROCDL::wmma_f32_16x16x16_bf16) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.bf8_bf8 (ROCDL::wmma_f32_16x16x16_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.bf8_fp8 (ROCDL::wmma_f32_16x16x16_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.f16 (ROCDL::wmma_f32_16x16x16_f16) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.fp8_bf8 (ROCDL::wmma_f32_16x16x16_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x16.fp8_fp8 (ROCDL::wmma_f32_16x16x16_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x16.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator.
%r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x32.bf16 (ROCDL::wmma_f32_16x16x32_bf16) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x32.bf16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
b | bfloat16 type or LLVM dialect-compatible vector of bfloat16 type |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x32.f16 (ROCDL::wmma_f32_16x16x32_f16) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x32.f16` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
b | 16-bit float or LLVM dialect-compatible vector of 16-bit float |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x4.f32 (ROCDL::wmma_f32_16x16x4_f32) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x4.f32` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls.
%r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c :
(vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
b | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x64.bf8_bf8 (ROCDL::wmma_f32_16x16x64_bf8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x64.bf8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x64.bf8_fp8 (ROCDL::wmma_f32_16x16x64_bf8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x64.bf8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x64.fp8_bf8 (ROCDL::wmma_f32_16x16x64_fp8_bf8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x64.fp8_bf8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.f32.16x16x64.fp8_fp8 (ROCDL::wmma_f32_16x16x64_fp8_fp8) ¶
Syntax:
operation ::= `rocdl.wmma.f32.16x16x64.fp8_fp8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls.
%r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c :
(vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.i32.16x16x16.iu4 (ROCDL::wmma_i32_16x16x16_iu4) ¶
Syntax:
operation ::= `rocdl.wmma.i32.16x16x16.iu4` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
{signA = false, signB = false, clamp = false} :
(vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.wmma.i32.16x16x16.iu8 (ROCDL::wmma_i32_16x16x16_iu8) ¶
Syntax:
operation ::= `rocdl.wmma.i32.16x16x16.iu8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
{signA = false, signB = false, clamp = false} :
(vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.wmma.i32.16x16x32.iu4 (ROCDL::wmma_i32_16x16x32_iu4) ¶
Syntax:
operation ::= `rocdl.wmma.i32.16x16x32.iu4` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs.
%r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c
{signA = false, signB = false, clamp = false} :
(vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.wmma.i32.16x16x64.iu8 (ROCDL::wmma_i32_16x16x64_iu8) ¶
Syntax:
operation ::= `rocdl.wmma.i32.16x16x64.iu8` $a `,` $b `,` $c attr-dict `:` functional-type(operands, $res)
Wave Matrix Multiply-Accumulate (WMMA) for integer types with sign, reuse, and clamp controls.
Example:
// WMMA i32 with unsigned i8 inputs and reuse controls.
%r = rocdl.wmma.i32.16x16x64.iu8 %a, %b, %c
{signA = false, signB = false, reuseA = false, reuseB = false, clamp = false} :
(vector<8xi32>, vector<8xi32>, vector<8xi32>) -> vector<8xi32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
signA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
signB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
clamp | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | integer or LLVM dialect-compatible vector of integer |
Results: ¶
| Result | Description |
|---|---|
res | integer or LLVM dialect-compatible vector of integer |
rocdl.wmma.scale.f32.16x16x128.f8f6f4 (ROCDL::wmma_scale_f32_16x16x128_f8f6f4) ¶
Syntax:
operation ::= `rocdl.wmma.scale.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.
Example:
// Scaled WMMA with f8f6f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB :
(vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
fmtA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
scaleAType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
scaleBType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
scaleA | 32-bit signless integer |
scaleB | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.scale.f32.32x16x128.f4 (ROCDL::wmma_scale_f32_32x16x128_f4) ¶
Syntax:
operation ::= `rocdl.wmma.scale.f32.32x16x128.f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.
Example:
// Scaled WMMA with f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
scaleAType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
scaleBType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
scaleA | 32-bit signless integer |
scaleB | 32-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.scale16.f32.16x16x128.f8f6f4 (ROCDL::wmma_scale16_f32_16x16x128_f8f6f4) ¶
Syntax:
operation ::= `rocdl.wmma.scale16.f32.16x16x128.f8f6f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.
Example:
// Scaled WMMA with f8f6f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB :
(vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
fmtA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
scaleAType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
scaleBType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
scaleA | 64-bit signless integer |
scaleB | 64-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.wmma.scale16.f32.32x16x128.f4 (ROCDL::wmma_scale16_f32_32x16x128_f4) ¶
Syntax:
operation ::= `rocdl.wmma.scale16.f32.32x16x128.f4` $a `,` $b `,` $c `,` $scaleA `,` $scaleB attr-dict `:` functional-type(operands, $res)
Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.
Example:
// Scaled WMMA with f4 format inputs.
%r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB :
(vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
modC | ::mlir::IntegerAttr | 16-bit signless integer attribute |
scaleAType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleA | ::mlir::IntegerAttr | 32-bit signless integer attribute |
scaleBType | ::mlir::IntegerAttr | 32-bit signless integer attribute |
fmtScaleB | ::mlir::IntegerAttr | 32-bit signless integer attribute |
reuseA | ::mlir::IntegerAttr | 1-bit signless integer attribute |
reuseB | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
| Operand | Description |
|---|---|
a | integer or LLVM dialect-compatible vector of integer |
b | integer or LLVM dialect-compatible vector of integer |
c | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
scaleA | 64-bit signless integer |
scaleB | 64-bit signless integer |
Results: ¶
| Result | Description |
|---|---|
res | 32-bit float or LLVM dialect-compatible vector of 32-bit float |
rocdl.workgroup.dim.x (ROCDL::BlockDimXOp) ¶
Syntax:
operation ::= `rocdl.workgroup.dim.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workgroup.dim.y (ROCDL::BlockDimYOp) ¶
Syntax:
operation ::= `rocdl.workgroup.dim.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workgroup.dim.z (ROCDL::BlockDimZOp) ¶
Syntax:
operation ::= `rocdl.workgroup.dim.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workgroup.id.x (ROCDL::BlockIdXOp) ¶
Syntax:
operation ::= `rocdl.workgroup.id.x` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workgroup.id.y (ROCDL::BlockIdYOp) ¶
Syntax:
operation ::= `rocdl.workgroup.id.y` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workgroup.id.z (ROCDL::BlockIdZOp) ¶
Syntax:
operation ::= `rocdl.workgroup.id.z` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workitem.id.x (ROCDL::ThreadIdXOp) ¶
Syntax:
operation ::= `rocdl.workitem.id.x` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workitem.id.y (ROCDL::ThreadIdYOp) ¶
Syntax:
operation ::= `rocdl.workitem.id.y` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
rocdl.workitem.id.z (ROCDL::ThreadIdZOp) ¶
Syntax:
operation ::= `rocdl.workitem.id.z` (`range` $range^)? attr-dict `:` type($res)
Read a hardware register for thread/workgroup/cluster identification.
An optional range attribute can constrain the returned value.
Example:
// Read the workitem id in the x dimension.
%0 = rocdl.workitem.id.x : i32
// Read with a known range constraint.
%1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
| Attribute | MLIR Type | Description |
|---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange |
Results: ¶
| Result | Description |
|---|---|
res | LLVM dialect-compatible type |
Attributes ¶
ROCDLTargetAttr ¶
Syntax:
#rocdl.target<
int, # O
::llvm::StringRef, # triple
::llvm::StringRef, # chip
::llvm::StringRef, # features
::llvm::StringRef, # abi
DictionaryAttr, # flags
ArrayAttr # link
>
ROCDL target attribute for controlling compilation of AMDGPU targets. All parameters decay into default values if not present.
Examples:
- Target with default values.
gpu.module @mymodule [#rocdl.target] attributes {...} {
...
}
- Target with
gfx90achip and fast math.
gpu.module @mymodule [#rocdl.target<chip = "gfx90a", flags = {fast, no_wave64}>] {
...
}
Parameters: ¶
| Parameter | C++ type | Description |
|---|---|---|
| O | int | Optimization level to apply. |
| triple | ::llvm::StringRef | Target triple. |
| chip | ::llvm::StringRef | Target chip. |
| features | ::llvm::StringRef | Target chip features. |
| abi | ::llvm::StringRef | ABI version. |
| flags | DictionaryAttr | Target specific flags. |
| link | ArrayAttr | Files to link to the LLVM module. |
MLIR