MLIR

Multi-Level IR Compiler Framework

'nvvm' Dialect

Operations 

source

nvvm.bar.warp.sync (NVVM::SyncWarpOp) 

Syntax:

operation ::= `nvvm.bar.warp.sync` $mask attr-dict `:` type($mask)

Operands: 

OperandDescription
maskLLVM dialect-compatible type

nvvm.barrier (NVVM::BarrierOp) 

Syntax:

operation ::= `nvvm.barrier` (`id` `=` $barrierId^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict

Traits: AttrSizedOperandSegments

Operands: 

OperandDescription
barrierId32-bit signless integer
numberOfThreads32-bit signless integer

nvvm.barrier.arrive (NVVM::BarrierArriveOp) 

Syntax:

operation ::= `nvvm.barrier.arrive` (`id` `=` $barrierId^)? `number_of_threads` `=` $numberOfThreads attr-dict

Thread that executes this op announces their arrival at the barrier with given id and continue their execution.

The default barrier id is 0 that is similar to nvvm.barrier Op. When barrierId is not present, the default barrier id is used.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
barrierId32-bit signless integer
numberOfThreads32-bit signless integer

nvvm.barrier0 (NVVM::Barrier0Op) 

Syntax:

operation ::= `nvvm.barrier0` attr-dict

nvvm.cluster.arrive (NVVM::ClusterArriveOp) 

Cluster Barrier Arrive Op

Syntax:

operation ::= `nvvm.cluster.arrive` attr-dict

The cluster.arrive can be used by the threads within the cluster for synchronization and communication. The cluster.arrive instruction marks the warps’ arrival at the barrier without causing the executing thread to wait for other participating threads.

The aligned attribute, when provided, generates the .aligned version of the PTX instruction.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: 

AttributeMLIR TypeDescription
aligned::mlir::UnitAttrunit attribute

nvvm.cluster.arrive.relaxed (NVVM::ClusterArriveRelaxedOp) 

Cluster Barrier Relaxed Arrive Op

Syntax:

operation ::= `nvvm.cluster.arrive.relaxed` attr-dict

The cluster.arrive can be used by the threads within the cluster for synchronization and communication. The cluster.arrive instruction marks the warps’ arrival at the barrier without causing the executing thread to wait for other participating threads.

The aligned attribute, when provided, generates the .aligned version of the PTX instruction. The .relaxed qualifier on cluster.arrive specifies that there are no memory ordering and visibility guarantees provided for the memory accesses performed prior to cluster.arrive.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: 

AttributeMLIR TypeDescription
aligned::mlir::UnitAttrunit attribute

nvvm.cluster.wait (NVVM::ClusterWaitOp) 

Cluster Barrier Wait Op

Syntax:

operation ::= `nvvm.cluster.wait` attr-dict

The cluster.wait causes the executing thread to wait for all non-exited threads of the cluster to perform cluster.arrive. The aligned attribute, when provided, generates the .aligned version of the PTX instruction.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: 

AttributeMLIR TypeDescription
aligned::mlir::UnitAttrunit attribute

nvvm.cp.async.bulk.commit.group (NVVM::CpAsyncBulkCommitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.bulk.commit.group` attr-dict

This Op commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group)

nvvm.cp.async.bulk.tensor.global.shared.cta (NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) 

Syntax:

operation ::= `nvvm.cp.async.bulk.tensor.global.shared.cta` $tmaDescriptor `,`
              $srcMem `,`
              `box` `[`$coordinates `]`
              (`,` `predicate` `=` $predicate^)?
              attr-dict  `:` type(operands)

Traits: AttrSizedOperandSegments

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
tmaDescriptorLLVM pointer type
srcMemLLVM pointer in address space 3
coordinatesvariadic of 32-bit signless integer
predicate1-bit signless integer

nvvm.cp.async.bulk.tensor.shared.cluster.global (NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) 

Syntax:

operation ::= `nvvm.cp.async.bulk.tensor.shared.cluster.global` $dstMem `,`
              $tmaDescriptor `,`
              $mbar `,`
              `box` `[`$coordinates `]`
              (`im2col` `[` $im2colOffsets^ `]` )?
              (`multicast_mask` `=` $multicastMask^ )?
              (`l2_cache_hint` `=` $l2CacheHint^ )?
              (`predicate` `=` $predicate^)?
              attr-dict  `:` type($dstMem) `,` type($tmaDescriptor)

Initiates an asynchronous copy operation on the tensor data from global memory to shared memory.

The Op operates has two load modes:

  1. Tiled Mode: It’s the default mode. The source multi-dimensional tensor layout is preserved at the destination.

  2. Im2col Mode: This mode is used when im2colOffsets operands are present. the elements in the Bounding Box of the source tensor are rearranged into columns at the destination. In this mode, the tensor has to be at least 3-dimensional.

The multicastMask operand is optional. When it is present, the Op copies data from global memory to shared memory of multiple CTAs in the cluster. Operand multicastMask specifies the destination CTAs in the cluster such that each bit position in the 16-bit multicastMask operand corresponds to the nvvm.read.ptx.sreg.ctaid of the destination CTA.

The l2CacheHint operand is optional, and it is used to specify cache eviction policy that may be used during the memory access.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor)

Traits: AttrSizedOperandSegments

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
dstMemLLVM pointer in address space 3
tmaDescriptorLLVM pointer type
coordinatesvariadic of 32-bit signless integer
mbarLLVM pointer in address space 3
im2colOffsetsvariadic of 16-bit signless integer
multicastMask16-bit signless integer
l2CacheHint64-bit signless integer
predicate1-bit signless integer

nvvm.cp.async.bulk.wait_group (NVVM::CpAsyncBulkWaitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.bulk.wait_group` $group attr-dict

Op waits for completion of the most recent bulk async-groups.

The $group operand tells waiting has to be done until for $group or fewer of the most recent bulk async-groups. If $group is 0, the op wait until all the most recent bulk async-groups have completed.

The $read indicates that the waiting has to be done until all the bulk async operations in the specified bulk async-group have completed reading from their source locations.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-wait-group)

Attributes: 

AttributeMLIR TypeDescription
group::mlir::IntegerAttr32-bit signless integer attribute whose minimum value is 0
read::mlir::UnitAttrunit attribute

nvvm.cp.async.commit.group (NVVM::CpAsyncCommitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.commit.group` attr-dict

nvvm.cp.async.mbarrier.arrive (NVVM::CpAsyncMBarrierArriveOp) 

NVVM Dialect Op for cp.async.mbarrier.arrive

Syntax:

operation ::= `nvvm.cp.async.mbarrier.arrive` $addr attr-dict `:` type(operands)

The cp.async.mbarrier.arrive Op makes the mbarrier object track all prior cp.async operations initiated by the executing thread. The addr operand specifies the address of the mbarrier object in generic address space. The noinc attr impacts how the mbarrier’s state is updated. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)

Attributes: 

AttributeMLIR TypeDescription
noinc::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
addrLLVM pointer type

nvvm.cp.async.mbarrier.arrive.shared (NVVM::CpAsyncMBarrierArriveSharedOp) 

NVVM Dialect Op for cp.async.mbarrier.arrive.shared

Syntax:

operation ::= `nvvm.cp.async.mbarrier.arrive.shared` $addr attr-dict `:` type(operands)

The cp.async.mbarrier.arrive.shared Op makes the mbarrier object track all prior cp.async operations initiated by the executing thread. The addr operand specifies the address of the mbarrier object in shared memory. The noinc attr impacts how the mbarrier’s state is updated. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)

Attributes: 

AttributeMLIR TypeDescription
noinc::mlir::IntegerAttr1-bit signless integer attribute

Operands: 

OperandDescription
addrLLVM pointer in address space 3

nvvm.cp.async.shared.global (NVVM::CpAsyncOp) 

Syntax:

operation ::= `nvvm.cp.async.shared.global` $dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
modifier::mlir::NVVM::LoadCacheModifierKindAttr
NVVM load cache modifier kind
Enum attribute of the different kinds of cache operators for load instructions.

For more information, see PTX ISA

Operands: 

OperandDescription
dstLLVM pointer in address space 3
srcLLVM pointer in address space 1
cpSizeLLVM dialect-compatible type

nvvm.cp.async.wait.group (NVVM::CpAsyncWaitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.wait.group` $n attr-dict

Attributes: 

AttributeMLIR TypeDescription
n::mlir::IntegerAttr32-bit signless integer attribute

nvvm.elect.sync (NVVM::ElectSyncOp) 

Syntax:

operation ::= `nvvm.elect.sync` attr-dict `->` type(results)

Interfaces: BasicPtxBuilderInterface

Results: 

ResultDescription
pred1-bit signless integer

nvvm.fence.mbarrier.init (NVVM::FenceMbarrierInitOp) 

Syntax:

operation ::= `nvvm.fence.mbarrier.init` attr-dict

Fence operation that applies on the prior nvvm.mbarrier.init [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)

Interfaces: BasicPtxBuilderInterface

nvvm.fence.proxy (NVVM::FenceProxyOp) 

Syntax:

operation ::= `nvvm.fence.proxy` attr-dict

Fence operation with proxy to establish an ordering between memory accesses that may happen through different proxies. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)

Interfaces: BasicPtxBuilderInterface

Attributes: 

AttributeMLIR TypeDescription
kind::mlir::NVVM::ProxyKindAttr
Proxy kind

Enum cases:

  • alias (alias)
  • async (async)
  • async.global (async_global)
  • async.shared (async_shared)
space::mlir::NVVM::SharedSpaceAttr
Shared memory space

Enum cases:

  • cta (shared_cta)
  • cluster (shared_cluster)

nvvm.fence.sc.cluster (NVVM::FenceScClusterOp) 

Syntax:

operation ::= `nvvm.fence.sc.cluster` attr-dict

nvvm.ldmatrix (NVVM::LdMatrixOp) 

Cooperative matrix load

Syntax:

operation ::= `nvvm.ldmatrix` $ptr attr-dict `:` functional-type($ptr, $res)

Attributes: 

AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)

Operands: 

OperandDescription
ptrLLVM pointer type

Results: 

ResultDescription
resany type

nvvm.mbarrier.arrive (NVVM::MBarrierArriveOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive` $addr attr-dict `:` type($addr) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer type

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.arrive.expect_tx (NVVM::MBarrierArriveExpectTxOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive.expect_tx` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer type
txcount32-bit signless integer
predicate1-bit signless integer

nvvm.mbarrier.arrive.expect_tx.shared (NVVM::MBarrierArriveExpectTxSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive.expect_tx.shared` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer in address space 3
txcount32-bit signless integer
predicate1-bit signless integer

nvvm.mbarrier.arrive.nocomplete (NVVM::MBarrierArriveNocompleteOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive.nocomplete` $addr `,` $count attr-dict `:` type(operands) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer type
count32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.arrive.nocomplete.shared (NVVM::MBarrierArriveNocompleteSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive.nocomplete.shared` $addr `,` $count attr-dict `:` type(operands) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer in address space 3
count32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.arrive.shared (NVVM::MBarrierArriveSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.arrive.shared` $addr attr-dict `:` qualified(type($addr)) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer in address space 3

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.init (NVVM::MBarrierInitOp) 

Syntax:

operation ::= `nvvm.mbarrier.init` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer type
count32-bit signless integer
predicate1-bit signless integer

nvvm.mbarrier.init.shared (NVVM::MBarrierInitSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.init.shared` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer in address space 3
count32-bit signless integer
predicate1-bit signless integer

nvvm.mbarrier.inval (NVVM::MBarrierInvalOp) 

Syntax:

operation ::= `nvvm.mbarrier.inval` $addr attr-dict `:` type(operands)

Operands: 

OperandDescription
addrLLVM pointer type

nvvm.mbarrier.inval.shared (NVVM::MBarrierInvalSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.inval.shared` $addr attr-dict `:` type(operands)

Operands: 

OperandDescription
addrLLVM pointer in address space 3

nvvm.mbarrier.test.wait (NVVM::MBarrierTestWaitOp) 

Syntax:

operation ::= `nvvm.mbarrier.test.wait` $addr `,` $state attr-dict `:` type(operands) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer type
stateLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.test.wait.shared (NVVM::MBarrierTestWaitSharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.test.wait.shared` $addr `,` $state attr-dict `:` type(operands) `->` type($res)

Operands: 

OperandDescription
addrLLVM pointer in address space 3
stateLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.mbarrier.try_wait.parity (NVVM::MBarrierTryWaitParityOp) 

Syntax:

operation ::= `nvvm.mbarrier.try_wait.parity` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer type
phase32-bit signless integer
ticks32-bit signless integer

nvvm.mbarrier.try_wait.parity.shared (NVVM::MBarrierTryWaitParitySharedOp) 

Syntax:

operation ::= `nvvm.mbarrier.try_wait.parity.shared` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
addrLLVM pointer in address space 3
phase32-bit signless integer
ticks32-bit signless integer

nvvm.mma.sync (NVVM::MmaOp) 

Cooperative matrix-multiply and accumulate

The nvvm.mma.sync operation collectively performs the operation D = matmul(A, B) + C using all threads in a warp.

All the threads in the warp must execute the same mma.sync operation.

For each possible multiplicand PTX data type, there are one or more possible instruction shapes given as “mMnNkK”. The below table describes the posssibilities as well as the types required for the operands. Note that the data type for C (the accumulator) and D (the result) can vary independently when there are multiple possibilities in the “C/D Type” column.

When an optional attribute cannot be immediately inferred from the types of the operands and the result during parsing or validation, an error will be raised.

b1Op is only relevant when the binary (b1) type is given to multiplicandDataType. It specifies how the multiply-and-acumulate is performed and is either xor_popc or and_poc. The default is xor_popc.

intOverflowBehavior is only relevant when the multiplicandType attribute is one of u8, s8, u4, s4, this attribute describes how overflow is handled in the accumulator. When the attribute is satfinite, the accumulator values are clamped in the int32 range on overflow. This is the default behavior. Alternatively, accumulator behavior wrapped can also be specified, in which case overflow wraps from one end of the range to the other.

layoutA and layoutB are required and should generally be set to #nvvm.mma_layout<row> and #nvvm.mma_layout<col> respectively, but other combinations are possible for certain layouts according to the table below.

| A/B Type | Shape     | ALayout | BLayout | A Type   | B Type   | C/D Type          |
|----------|-----------|---------|---------|----------|----------|-------------------|
| f64      | .m8n8k4   | row     | col     | 1x f64   | 1x f64   | 2x f64            |
| f16      | .m8n8k4   | row/col | row/col | 2x f16x2 | 2x f16x2 | 4x f16x2 or 8xf32 |
|          | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| bf16     | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| tf32     | .m16n8k4  | row     | col     | 2x i32   | 1x i32   | 4x f32            |
|          | .m16n8k8  | row     | col     | 4x i32   | 2x i32   | 2x f16x2 or 4 f32 |
| u8/s8    | .m8n8k16  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | .m16n8k16 | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | .m16n8k32 | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| u4/s4    | .m8n8k32  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k32  | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | m16n8k64  | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| b1       | m8n8k128  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k128 | row     | col     | 2x i32   | 1x i32   | 4x i32            |

Example:


%128 = nvvm.mma.sync A[%120, %121, %122, %123]
                     B[%124, %125]
                     C[%126, %127]
                     {layoutA = #nvvm.mma_layout<row>,
                      layoutB = #nvvm.mma_layout<col>,
                      shape = {k = 16 : i32, m = 16 : i32, n = 8 : i32}}
    : (vector<2xf16>, vector<2xf16>, vector<2xf16>)
       -> !llvm.struct<(vector<2xf16>, vector<2xf16>)>

Traits: AttrSizedOperandSegments

Attributes: 

AttributeMLIR TypeDescription
shape::mlir::NVVM::MMAShapeAttrAttribute for MMA operation shape.
b1Op::mlir::NVVM::MMAB1OpAttr
MMA binary operations

Enum cases:

  • none (none)
  • xor_popc (xor_popc)
  • and_popc (and_popc)
intOverflowBehavior::mlir::NVVM::MMAIntOverflowAttr
MMA overflow options

Enum cases:

  • satfinite (satfinite)
  • wrapped (wrapped)
layoutA::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
layoutB::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
multiplicandAPtxType::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)
multiplicandBPtxType::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)

Operands: 

OperandDescription
operandAvariadic of LLVM dialect-compatible type
operandBvariadic of LLVM dialect-compatible type
operandCvariadic of LLVM dialect-compatible type

Results: 

ResultDescription
resLLVM structure type

nvvm.prefetch.tensormap (NVVM::PrefetchTensorMapOp) 

Syntax:

operation ::= `nvvm.prefetch.tensormap` $tmaDescriptor (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: 

OperandDescription
tmaDescriptorLLVM pointer type
predicate1-bit signless integer

nvvm.rcp.approx.ftz.f (NVVM::RcpApproxFtzF32Op) 

Syntax:

operation ::= `nvvm.rcp.approx.ftz.f` $arg attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
arg32-bit float

Results: 

ResultDescription
res32-bit float

nvvm.read.ptx.sreg.clock (NVVM::ClockOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.clock` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.clock64 (NVVM::Clock64Op) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.clock64` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.ctaid.x (NVVM::BlockInClusterIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.ctaid.y (NVVM::BlockInClusterIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.ctaid.z (NVVM::BlockInClusterIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.ctarank (NVVM::ClusterId) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctarank` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.nctaid.x (NVVM::GridInClusterDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.nctaid.y (NVVM::GridInClusterDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.nctaid.z (NVVM::GridInClusterDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.cluster.nctarank (NVVM::ClusterDim) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctarank` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.clusterid.x (NVVM::ClusterIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.clusterid.y (NVVM::ClusterIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.clusterid.z (NVVM::ClusterIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.x (NVVM::BlockIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.y (NVVM::BlockIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.z (NVVM::BlockIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.laneid (NVVM::LaneIdOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.laneid` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nclusterid.x (NVVM::ClusterDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nclusterid.y (NVVM::ClusterDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nclusterid.z (NVVM::ClusterDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nctaid.x (NVVM::GridDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nctaid.y (NVVM::GridDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nctaid.z (NVVM::GridDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ntid.x (NVVM::BlockDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ntid.y (NVVM::BlockDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ntid.z (NVVM::BlockDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.x (NVVM::ThreadIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.y (NVVM::ThreadIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.z (NVVM::ThreadIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.warpsize (NVVM::WarpSizeOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.warpsize` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.redux.sync (NVVM::ReduxOp) 

Syntax:

operation ::= `nvvm.redux.sync` $kind $val `,` $mask_and_clamp  attr-dict `:` type($val) `->` type($res)

Attributes: 

AttributeMLIR TypeDescription
kind::mlir::NVVM::ReduxKindAttr
NVVM redux kind

Enum cases:

  • add (ADD)
  • and (AND)
  • max (MAX)
  • min (MIN)
  • or (OR)
  • umax (UMAX)
  • umin (UMIN)
  • xor (XOR)

Operands: 

OperandDescription
valLLVM dialect-compatible type
mask_and_clamp32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.setmaxregister (NVVM::SetMaxRegisterOp) 

Syntax:

operation ::= `nvvm.setmaxregister` $action $regCount attr-dict

Attributes: 

AttributeMLIR TypeDescription
regCount::mlir::IntegerAttr32-bit signless integer attribute
action::mlir::NVVM::SetMaxRegisterActionAttr
NVVM set max register action

Enum cases:

  • decrease (decrease)
  • increase (increase)

nvvm.shfl.sync (NVVM::ShflOp) 

NVVM Dialect Op for shfl.sync

Syntax:

operation ::= `nvvm.shfl.sync` $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp  attr-dict
              `:` type($val) `->` type($res)

The shfl.sync Op implements data shuffle within threads of a warp. The thread_mask denotes the threads participating in the Op where the bit position corresponds to a particular thread’s laneid. The offset specifies a source lane or source lane offset (depending on kind). The val is the input value to be copied from the source. The mask_and_clamp contains two packed values specifying a mask for logically splitting warps into sub-segments and an upper bound for clamping the source lane index. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)

Attributes: 

AttributeMLIR TypeDescription
kind::mlir::NVVM::ShflKindAttr
NVVM shuffle kind

Enum cases:

  • bfly (bfly)
  • up (up)
  • down (down)
  • idx (idx)
return_value_and_is_valid::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
thread_mask32-bit signless integer
valLLVM dialect-compatible type
offset32-bit signless integer
mask_and_clamp32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.stmatrix (NVVM::StMatrixOp) 

Cooperative matrix store

Syntax:

operation ::= `nvvm.stmatrix` $ptr `,` $sources attr-dict `:` type(operands)

Collectively store one or more matrices across all threads in a warp to the location indicated by the address operand $ptr in shared memory. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)

Interfaces: BasicPtxBuilderInterface

Attributes: 

AttributeMLIR TypeDescription
layout::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)

Operands: 

OperandDescription
ptrLLVM pointer in address space 3
sourcesvariadic of 32-bit signless integer

nvvm.vote.ballot.sync (NVVM::VoteBallotOp) 

Operands: 

OperandDescription
maskLLVM dialect-compatible type
predLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.wgmma.commit.group.sync.aligned (NVVM::WgmmaGroupSyncAlignedOp) 

Syntax:

operation ::= `nvvm.wgmma.commit.group.sync.aligned` attr-dict

Commits all prior uncommitted warpgroup level matrix multiplication operations.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

nvvm.wgmma.fence.aligned (NVVM::WgmmaFenceAlignedOp) 

Syntax:

operation ::= `nvvm.wgmma.fence.aligned` attr-dict

Enforce an ordering of register accesses between warpgroup level matrix multiplication and other operations.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

nvvm.wgmma.mma_async (NVVM::WgmmaMmaAsyncOp) 

Syntax:

operation ::= `nvvm.wgmma.mma_async` $descriptorA `,` $descriptorB `,` $inouts `,` $shape `,`
              `D` `[` $typeD `,` $scaleD (`,` $satfinite^)? `]` `,`
              `A` `[` $typeA `,` $scaleA `,` $layoutA `]` `,`
              `B` `[` $typeB `,` $scaleB `,` $layoutB `]`
              attr-dict `:`
              type($inouts) `->` type($results)

The warpgroup (128 threads) level matrix multiply and accumulate operation has either of the following forms, where matrix D is called accumulator: D = A * B + D D = A * B, where the input from accumulator D is disabled.

Supported shapes:

|--------------|--------------|------------|--------------|---------------|
|              |              |            |              |f16+=e4m3*e4m3 |
|              |              |            |              |f16+=e5m2*e5m2 |
|f32+=tf32*tf32|f16+=f16 *f16 | s32+=s8*s8 |s32 += b1 * b1|f16+=e5m2*e4m3 |
|              |f32+=f16 *f16 | s32+=u8*u8 |              |f16+=e4m3*e5m2 |
|              |f32+=bf16*bf16| s32+=u8*u8 |              |f16+=e4m3*e5m2 |
|              |f32+=bf16*bf16| s32+=s8*u8 |              |f32+=e4m3*e4m3 |
|              |              | s32+=u8*s8 |              |f32+=e5m2*e5m2 |
|              |              |            |              |f32+=e4m3*e5m2 |
|              |              |            |              |f32+=e4m3*e5m2 |
|--------------|--------------|------------|--------------|---------------|
|   .m64n8k8   |  .m64n8k16   | .m64n8k32  | .m64n8k256   | .m64n8k32     |
|   .m64n16k8  |  .m64n16k16  | .m64n16k32 | .m64n16k256  | .m64n16k32    |
|   .m64n24k8  |  .m64n24k16  | .m64n24k32 | .m64n24k256  | .m64n24k32    |
|   .m64n32k8  |  .m64n32k16  | .m64n32k32 | .m64n32k256  | .m64n32k32    |
|   .m64n40k8  |  .m64n40k16  | .m64n48k32 | .m64n48k256  | .m64n40k32    |
|   .m64n48k8  |  .m64n48k16  | .m64n64k32 | .m64n64k256  | .m64n48k32    |
|   .m64n56k8  |  .m64n56k16  | .m64n80k32 | .m64n80k256  | .m64n56k32    |
|   .m64n64k8  |  .m64n64k16  | .m64n96k32 | .m64n96k256  | .m64n64k32    |
|   .m64n72k8  |  .m64n72k16  | .m64n112k32| .m64n112k256 | .m64n72k32    |
|   .m64n80k8  |  .m64n80k16  | .m64n128k32| .m64n128k256 | .m64n80k32    |
|   .m64n88k8  |  .m64n88k16  | .m64n144k32| .m64n144k256 | .m64n88k32    |
|   .m64n96k8  |  .m64n96k16  | .m64n160k32| .m64n160k256 | .m64n96k32    |
|   .m64n104k8 |  .m64n104k16 | .m64n176k32| .m64n176k256 | .m64n104k32   |
|   .m64n112k8 |  .m64n112k16 | .m64n192k32| .m64n192k256 | .m64n112k32   |
|   .m64n120k8 |  .m64n120k16 | .m64n208k32| .m64n208k256 | .m64n120k32   |
|   .m64n128k8 |  .m64n128k16 | .m64n224k32| .m64n224k256 | .m64n128k32   |
|   .m64n136k8 |  .m64n136k16 | .m64n240k32| .m64n240k256 | .m64n136k32   |
|   .m64n144k8 |  .m64n144k16 | .m64n256k32| .m64n256k256 | .m64n144k32   |
|   .m64n152k8 |  .m64n152k16 |            |              | .m64n152k32   |
|   .m64n160k8 |  .m64n160k16 |            |              | .m64n160k32   |
|   .m64n168k8 |  .m64n168k16 |            |              | .m64n168k32   |
|   .m64n176k8 |  .m64n176k16 |            |              | .m64n176k32   |
|   .m64n184k8 |  .m64n184k16 |            |              | .m64n184k32   |
|   .m64n192k8 |  .m64n192k16 |            |              | .m64n192k32   |
|   .m64n200k8 |  .m64n200k16 |            |              | .m64n200k32   |
|   .m64n208k8 |  .m64n208k16 |            |              | .m64n208k32   |
|   .m64n216k8 |  .m64n216k16 |            |              | .m64n216k32   |
|   .m64n224k8 |  .m64n224k16 |            |              | .m64n224k32   |
|   .m64n232k8 |  .m64n232k16 |            |              | .m64n232k32   |
|   .m64n240k8 |  .m64n240k16 |            |              | .m64n240k32   |
|   .m64n248k8 |  .m64n248k16 |            |              | .m64n248k32   |
|   .m64n256k8 |  .m64n256k16 |            |              | .m64n256k32   |
|--------------|--------------|------------|--------------|---------------|

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

Attributes: 

AttributeMLIR TypeDescription
shape::mlir::NVVM::MMAShapeAttrAttribute for MMA operation shape.
typeA::mlir::NVVM::WGMMATypesAttr
NVVM WGMMA types

Enum cases:

  • f16 (f16)
  • tf32 (tf32)
  • u8 (u8)
  • s8 (s8)
  • b1 (b1)
  • bf16 (bf16)
  • e4m3 (e4m3)
  • e5m2 (e5m2)
  • f32 (f32)
  • s32 (s32)
typeB::mlir::NVVM::WGMMATypesAttr
NVVM WGMMA types

Enum cases:

  • f16 (f16)
  • tf32 (tf32)
  • u8 (u8)
  • s8 (s8)
  • b1 (b1)
  • bf16 (bf16)
  • e4m3 (e4m3)
  • e5m2 (e5m2)
  • f32 (f32)
  • s32 (s32)
typeD::mlir::NVVM::WGMMATypesAttr
NVVM WGMMA types

Enum cases:

  • f16 (f16)
  • tf32 (tf32)
  • u8 (u8)
  • s8 (s8)
  • b1 (b1)
  • bf16 (bf16)
  • e4m3 (e4m3)
  • e5m2 (e5m2)
  • f32 (f32)
  • s32 (s32)
scaleD::mlir::NVVM::WGMMAScaleOutAttr
WGMMA input predicate

Enum cases:

  • zero (zero)
  • one (one)
scaleA::mlir::NVVM::WGMMAScaleInAttr
WGMMA overflow options

Enum cases:

  • one (one)
  • neg (neg)
scaleB::mlir::NVVM::WGMMAScaleInAttr
WGMMA overflow options

Enum cases:

  • one (one)
  • neg (neg)
layoutA::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
layoutB::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
satfinite::mlir::NVVM::MMAIntOverflowAttr
MMA overflow options

Enum cases:

  • satfinite (satfinite)
  • wrapped (wrapped)

Operands: 

OperandDescription
inoutsLLVM structure type
descriptorA64-bit signless integer
descriptorB64-bit signless integer

Results: 

ResultDescription
resultsLLVM structure type

nvvm.wgmma.wait.group.sync.aligned (NVVM::WgmmaWaitGroupSyncOp) 

Syntax:

operation ::= `nvvm.wgmma.wait.group.sync.aligned` attr-dict $group

Signal the completion of a preceding warpgroup operation.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

Attributes: 

AttributeMLIR TypeDescription
group::mlir::IntegerAttr32-bit signless integer attribute

nvvm.wmma.load (NVVM::WMMALoadOp) 

Warp synchronous matrix load

Syntax:

operation ::= `nvvm.wmma.load` $ptr `,` $stride attr-dict `:` functional-type($ptr, $res)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
eltype::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)
frag::mlir::NVVM::MMAFragAttr
NVVM MMA frag type

Enum cases:

  • a (a)
  • b (b)
  • c (c)

Operands: 

OperandDescription
ptrLLVM pointer type
stride32-bit signless integer

Results: 

ResultDescription
resLLVM structure type

nvvm.wmma.mma (NVVM::WMMAMmaOp) 

Warp synchronous matrix-multiply accumulate using tensor cores.

Syntax:

operation ::= `nvvm.wmma.mma` $args attr-dict `:` functional-type($args, $res)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layoutA::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
layoutB::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
eltypeA::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)
eltypeB::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)

Operands: 

OperandDescription
argsvariadic of LLVM dialect-compatible type

Results: 

ResultDescription
resLLVM structure type

nvvm.wmma.store (NVVM::WMMAStoreOp) 

Warp synchronous matrix store

Syntax:

operation ::= `nvvm.wmma.store` $ptr `,` $stride `,` $args attr-dict `:` qualified(type($ptr)) `,`
              type($args)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttr
NVVM MMA layout

Enum cases:

  • row (row)
  • col (col)
eltype::mlir::NVVM::MMATypesAttr
NVVM MMA types

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)

Operands: 

OperandDescription
ptrLLVM pointer type
argsvariadic of LLVM dialect-compatible type
stride32-bit signless integer

Attributes 

LoadCacheModifierKindAttr 

NVVM load cache modifier kind

Syntax:

#nvvm.load_cache_modifier<
  ::mlir::NVVM::LoadCacheModifierKind   # value
>

Enum attribute of the different kinds of cache operators for load instructions.

For more information, see PTX ISA

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::LoadCacheModifierKindan enum of type LoadCacheModifierKind

MMAB1OpAttr 

MMA binary operations

Syntax:

#nvvm.mma_b1op<
  ::mlir::NVVM::MMAB1Op   # value
>

Enum cases:

  • none (none)
  • xor_popc (xor_popc)
  • and_popc (and_popc)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAB1Opan enum of type MMAB1Op

MMAFragAttr 

NVVM MMA frag type

Syntax:

#nvvm.mma_frag<
  ::mlir::NVVM::MMAFrag   # value
>

Enum cases:

  • a (a)
  • b (b)
  • c (c)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAFragan enum of type MMAFrag

MMAIntOverflowAttr 

MMA overflow options

Syntax:

#nvvm.mma_int_overflow<
  ::mlir::NVVM::MMAIntOverflow   # value
>

Enum cases:

  • satfinite (satfinite)
  • wrapped (wrapped)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAIntOverflowan enum of type MMAIntOverflow

MMALayoutAttr 

NVVM MMA layout

Syntax:

#nvvm.mma_layout<
  ::mlir::NVVM::MMALayout   # value
>

Enum cases:

  • row (row)
  • col (col)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMALayoutan enum of type MMALayout

MMATypesAttr 

NVVM MMA types

Syntax:

#nvvm.mma_type<
  ::mlir::NVVM::MMATypes   # value
>

Enum cases:

  • f16 (f16)
  • f32 (f32)
  • tf32 (tf32)
  • bf16 (bf16)
  • s8 (s8)
  • u8 (u8)
  • s32 (s32)
  • s4 (s4)
  • u4 (u4)
  • b1 (b1)
  • f64 (f64)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMATypesan enum of type MMATypes

MMAShapeAttr 

Attribute for MMA operation shape.

Syntax:

#nvvm.shape<
  int,   # m
  int,   # n
  int   # k
>

Parameters: 

ParameterC++ typeDescription
mint
nint
kint

NVVMTargetAttr 

Syntax:

#nvvm.target<
  int,   # O
  ::llvm::StringRef,   # triple
  ::llvm::StringRef,   # chip
  ::llvm::StringRef,   # features
  DictionaryAttr,   # flags
  ArrayAttr   # link
>

GPU target attribute for controlling compilation of NVIDIA targets. All parameters decay into default values if not present.

Examples:

  1. Target with default values.
  gpu.module @mymodule [#nvvm.target] attributes {...} {
    ...
  }
  1. Target with sm_90 chip and fast math.
  gpu.module @mymodule [#nvvm.target<chip = "sm_90", flags = {fast}>] {
    ...
  }

Parameters: 

ParameterC++ typeDescription
OintOptimization level to apply.
triple::llvm::StringRefTarget triple.
chip::llvm::StringRefTarget chip.
features::llvm::StringRefTarget chip features.
flagsDictionaryAttrTarget specific flags.
linkArrayAttrFiles to link to the LLVM module.

ProxyKindAttr 

Proxy kind

Syntax:

#nvvm.proxy_kind<
  ::mlir::NVVM::ProxyKind   # value
>

Enum cases:

  • alias (alias)
  • async (async)
  • async.global (async_global)
  • async.shared (async_shared)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::ProxyKindan enum of type ProxyKind

ReduxKindAttr 

NVVM redux kind

Syntax:

#nvvm.redux_kind<
  ::mlir::NVVM::ReduxKind   # value
>

Enum cases:

  • add (ADD)
  • and (AND)
  • max (MAX)
  • min (MIN)
  • or (OR)
  • umax (UMAX)
  • umin (UMIN)
  • xor (XOR)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::ReduxKindan enum of type ReduxKind

SetMaxRegisterActionAttr 

NVVM set max register action

Syntax:

#nvvm.action<
  ::mlir::NVVM::SetMaxRegisterAction   # value
>

Enum cases:

  • decrease (decrease)
  • increase (increase)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::SetMaxRegisterActionan enum of type SetMaxRegisterAction

SharedSpaceAttr 

Shared memory space

Syntax:

#nvvm.shared_space<
  ::mlir::NVVM::SharedSpace   # value
>

Enum cases:

  • cta (shared_cta)
  • cluster (shared_cluster)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::SharedSpacean enum of type SharedSpace

ShflKindAttr 

NVVM shuffle kind

Syntax:

#nvvm.shfl_kind<
  ::mlir::NVVM::ShflKind   # value
>

Enum cases:

  • bfly (bfly)
  • up (up)
  • down (down)
  • idx (idx)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::ShflKindan enum of type ShflKind

WGMMAScaleInAttr 

WGMMA overflow options

Syntax:

#nvvm.wgmma_scale_in<
  ::mlir::NVVM::WGMMAScaleIn   # value
>

Enum cases:

  • one (one)
  • neg (neg)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::WGMMAScaleInan enum of type WGMMAScaleIn

WGMMAScaleOutAttr 

WGMMA input predicate

Syntax:

#nvvm.wgmma_scale_out<
  ::mlir::NVVM::WGMMAScaleOut   # value
>

Enum cases:

  • zero (zero)
  • one (one)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::WGMMAScaleOutan enum of type WGMMAScaleOut

WGMMATypesAttr 

NVVM WGMMA types

Syntax:

#nvvm.wgmma_type<
  ::mlir::NVVM::WGMMATypes   # value
>

Enum cases:

  • f16 (f16)
  • tf32 (tf32)
  • u8 (u8)
  • s8 (s8)
  • b1 (b1)
  • bf16 (bf16)
  • e4m3 (e4m3)
  • e5m2 (e5m2)
  • f32 (f32)
  • s32 (s32)

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::WGMMATypesan enum of type WGMMATypes