'nvvm' Dialect

Operations ¶

`nvvm.bar.warp.sync` (NVVM::SyncWarpOp) ¶

Syntax:

operation ::= `nvvm.bar.warp.sync` $mask attr-dict `:` type($mask)

Operands: ¶

Operand	Description
`mask`	LLVM dialect-compatible type

`nvvm.barrier` (NVVM::BarrierOp) ¶

Syntax:

operation ::= `nvvm.barrier` (`id` `=` $barrierId^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict

Traits: AttrSizedOperandSegments

Operands: ¶

Operand	Description
`barrierId`	32-bit signless integer
`numberOfThreads`	32-bit signless integer

`nvvm.barrier.arrive` (NVVM::BarrierArriveOp) ¶

Syntax:

operation ::= `nvvm.barrier.arrive` (`id` `=` $barrierId^)? `number_of_threads` `=` $numberOfThreads attr-dict

Thread that executes this op announces their arrival at the barrier with given id and continue their execution.

The default barrier id is 0 that is similar to nvvm.barrier Op. When barrierId is not present, the default barrier id is used.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`barrierId`	32-bit signless integer
`numberOfThreads`	32-bit signless integer

`nvvm.barrier0` (NVVM::Barrier0Op) ¶

Syntax:

operation ::= `nvvm.barrier0` attr-dict

`nvvm.cluster.arrive` (NVVM::ClusterArriveOp) ¶

Cluster Barrier Arrive Op

Syntax:

operation ::= `nvvm.cluster.arrive` attr-dict

The cluster.arrive can be used by the threads within the cluster for synchronization and communication. The cluster.arrive instruction marks the warps’ arrival at the barrier without causing the executing thread to wait for other participating threads.

The aligned attribute, when provided, generates the .aligned version of the PTX instruction.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: ¶

Attribute	MLIR Type	Description
`aligned`	::mlir::UnitAttr	unit attribute

`nvvm.cluster.arrive.relaxed` (NVVM::ClusterArriveRelaxedOp) ¶

Cluster Barrier Relaxed Arrive Op

Syntax:

operation ::= `nvvm.cluster.arrive.relaxed` attr-dict

The aligned attribute, when provided, generates the .aligned version of the PTX instruction. The .relaxed qualifier on cluster.arrive specifies that there are no memory ordering and visibility guarantees provided for the memory accesses performed prior to cluster.arrive.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: ¶

Attribute	MLIR Type	Description
`aligned`	::mlir::UnitAttr	unit attribute

`nvvm.cluster.wait` (NVVM::ClusterWaitOp) ¶

Cluster Barrier Wait Op

Syntax:

operation ::= `nvvm.cluster.wait` attr-dict

The cluster.wait causes the executing thread to wait for all non-exited threads of the cluster to perform cluster.arrive. The aligned attribute, when provided, generates the .aligned version of the PTX instruction.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)

Attributes: ¶

Attribute	MLIR Type	Description
`aligned`	::mlir::UnitAttr	unit attribute

`nvvm.cp.async.bulk.commit.group` (NVVM::CpAsyncBulkCommitGroupOp) ¶

Syntax:

operation ::= `nvvm.cp.async.bulk.commit.group` attr-dict

This Op commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group)

`nvvm.cp.async.bulk.tensor.global.shared.cta` (NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) ¶

Syntax:

operation ::= `nvvm.cp.async.bulk.tensor.global.shared.cta` $tmaDescriptor `,`
              $srcMem `,`
              `box` `[`$coordinates `]`
              (`,` `predicate` `=` $predicate^)?
              attr-dict  `:` type(operands)

Traits: AttrSizedOperandSegments

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`tmaDescriptor`	LLVM pointer type
`srcMem`	LLVM pointer in address space 3
`coordinates`	variadic of 32-bit signless integer
`predicate`	1-bit signless integer

`nvvm.cp.async.bulk.tensor.shared.cluster.global` (NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) ¶

Syntax:

operation ::= `nvvm.cp.async.bulk.tensor.shared.cluster.global` $dstMem `,`
              $tmaDescriptor `,`
              $mbar `,`
              `box` `[`$coordinates `]`
              (`im2col` `[` $im2colOffsets^ `]` )?
              (`multicast_mask` `=` $multicastMask^ )?
              (`l2_cache_hint` `=` $l2CacheHint^ )?
              (`predicate` `=` $predicate^)?
              attr-dict  `:` type($dstMem) `,` type($tmaDescriptor)

Initiates an asynchronous copy operation on the tensor data from global memory to shared memory.

The Op operates has two load modes:

Tiled Mode: It’s the default mode. The source multi-dimensional tensor layout is preserved at the destination.
Im2col Mode: This mode is used when im2colOffsets operands are present. the elements in the Bounding Box of the source tensor are rearranged into columns at the destination. In this mode, the tensor has to be at least 3-dimensional.

The multicastMask operand is optional. When it is present, the Op copies data from global memory to shared memory of multiple CTAs in the cluster. Operand multicastMask specifies the destination CTAs in the cluster such that each bit position in the 16-bit multicastMask operand corresponds to the nvvm.read.ptx.sreg.ctaid of the destination CTA.

The l2CacheHint operand is optional, and it is used to specify cache eviction policy that may be used during the memory access.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor)

Traits: AttrSizedOperandSegments

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`dstMem`	LLVM pointer in address space 3
`tmaDescriptor`	LLVM pointer type
`coordinates`	variadic of 32-bit signless integer
`mbar`	LLVM pointer in address space 3
`im2colOffsets`	variadic of 16-bit signless integer
`multicastMask`	16-bit signless integer
`l2CacheHint`	64-bit signless integer
`predicate`	1-bit signless integer

`nvvm.cp.async.bulk.wait_group` (NVVM::CpAsyncBulkWaitGroupOp) ¶

Syntax:

operation ::= `nvvm.cp.async.bulk.wait_group` $group attr-dict

Op waits for completion of the most recent bulk async-groups.

The $group operand tells waiting has to be done until for $group or fewer of the most recent bulk async-groups. If $group is 0, the op wait until all the most recent bulk async-groups have completed.

The $read indicates that the waiting has to be done until all the bulk async operations in the specified bulk async-group have completed reading from their source locations.

[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-wait-group)

Attributes: ¶

Attribute	MLIR Type	Description
`group`	::mlir::IntegerAttr	32-bit signless integer attribute whose minimum value is 0
`read`	::mlir::UnitAttr	unit attribute

`nvvm.cp.async.commit.group` (NVVM::CpAsyncCommitGroupOp) ¶

Syntax:

operation ::= `nvvm.cp.async.commit.group` attr-dict

`nvvm.cp.async.mbarrier.arrive` (NVVM::CpAsyncMBarrierArriveOp) ¶

NVVM Dialect Op for cp.async.mbarrier.arrive

Syntax:

operation ::= `nvvm.cp.async.mbarrier.arrive` $addr attr-dict `:` type(operands)

The cp.async.mbarrier.arrive Op makes the mbarrier object track all prior cp.async operations initiated by the executing thread. The addr operand specifies the address of the mbarrier object in generic address space. The noinc attr impacts how the mbarrier’s state is updated. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)

Attributes: ¶

Attribute	MLIR Type	Description
`noinc`	::mlir::IntegerAttr	1-bit signless integer attribute

Operands: ¶

Operand	Description
`addr`	LLVM pointer type

`nvvm.cp.async.mbarrier.arrive.shared` (NVVM::CpAsyncMBarrierArriveSharedOp) ¶

NVVM Dialect Op for cp.async.mbarrier.arrive.shared

Syntax:

operation ::= `nvvm.cp.async.mbarrier.arrive.shared` $addr attr-dict `:` type(operands)

The cp.async.mbarrier.arrive.shared Op makes the mbarrier object track all prior cp.async operations initiated by the executing thread. The addr operand specifies the address of the mbarrier object in shared memory. The noinc attr impacts how the mbarrier’s state is updated. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)

Attributes: ¶

Attribute	MLIR Type	Description
`noinc`	::mlir::IntegerAttr	1-bit signless integer attribute

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3

`nvvm.cp.async.shared.global` (NVVM::CpAsyncOp) ¶

Syntax:

operation ::= `nvvm.cp.async.shared.global` $dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Attributes: ¶

Attribute	MLIR Type	Description
`size`	::mlir::IntegerAttr	32-bit signless integer attribute
`modifier`	::mlir::NVVM::LoadCacheModifierKindAttr	NVVM load cache modifier kind `Enum attribute of the different kinds of cache operators for load instructions.` `For more information, see PTX ISA`

Operands: ¶

Operand	Description
`dst`	LLVM pointer in address space 3
`src`	LLVM pointer in address space 1
`cpSize`	LLVM dialect-compatible type

`nvvm.cp.async.wait.group` (NVVM::CpAsyncWaitGroupOp) ¶

Syntax:

operation ::= `nvvm.cp.async.wait.group` $n attr-dict

Attributes: ¶

Attribute	MLIR Type	Description
`n`	::mlir::IntegerAttr	32-bit signless integer attribute

`nvvm.elect.sync` (NVVM::ElectSyncOp) ¶

Syntax:

operation ::= `nvvm.elect.sync` attr-dict `->` type(results)

Interfaces: BasicPtxBuilderInterface

Results: ¶

Result	Description
`pred`	1-bit signless integer

`nvvm.fence.mbarrier.init` (NVVM::FenceMbarrierInitOp) ¶

Syntax:

operation ::= `nvvm.fence.mbarrier.init` attr-dict

Fence operation that applies on the prior nvvm.mbarrier.init [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)

Interfaces: BasicPtxBuilderInterface

`nvvm.fence.proxy` (NVVM::FenceProxyOp) ¶

Syntax:

operation ::= `nvvm.fence.proxy` attr-dict

Fence operation with proxy to establish an ordering between memory accesses that may happen through different proxies. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)

Interfaces: BasicPtxBuilderInterface

Attributes: ¶

Attribute	MLIR Type	Description
`kind`	::mlir::NVVM::ProxyKindAttr	Proxy kind Enum cases: alias (`alias`) async (`async`) async.global (`async_global`) async.shared (`async_shared`)
`space`	::mlir::NVVM::SharedSpaceAttr	Shared memory space Enum cases: cta (`shared_cta`) cluster (`shared_cluster`)

`nvvm.fence.sc.cluster` (NVVM::FenceScClusterOp) ¶

Syntax:

operation ::= `nvvm.fence.sc.cluster` attr-dict

`nvvm.ldmatrix` (NVVM::LdMatrixOp) ¶

Cooperative matrix load

Syntax:

operation ::= `nvvm.ldmatrix` $ptr attr-dict `:` functional-type($ptr, $res)

Attributes: ¶

Attribute	MLIR Type	Description
`num`	::mlir::IntegerAttr	32-bit signless integer attribute
`layout`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)

Operands: ¶

Operand	Description
`ptr`	LLVM pointer type

Results: ¶

Result	Description
`res`	any type

`nvvm.mbarrier.arrive` (NVVM::MBarrierArriveOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive` $addr attr-dict `:` type($addr) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer type

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.arrive.expect_tx` (NVVM::MBarrierArriveExpectTxOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive.expect_tx` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer type
`txcount`	32-bit signless integer
`predicate`	1-bit signless integer

`nvvm.mbarrier.arrive.expect_tx.shared` (NVVM::MBarrierArriveExpectTxSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive.expect_tx.shared` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3
`txcount`	32-bit signless integer
`predicate`	1-bit signless integer

`nvvm.mbarrier.arrive.nocomplete` (NVVM::MBarrierArriveNocompleteOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive.nocomplete` $addr `,` $count attr-dict `:` type(operands) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer type
`count`	32-bit signless integer

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.arrive.nocomplete.shared` (NVVM::MBarrierArriveNocompleteSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive.nocomplete.shared` $addr `,` $count attr-dict `:` type(operands) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3
`count`	32-bit signless integer

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.arrive.shared` (NVVM::MBarrierArriveSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.arrive.shared` $addr attr-dict `:` qualified(type($addr)) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.init` (NVVM::MBarrierInitOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.init` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer type
`count`	32-bit signless integer
`predicate`	1-bit signless integer

`nvvm.mbarrier.init.shared` (NVVM::MBarrierInitSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.init.shared` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3
`count`	32-bit signless integer
`predicate`	1-bit signless integer

`nvvm.mbarrier.inval` (NVVM::MBarrierInvalOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.inval` $addr attr-dict `:` type(operands)

Operands: ¶

Operand	Description
`addr`	LLVM pointer type

`nvvm.mbarrier.inval.shared` (NVVM::MBarrierInvalSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.inval.shared` $addr attr-dict `:` type(operands)

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3

`nvvm.mbarrier.test.wait` (NVVM::MBarrierTestWaitOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.test.wait` $addr `,` $state attr-dict `:` type(operands) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer type
`state`	LLVM dialect-compatible type

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.test.wait.shared` (NVVM::MBarrierTestWaitSharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.test.wait.shared` $addr `,` $state attr-dict `:` type(operands) `->` type($res)

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3
`state`	LLVM dialect-compatible type

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.mbarrier.try_wait.parity` (NVVM::MBarrierTryWaitParityOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.try_wait.parity` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer type
`phase`	32-bit signless integer
`ticks`	32-bit signless integer

`nvvm.mbarrier.try_wait.parity.shared` (NVVM::MBarrierTryWaitParitySharedOp) ¶

Syntax:

operation ::= `nvvm.mbarrier.try_wait.parity.shared` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`addr`	LLVM pointer in address space 3
`phase`	32-bit signless integer
`ticks`	32-bit signless integer

`nvvm.mma.sync` (NVVM::MmaOp) ¶

Cooperative matrix-multiply and accumulate

The nvvm.mma.sync operation collectively performs the operation D = matmul(A, B) + C using all threads in a warp.

All the threads in the warp must execute the same mma.sync operation.

For each possible multiplicand PTX data type, there are one or more possible instruction shapes given as “mMnNkK”. The below table describes the posssibilities as well as the types required for the operands. Note that the data type for C (the accumulator) and D (the result) can vary independently when there are multiple possibilities in the “C/D Type” column.

When an optional attribute cannot be immediately inferred from the types of the operands and the result during parsing or validation, an error will be raised.

b1Op is only relevant when the binary (b1) type is given to multiplicandDataType. It specifies how the multiply-and-acumulate is performed and is either xor_popc or and_poc. The default is xor_popc.

intOverflowBehavior is only relevant when the multiplicandType attribute is one of u8, s8, u4, s4, this attribute describes how overflow is handled in the accumulator. When the attribute is satfinite, the accumulator values are clamped in the int32 range on overflow. This is the default behavior. Alternatively, accumulator behavior wrapped can also be specified, in which case overflow wraps from one end of the range to the other.

layoutA and layoutB are required and should generally be set to #nvvm.mma_layout<row> and #nvvm.mma_layout<col> respectively, but other combinations are possible for certain layouts according to the table below.

| A/B Type | Shape     | ALayout | BLayout | A Type   | B Type   | C/D Type          |
|----------|-----------|---------|---------|----------|----------|-------------------|
| f64      | .m8n8k4   | row     | col     | 1x f64   | 1x f64   | 2x f64            |
| f16      | .m8n8k4   | row/col | row/col | 2x f16x2 | 2x f16x2 | 4x f16x2 or 8xf32 |
|          | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| bf16     | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| tf32     | .m16n8k4  | row     | col     | 2x i32   | 1x i32   | 4x f32            |
|          | .m16n8k8  | row     | col     | 4x i32   | 2x i32   | 2x f16x2 or 4 f32 |
| u8/s8    | .m8n8k16  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | .m16n8k16 | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | .m16n8k32 | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| u4/s4    | .m8n8k32  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k32  | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | m16n8k64  | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| b1       | m8n8k128  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k128 | row     | col     | 2x i32   | 1x i32   | 4x i32            |

Example:


%128 = nvvm.mma.sync A[%120, %121, %122, %123]
                     B[%124, %125]
                     C[%126, %127]
                     {layoutA = #nvvm.mma_layout<row>,
                      layoutB = #nvvm.mma_layout<col>,
                      shape = {k = 16 : i32, m = 16 : i32, n = 8 : i32}}
    : (vector<2xf16>, vector<2xf16>, vector<2xf16>)
       -> !llvm.struct<(vector<2xf16>, vector<2xf16>)>

Traits: AttrSizedOperandSegments

Attributes: ¶

Attribute	MLIR Type	Description
`shape`	::mlir::NVVM::MMAShapeAttr	Attribute for MMA operation shape.
`b1Op`	::mlir::NVVM::MMAB1OpAttr	MMA binary operations Enum cases: none (`none`) xor_popc (`xor_popc`) and_popc (`and_popc`)
`intOverflowBehavior`	::mlir::NVVM::MMAIntOverflowAttr	MMA overflow options Enum cases: satfinite (`satfinite`) wrapped (`wrapped`)
`layoutA`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`layoutB`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`multiplicandAPtxType`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)
`multiplicandBPtxType`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)

Operands: ¶

Operand	Description
`operandA`	variadic of LLVM dialect-compatible type
`operandB`	variadic of LLVM dialect-compatible type
`operandC`	variadic of LLVM dialect-compatible type

Results: ¶

Result	Description
`res`	LLVM structure type

`nvvm.prefetch.tensormap` (NVVM::PrefetchTensorMapOp) ¶

Syntax:

operation ::= `nvvm.prefetch.tensormap` $tmaDescriptor (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)

Interfaces: BasicPtxBuilderInterface

Operands: ¶

Operand	Description
`tmaDescriptor`	LLVM pointer type
`predicate`	1-bit signless integer

`nvvm.rcp.approx.ftz.f` (NVVM::RcpApproxFtzF32Op) ¶

Syntax:

operation ::= `nvvm.rcp.approx.ftz.f` $arg attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: ¶

Operand	Description
`arg`	32-bit float

Results: ¶

Result	Description
`res`	32-bit float

`nvvm.read.ptx.sreg.clock` (NVVM::ClockOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.clock` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.clock64` (NVVM::Clock64Op) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.clock64` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.ctaid.x` (NVVM::BlockInClusterIdXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.ctaid.y` (NVVM::BlockInClusterIdYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.ctaid.z` (NVVM::BlockInClusterIdZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.ctarank` (NVVM::ClusterId) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.ctarank` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.nctaid.x` (NVVM::ClusterDimBlocksXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.nctaid.y` (NVVM::ClusterDimBlocksYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.nctaid.z` (NVVM::ClusterDimBlocksZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.cluster.nctarank` (NVVM::ClusterDim) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.cluster.nctarank` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.clusterid.x` (NVVM::ClusterIdXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.clusterid.y` (NVVM::ClusterIdYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.clusterid.z` (NVVM::ClusterIdZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.clusterid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ctaid.x` (NVVM::BlockIdXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ctaid.y` (NVVM::BlockIdYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ctaid.z` (NVVM::BlockIdZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.laneid` (NVVM::LaneIdOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.laneid` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nclusterid.x` (NVVM::ClusterDimXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nclusterid.y` (NVVM::ClusterDimYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nclusterid.z` (NVVM::ClusterDimZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nclusterid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nctaid.x` (NVVM::GridDimXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nctaid.y` (NVVM::GridDimYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.nctaid.z` (NVVM::GridDimZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ntid.x` (NVVM::BlockDimXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ntid.y` (NVVM::BlockDimYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.ntid.z` (NVVM::BlockDimZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.tid.x` (NVVM::ThreadIdXOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.x` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.tid.y` (NVVM::ThreadIdYOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.y` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.tid.z` (NVVM::ThreadIdZOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.z` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.read.ptx.sreg.warpsize` (NVVM::WarpSizeOp) ¶

Syntax:

operation ::= `nvvm.read.ptx.sreg.warpsize` attr-dict `:` type($res)

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.redux.sync` (NVVM::ReduxOp) ¶

Syntax:

operation ::= `nvvm.redux.sync` $kind $val `,` $mask_and_clamp  attr-dict `:` type($val) `->` type($res)

Attributes: ¶

Attribute	MLIR Type	Description
`kind`	::mlir::NVVM::ReduxKindAttr	NVVM redux kind Enum cases: add (`ADD`) and (`AND`) max (`MAX`) min (`MIN`) or (`OR`) umax (`UMAX`) umin (`UMIN`) xor (`XOR`)

Operands: ¶

Operand	Description
`val`	LLVM dialect-compatible type
`mask_and_clamp`	32-bit signless integer

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.setmaxregister` (NVVM::SetMaxRegisterOp) ¶

Syntax:

operation ::= `nvvm.setmaxregister` $action $regCount attr-dict

Attributes: ¶

Attribute	MLIR Type	Description
`regCount`	::mlir::IntegerAttr	32-bit signless integer attribute
`action`	::mlir::NVVM::SetMaxRegisterActionAttr	NVVM set max register action Enum cases: decrease (`decrease`) increase (`increase`)

`nvvm.shfl.sync` (NVVM::ShflOp) ¶

NVVM Dialect Op for shfl.sync

Syntax:

operation ::= `nvvm.shfl.sync` $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp  attr-dict
              `:` type($val) `->` type($res)

The shfl.sync Op implements data shuffle within threads of a warp. The thread_mask denotes the threads participating in the Op where the bit position corresponds to a particular thread’s laneid. The offset specifies a source lane or source lane offset (depending on kind). The val is the input value to be copied from the source. The mask_and_clamp contains two packed values specifying a mask for logically splitting warps into sub-segments and an upper bound for clamping the source lane index. [For more information, refer PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)

Attributes: ¶

Attribute	MLIR Type	Description
`kind`	::mlir::NVVM::ShflKindAttr	NVVM shuffle kind Enum cases: bfly (`bfly`) up (`up`) down (`down`) idx (`idx`)
`return_value_and_is_valid`	::mlir::UnitAttr	unit attribute

Operands: ¶

Operand	Description
`thread_mask`	32-bit signless integer
`val`	LLVM dialect-compatible type
`offset`	32-bit signless integer
`mask_and_clamp`	32-bit signless integer

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.stmatrix` (NVVM::StMatrixOp) ¶

Cooperative matrix store

Syntax:

operation ::= `nvvm.stmatrix` $ptr `,` $sources attr-dict `:` type(operands)

Collectively store one or more matrices across all threads in a warp to the location indicated by the address operand $ptr in shared memory. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)

Interfaces: BasicPtxBuilderInterface

Attributes: ¶

Attribute	MLIR Type	Description
`layout`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)

Operands: ¶

Operand	Description
`ptr`	LLVM pointer in address space 3
`sources`	variadic of 32-bit signless integer

`nvvm.vote.ballot.sync` (NVVM::VoteBallotOp) ¶

Operands: ¶

Operand	Description
`mask`	LLVM dialect-compatible type
`pred`	LLVM dialect-compatible type

Results: ¶

Result	Description
`res`	LLVM dialect-compatible type

`nvvm.wgmma.commit.group.sync.aligned` (NVVM::WgmmaGroupSyncAlignedOp) ¶

Syntax:

operation ::= `nvvm.wgmma.commit.group.sync.aligned` attr-dict

Commits all prior uncommitted warpgroup level matrix multiplication operations.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

`nvvm.wgmma.fence.aligned` (NVVM::WgmmaFenceAlignedOp) ¶

Syntax:

operation ::= `nvvm.wgmma.fence.aligned` attr-dict

Enforce an ordering of register accesses between warpgroup level matrix multiplication and other operations.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

`nvvm.wgmma.mma_async` (NVVM::WgmmaMmaAsyncOp) ¶

Syntax:

operation ::= `nvvm.wgmma.mma_async` $descriptorA `,` $descriptorB `,` $inouts `,` $shape `,`
              `D` `[` $typeD `,` $scaleD (`,` $satfinite^)? `]` `,`
              `A` `[` $typeA `,` $scaleA `,` $layoutA `]` `,`
              `B` `[` $typeB `,` $scaleB `,` $layoutB `]`
              attr-dict `:`
              type($inouts) `->` type($results)

The warpgroup (128 threads) level matrix multiply and accumulate operation has either of the following forms, where matrix D is called accumulator: D = A * B + D D = A * B, where the input from accumulator D is disabled.

Supported shapes:

|--------------|--------------|------------|--------------|---------------|
|              |              |            |              |f16+=e4m3*e4m3 |
|              |              |            |              |f16+=e5m2*e5m2 |
|f32+=tf32*tf32|f16+=f16 *f16 | s32+=s8*s8 |s32 += b1 * b1|f16+=e5m2*e4m3 |
|              |f32+=f16 *f16 | s32+=u8*u8 |              |f16+=e4m3*e5m2 |
|              |f32+=bf16*bf16| s32+=u8*u8 |              |f16+=e4m3*e5m2 |
|              |f32+=bf16*bf16| s32+=s8*u8 |              |f32+=e4m3*e4m3 |
|              |              | s32+=u8*s8 |              |f32+=e5m2*e5m2 |
|              |              |            |              |f32+=e4m3*e5m2 |
|              |              |            |              |f32+=e4m3*e5m2 |
|--------------|--------------|------------|--------------|---------------|
|   .m64n8k8   |  .m64n8k16   | .m64n8k32  | .m64n8k256   | .m64n8k32     |
|   .m64n16k8  |  .m64n16k16  | .m64n16k32 | .m64n16k256  | .m64n16k32    |
|   .m64n24k8  |  .m64n24k16  | .m64n24k32 | .m64n24k256  | .m64n24k32    |
|   .m64n32k8  |  .m64n32k16  | .m64n32k32 | .m64n32k256  | .m64n32k32    |
|   .m64n40k8  |  .m64n40k16  | .m64n48k32 | .m64n48k256  | .m64n40k32    |
|   .m64n48k8  |  .m64n48k16  | .m64n64k32 | .m64n64k256  | .m64n48k32    |
|   .m64n56k8  |  .m64n56k16  | .m64n80k32 | .m64n80k256  | .m64n56k32    |
|   .m64n64k8  |  .m64n64k16  | .m64n96k32 | .m64n96k256  | .m64n64k32    |
|   .m64n72k8  |  .m64n72k16  | .m64n112k32| .m64n112k256 | .m64n72k32    |
|   .m64n80k8  |  .m64n80k16  | .m64n128k32| .m64n128k256 | .m64n80k32    |
|   .m64n88k8  |  .m64n88k16  | .m64n144k32| .m64n144k256 | .m64n88k32    |
|   .m64n96k8  |  .m64n96k16  | .m64n160k32| .m64n160k256 | .m64n96k32    |
|   .m64n104k8 |  .m64n104k16 | .m64n176k32| .m64n176k256 | .m64n104k32   |
|   .m64n112k8 |  .m64n112k16 | .m64n192k32| .m64n192k256 | .m64n112k32   |
|   .m64n120k8 |  .m64n120k16 | .m64n208k32| .m64n208k256 | .m64n120k32   |
|   .m64n128k8 |  .m64n128k16 | .m64n224k32| .m64n224k256 | .m64n128k32   |
|   .m64n136k8 |  .m64n136k16 | .m64n240k32| .m64n240k256 | .m64n136k32   |
|   .m64n144k8 |  .m64n144k16 | .m64n256k32| .m64n256k256 | .m64n144k32   |
|   .m64n152k8 |  .m64n152k16 |            |              | .m64n152k32   |
|   .m64n160k8 |  .m64n160k16 |            |              | .m64n160k32   |
|   .m64n168k8 |  .m64n168k16 |            |              | .m64n168k32   |
|   .m64n176k8 |  .m64n176k16 |            |              | .m64n176k32   |
|   .m64n184k8 |  .m64n184k16 |            |              | .m64n184k32   |
|   .m64n192k8 |  .m64n192k16 |            |              | .m64n192k32   |
|   .m64n200k8 |  .m64n200k16 |            |              | .m64n200k32   |
|   .m64n208k8 |  .m64n208k16 |            |              | .m64n208k32   |
|   .m64n216k8 |  .m64n216k16 |            |              | .m64n216k32   |
|   .m64n224k8 |  .m64n224k16 |            |              | .m64n224k32   |
|   .m64n232k8 |  .m64n232k16 |            |              | .m64n232k32   |
|   .m64n240k8 |  .m64n240k16 |            |              | .m64n240k32   |
|   .m64n248k8 |  .m64n248k16 |            |              | .m64n248k32   |
|   .m64n256k8 |  .m64n256k16 |            |              | .m64n256k32   |
|--------------|--------------|------------|--------------|---------------|

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

Attributes: ¶

Attribute	MLIR Type	Description
`shape`	::mlir::NVVM::MMAShapeAttr	Attribute for MMA operation shape.
`typeA`	::mlir::NVVM::WGMMATypesAttr	NVVM WGMMA types Enum cases: f16 (`f16`) tf32 (`tf32`) u8 (`u8`) s8 (`s8`) b1 (`b1`) bf16 (`bf16`) e4m3 (`e4m3`) e5m2 (`e5m2`) f32 (`f32`) s32 (`s32`)
`typeB`	::mlir::NVVM::WGMMATypesAttr	NVVM WGMMA types Enum cases: f16 (`f16`) tf32 (`tf32`) u8 (`u8`) s8 (`s8`) b1 (`b1`) bf16 (`bf16`) e4m3 (`e4m3`) e5m2 (`e5m2`) f32 (`f32`) s32 (`s32`)
`typeD`	::mlir::NVVM::WGMMATypesAttr	NVVM WGMMA types Enum cases: f16 (`f16`) tf32 (`tf32`) u8 (`u8`) s8 (`s8`) b1 (`b1`) bf16 (`bf16`) e4m3 (`e4m3`) e5m2 (`e5m2`) f32 (`f32`) s32 (`s32`)
`scaleD`	::mlir::NVVM::WGMMAScaleOutAttr	WGMMA input predicate Enum cases: zero (`zero`) one (`one`)
`scaleA`	::mlir::NVVM::WGMMAScaleInAttr	WGMMA overflow options Enum cases: one (`one`) neg (`neg`)
`scaleB`	::mlir::NVVM::WGMMAScaleInAttr	WGMMA overflow options Enum cases: one (`one`) neg (`neg`)
`layoutA`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`layoutB`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`satfinite`	::mlir::NVVM::MMAIntOverflowAttr	MMA overflow options Enum cases: satfinite (`satfinite`) wrapped (`wrapped`)

Operands: ¶

Operand	Description
`inouts`	LLVM structure type
`descriptorA`	64-bit signless integer
`descriptorB`	64-bit signless integer

Results: ¶

Result	Description
`results`	LLVM structure type

`nvvm.wgmma.wait.group.sync.aligned` (NVVM::WgmmaWaitGroupSyncOp) ¶

Syntax:

operation ::= `nvvm.wgmma.wait.group.sync.aligned` attr-dict $group

Signal the completion of a preceding warpgroup operation.

For more information, see PTX ISA

Interfaces: BasicPtxBuilderInterface

Attributes: ¶

Attribute	MLIR Type	Description
`group`	::mlir::IntegerAttr	32-bit signless integer attribute

`nvvm.wmma.load` (NVVM::WMMALoadOp) ¶

Warp synchronous matrix load

Syntax:

operation ::= `nvvm.wmma.load` $ptr `,` $stride attr-dict `:` functional-type($ptr, $res)

Attributes: ¶

Attribute	MLIR Type	Description
`m`	::mlir::IntegerAttr	32-bit signless integer attribute
`n`	::mlir::IntegerAttr	32-bit signless integer attribute
`k`	::mlir::IntegerAttr	32-bit signless integer attribute
`layout`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`eltype`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)
`frag`	::mlir::NVVM::MMAFragAttr	NVVM MMA frag type Enum cases: a (`a`) b (`b`) c (`c`)

Operands: ¶

Operand	Description
`ptr`	LLVM pointer type
`stride`	32-bit signless integer

Results: ¶

Result	Description
`res`	LLVM structure type

`nvvm.wmma.mma` (NVVM::WMMAMmaOp) ¶

Warp synchronous matrix-multiply accumulate using tensor cores.

Syntax:

operation ::= `nvvm.wmma.mma` $args attr-dict `:` functional-type($args, $res)

Attributes: ¶

Attribute	MLIR Type	Description
`m`	::mlir::IntegerAttr	32-bit signless integer attribute
`n`	::mlir::IntegerAttr	32-bit signless integer attribute
`k`	::mlir::IntegerAttr	32-bit signless integer attribute
`layoutA`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`layoutB`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`eltypeA`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)
`eltypeB`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)

Operands: ¶

Operand	Description
`args`	variadic of LLVM dialect-compatible type

Results: ¶

Result	Description
`res`	LLVM structure type

`nvvm.wmma.store` (NVVM::WMMAStoreOp) ¶

Warp synchronous matrix store

Syntax:

operation ::= `nvvm.wmma.store` $ptr `,` $stride `,` $args attr-dict `:` qualified(type($ptr)) `,`
              type($args)

Attributes: ¶

Attribute	MLIR Type	Description
`m`	::mlir::IntegerAttr	32-bit signless integer attribute
`n`	::mlir::IntegerAttr	32-bit signless integer attribute
`k`	::mlir::IntegerAttr	32-bit signless integer attribute
`layout`	::mlir::NVVM::MMALayoutAttr	NVVM MMA layout Enum cases: row (`row`) col (`col`)
`eltype`	::mlir::NVVM::MMATypesAttr	NVVM MMA types Enum cases: f16 (`f16`) f32 (`f32`) tf32 (`tf32`) bf16 (`bf16`) s8 (`s8`) u8 (`u8`) s32 (`s32`) s4 (`s4`) u4 (`u4`) b1 (`b1`) f64 (`f64`)

Operands: ¶

Operand	Description
`ptr`	LLVM pointer type
`args`	variadic of LLVM dialect-compatible type
`stride`	32-bit signless integer

Attributes ¶

LoadCacheModifierKindAttr ¶

NVVM load cache modifier kind

Syntax:

#nvvm.load_cache_modifier<
  ::mlir::NVVM::LoadCacheModifierKind   # value
>

Enum attribute of the different kinds of cache operators for load instructions.

For more information, see PTX ISA

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::LoadCacheModifierKind`	an enum of type LoadCacheModifierKind

MMAB1OpAttr ¶

MMA binary operations

Syntax:

#nvvm.mma_b1op<
  ::mlir::NVVM::MMAB1Op   # value
>

Enum cases:

none (none)
xor_popc (xor_popc)
and_popc (and_popc)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::MMAB1Op`	an enum of type MMAB1Op

MMAFragAttr ¶

NVVM MMA frag type

Syntax:

#nvvm.mma_frag<
  ::mlir::NVVM::MMAFrag   # value
>

Enum cases:

a (a)
b (b)
c (c)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::MMAFrag`	an enum of type MMAFrag

MMAIntOverflowAttr ¶

MMA overflow options

Syntax:

#nvvm.mma_int_overflow<
  ::mlir::NVVM::MMAIntOverflow   # value
>

Enum cases:

satfinite (satfinite)
wrapped (wrapped)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::MMAIntOverflow`	an enum of type MMAIntOverflow

MMALayoutAttr ¶

NVVM MMA layout

Syntax:

#nvvm.mma_layout<
  ::mlir::NVVM::MMALayout   # value
>

Enum cases:

row (row)
col (col)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::MMALayout`	an enum of type MMALayout

MMATypesAttr ¶

NVVM MMA types

Syntax:

#nvvm.mma_type<
  ::mlir::NVVM::MMATypes   # value
>

Enum cases:

f16 (f16)
f32 (f32)
tf32 (tf32)
bf16 (bf16)
s8 (s8)
u8 (u8)
s32 (s32)
s4 (s4)
u4 (u4)
b1 (b1)
f64 (f64)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::MMATypes`	an enum of type MMATypes

MMAShapeAttr ¶

Attribute for MMA operation shape.

Syntax:

#nvvm.shape<
  int,   # m
  int,   # n
  int   # k
>

Parameters: ¶

Parameter	C++ type	Description
m	`int`
n	`int`
k	`int`

NVVMTargetAttr ¶

Syntax:

#nvvm.target<
  int,   # O
  ::llvm::StringRef,   # triple
  ::llvm::StringRef,   # chip
  ::llvm::StringRef,   # features
  DictionaryAttr,   # flags
  ArrayAttr   # link
>

GPU target attribute for controlling compilation of NVIDIA targets. All parameters decay into default values if not present.

Examples:

Target with default values.

  gpu.module @mymodule [#nvvm.target] attributes {...} {
    ...
  }

Target with sm_90 chip and fast math.

  gpu.module @mymodule [#nvvm.target<chip = "sm_90", flags = {fast}>] {
    ...
  }

Parameters: ¶

Parameter	C++ type	Description
O	`int`	Optimization level to apply.
triple	`::llvm::StringRef`	Target triple.
chip	`::llvm::StringRef`	Target chip.
features	`::llvm::StringRef`	Target chip features.
flags	`DictionaryAttr`	Target specific flags.
link	`ArrayAttr`	Files to link to the LLVM module.

ProxyKindAttr ¶

Proxy kind

Syntax:

#nvvm.proxy_kind<
  ::mlir::NVVM::ProxyKind   # value
>

Enum cases:

alias (alias)
async (async)
async.global (async_global)
async.shared (async_shared)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::ProxyKind`	an enum of type ProxyKind

ReduxKindAttr ¶

NVVM redux kind

Syntax:

#nvvm.redux_kind<
  ::mlir::NVVM::ReduxKind   # value
>

Enum cases:

add (ADD)
and (AND)
max (MAX)
min (MIN)
or (OR)
umax (UMAX)
umin (UMIN)
xor (XOR)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::ReduxKind`	an enum of type ReduxKind

SetMaxRegisterActionAttr ¶

NVVM set max register action

Syntax:

#nvvm.action<
  ::mlir::NVVM::SetMaxRegisterAction   # value
>

Enum cases:

decrease (decrease)
increase (increase)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::SetMaxRegisterAction`	an enum of type SetMaxRegisterAction

SharedSpaceAttr ¶

Shared memory space

Syntax:

#nvvm.shared_space<
  ::mlir::NVVM::SharedSpace   # value
>

Enum cases:

cta (shared_cta)
cluster (shared_cluster)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::SharedSpace`	an enum of type SharedSpace

ShflKindAttr ¶

NVVM shuffle kind

Syntax:

#nvvm.shfl_kind<
  ::mlir::NVVM::ShflKind   # value
>

Enum cases:

bfly (bfly)
up (up)
down (down)
idx (idx)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::ShflKind`	an enum of type ShflKind

WGMMAScaleInAttr ¶

WGMMA overflow options

Syntax:

#nvvm.wgmma_scale_in<
  ::mlir::NVVM::WGMMAScaleIn   # value
>

Enum cases:

one (one)
neg (neg)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::WGMMAScaleIn`	an enum of type WGMMAScaleIn

WGMMAScaleOutAttr ¶

WGMMA input predicate

Syntax:

#nvvm.wgmma_scale_out<
  ::mlir::NVVM::WGMMAScaleOut   # value
>

Enum cases:

zero (zero)
one (one)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::WGMMAScaleOut`	an enum of type WGMMAScaleOut

WGMMATypesAttr ¶

NVVM WGMMA types

Syntax:

#nvvm.wgmma_type<
  ::mlir::NVVM::WGMMATypes   # value
>

Enum cases:

f16 (f16)
tf32 (tf32)
u8 (u8)
s8 (s8)
b1 (b1)
bf16 (bf16)
e4m3 (e4m3)
e5m2 (e5m2)
f32 (f32)
s32 (s32)

Parameters: ¶

Parameter	C++ type	Description
value	`::mlir::NVVM::WGMMATypes`	an enum of type WGMMATypes

Enums ¶

LoadCacheModifierKind ¶

NVVM load cache modifier kind

Cases: ¶

Symbol	Value	String
CA	`0`	ca
CG	`1`	cg
CS	`2`	cs
LU	`3`	lu
CV	`4`	cv

MMAB1Op ¶

MMA binary operations

Cases: ¶

Symbol	Value	String
none	`0`	none
xor_popc	`1`	xor_popc
and_popc	`2`	and_popc

MMAFrag ¶

NVVM MMA frag type

Cases: ¶

Symbol	Value	String
a	`0`	a
b	`1`	b
c	`2`	c

MMAIntOverflow ¶

MMA overflow options

Cases: ¶

Symbol	Value	String
satfinite	`1`	satfinite
wrapped	`0`	wrapped

MMALayout ¶

NVVM MMA layout

Cases: ¶

Symbol	Value	String
row	`0`	row
col	`1`	col

MMATypes ¶

NVVM MMA types

Cases: ¶

Symbol	Value	String
f16	`0`	f16
f32	`1`	f32
tf32	`2`	tf32
bf16	`9`	bf16
s8	`4`	s8
u8	`3`	u8
s32	`5`	s32
s4	`8`	s4
u4	`7`	u4
b1	`6`	b1
f64	`10`	f64

ProxyKind ¶

Proxy kind

Cases: ¶

Symbol	Value	String
alias	`0`	alias
async	`1`	async
async_global	`2`	async.global
async_shared	`3`	async.shared

ReduxKind ¶

NVVM redux kind

Cases: ¶

Symbol	Value	String
ADD	`1`	add
AND	`2`	and
MAX	`3`	max
MIN	`4`	min
OR	`5`	or
UMAX	`6`	umax
UMIN	`7`	umin
XOR	`8`	xor

SetMaxRegisterAction ¶

NVVM set max register action

Cases: ¶

Symbol	Value	String
decrease	`1`	decrease
increase	`0`	increase

SharedSpace ¶

Shared memory space

Cases: ¶

Symbol	Value	String
shared_cta	`0`	cta
shared_cluster	`1`	cluster

ShflKind ¶

NVVM shuffle kind

Cases: ¶

Symbol	Value	String
bfly	`0`	bfly
up	`1`	up
down	`2`	down
idx	`3`	idx

WGMMAScaleIn ¶

WGMMA overflow options

Cases: ¶

Symbol	Value	String
one	`1`	one
neg	`-1`	neg

WGMMAScaleOut ¶

WGMMA input predicate

Cases: ¶

Symbol	Value	String
zero	`0`	zero
one	`1`	one

WGMMATypes ¶

NVVM WGMMA types

Cases: ¶

Symbol	Value	String
f16	`0`	f16
tf32	`1`	tf32
u8	`2`	u8
s8	`3`	s8
b1	`4`	b1
bf16	`5`	bf16
e4m3	`6`	e4m3
e5m2	`7`	e5m2
f32	`8`	f32
s32	`9`	s32

'nvvm' Dialect

Operations ¶

nvvm.bar.warp.sync (NVVM::SyncWarpOp) ¶

Operands: ¶

nvvm.barrier (NVVM::BarrierOp) ¶

Operands: ¶

nvvm.barrier.arrive (NVVM::BarrierArriveOp) ¶

Operands: ¶

nvvm.barrier0 (NVVM::Barrier0Op) ¶

nvvm.cluster.arrive (NVVM::ClusterArriveOp) ¶

Attributes: ¶

nvvm.cluster.arrive.relaxed (NVVM::ClusterArriveRelaxedOp) ¶

Attributes: ¶

nvvm.cluster.wait (NVVM::ClusterWaitOp) ¶

Attributes: ¶

nvvm.cp.async.bulk.commit.group (NVVM::CpAsyncBulkCommitGroupOp) ¶

nvvm.cp.async.bulk.tensor.global.shared.cta (NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) ¶

Operands: ¶

nvvm.cp.async.bulk.tensor.shared.cluster.global (NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) ¶

Operands: ¶

nvvm.cp.async.bulk.wait_group (NVVM::CpAsyncBulkWaitGroupOp) ¶

Attributes: ¶

nvvm.cp.async.commit.group (NVVM::CpAsyncCommitGroupOp) ¶

nvvm.cp.async.mbarrier.arrive (NVVM::CpAsyncMBarrierArriveOp) ¶

Attributes: ¶

Operands: ¶

nvvm.cp.async.mbarrier.arrive.shared (NVVM::CpAsyncMBarrierArriveSharedOp) ¶

Attributes: ¶

Operands: ¶

nvvm.cp.async.shared.global (NVVM::CpAsyncOp) ¶

Attributes: ¶

Operands: ¶

nvvm.cp.async.wait.group (NVVM::CpAsyncWaitGroupOp) ¶

Attributes: ¶

nvvm.elect.sync (NVVM::ElectSyncOp) ¶

Results: ¶

nvvm.fence.mbarrier.init (NVVM::FenceMbarrierInitOp) ¶

nvvm.fence.proxy (NVVM::FenceProxyOp) ¶

Attributes: ¶

nvvm.fence.sc.cluster (NVVM::FenceScClusterOp) ¶

nvvm.ldmatrix (NVVM::LdMatrixOp) ¶

Attributes: ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.arrive (NVVM::MBarrierArriveOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.arrive.expect_tx (NVVM::MBarrierArriveExpectTxOp) ¶

Operands: ¶

nvvm.mbarrier.arrive.expect_tx.shared (NVVM::MBarrierArriveExpectTxSharedOp) ¶

Operands: ¶

nvvm.mbarrier.arrive.nocomplete (NVVM::MBarrierArriveNocompleteOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.arrive.nocomplete.shared (NVVM::MBarrierArriveNocompleteSharedOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.arrive.shared (NVVM::MBarrierArriveSharedOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.init (NVVM::MBarrierInitOp) ¶

Operands: ¶

nvvm.mbarrier.init.shared (NVVM::MBarrierInitSharedOp) ¶

Operands: ¶

nvvm.mbarrier.inval (NVVM::MBarrierInvalOp) ¶

Operands: ¶

nvvm.mbarrier.inval.shared (NVVM::MBarrierInvalSharedOp) ¶

Operands: ¶

nvvm.mbarrier.test.wait (NVVM::MBarrierTestWaitOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.test.wait.shared (NVVM::MBarrierTestWaitSharedOp) ¶

Operands: ¶

Results: ¶

nvvm.mbarrier.try_wait.parity (NVVM::MBarrierTryWaitParityOp) ¶

Operands: ¶

nvvm.mbarrier.try_wait.parity.shared (NVVM::MBarrierTryWaitParitySharedOp) ¶

Operands: ¶

nvvm.mma.sync (NVVM::MmaOp) ¶

Attributes: ¶

`nvvm.bar.warp.sync` (NVVM::SyncWarpOp) ¶

`nvvm.barrier` (NVVM::BarrierOp) ¶

`nvvm.barrier.arrive` (NVVM::BarrierArriveOp) ¶

`nvvm.barrier0` (NVVM::Barrier0Op) ¶

`nvvm.cluster.arrive` (NVVM::ClusterArriveOp) ¶

`nvvm.cluster.arrive.relaxed` (NVVM::ClusterArriveRelaxedOp) ¶

`nvvm.cluster.wait` (NVVM::ClusterWaitOp) ¶

`nvvm.cp.async.bulk.commit.group` (NVVM::CpAsyncBulkCommitGroupOp) ¶

`nvvm.cp.async.bulk.tensor.global.shared.cta` (NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) ¶

`nvvm.cp.async.bulk.tensor.shared.cluster.global` (NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) ¶

`nvvm.cp.async.bulk.wait_group` (NVVM::CpAsyncBulkWaitGroupOp) ¶

`nvvm.cp.async.commit.group` (NVVM::CpAsyncCommitGroupOp) ¶

`nvvm.cp.async.mbarrier.arrive` (NVVM::CpAsyncMBarrierArriveOp) ¶

`nvvm.cp.async.mbarrier.arrive.shared` (NVVM::CpAsyncMBarrierArriveSharedOp) ¶

`nvvm.cp.async.shared.global` (NVVM::CpAsyncOp) ¶

`nvvm.cp.async.wait.group` (NVVM::CpAsyncWaitGroupOp) ¶

`nvvm.elect.sync` (NVVM::ElectSyncOp) ¶

`nvvm.fence.mbarrier.init` (NVVM::FenceMbarrierInitOp) ¶

`nvvm.fence.proxy` (NVVM::FenceProxyOp) ¶

`nvvm.fence.sc.cluster` (NVVM::FenceScClusterOp) ¶

`nvvm.ldmatrix` (NVVM::LdMatrixOp) ¶

`nvvm.mbarrier.arrive` (NVVM::MBarrierArriveOp) ¶

`nvvm.mbarrier.arrive.expect_tx` (NVVM::MBarrierArriveExpectTxOp) ¶

`nvvm.mbarrier.arrive.expect_tx.shared` (NVVM::MBarrierArriveExpectTxSharedOp) ¶

`nvvm.mbarrier.arrive.nocomplete` (NVVM::MBarrierArriveNocompleteOp) ¶

`nvvm.mbarrier.arrive.nocomplete.shared` (NVVM::MBarrierArriveNocompleteSharedOp) ¶

`nvvm.mbarrier.arrive.shared` (NVVM::MBarrierArriveSharedOp) ¶

`nvvm.mbarrier.init` (NVVM::MBarrierInitOp) ¶

`nvvm.mbarrier.init.shared` (NVVM::MBarrierInitSharedOp) ¶

`nvvm.mbarrier.inval` (NVVM::MBarrierInvalOp) ¶

`nvvm.mbarrier.inval.shared` (NVVM::MBarrierInvalSharedOp) ¶

`nvvm.mbarrier.test.wait` (NVVM::MBarrierTestWaitOp) ¶

`nvvm.mbarrier.test.wait.shared` (NVVM::MBarrierTestWaitSharedOp) ¶

`nvvm.mbarrier.try_wait.parity` (NVVM::MBarrierTryWaitParityOp) ¶

`nvvm.mbarrier.try_wait.parity.shared` (NVVM::MBarrierTryWaitParitySharedOp) ¶

`nvvm.mma.sync` (NVVM::MmaOp) ¶

`nvvm.prefetch.tensormap` (NVVM::PrefetchTensorMapOp) ¶

`nvvm.rcp.approx.ftz.f` (NVVM::RcpApproxFtzF32Op) ¶

`nvvm.read.ptx.sreg.clock` (NVVM::ClockOp) ¶

`nvvm.read.ptx.sreg.clock64` (NVVM::Clock64Op) ¶

`nvvm.read.ptx.sreg.cluster.ctaid.x` (NVVM::BlockInClusterIdXOp) ¶

`nvvm.read.ptx.sreg.cluster.ctaid.y` (NVVM::BlockInClusterIdYOp) ¶

`nvvm.read.ptx.sreg.cluster.ctaid.z` (NVVM::BlockInClusterIdZOp) ¶

`nvvm.read.ptx.sreg.cluster.ctarank` (NVVM::ClusterId) ¶

`nvvm.read.ptx.sreg.cluster.nctaid.x` (NVVM::ClusterDimBlocksXOp) ¶

`nvvm.read.ptx.sreg.cluster.nctaid.y` (NVVM::ClusterDimBlocksYOp) ¶

`nvvm.read.ptx.sreg.cluster.nctaid.z` (NVVM::ClusterDimBlocksZOp) ¶

`nvvm.read.ptx.sreg.cluster.nctarank` (NVVM::ClusterDim) ¶

`nvvm.read.ptx.sreg.clusterid.x` (NVVM::ClusterIdXOp) ¶

`nvvm.read.ptx.sreg.clusterid.y` (NVVM::ClusterIdYOp) ¶

`nvvm.read.ptx.sreg.clusterid.z` (NVVM::ClusterIdZOp) ¶

`nvvm.read.ptx.sreg.ctaid.x` (NVVM::BlockIdXOp) ¶

`nvvm.read.ptx.sreg.ctaid.y` (NVVM::BlockIdYOp) ¶

`nvvm.read.ptx.sreg.ctaid.z` (NVVM::BlockIdZOp) ¶

`nvvm.read.ptx.sreg.laneid` (NVVM::LaneIdOp) ¶

`nvvm.read.ptx.sreg.nclusterid.x` (NVVM::ClusterDimXOp) ¶

`nvvm.read.ptx.sreg.nclusterid.y` (NVVM::ClusterDimYOp) ¶

`nvvm.read.ptx.sreg.nclusterid.z` (NVVM::ClusterDimZOp) ¶

`nvvm.read.ptx.sreg.nctaid.x` (NVVM::GridDimXOp) ¶

`nvvm.read.ptx.sreg.nctaid.y` (NVVM::GridDimYOp) ¶

`nvvm.read.ptx.sreg.nctaid.z` (NVVM::GridDimZOp) ¶

`nvvm.read.ptx.sreg.ntid.x` (NVVM::BlockDimXOp) ¶

`nvvm.read.ptx.sreg.ntid.y` (NVVM::BlockDimYOp) ¶

`nvvm.read.ptx.sreg.ntid.z` (NVVM::BlockDimZOp) ¶

`nvvm.read.ptx.sreg.tid.x` (NVVM::ThreadIdXOp) ¶

`nvvm.read.ptx.sreg.tid.y` (NVVM::ThreadIdYOp) ¶

`nvvm.read.ptx.sreg.tid.z` (NVVM::ThreadIdZOp) ¶

`nvvm.read.ptx.sreg.warpsize` (NVVM::WarpSizeOp) ¶

`nvvm.redux.sync` (NVVM::ReduxOp) ¶