'nvvm' Dialect
Operations ¶
nvvm.bar.warp.sync
(NVVM::SyncWarpOp) ¶
Syntax:
operation ::= `nvvm.bar.warp.sync` $mask attr-dict `:` type($mask)
Operands: ¶
Operand | Description |
---|---|
mask | LLVM dialect-compatible type |
nvvm.barrier
(NVVM::BarrierOp) ¶
Syntax:
operation ::= `nvvm.barrier` (`id` `=` $barrierId^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict
Traits: AttrSizedOperandSegments
Operands: ¶
Operand | Description |
---|---|
barrierId | 32-bit signless integer |
numberOfThreads | 32-bit signless integer |
nvvm.barrier.arrive
(NVVM::BarrierArriveOp) ¶
Syntax:
operation ::= `nvvm.barrier.arrive` (`id` `=` $barrierId^)? `number_of_threads` `=` $numberOfThreads attr-dict
Thread that executes this op announces their arrival at the barrier with given id and continue their execution.
The default barrier id is 0 that is similar to nvvm.barrier
Op. When
barrierId
is not present, the default barrier id is used.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
barrierId | 32-bit signless integer |
numberOfThreads | 32-bit signless integer |
nvvm.barrier0
(NVVM::Barrier0Op) ¶
Syntax:
operation ::= `nvvm.barrier0` attr-dict
nvvm.breakpoint
(NVVM::Breakpoint) ¶
Breakpoint Op
Syntax:
operation ::= `nvvm.breakpoint` attr-dict
Breakpoint suspends execution of the program for debugging. For more information, see PTX ISA
nvvm.cluster.arrive
(NVVM::ClusterArriveOp) ¶
Cluster Barrier Arrive Op
Syntax:
operation ::= `nvvm.cluster.arrive` attr-dict
The cluster.arrive
can be used by the threads within the cluster for synchronization and
communication. The cluster.arrive
instruction marks the warps’ arrival at the barrier
without causing the executing thread to wait for other participating threads.
The aligned
attribute, when provided, generates the .aligned version of the PTX instruction.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.cluster.arrive.relaxed
(NVVM::ClusterArriveRelaxedOp) ¶
Cluster Barrier Relaxed Arrive Op
Syntax:
operation ::= `nvvm.cluster.arrive.relaxed` attr-dict
The cluster.arrive
can be used by the threads within the cluster for synchronization and
communication. The cluster.arrive
instruction marks the warps’ arrival at the barrier
without causing the executing thread to wait for other participating threads.
The aligned
attribute, when provided, generates the .aligned version of the PTX instruction.
The .relaxed qualifier on cluster.arrive
specifies that there are no memory
ordering and visibility guarantees provided for the memory accesses performed prior to
cluster.arrive
.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.cluster.wait
(NVVM::ClusterWaitOp) ¶
Cluster Barrier Wait Op
Syntax:
operation ::= `nvvm.cluster.wait` attr-dict
The cluster.wait
causes the executing thread to wait for all non-exited threads
of the cluster to perform cluster.arrive
. The aligned
attribute, when provided,
generates the .aligned version of the PTX instruction.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-barrier-cluster)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.cp.async.bulk.commit.group
(NVVM::CpAsyncBulkCommitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.commit.group` attr-dict
This Op commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-commit-group)
nvvm.cp.async.bulk.tensor.global.shared.cta
(NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.global.shared.cta` $tmaDescriptor `,`
$srcMem `,`
`box` `[`$coordinates `]`
(`,` `predicate` `=` $predicate^)?
attr-dict `:` type(operands)
Traits: AttrSizedOperandSegments
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
tmaDescriptor | LLVM pointer type |
srcMem | LLVM pointer in address space 3 |
coordinates | variadic of 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.cp.async.bulk.tensor.shared.cluster.global
(NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.shared.cluster.global` $dstMem `,`
$tmaDescriptor `,`
$mbar `,`
`box` `[`$coordinates `]`
(`im2col` `[` $im2colOffsets^ `]` )?
(`multicast_mask` `=` $multicastMask^ )?
(`l2_cache_hint` `=` $l2CacheHint^ )?
(`predicate` `=` $predicate^)?
attr-dict `:` type($dstMem) `,` type($tmaDescriptor)
Initiates an asynchronous copy operation on the tensor data from global memory to shared memory.
The Op operates has two load modes:
Tiled Mode: It’s the default mode. The source multi-dimensional tensor layout is preserved at the destination.
Im2col Mode: This mode is used when
im2colOffsets
operands are present. the elements in the Bounding Box of the source tensor are rearranged into columns at the destination. In this mode, the tensor has to be at least 3-dimensional.
The multicastMask
operand is optional. When it is present, the Op copies
data from global memory to shared memory of multiple CTAs in the cluster.
Operand multicastMask
specifies the destination CTAs in the cluster such
that each bit position in the 16-bit multicastMask
operand corresponds to
the nvvm.read.ptx.sreg.ctaid
of the destination CTA.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-tensor)
Traits: AttrSizedOperandSegments
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
dstMem | LLVM pointer in address space 3 |
tmaDescriptor | LLVM pointer type |
coordinates | variadic of 32-bit signless integer |
mbar | LLVM pointer in address space 3 |
im2colOffsets | variadic of 16-bit signless integer |
multicastMask | 16-bit signless integer |
l2CacheHint | 64-bit signless integer |
predicate | 1-bit signless integer |
nvvm.cp.async.bulk.wait_group
(NVVM::CpAsyncBulkWaitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.wait_group` $group attr-dict
Op waits for completion of the most recent bulk async-groups.
The $group
operand tells waiting has to be done until for $group or fewer
of the most recent bulk async-groups. If $group
is 0, the op wait until
all the most recent bulk async-groups have completed.
The $read
indicates that the waiting has to be done until all the bulk
async operations in the specified bulk async-group have completed reading
from their source locations.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#data-movement-and-conversion-instructions-cp-async-bulk-wait-group)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::IntegerAttr | 32-bit signless integer attribute whose minimum value is 0 |
read | ::mlir::UnitAttr | unit attribute |
nvvm.cp.async.commit.group
(NVVM::CpAsyncCommitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.commit.group` attr-dict
nvvm.cp.async.mbarrier.arrive
(NVVM::CpAsyncMBarrierArriveOp) ¶
NVVM Dialect Op for cp.async.mbarrier.arrive
Syntax:
operation ::= `nvvm.cp.async.mbarrier.arrive` $addr attr-dict `:` type(operands)
The cp.async.mbarrier.arrive
Op makes the mbarrier object track
all prior cp.async operations initiated by the executing thread.
The addr
operand specifies the address of the mbarrier object
in generic address space. The noinc
attr impacts how the
mbarrier’s state is updated.
[For more information, refer PTX ISA]
(
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
noinc | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
nvvm.cp.async.mbarrier.arrive.shared
(NVVM::CpAsyncMBarrierArriveSharedOp) ¶
NVVM Dialect Op for cp.async.mbarrier.arrive.shared
Syntax:
operation ::= `nvvm.cp.async.mbarrier.arrive.shared` $addr attr-dict `:` type(operands)
The cp.async.mbarrier.arrive.shared
Op makes the mbarrier object
track all prior cp.async operations initiated by the executing thread.
The addr
operand specifies the address of the mbarrier object in
shared memory. The noinc
attr impacts how the mbarrier’s state
is updated. [For more information, refer PTX ISA]
(
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-cp-async-mbarrier-arrive)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
noinc | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
nvvm.cp.async.shared.global
(NVVM::CpAsyncOp) ¶
Syntax:
operation ::= `nvvm.cp.async.shared.global` $dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
modifier | ::mlir::NVVM::LoadCacheModifierKindAttr | NVVM load cache modifier kind
|
Operands: ¶
Operand | Description |
---|---|
dst | LLVM pointer in address space 3 |
src | LLVM pointer in address space 1 |
cpSize | LLVM dialect-compatible type |
nvvm.cp.async.wait.group
(NVVM::CpAsyncWaitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.wait.group` $n attr-dict
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
nvvm.elect.sync
(NVVM::ElectSyncOp) ¶
Elect one leader thread
Syntax:
operation ::= `nvvm.elect.sync` attr-dict `->` type(results)
The elect.sync
instruction elects one predicated active leader
thread from among a set of threads specified in membermask.
The membermask is set to 0xFFFFFFFF
for the current version
of this Op. The predicate result is set to True
for the
leader thread, and False
for all other threads.
[For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-elect-sync)
Results: ¶
Result | Description |
---|---|
pred | 1-bit signless integer |
nvvm.fence.mbarrier.init
(NVVM::FenceMbarrierInitOp) ¶
Syntax:
operation ::= `nvvm.fence.mbarrier.init` attr-dict
Fence operation that applies on the prior nvvm.mbarrier.init [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
Interfaces: BasicPtxBuilderInterface
nvvm.fence.proxy
(NVVM::FenceProxyOp) ¶
Syntax:
operation ::= `nvvm.fence.proxy` attr-dict
Fence operation with proxy to establish an ordering between memory accesses that may happen through different proxies. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ProxyKindAttr | Proxy kindEnum cases:
|
space | ::mlir::NVVM::SharedSpaceAttr | Shared memory spaceEnum cases:
|
nvvm.fence.proxy.acquire
(NVVM::FenceProxyAcquireOp) ¶
Uni-directional proxy fence operation with acquire semantics
Syntax:
operation ::= `nvvm.fence.proxy.acquire` $scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict
fence.proxy.acquire
is a uni-directional fence used to establish ordering
between a prior memory access performed via the generic proxy and a
subsequent memory access performed via the tensormap proxy
The address operand addr
and the operand size
together specify the
memory range [addr, addr+size)
on which the ordering guarantees on the
memory accesses across the proxies is to be provided. The only supported
value for the size
operand is 128 and must be an immediate. Generic Addressing
is used unconditionally, and the address specified by the operand addr
must
fall within the .global
state space. Otherwise, the behavior is undefined
[For more information, see PTX ISA]
(
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
scope | ::mlir::NVVM::MemScopeKindAttr | NVVM Memory Scope kindEnum cases:
|
fromProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kindEnum cases:
|
toProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kindEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 0 |
size | 32-bit signless integer |
nvvm.fence.proxy.release
(NVVM::FenceProxyReleaseOp) ¶
Uni-directional proxy fence operation with release semantics
Syntax:
operation ::= `nvvm.fence.proxy.release` $scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict
fence.proxy.release
is a uni-directional fence used to establish ordering
between a prior memory access performed via the generic proxy and a
subsequent memory access performed via the tensormap proxy. fence.proxy.release
operation can form a release sequence that synchronizes with an acquire
sequence that contains the fence.proxy.acquire proxy fence operation
[For more information, see PTX ISA]
(
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-membar)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
scope | ::mlir::NVVM::MemScopeKindAttr | NVVM Memory Scope kindEnum cases:
|
fromProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kindEnum cases:
|
toProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kindEnum cases:
|
nvvm.fence.sc.cluster
(NVVM::FenceScClusterOp) ¶
Syntax:
operation ::= `nvvm.fence.sc.cluster` attr-dict
nvvm.ldmatrix
(NVVM::LdMatrixOp) ¶
Cooperative matrix load
Syntax:
operation ::= `nvvm.ldmatrix` $ptr attr-dict `:` functional-type($ptr, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer type |
Results: ¶
Result | Description |
---|---|
res | any type |
nvvm.mbarrier.arrive
(NVVM::MBarrierArriveOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive` $addr attr-dict `:` type($addr) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.expect_tx
(NVVM::MBarrierArriveExpectTxOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive.expect_tx` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
txcount | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.arrive.expect_tx.shared
(NVVM::MBarrierArriveExpectTxSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive.expect_tx.shared` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
txcount | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.arrive.nocomplete
(NVVM::MBarrierArriveNocompleteOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive.nocomplete` $addr `,` $count attr-dict `:` type(operands) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
count | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.nocomplete.shared
(NVVM::MBarrierArriveNocompleteSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive.nocomplete.shared` $addr `,` $count attr-dict `:` type(operands) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
count | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.shared
(NVVM::MBarrierArriveSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.arrive.shared` $addr attr-dict `:` qualified(type($addr)) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.init
(NVVM::MBarrierInitOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.init` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
count | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.init.shared
(NVVM::MBarrierInitSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.init.shared` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
count | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.inval
(NVVM::MBarrierInvalOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.inval` $addr attr-dict `:` type(operands)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
nvvm.mbarrier.inval.shared
(NVVM::MBarrierInvalSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.inval.shared` $addr attr-dict `:` type(operands)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
nvvm.mbarrier.test.wait
(NVVM::MBarrierTestWaitOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.test.wait` $addr `,` $state attr-dict `:` type(operands) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
state | LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.test.wait.shared
(NVVM::MBarrierTestWaitSharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.test.wait.shared` $addr `,` $state attr-dict `:` type(operands) `->` type($res)
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
state | LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.try_wait.parity
(NVVM::MBarrierTryWaitParityOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.try_wait.parity` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
phase | 32-bit signless integer |
ticks | 32-bit signless integer |
nvvm.mbarrier.try_wait.parity.shared
(NVVM::MBarrierTryWaitParitySharedOp) ¶
Syntax:
operation ::= `nvvm.mbarrier.try_wait.parity.shared` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
phase | 32-bit signless integer |
ticks | 32-bit signless integer |
nvvm.mma.sync
(NVVM::MmaOp) ¶
Cooperative matrix-multiply and accumulate
The nvvm.mma.sync
operation collectively performs the operation
D = matmul(A, B) + C
using all threads in a warp.
All the threads in the warp must execute the same mma.sync
operation.
For each possible multiplicand PTX data type, there are one or more possible instruction shapes given as “mMnNkK”. The below table describes the posssibilities as well as the types required for the operands. Note that the data type for C (the accumulator) and D (the result) can vary independently when there are multiple possibilities in the “C/D Type” column.
When an optional attribute cannot be immediately inferred from the types of the operands and the result during parsing or validation, an error will be raised.
b1Op
is only relevant when the binary (b1) type is given to
multiplicandDataType
. It specifies how the multiply-and-acumulate is
performed and is either xor_popc
or and_poc
. The default is xor_popc
.
intOverflowBehavior
is only relevant when the multiplicandType
attribute
is one of u8, s8, u4, s4
, this attribute describes how overflow is handled
in the accumulator. When the attribute is satfinite
, the accumulator values
are clamped in the int32 range on overflow. This is the default behavior.
Alternatively, accumulator behavior wrapped
can also be specified, in
which case overflow wraps from one end of the range to the other.
layoutA
and layoutB
are required and should generally be set to
#nvvm.mma_layout<row>
and #nvvm.mma_layout<col>
respectively, but other
combinations are possible for certain layouts according to the table below.
| A/B Type | Shape | ALayout | BLayout | A Type | B Type | C/D Type |
|----------|-----------|---------|---------|----------|----------|-------------------|
| f64 | .m8n8k4 | row | col | 1x f64 | 1x f64 | 2x f64 |
| f16 | .m8n8k4 | row/col | row/col | 2x f16x2 | 2x f16x2 | 4x f16x2 or 8xf32 |
| | .m16n8k8 | row | col | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
| | .m16n8k16 | row | col | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| bf16 | .m16n8k8 | row | col | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
| | .m16n8k16 | row | col | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| tf32 | .m16n8k4 | row | col | 2x i32 | 1x i32 | 4x f32 |
| | .m16n8k8 | row | col | 4x i32 | 2x i32 | 2x f16x2 or 4 f32 |
| u8/s8 | .m8n8k16 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | .m16n8k16 | row | col | 2x i32 | 1x i32 | 4x i32 |
| | .m16n8k32 | row | col | 4x i32 | 2x i32 | 4x i32 |
| u4/s4 | .m8n8k32 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | m16n8k32 | row | col | 2x i32 | 1x i32 | 4x i32 |
| | m16n8k64 | row | col | 4x i32 | 2x i32 | 4x i32 |
| b1 | m8n8k128 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | m16n8k128 | row | col | 2x i32 | 1x i32 | 4x i32 |
Example:
%128 = nvvm.mma.sync A[%120, %121, %122, %123]
B[%124, %125]
C[%126, %127]
{layoutA = #nvvm.mma_layout<row>,
layoutB = #nvvm.mma_layout<col>,
shape = {k = 16 : i32, m = 16 : i32, n = 8 : i32}}
: (vector<2xf16>, vector<2xf16>, vector<2xf16>)
-> !llvm.struct<(vector<2xf16>, vector<2xf16>)>
Traits: AttrSizedOperandSegments
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
shape | ::mlir::NVVM::MMAShapeAttr | Attribute for MMA operation shape. |
b1Op | ::mlir::NVVM::MMAB1OpAttr | MMA binary operationsEnum cases:
|
intOverflowBehavior | ::mlir::NVVM::MMAIntOverflowAttr | MMA overflow optionsEnum cases:
|
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
multiplicandAPtxType | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
multiplicandBPtxType | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
operandA | variadic of LLVM dialect-compatible type |
operandB | variadic of LLVM dialect-compatible type |
operandC | variadic of LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.prefetch.tensormap
(NVVM::PrefetchTensorMapOp) ¶
Syntax:
operation ::= `nvvm.prefetch.tensormap` $tmaDescriptor (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
tmaDescriptor | LLVM pointer type |
predicate | 1-bit signless integer |
nvvm.rcp.approx.ftz.f
(NVVM::RcpApproxFtzF32Op) ¶
Syntax:
operation ::= `nvvm.rcp.approx.ftz.f` $arg attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
arg | 32-bit float |
Results: ¶
Result | Description |
---|---|
res | 32-bit float |
nvvm.read.ptx.sreg.clock
(NVVM::ClockOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clock` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clock64
(NVVM::Clock64Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clock64` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.x
(NVVM::BlockInClusterIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.y
(NVVM::BlockInClusterIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.z
(NVVM::BlockInClusterIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctarank
(NVVM::ClusterId) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctarank` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.x
(NVVM::ClusterDimBlocksXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.y
(NVVM::ClusterDimBlocksYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.z
(NVVM::ClusterDimBlocksZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctarank
(NVVM::ClusterDim) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctarank` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.x
(NVVM::ClusterIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.y
(NVVM::ClusterIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.z
(NVVM::ClusterIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.x
(NVVM::BlockIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.y
(NVVM::BlockIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.z
(NVVM::BlockIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg0
(NVVM::EnvReg0Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg0` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg1
(NVVM::EnvReg1Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg1` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg10
(NVVM::EnvReg10Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg10` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg11
(NVVM::EnvReg11Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg11` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg12
(NVVM::EnvReg12Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg12` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg13
(NVVM::EnvReg13Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg13` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg14
(NVVM::EnvReg14Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg14` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg15
(NVVM::EnvReg15Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg15` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg16
(NVVM::EnvReg16Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg16` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg17
(NVVM::EnvReg17Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg17` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg18
(NVVM::EnvReg18Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg18` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg19
(NVVM::EnvReg19Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg19` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg2
(NVVM::EnvReg2Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg2` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg20
(NVVM::EnvReg20Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg20` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg21
(NVVM::EnvReg21Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg21` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg22
(NVVM::EnvReg22Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg22` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg23
(NVVM::EnvReg23Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg23` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg24
(NVVM::EnvReg24Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg24` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg25
(NVVM::EnvReg25Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg25` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg26
(NVVM::EnvReg26Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg26` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg27
(NVVM::EnvReg27Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg27` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg28
(NVVM::EnvReg28Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg28` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg29
(NVVM::EnvReg29Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg29` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg3
(NVVM::EnvReg3Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg3` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg30
(NVVM::EnvReg30Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg30` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg31
(NVVM::EnvReg31Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg31` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg4
(NVVM::EnvReg4Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg4` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg5
(NVVM::EnvReg5Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg5` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg6
(NVVM::EnvReg6Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg6` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg7
(NVVM::EnvReg7Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg7` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg8
(NVVM::EnvReg8Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg8` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg9
(NVVM::EnvReg9Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg9` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.globaltimer
(NVVM::GlobalTimerOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.globaltimer` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.gridid
(NVVM::GridIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.gridid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.laneid
(NVVM::LaneIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.laneid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.eq
(NVVM::LaneMaskEqOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.eq` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.ge
(NVVM::LaneMaskGeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.ge` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.gt
(NVVM::LaneMaskGtOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.gt` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.le
(NVVM::LaneMaskLeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.le` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.lt
(NVVM::LaneMaskLtOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.lt` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.x
(NVVM::ClusterDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.y
(NVVM::ClusterDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.z
(NVVM::ClusterDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.x
(NVVM::GridDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.y
(NVVM::GridDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.z
(NVVM::GridDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nsmid
(NVVM::SmDimOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nsmid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.x
(NVVM::BlockDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.y
(NVVM::BlockDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.z
(NVVM::BlockDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nwarpid
(NVVM::WarpDimOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nwarpid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.smid
(NVVM::SmIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.smid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.x
(NVVM::ThreadIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.y
(NVVM::ThreadIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.z
(NVVM::ThreadIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.warpid
(NVVM::WarpIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.warpid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.warpsize
(NVVM::WarpSizeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.warpsize` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.redux.sync
(NVVM::ReduxOp) ¶
Syntax:
operation ::= `nvvm.redux.sync` $kind $val `,` $mask_and_clamp attr-dict `:` type($val) `->` type($res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ReduxKindAttr | NVVM redux kindEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
val | LLVM dialect-compatible type |
mask_and_clamp | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.setmaxregister
(NVVM::SetMaxRegisterOp) ¶
Syntax:
operation ::= `nvvm.setmaxregister` $action $regCount attr-dict
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
regCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
action | ::mlir::NVVM::SetMaxRegisterActionAttr | NVVM set max register actionEnum cases:
|
nvvm.shfl.sync
(NVVM::ShflOp) ¶
NVVM Dialect Op for shfl.sync
Syntax:
operation ::= `nvvm.shfl.sync` $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp attr-dict
`:` type($val) `->` type($res)
The shfl.sync
Op implements data shuffle within threads of a warp.
The thread_mask
denotes the threads participating in the Op where
the bit position corresponds to a particular thread’s laneid.
The offset
specifies a source lane or source lane offset
(depending on kind
). The val
is the input value to be copied from
the source. The mask_and_clamp
contains two packed values specifying
a mask for logically splitting warps into sub-segments and an upper bound
for clamping the source lane index.
[For more information, refer PTX ISA]
(
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-shfl-sync)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ShflKindAttr | NVVM shuffle kindEnum cases:
|
return_value_and_is_valid | ::mlir::UnitAttr | unit attribute |
Operands: ¶
Operand | Description |
---|---|
thread_mask | 32-bit signless integer |
val | LLVM dialect-compatible type |
offset | 32-bit signless integer |
mask_and_clamp | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.stmatrix
(NVVM::StMatrixOp) ¶
Cooperative matrix store
Syntax:
operation ::= `nvvm.stmatrix` $ptr `,` $sources attr-dict `:` type(operands)
Collectively store one or more matrices across all threads in a warp to the location indicated by the address operand $ptr in shared memory. [For more information, see PTX ISA] ( https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#warp-level-matrix-store-instruction-stmatrix)
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer in address space 3 |
sources | variadic of 32-bit signless integer |
nvvm.vote.ballot.sync
(NVVM::VoteBallotOp) ¶
Operands: ¶
Operand | Description |
---|---|
mask | LLVM dialect-compatible type |
pred | LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.wgmma.commit.group.sync.aligned
(NVVM::WgmmaGroupSyncAlignedOp) ¶
Syntax:
operation ::= `nvvm.wgmma.commit.group.sync.aligned` attr-dict
Commits all prior uncommitted warpgroup level matrix multiplication operations.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
nvvm.wgmma.fence.aligned
(NVVM::WgmmaFenceAlignedOp) ¶
Syntax:
operation ::= `nvvm.wgmma.fence.aligned` attr-dict
Enforce an ordering of register accesses between warpgroup level matrix multiplication and other operations.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
nvvm.wgmma.mma_async
(NVVM::WgmmaMmaAsyncOp) ¶
Syntax:
operation ::= `nvvm.wgmma.mma_async` $descriptorA `,` $descriptorB `,` $inouts `,` $shape `,`
`D` `[` $typeD `,` $scaleD (`,` $satfinite^)? `]` `,`
`A` `[` $typeA `,` $scaleA `,` $layoutA `]` `,`
`B` `[` $typeB `,` $scaleB `,` $layoutB `]`
attr-dict `:`
type($inouts) `->` type($results)
The warpgroup (128 threads) level matrix multiply and accumulate operation has either of the following forms, where matrix D is called accumulator: D = A * B + D D = A * B, where the input from accumulator D is disabled.
Supported shapes:
|--------------|--------------|------------|--------------|---------------|
| | | | |f16+=e4m3*e4m3 |
| | | | |f16+=e5m2*e5m2 |
|f32+=tf32*tf32|f16+=f16 *f16 | s32+=s8*s8 |s32 += b1 * b1|f16+=e5m2*e4m3 |
| |f32+=f16 *f16 | s32+=u8*u8 | |f16+=e4m3*e5m2 |
| |f32+=bf16*bf16| s32+=u8*u8 | |f16+=e4m3*e5m2 |
| |f32+=bf16*bf16| s32+=s8*u8 | |f32+=e4m3*e4m3 |
| | | s32+=u8*s8 | |f32+=e5m2*e5m2 |
| | | | |f32+=e4m3*e5m2 |
| | | | |f32+=e4m3*e5m2 |
|--------------|--------------|------------|--------------|---------------|
| .m64n8k8 | .m64n8k16 | .m64n8k32 | .m64n8k256 | .m64n8k32 |
| .m64n16k8 | .m64n16k16 | .m64n16k32 | .m64n16k256 | .m64n16k32 |
| .m64n24k8 | .m64n24k16 | .m64n24k32 | .m64n24k256 | .m64n24k32 |
| .m64n32k8 | .m64n32k16 | .m64n32k32 | .m64n32k256 | .m64n32k32 |
| .m64n40k8 | .m64n40k16 | .m64n48k32 | .m64n48k256 | .m64n40k32 |
| .m64n48k8 | .m64n48k16 | .m64n64k32 | .m64n64k256 | .m64n48k32 |
| .m64n56k8 | .m64n56k16 | .m64n80k32 | .m64n80k256 | .m64n56k32 |
| .m64n64k8 | .m64n64k16 | .m64n96k32 | .m64n96k256 | .m64n64k32 |
| .m64n72k8 | .m64n72k16 | .m64n112k32| .m64n112k256 | .m64n72k32 |
| .m64n80k8 | .m64n80k16 | .m64n128k32| .m64n128k256 | .m64n80k32 |
| .m64n88k8 | .m64n88k16 | .m64n144k32| .m64n144k256 | .m64n88k32 |
| .m64n96k8 | .m64n96k16 | .m64n160k32| .m64n160k256 | .m64n96k32 |
| .m64n104k8 | .m64n104k16 | .m64n176k32| .m64n176k256 | .m64n104k32 |
| .m64n112k8 | .m64n112k16 | .m64n192k32| .m64n192k256 | .m64n112k32 |
| .m64n120k8 | .m64n120k16 | .m64n208k32| .m64n208k256 | .m64n120k32 |
| .m64n128k8 | .m64n128k16 | .m64n224k32| .m64n224k256 | .m64n128k32 |
| .m64n136k8 | .m64n136k16 | .m64n240k32| .m64n240k256 | .m64n136k32 |
| .m64n144k8 | .m64n144k16 | .m64n256k32| .m64n256k256 | .m64n144k32 |
| .m64n152k8 | .m64n152k16 | | | .m64n152k32 |
| .m64n160k8 | .m64n160k16 | | | .m64n160k32 |
| .m64n168k8 | .m64n168k16 | | | .m64n168k32 |
| .m64n176k8 | .m64n176k16 | | | .m64n176k32 |
| .m64n184k8 | .m64n184k16 | | | .m64n184k32 |
| .m64n192k8 | .m64n192k16 | | | .m64n192k32 |
| .m64n200k8 | .m64n200k16 | | | .m64n200k32 |
| .m64n208k8 | .m64n208k16 | | | .m64n208k32 |
| .m64n216k8 | .m64n216k16 | | | .m64n216k32 |
| .m64n224k8 | .m64n224k16 | | | .m64n224k32 |
| .m64n232k8 | .m64n232k16 | | | .m64n232k32 |
| .m64n240k8 | .m64n240k16 | | | .m64n240k32 |
| .m64n248k8 | .m64n248k16 | | | .m64n248k32 |
| .m64n256k8 | .m64n256k16 | | | .m64n256k32 |
|--------------|--------------|------------|--------------|---------------|
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
shape | ::mlir::NVVM::MMAShapeAttr | Attribute for MMA operation shape. |
typeA | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA typesEnum cases:
|
typeB | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA typesEnum cases:
|
typeD | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA typesEnum cases:
|
scaleD | ::mlir::NVVM::WGMMAScaleOutAttr | WGMMA input predicateEnum cases:
|
scaleA | ::mlir::NVVM::WGMMAScaleInAttr | WGMMA overflow optionsEnum cases:
|
scaleB | ::mlir::NVVM::WGMMAScaleInAttr | WGMMA overflow optionsEnum cases:
|
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
satfinite | ::mlir::NVVM::MMAIntOverflowAttr | MMA overflow optionsEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
inouts | LLVM structure type |
descriptorA | 64-bit signless integer |
descriptorB | 64-bit signless integer |
Results: ¶
Result | Description |
---|---|
results | LLVM structure type |
nvvm.wgmma.wait.group.sync.aligned
(NVVM::WgmmaWaitGroupSyncOp) ¶
Syntax:
operation ::= `nvvm.wgmma.wait.group.sync.aligned` attr-dict $group
Signal the completion of a preceding warpgroup operation.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::IntegerAttr | 32-bit signless integer attribute |
nvvm.wmma.load
(NVVM::WMMALoadOp) ¶
Warp synchronous matrix load
Syntax:
operation ::= `nvvm.wmma.load` $ptr `,` $stride attr-dict `:` functional-type($ptr, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
eltype | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
frag | ::mlir::NVVM::MMAFragAttr | NVVM MMA frag typeEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer type |
stride | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.wmma.mma
(NVVM::WMMAMmaOp) ¶
Warp synchronous matrix-multiply accumulate using tensor cores.
Syntax:
operation ::= `nvvm.wmma.mma` $args attr-dict `:` functional-type($args, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
eltypeA | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
eltypeB | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
args | variadic of LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.wmma.store
(NVVM::WMMAStoreOp) ¶
Warp synchronous matrix store
Syntax:
operation ::= `nvvm.wmma.store` $ptr `,` $stride `,` $args attr-dict `:` qualified(type($ptr)) `,`
type($args)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layoutEnum cases:
|
eltype | ::mlir::NVVM::MMATypesAttr | NVVM MMA typesEnum cases:
|
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer type |
args | variadic of LLVM dialect-compatible type |
stride | 32-bit signless integer |
Attributes ¶
LoadCacheModifierKindAttr ¶
NVVM load cache modifier kind
Syntax:
#nvvm.load_cache_modifier<
::mlir::NVVM::LoadCacheModifierKind # value
>
Enum attribute of the different kinds of cache operators for load instructions.
For more information, see PTX ISA
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::LoadCacheModifierKind | an enum of type LoadCacheModifierKind |
MMAB1OpAttr ¶
MMA binary operations
Syntax:
#nvvm.mma_b1op<
::mlir::NVVM::MMAB1Op # value
>
Enum cases:
- none (
none
) - xor_popc (
xor_popc
) - and_popc (
and_popc
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAB1Op | an enum of type MMAB1Op |
MMAFragAttr ¶
NVVM MMA frag type
Syntax:
#nvvm.mma_frag<
::mlir::NVVM::MMAFrag # value
>
Enum cases:
- a (
a
) - b (
b
) - c (
c
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAFrag | an enum of type MMAFrag |
MMAIntOverflowAttr ¶
MMA overflow options
Syntax:
#nvvm.mma_int_overflow<
::mlir::NVVM::MMAIntOverflow # value
>
Enum cases:
- satfinite (
satfinite
) - wrapped (
wrapped
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAIntOverflow | an enum of type MMAIntOverflow |
MMALayoutAttr ¶
NVVM MMA layout
Syntax:
#nvvm.mma_layout<
::mlir::NVVM::MMALayout # value
>
Enum cases:
- row (
row
) - col (
col
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMALayout | an enum of type MMALayout |
MMATypesAttr ¶
NVVM MMA types
Syntax:
#nvvm.mma_type<
::mlir::NVVM::MMATypes # value
>
Enum cases:
- f16 (
f16
) - f32 (
f32
) - tf32 (
tf32
) - bf16 (
bf16
) - s8 (
s8
) - u8 (
u8
) - s32 (
s32
) - s4 (
s4
) - u4 (
u4
) - b1 (
b1
) - f64 (
f64
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMATypes | an enum of type MMATypes |
MemScopeKindAttr ¶
NVVM Memory Scope kind
Syntax:
#nvvm.mem_scope<
::mlir::NVVM::MemScopeKind # value
>
Enum cases:
- cta (
CTA
) - cluster (
CLUSTER
) - gpu (
GPU
) - sys (
SYS
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MemScopeKind | an enum of type MemScopeKind |
MMAShapeAttr ¶
Attribute for MMA operation shape.
Syntax:
#nvvm.shape<
int, # m
int, # n
int # k
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
m | int | |
n | int | |
k | int |
NVVMTargetAttr ¶
Syntax:
#nvvm.target<
int, # O
::llvm::StringRef, # triple
::llvm::StringRef, # chip
::llvm::StringRef, # features
DictionaryAttr, # flags
ArrayAttr # link
>
GPU target attribute for controlling compilation of NVIDIA targets. All parameters decay into default values if not present.
Examples:
- Target with default values.
gpu.module @mymodule [#nvvm.target] attributes {...} {
...
}
- Target with
sm_90
chip and fast math.
gpu.module @mymodule [#nvvm.target<chip = "sm_90", flags = {fast}>] {
...
}
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
O | int | Optimization level to apply. |
triple | ::llvm::StringRef | Target triple. |
chip | ::llvm::StringRef | Target chip. |
features | ::llvm::StringRef | Target chip features. |
flags | DictionaryAttr | Target specific flags. |
link | ArrayAttr | Files to link to the LLVM module. |
ProxyKindAttr ¶
Proxy kind
Syntax:
#nvvm.proxy_kind<
::mlir::NVVM::ProxyKind # value
>
Enum cases:
- alias (
alias
) - async (
async
) - async.global (
async_global
) - async.shared (
async_shared
) - tensormap (
TENSORMAP
) - generic (
GENERIC
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ProxyKind | an enum of type ProxyKind |
ReduxKindAttr ¶
NVVM redux kind
Syntax:
#nvvm.redux_kind<
::mlir::NVVM::ReduxKind # value
>
Enum cases:
- add (
ADD
) - and (
AND
) - max (
MAX
) - min (
MIN
) - or (
OR
) - umax (
UMAX
) - umin (
UMIN
) - xor (
XOR
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ReduxKind | an enum of type ReduxKind |
SetMaxRegisterActionAttr ¶
NVVM set max register action
Syntax:
#nvvm.action<
::mlir::NVVM::SetMaxRegisterAction # value
>
Enum cases:
- decrease (
decrease
) - increase (
increase
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::SetMaxRegisterAction | an enum of type SetMaxRegisterAction |
SharedSpaceAttr ¶
Shared memory space
Syntax:
#nvvm.shared_space<
::mlir::NVVM::SharedSpace # value
>
Enum cases:
- cta (
shared_cta
) - cluster (
shared_cluster
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::SharedSpace | an enum of type SharedSpace |
ShflKindAttr ¶
NVVM shuffle kind
Syntax:
#nvvm.shfl_kind<
::mlir::NVVM::ShflKind # value
>
Enum cases:
- bfly (
bfly
) - up (
up
) - down (
down
) - idx (
idx
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ShflKind | an enum of type ShflKind |
WGMMAScaleInAttr ¶
WGMMA overflow options
Syntax:
#nvvm.wgmma_scale_in<
::mlir::NVVM::WGMMAScaleIn # value
>
Enum cases:
- one (
one
) - neg (
neg
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMAScaleIn | an enum of type WGMMAScaleIn |
WGMMAScaleOutAttr ¶
WGMMA input predicate
Syntax:
#nvvm.wgmma_scale_out<
::mlir::NVVM::WGMMAScaleOut # value
>
Enum cases:
- zero (
zero
) - one (
one
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMAScaleOut | an enum of type WGMMAScaleOut |
WGMMATypesAttr ¶
NVVM WGMMA types
Syntax:
#nvvm.wgmma_type<
::mlir::NVVM::WGMMATypes # value
>
Enum cases:
- f16 (
f16
) - tf32 (
tf32
) - u8 (
u8
) - s8 (
s8
) - b1 (
b1
) - bf16 (
bf16
) - e4m3 (
e4m3
) - e5m2 (
e5m2
) - f32 (
f32
) - s32 (
s32
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMATypes | an enum of type WGMMATypes |
Enums ¶
LoadCacheModifierKind ¶
NVVM load cache modifier kind
Cases: ¶
Symbol | Value | String |
---|---|---|
CA | 0 | ca |
CG | 1 | cg |
CS | 2 | cs |
LU | 3 | lu |
CV | 4 | cv |
MMAB1Op ¶
MMA binary operations
Cases: ¶
Symbol | Value | String |
---|---|---|
none | 0 | none |
xor_popc | 1 | xor_popc |
and_popc | 2 | and_popc |
MMAFrag ¶
NVVM MMA frag type
Cases: ¶
Symbol | Value | String |
---|---|---|
a | 0 | a |
b | 1 | b |
c | 2 | c |
MMAIntOverflow ¶
MMA overflow options
Cases: ¶
Symbol | Value | String |
---|---|---|
satfinite | 1 | satfinite |
wrapped | 0 | wrapped |
MMALayout ¶
NVVM MMA layout
Cases: ¶
Symbol | Value | String |
---|---|---|
row | 0 | row |
col | 1 | col |
MMATypes ¶
NVVM MMA types
Cases: ¶
Symbol | Value | String |
---|---|---|
f16 | 0 | f16 |
f32 | 1 | f32 |
tf32 | 2 | tf32 |
bf16 | 9 | bf16 |
s8 | 4 | s8 |
u8 | 3 | u8 |
s32 | 5 | s32 |
s4 | 8 | s4 |
u4 | 7 | u4 |
b1 | 6 | b1 |
f64 | 10 | f64 |
MemScopeKind ¶
NVVM Memory Scope kind
Cases: ¶
Symbol | Value | String |
---|---|---|
CTA | 0 | cta |
CLUSTER | 1 | cluster |
GPU | 2 | gpu |
SYS | 3 | sys |
ProxyKind ¶
Proxy kind
Cases: ¶
Symbol | Value | String |
---|---|---|
alias | 0 | alias |
async | 1 | async |
async_global | 2 | async.global |
async_shared | 3 | async.shared |
TENSORMAP | 4 | tensormap |
GENERIC | 5 | generic |
ReduxKind ¶
NVVM redux kind
Cases: ¶
Symbol | Value | String |
---|---|---|
ADD | 1 | add |
AND | 2 | and |
MAX | 3 | max |
MIN | 4 | min |
OR | 5 | or |
UMAX | 6 | umax |
UMIN | 7 | umin |
XOR | 8 | xor |
SetMaxRegisterAction ¶
NVVM set max register action
Cases: ¶
Symbol | Value | String |
---|---|---|
decrease | 1 | decrease |
increase | 0 | increase |
SharedSpace ¶
Shared memory space
Cases: ¶
Symbol | Value | String |
---|---|---|
shared_cta | 0 | cta |
shared_cluster | 1 | cluster |
ShflKind ¶
NVVM shuffle kind
Cases: ¶
Symbol | Value | String |
---|---|---|
bfly | 0 | bfly |
up | 1 | up |
down | 2 | down |
idx | 3 | idx |
WGMMAScaleIn ¶
WGMMA overflow options
Cases: ¶
Symbol | Value | String |
---|---|---|
one | 1 | one |
neg | -1 | neg |
WGMMAScaleOut ¶
WGMMA input predicate
Cases: ¶
Symbol | Value | String |
---|---|---|
zero | 0 | zero |
one | 1 | one |
WGMMATypes ¶
NVVM WGMMA types
Cases: ¶
Symbol | Value | String |
---|---|---|
f16 | 0 | f16 |
tf32 | 1 | tf32 |
u8 | 2 | u8 |
s8 | 3 | s8 |
b1 | 4 | b1 |
bf16 | 5 | bf16 |
e4m3 | 6 | e4m3 |
e5m2 | 7 | e5m2 |
f32 | 8 | f32 |
s32 | 9 | s32 |