'nvvm' Dialect
The NVVM dialect that models NVIDIA’s public ISA
The NVVM dialect is MLIR’s LLVM-IR-based, NVIDIA-specific backend dialect. It models NVVM intrinsics and public ISA functionality and introduces NVIDIA extensions to the MLIR/LLVM type system and address spaces (e.g., global, shared, and cluster memory), enabling faithful lowering of GPU kernels to the NVPTX toolchain. While a NVVM op usually maps to a single LLVM IR intrinsic, the NVVM dialect uses type polymorphism and other attributes so that a single NVVM op can map to different LLVM intrinsics.
Scope and capabilities: The dialect covers core GPU features such as
thread/block builtins, barriers and atomics, warp-level collectives (e.g.,
shuffle/vote), matrix/tensor core operations (e.g., mma.sync
, wgmma
),
tensor memory accelerator (TMA) operations, asynchronous copies (cp.async
,
bulk/tensor variants) with memory barriers, cache and prefetch controls, and
NVVM-specific attributes and enums (e.g., FP rounding modes, memory scopes,
and MMA types/layouts).
Non-goals: NVVM is not a place for convenience or “wrapper” ops. It is
not intended to introduce high-level ops that expand into multiple unrelated
NVVM intrinsics or that lower to no intrinsic at all. Such abstractions belong
in higher-level dialects (e.g., nvgpu
, gpu
, or project-specific dialects).
The design intent is a thin, predictable, low-level surface with
near-mechanical lowering to NVVM/LLVM IR.
Placement in the lowering pipeline: NVVM sits below target-agnostic
dialects like gpu
and NVIDIA’s nvgpu
. Typical pipelines convert
gpu
/nvgpu
ops into NVVM using -convert-gpu-to-nvvm
and
-convert-nvgpu-to-nvvm
, then translate into LLVM for final code
generation via NVPTX backend.
Target configuration and serialization: NVVM provides a #nvvm.target
attribute to describe the GPU target (SM, features, and flags). In
conjunction with gpu
serialization (e.g., gpu-module-to-binary
), this
enables producing architecture-specific GPU binaries (such as CUBIN) from
nested GPU modules.
Inline PTX: When an intrinsic is unavailable or a performance-critical
sequence must be expressed directly, NVVM provides an nvvm.inline_ptx
op to
embed PTX inline as a last-resort escape hatch, with explicit operands and
results.
Operations ¶
nvvm.bar.warp.sync
(NVVM::SyncWarpOp) ¶
Warp Barrier Synchronization Op
Syntax:
operation ::= `nvvm.bar.warp.sync` $mask attr-dict `:` type($mask)
The nvvm.bar.warp.sync
operation performs barrier synchronization for threads
within a warp.
This operation causes the executing thread to wait until all threads corresponding
to the mask
operand have executed a bar.warp.sync
with the same mask value
before resuming execution.
The mask
operand specifies the threads participating in the barrier, where each
bit position corresponds to the thread’s lane ID within the warp. Only threads with
their corresponding bit set in the mask participate in the barrier synchronization.
Important constraints:
- The behavior is undefined if the executing thread is not included in the mask (i.e., the bit corresponding to the thread’s lane ID is not set)
- For compute capability sm_6x or below, all threads in the mask must execute
the same
bar.warp.sync
instruction in convergence
This operation also guarantees memory ordering among participating threads.
Threads within the warp that wish to communicate via memory can store to memory,
execute bar.warp.sync
, and then safely read values stored by other threads
in the warp.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
mask | LLVM dialect-compatible type |
nvvm.barrier
(NVVM::BarrierOp) ¶
CTA Barrier Synchronization Op
Syntax:
operation ::= `nvvm.barrier` (`id` `=` $barrierId^)? (`number_of_threads` `=` $numberOfThreads^)? attr-dict
The nvvm.barrier
operation performs barrier synchronization and communication
within a CTA (Cooperative Thread Array). It causes executing threads to wait for
all non-exited threads participating in the barrier to arrive.
The operation takes two optional operands:
barrierId
: Specifies a logical barrier resource with value 0 through 15. Each CTA instance has sixteen barriers numbered 0..15. Defaults to 0 if not specified.numberOfThreads
: Specifies the number of threads participating in the barrier. When specified, the value must be a multiple of the warp size. If not specified, all threads in the CTA participate in the barrier.
The barrier operation guarantees that when the barrier completes, prior memory accesses requested by participating threads are performed relative to all threads participating in the barrier. It also ensures that no new memory access is requested by participating threads before the barrier completes.
When a barrier completes, the waiting threads are restarted without delay, and the barrier is reinitialized so that it can be immediately reused.
This operation generates an aligned barrier, indicating that all threads in the CTA will execute the same barrier instruction. Behavior is undefined if all threads in the CTA do not reach this instruction.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Operands: ¶
Operand | Description |
---|---|
barrierId | 32-bit signless integer |
numberOfThreads | 32-bit signless integer |
nvvm.barrier.arrive
(NVVM::BarrierArriveOp) ¶
Syntax:
operation ::= `nvvm.barrier.arrive` (`id` `=` $barrierId^)? `number_of_threads` `=` $numberOfThreads attr-dict
Thread that executes this op announces their arrival at the barrier with given id and continue their execution.
The default barrier id is 0 that is similar to nvvm.barrier
Op. When
barrierId
is not present, the default barrier id is used.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
barrierId | 32-bit signless integer |
numberOfThreads | 32-bit signless integer |
nvvm.barrier0
(NVVM::Barrier0Op) ¶
CTA Barrier Synchronization Op (Barrier ID 0)
Syntax:
operation ::= `nvvm.barrier0` attr-dict
The nvvm.barrier0
operation is a convenience operation that performs barrier
synchronization and communication within a CTA (Cooperative Thread Array) using
barrier ID 0. It is functionally equivalent to nvvm.barrier
or nvvm.barrier id=0
.
For more information, see PTX ISA
nvvm.breakpoint
(NVVM::Breakpoint) ¶
Breakpoint Op
Syntax:
operation ::= `nvvm.breakpoint` attr-dict
Breakpoint suspends execution of the program for debugging. For more information, see PTX ISA
nvvm.cluster.arrive
(NVVM::ClusterArriveOp) ¶
Cluster Barrier Arrive Op
Syntax:
operation ::= `nvvm.cluster.arrive` attr-dict
The cluster.arrive
can be used by the threads within the cluster for synchronization and
communication. The cluster.arrive
instruction marks the warps’ arrival at the barrier
without causing the executing thread to wait for other participating threads.
The aligned
attribute, when provided, generates the .aligned version of the PTX instruction.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.cluster.arrive.relaxed
(NVVM::ClusterArriveRelaxedOp) ¶
Cluster Barrier Relaxed Arrive Op
Syntax:
operation ::= `nvvm.cluster.arrive.relaxed` attr-dict
The cluster.arrive
can be used by the threads within the cluster for synchronization and
communication. The cluster.arrive
instruction marks the warps’ arrival at the barrier
without causing the executing thread to wait for other participating threads.
The aligned
attribute, when provided, generates the .aligned version of the PTX instruction.
The .relaxed qualifier on cluster.arrive
specifies that there are no memory
ordering and visibility guarantees provided for the memory accesses performed prior to
cluster.arrive
.
For more information, see PTX ISA
Traits: NVVMRequiresSM<90>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.cluster.wait
(NVVM::ClusterWaitOp) ¶
Cluster Barrier Wait Op
Syntax:
operation ::= `nvvm.cluster.wait` attr-dict
The cluster.wait
causes the executing thread to wait for all non-exited threads
of the cluster to perform cluster.arrive
. The aligned
attribute, when provided,
generates the .aligned version of the PTX instruction.
For more information, see PTX ISA
Traits: NVVMRequiresSM<90>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
aligned | ::mlir::UnitAttr | unit attribute |
nvvm.convert.bf16x2.to.f8x2
(NVVM::ConvertBF16x2ToF8x2Op) ¶
Convert a pair of bf16 inputs to f8x2
Syntax:
operation ::= `nvvm.convert.bf16x2.to.f8x2` $type $a attr-dict `:` type($a) `->` type($dst)
This Op converts the given bf16 inputs in a bf16x2 vector to the specified
f8 type.
The result dst
is represented as an i16 type or as a vector
of two i8 types.
If dst
is returned as an i16 type, the converted values from a
are packed such that the value converted from the first element of a
is stored in the upper 8 bits of dst
and the value converted from the
second element of a
is stored in the lower 8 bits of dst
.
If dst
is returned as a vector type, each converted value is stored as an
i8 element in the vector.
The rnd
and sat
attributes specify the rounding and saturation modes
respectively.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
type | ::mlir::NVVM::ConvertFP8TypeAttr | NVVM ConvertFP8Type kind |
rnd | ::mlir::NVVM::FPRoundingModeAttr | NVVM FPRoundingMode kind |
sat | ::mlir::NVVM::SaturationModeAttr | NVVM SaturationMode kind |
Operands: ¶
Operand | Description |
---|---|
a | vector of bfloat16 type values of length 2 |
Results: ¶
Result | Description |
---|---|
dst | 16-bit signless integer or vector of 8-bit signless integer values of length 2 |
nvvm.convert.f16x2.to.f8x2
(NVVM::ConvertF16x2ToF8x2Op) ¶
Convert an f16x2 input to f8x2
Syntax:
operation ::= `nvvm.convert.f16x2.to.f8x2` $type $a attr-dict `:` type($a) `->` type($dst)
This Op converts the given f16 inputs in an f16x2 vector to the specified
f8 type.
The result dst
is represented as an i16 type or as a vector
of two i8 types.
If dst
is returned as an i16 type, the converted values from a
are packed such that the value converted from the first element of a
is stored in the upper 8 bits of dst
and the value converted from the
second element of a
is stored in the lower 8 bits of dst
.
If dst
is returned as a vector type, each converted value is stored as an
i8 element in the vector.
The relu
attribute, when set, lowers to the ‘.relu’ variant of
the cvt instruction.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
type | ::mlir::NVVM::ConvertFP8TypeAttr | NVVM ConvertFP8Type kind |
relu | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
a | vector of 16-bit float values of length 2 |
Results: ¶
Result | Description |
---|---|
dst | 16-bit signless integer or vector of 8-bit signless integer values of length 2 |
nvvm.convert.f32x2.to.f6x2
(NVVM::ConvertF32x2ToF6x2Op) ¶
Convert a pair of float inputs to f6x2
Syntax:
operation ::= `nvvm.convert.f32x2.to.f6x2` $type $a `,` $b attr-dict `:` type($dst)
This Op converts each of the given float inputs to the specified fp6 type.
The result dst
is represented either as an i16 type or as a vector
of two i8 types.
If dst
is returned as an i16 type, the converted values are packed such
that the value converted from a
is stored in the upper 8 bits of dst
with 2 MSB bits padded with zeros and the value converted from b
is
stored in the lower 8 bits of dst
with 2 MSB bits padded with zeros.
If dst
is returned as a vector type, each converted value is stored as an
i8 element in the vector.
The relu
attribute, when set, lowers to the ‘.relu’ variant of
the cvt instruction.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
type | ::mlir::NVVM::ConvertFP6TypeAttr | NVVM ConvertFP6Type kind |
relu | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
a | 32-bit float |
b | 32-bit float |
Results: ¶
Result | Description |
---|---|
dst | 16-bit signless integer or vector of 8-bit signless integer values of length 2 |
nvvm.convert.f32x2.to.f8x2
(NVVM::ConvertF32x2ToF8x2Op) ¶
Convert a pair of float inputs to f8x2
Syntax:
operation ::= `nvvm.convert.f32x2.to.f8x2` $type $a `,` $b attr-dict `:` type($dst)
This Op converts each of the given float inputs to the specified fp8 type.
The result dst
is represented as an i16 type or as a vector
of two i8 types.
If dst
is returned as an i16 type, the converted values are packed such
that the value converted from a
is stored in the upper 8 bits of dst
and the value converted from b
is stored in the lower 8 bits of dst
.
If dst
is returned as a vector type, each converted value is stored as an
i8 element in the vector.
The rnd
and sat
attributes specify the rounding and saturation modes respectively.
The relu
attribute, when set, lowers to the ‘.relu’ variant of
the cvt instruction.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
type | ::mlir::NVVM::ConvertFP8TypeAttr | NVVM ConvertFP8Type kind |
rnd | ::mlir::NVVM::FPRoundingModeAttr | NVVM FPRoundingMode kind |
sat | ::mlir::NVVM::SaturationModeAttr | NVVM SaturationMode kind |
relu | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
a | 32-bit float |
b | 32-bit float |
Results: ¶
Result | Description |
---|---|
dst | 16-bit signless integer or vector of 8-bit signless integer values of length 2 |
nvvm.convert.float.to.tf32
(NVVM::ConvertFloatToTF32Op) ¶
Convert the given float input to TF32
Syntax:
operation ::= `nvvm.convert.float.to.tf32` $src attr-dict
This Op converts the given f32 input to tf32.
The result res
is represented as an i32 type.
The relu
attribute, when set, lowers to the ‘.relu’ variant of
the cvt instruction. The rnd
and sat
attributes specify the
the rounding and saturation modes respectively.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
rnd | ::mlir::NVVM::FPRoundingModeAttr | NVVM FPRoundingMode kind |
sat | ::mlir::NVVM::SaturationModeAttr | NVVM SaturationMode kind |
relu | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
src | 32-bit float |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer |
nvvm.cp.async.bulk.commit.group
(NVVM::CpAsyncBulkCommitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.commit.group` attr-dict
This Op commits all prior initiated but uncommitted cp.async.bulk instructions into a cp.async.bulk-group.
For more information, see PTX ISA
nvvm.cp.async.bulk.global.shared.cta
(NVVM::CpAsyncBulkSharedCTAToGlobalOp) ¶
Async bulk copy from Shared CTA memory to Global memory
Syntax:
operation ::= `nvvm.cp.async.bulk.global.shared.cta` $dstMem `,` $srcMem `,` $size
(`l2_cache_hint` `=` $l2CacheHint^ )?
(`byte_mask` `=` $byteMask^ )?
attr-dict `:` type($dstMem) `,` type($srcMem)
Initiates an asynchronous copy operation from Shared CTA memory to
global memory. The 32-bit operand size
specifies the amount of
memory to be copied, in terms of number of bytes. size
must be a
multiple of 16. The l2CacheHint
operand is optional, and it is used
to specify cache eviction policy that may be used during the memory
access. The byteMask
operand is optional. The i-th bit in the 16-bit
wide byteMask
specifies whether the i-th byte of each 16-byte wide
chunk of source data is copied to the destination. If the bit is set,
the byte is copied.
Example:
nvvm.cp.async.bulk.global.shared.cta %dst, %src, %size
: !llvm.ptr<1>, !llvm.ptr<3>
// with l2_cache_hint
nvvm.cp.async.bulk.global.shared.cta %dst, %src, %size l2_cache_hint = %ch
: !llvm.ptr<1>, !llvm.ptr<3>
// with byte_mask
nvvm.cp.async.bulk.global.shared.cta %dst, %src, %size byte_mask = %mask
: !llvm.ptr<1>, !llvm.ptr<3>
// with both l2_cache_hint and byte_mask
nvvm.cp.async.bulk.global.shared.cta %dst, %src, %size l2_cache_hint = %ch byte_mask = %mask
: !llvm.ptr<1>, !llvm.ptr<3>
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Operands: ¶
Operand | Description |
---|---|
dstMem | LLVM pointer in address space 1 |
srcMem | LLVM pointer in address space 3 |
size | 32-bit signless integer |
l2CacheHint | 64-bit signless integer |
byteMask | 16-bit signless integer |
nvvm.cp.async.bulk.prefetch
(NVVM::CpAsyncBulkPrefetchOp) ¶
Async bulk prefetch from global memory to L2 cache
Syntax:
operation ::= `nvvm.cp.async.bulk.prefetch` $srcMem `,` $size (`l2_cache_hint` `=` $l2CacheHint^ )?
attr-dict `:` type($srcMem)
Initiates an asynchronous prefetch of data from the location
specified by srcMem
to the L2 cache.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
Example:
nvvm.cp.async.bulk.prefetch %src, %size : !llvm.ptr<1>
// with l2_cache_hint
nvvm.cp.async.bulk.prefetch %src, %size l2_cache_hint = %ch : !llvm.ptr<1>
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
srcMem | LLVM pointer in address space 1 |
size | 32-bit signless integer |
l2CacheHint | 64-bit signless integer |
nvvm.cp.async.bulk.shared.cluster.global
(NVVM::CpAsyncBulkGlobalToSharedClusterOp) ¶
Async bulk copy from global memory to Shared cluster memory
Syntax:
operation ::= `nvvm.cp.async.bulk.shared.cluster.global` $dstMem `,` $srcMem `,` $mbar `,` $size
(`multicast_mask` `=` $multicastMask^ )?
(`l2_cache_hint` `=` $l2CacheHint^ )?
attr-dict `:` type($dstMem) `,` type($srcMem)
Initiates an asynchronous copy operation from global memory to cluster’s shared memory.
The multicastMask
operand is optional. When it is present, the Op copies
data from global memory to shared memory of multiple CTAs in the cluster.
Operand multicastMask
specifies the destination CTAs in the cluster such
that each bit position in the 16-bit multicastMask
operand corresponds to
the nvvm.read.ptx.sreg.ctaid
of the destination CTA.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Operands: ¶
Operand | Description |
---|---|
dstMem | LLVM pointer in address space 7 |
srcMem | LLVM pointer in address space 1 |
mbar | LLVM pointer in address space 3 |
size | 32-bit signless integer |
multicastMask | 16-bit signless integer |
l2CacheHint | 64-bit signless integer |
nvvm.cp.async.bulk.shared.cluster.shared.cta
(NVVM::CpAsyncBulkSharedCTAToSharedClusterOp) ¶
Async bulk copy from Shared CTA memory to Shared cluster memory
Syntax:
operation ::= `nvvm.cp.async.bulk.shared.cluster.shared.cta` $dstMem `,` $srcMem `,` $mbar `,` $size
attr-dict `:` type($dstMem) `,` type($srcMem)
Initiates an asynchronous copy operation from Shared CTA memory to Shared cluster memory.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
dstMem | LLVM pointer in address space 7 |
srcMem | LLVM pointer in address space 3 |
mbar | LLVM pointer in address space 3 |
size | 32-bit signless integer |
nvvm.cp.async.bulk.tensor.global.shared.cta
(NVVM::CpAsyncBulkTensorSharedCTAToGlobalOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.global.shared.cta` $tmaDescriptor `,`
$srcMem `,`
`box` `[`$coordinates `]`
(`l2_cache_hint` `=` $l2CacheHint^ )?
(`,` `predicate` `=` $predicate^)?
attr-dict `:` type($tmaDescriptor) `,` type($srcMem)
Initiates an asynchronous copy of the tensor data from shared::cta
memory to global memory. This Op supports all the store modes specified in
TMAStoreMode
.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
mode | ::mlir::NVVM::TMAStoreModeAttr | NVVM TMA Store Mode |
Operands: ¶
Operand | Description |
---|---|
tmaDescriptor | LLVM pointer in address space 0 |
srcMem | LLVM pointer in address space 3 |
coordinates | variadic of 32-bit signless integer |
l2CacheHint | 64-bit signless integer |
predicate | 1-bit signless integer |
nvvm.cp.async.bulk.tensor.prefetch
(NVVM::CpAsyncBulkTensorPrefetchOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.prefetch` $tmaDescriptor `,`
`box` `[`$coordinates `]`
(`im2col` `[` $im2colOffsets^ `]` )?
(`l2_cache_hint` `=` $l2CacheHint^ )?
attr-dict `:` type($tmaDescriptor)
Initiates an asynchronous prefetch operation on the tensor data from global
memory to L2 cache. This Op supports all the load modes specified in
TMALoadMode
.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
mode | ::mlir::NVVM::TMALoadModeAttr | List of Load-Modes supported for TMA Tensor Ops
|
Operands: ¶
Operand | Description |
---|---|
tmaDescriptor | LLVM pointer in address space 0 |
coordinates | variadic of 32-bit signless integer |
im2colOffsets | variadic of 16-bit signless integer |
l2CacheHint | 64-bit signless integer |
nvvm.cp.async.bulk.tensor.reduce
(NVVM::CpAsyncBulkTensorReduceOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.reduce` $tmaDescriptor `,`
$srcMem `,`
`box` `[`$coordinates `]`
(`l2_cache_hint` `=` $l2CacheHint^ )?
attr-dict `:` type($tmaDescriptor) `,` type($srcMem)
Initiates an asynchronous reduction operation of tensor data in global memory with tensor data in shared memory.
The mode
attribute indicates whether the copy mode is tile or im2col.
The redOp
attribute specifies the reduction operations applied.
The supported reduction operations are:
{add, min, max, inc, dec, and, or, xor}
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
redKind | ::mlir::NVVM::TMAReduxKindAttr | NVVM TMA redux kind |
mode | ::mlir::NVVM::TMAStoreModeAttr | NVVM TMA Store Mode |
Operands: ¶
Operand | Description |
---|---|
tmaDescriptor | LLVM pointer type |
srcMem | LLVM pointer in address space 3 |
coordinates | variadic of 32-bit signless integer |
l2CacheHint | 64-bit signless integer |
nvvm.cp.async.bulk.tensor.shared.cluster.global
(NVVM::CpAsyncBulkTensorGlobalToSharedClusterOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.tensor.shared.cluster.global` $dstMem `,`
$tmaDescriptor `,`
$mbar `,`
`box` `[`$coordinates `]`
(`im2col` `[` $im2colOffsets^ `]` )?
(`multicast_mask` `=` $multicastMask^ )?
(`l2_cache_hint` `=` $l2CacheHint^ )?
(`predicate` `=` $predicate^)?
attr-dict `:` type($dstMem) `,` type($tmaDescriptor)
Initiates an asynchronous copy operation on the tensor data from global memory to shared memory.
The Op operates has two load modes:
Tiled Mode: It’s the default mode. The source multi-dimensional tensor layout is preserved at the destination.
Im2col Mode: This mode is used when
im2colOffsets
operands are present. the elements in the Bounding Box of the source tensor are rearranged into columns at the destination. In this mode, the tensor has to be at least 3-dimensional.
The multicastMask
operand is optional. When it is present, the Op copies
data from global memory to shared memory of multiple CTAs in the cluster.
Operand multicastMask
specifies the destination CTAs in the cluster such
that each bit position in the 16-bit multicastMask
operand corresponds to
the nvvm.read.ptx.sreg.ctaid
of the destination CTA.
The l2CacheHint
operand is optional, and it is used to specify cache
eviction policy that may be used during the memory access.
For more information, see PTX ISA
Traits: AttrSizedOperandSegments
, NVVMRequiresSM<90>
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
dstMem | LLVM pointer in address space 3 |
tmaDescriptor | LLVM pointer type |
coordinates | variadic of 32-bit signless integer |
mbar | LLVM pointer in address space 3 |
im2colOffsets | variadic of 16-bit signless integer |
multicastMask | 16-bit signless integer |
l2CacheHint | 64-bit signless integer |
predicate | 1-bit signless integer |
nvvm.cp.async.bulk.wait_group
(NVVM::CpAsyncBulkWaitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.bulk.wait_group` $group attr-dict
Op waits for completion of the most recent bulk async-groups.
The $group
operand tells waiting has to be done until for $group or fewer
of the most recent bulk async-groups. If $group
is 0, the op wait until
all the most recent bulk async-groups have completed.
The $read
indicates that the waiting has to be done until all the bulk
async operations in the specified bulk async-group have completed reading
from their source locations.
For more information, see PTX ISA
Traits: NVVMRequiresSM<90>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::IntegerAttr | 32-bit signless integer attribute whose minimum value is 0 |
read | ::mlir::UnitAttr | unit attribute |
nvvm.cp.async.commit.group
(NVVM::CpAsyncCommitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.commit.group` attr-dict
nvvm.cp.async.mbarrier.arrive
(NVVM::CpAsyncMBarrierArriveOp) ¶
NVVM Dialect Op for cp.async.mbarrier.arrive
Syntax:
operation ::= `nvvm.cp.async.mbarrier.arrive` $addr attr-dict `:` type(operands)
The cp.async.mbarrier.arrive
Op makes the mbarrier object track
all prior cp.async operations initiated by the executing thread.
The addr
operand specifies the address of the mbarrier object
in generic address space. The noinc
attr impacts how the
mbarrier’s state is updated.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
noinc | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
nvvm.cp.async.mbarrier.arrive.shared
(NVVM::CpAsyncMBarrierArriveSharedOp) ¶
NVVM Dialect Op for cp.async.mbarrier.arrive.shared
Syntax:
operation ::= `nvvm.cp.async.mbarrier.arrive.shared` $addr attr-dict `:` type(operands)
The cp.async.mbarrier.arrive.shared
Op makes the mbarrier object
track all prior cp.async operations initiated by the executing thread.
The addr
operand specifies the address of the mbarrier object in
shared memory. The noinc
attr impacts how the mbarrier’s state
is updated.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
noinc | ::mlir::IntegerAttr | 1-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
nvvm.cp.async.shared.global
(NVVM::CpAsyncOp) ¶
Syntax:
operation ::= `nvvm.cp.async.shared.global` $dst `,` $src `,` $size `,` `cache` `=` $modifier (`,` $cpSize^)? attr-dict `:` type(operands)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
size | ::mlir::IntegerAttr | 32-bit signless integer attribute |
modifier | ::mlir::NVVM::LoadCacheModifierKindAttr | NVVM load cache modifier kind
|
Operands: ¶
Operand | Description |
---|---|
dst | LLVM pointer in address space 3 |
src | LLVM pointer in address space 1 |
cpSize | LLVM dialect-compatible type |
nvvm.cp.async.wait.group
(NVVM::CpAsyncWaitGroupOp) ¶
Syntax:
operation ::= `nvvm.cp.async.wait.group` $n attr-dict
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
nvvm.dot.accumulate.2way
(NVVM::DotAccumulate2WayOp) ¶
Two-way 16-bit to 8-bit dot product-accumulate instruction
Syntax:
operation ::= `nvvm.dot.accumulate.2way` $a $a_type `,` $b $b_type `,` $c attr-dict `:` type($a) `,` type($b)
Performs a two-way 16-bit to 8-bit dot-product which is accumulated in a
32-bit result.
Operand a
is a vector of two 16-bit elements and operand b
a vector
of four 8-bit elements between which the dot product is computed.
The a_type
and b_type
attributes specify the type of the elements in a
and b
respectively.
If a_type
or b_type
is s
, then the elements in the corresponding
vector are sign-extended to 32-bit before the dot product is computed.
If a_type
or b_type
is u
, then the elements in the corresponding
vector are zero-extended to 32-bit instead.
The b_hi
boolean attribute specifies which two bytes of b
are used for
the dot product. If b_hi
is true, then the dot product is computed
between a
and elements at indices 2 and 3 of b
. If b_hi
is false,
then the dot product is computed between a
and elements at indices 0 and
1 of b
.
Operand c
is a 32-bit integer to which the result is accumulated. It is
treated as holding a signed integer if any of a_type
or b_type
is
signed.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
a_type | ::mlir::NVVM::DotAccumulateTypeAttr | NVVM DotAccumulateType |
b_type | ::mlir::NVVM::DotAccumulateTypeAttr | NVVM DotAccumulateType |
b_hi | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
a | vector of 16-bit signless integer values of length 2 |
b | vector of 8-bit signless integer values of length 4 |
c | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer |
nvvm.dot.accumulate.4way
(NVVM::DotAccumulate4WayOp) ¶
Four-way byte dot product-accumulate instruction
Syntax:
operation ::= `nvvm.dot.accumulate.4way` $a $a_type `,` $b $b_type `,` $c attr-dict `:` type($a) `,` type($b)
Performs a four-way byte dot-product which is accumulated in a 32-bit
result.
Operand a
and b
are vectors of 4 bytes between which the dot product is
computed.
The a_type
and b_type
attributes specify the type of the elements in a
and b
respectively.
If a_type
or b_type
is signed
, then the elements in the corresponding
vector are sign-extended to 32-bit before the dot product is computed.
If a_type
or b_type
is unsigned
, then the elements in the
corresponding vector are zero-extended to 32-bit instead.
Operand c
is a 32-bit integer to which the result is accumulated. It is
treated as holding a signed integer if any of a_type
or b_type
is s8
.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
a_type | ::mlir::NVVM::DotAccumulateTypeAttr | NVVM DotAccumulateType |
b_type | ::mlir::NVVM::DotAccumulateTypeAttr | NVVM DotAccumulateType |
Operands: ¶
Operand | Description |
---|---|
a | vector of 8-bit signless integer values of length 4 |
b | vector of 8-bit signless integer values of length 4 |
c | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer |
nvvm.elect.sync
(NVVM::ElectSyncOp) ¶
Elect one leader thread
Syntax:
operation ::= `nvvm.elect.sync` ($membermask^)? attr-dict `->` type(results)
The elect.sync
instruction elects one predicated active leader
thread from among a set of threads specified in the membermask
.
When the membermask
is not provided explicitly, a default value
of 0xFFFFFFFF
is used. The predicate result is set to True
for
the leader thread, and False
for all other threads.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
membermask | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
pred | 1-bit signless integer |
nvvm.exit
(NVVM::Exit) ¶
Exit Op
Syntax:
operation ::= `nvvm.exit` attr-dict
Ends execution of a thread. For more information, see PTX ISA
nvvm.fence.mbarrier.init
(NVVM::FenceMbarrierInitOp) ¶
Syntax:
operation ::= `nvvm.fence.mbarrier.init` attr-dict
Fence operation that applies on the prior nvvm.mbarrier.init
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
nvvm.fence.proxy
(NVVM::FenceProxyOp) ¶
Syntax:
operation ::= `nvvm.fence.proxy` attr-dict
Fence operation with proxy to establish an ordering between memory accesses that may happen through different proxies.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ProxyKindAttr | Proxy kind |
space | ::mlir::NVVM::SharedSpaceAttr | Shared memory space |
nvvm.fence.proxy.acquire
(NVVM::FenceProxyAcquireOp) ¶
Uni-directional proxy fence operation with acquire semantics
Syntax:
operation ::= `nvvm.fence.proxy.acquire` $scope $addr `,` $size (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict
fence.proxy.acquire
is a uni-directional fence used to establish ordering
between a prior memory access performed via the generic proxy and a
subsequent memory access performed via the tensormap proxy
The address operand addr
and the operand size
together specify the
memory range [addr, addr+size)
on which the ordering guarantees on the
memory accesses across the proxies is to be provided. The only supported
value for the size
operand is 128 and must be an immediate. Generic Addressing
is used unconditionally, and the address specified by the operand addr
must
fall within the .global
state space. Otherwise, the behavior is undefined
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
scope | ::mlir::NVVM::MemScopeKindAttr | NVVM Memory Scope kind |
fromProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kind |
toProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kind |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 0 |
size | 32-bit signless integer |
nvvm.fence.proxy.release
(NVVM::FenceProxyReleaseOp) ¶
Uni-directional proxy fence operation with release semantics
Syntax:
operation ::= `nvvm.fence.proxy.release` $scope (`from_proxy` `=` $fromProxy^)? (`to_proxy` `=` $toProxy^)? attr-dict
fence.proxy.release
is a uni-directional fence used to establish ordering
between a prior memory access performed via the generic proxy and a
subsequent memory access performed via the tensormap proxy. fence.proxy.release
operation can form a release sequence that synchronizes with an acquire
sequence that contains the fence.proxy.acquire proxy fence operation
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
scope | ::mlir::NVVM::MemScopeKindAttr | NVVM Memory Scope kind |
fromProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kind |
toProxy | ::mlir::NVVM::ProxyKindAttr | Proxy kind |
nvvm.fence.sc.cluster
(NVVM::FenceScClusterOp) ¶
Syntax:
operation ::= `nvvm.fence.sc.cluster` attr-dict
nvvm.griddepcontrol
(NVVM::GriddepcontrolOp) ¶
Syntax:
operation ::= `nvvm.griddepcontrol` $kind attr-dict
If the $kind attribute is set to wait
, it causes the
executing thread to wait until all prerequisite grids in flight
have completed and all the memory operations from the prerequisite grids
are performed and made visible to the current grid.
When the $kind is launch_dependents, it signals that specific dependents the runtime system designated to react to this instruction can be scheduled as soon as all other CTAs in the grid issue the same instruction or have completed.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::GridDepActionKindAttr | Action kind for grid dependency controlEnum cases:
|
nvvm.inline_ptx
(NVVM::InlinePtxOp) ¶
Inline PTX Op
Syntax:
operation ::= `nvvm.inline_ptx` $ptxCode
( `ro` `(` $readOnlyArgs^ `:` type($readOnlyArgs) `)` )?
( `rw` `(` $readWriteArgs^ `:` type($readWriteArgs) `)` )?
(`,` `predicate` `=` $predicate^)?
attr-dict
( `->` type($writeOnlyArgs)^ )?
This op allows using PTX directly within the NVVM
dialect, while greatly simplifying llvm.inline_asm generation. It
automatically handles register size selection and sets the correct
read/write access for each operand. The operation leverages the
BasicPtxBuilderInterface
to abstract away low-level details of
PTX assembly formatting.
The `predicate` attribute is used to specify a predicate for the
PTX instruction.
Example 1: Read-only Parameters
```mlir
nvvm.inline_ptx "mbarrier.init.b64 [$0], $1;" (%barrier_gen, %count) : !llvm.ptr, i32
// Lowers to:
llvm.inline_asm has_side_effects asm_dialect = att
"mbarrier.init.b64 [$0], $1;", "l,r" %arg0, %arg2 : (!llvm.ptr, i32) -> ()
```
Example 2: Read-only and Write-only Parameters
```mlir
%0 = nvvm.inline_ptx "ex2.approx.ftz.f32 $0, $1;" (%input) : f32 -> f32
// Lowers to:
%0 = llvm.inline_asm has_side_effects asm_dialect = att
"ex2.approx.ftz.f32 $0, $1;", "=f,f" %arg0 : (f32) -> f32
```
Example 3: Predicate Usage
```mlir
nvvm.inline_ptx "mbarrier.init.b64 [$0], $1;" (%barrier_gen, %count),
predicate = %pred : !llvm.ptr, i32, i1
// Lowers to:
llvm.inline_asm has_side_effects asm_dialect = att
"@$2 mbarrier.init.b64 [$0], $1;", "l,r,b" %arg0, %arg2, %arg3
: (!llvm.ptr, i32, i1) -> ()
```
Traits: AttrSizedOperandSegments
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
ptxCode | ::mlir::StringAttr | string attribute |
Operands: ¶
Operand | Description |
---|---|
readOnlyArgs | variadic of any type |
readWriteArgs | variadic of any type |
predicate | 1-bit signless integer |
Results: ¶
Result | Description |
---|---|
writeOnlyArgs | variadic of any type |
nvvm.ldmatrix
(NVVM::LdMatrixOp) ¶
Cooperative matrix load
Syntax:
operation ::= `nvvm.ldmatrix` $ptr attr-dict `:` functional-type($ptr, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
num | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
shape | ::mlir::NVVM::LdStMatrixShapeAttr | Matrix shape for ldmatrix and stmatrix |
eltType | ::mlir::NVVM::LdStMatrixEltTypeAttr | Element type for ldmatrix and stmatrix |
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer in address space 3 |
Results: ¶
Result | Description |
---|---|
res | any type |
nvvm.mapa
(NVVM::MapaOp) ¶
Syntax:
operation ::= `nvvm.mapa` $a`,` $b attr-dict `:` type($a) `->` type($res)
Traits: NVVMRequiresSM<90>
Operands: ¶
Operand | Description |
---|---|
a | LLVM pointer in address space 0 or LLVM pointer in address space 3 |
b | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM pointer in address space 0 or LLVM pointer in address space 7 |
nvvm.match.sync
(NVVM::MatchSyncOp) ¶
Broadcast and compare a value across threads in warp
Syntax:
operation ::= `nvvm.match.sync` $kind $thread_mask `,` $val attr-dict `:` type($val) `->` type($res)
The match.sync
op performs broadcast and compare of operand val
across
all non-exited threads in thread_mask
and returns a mask depending on the
kind and an optional predicate.
The matching operation kinds are:
any
: Returns a mask corresponding to the non-exited threads in thethread_mask
that have the same value of operandval
.all
: Returns a mask and a predicate. If all non-exited threads in thethread_mask
have the same value of operandval
, the predicate is set to true and the mask corresponds to the non-exited threads in thethread_mask
. Otherwise, the predicate is set to false and the mask is 0.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::MatchSyncKindAttr | NVVM match sync kind |
Operands: ¶
Operand | Description |
---|---|
thread_mask | 32-bit signless integer |
val | 32-bit signless integer or 64-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer or LLVM struct type |
nvvm.mbarrier.arrive
(NVVM::MBarrierArriveOp) ¶
MBarrier Arrive Operation
Syntax:
operation ::= `nvvm.mbarrier.arrive` $addr attr-dict `:` type($addr) `->` type($res)
The nvvm.mbarrier.arrive
operation performs an arrive-on operation on the
mbarrier object at the specified address. Uses the default .release.cta
semantics.
This release pattern establishes memory ordering for operations occurring in program
order before this arrive instruction by making operations from the current thread
visible to subsequent operations in other threads within the CTA. When other threads
perform corresponding acquire operations (like ‘mbarrier.test.wait’), they synchronize
with this release pattern.
This operation causes the executing thread to signal its arrival at the barrier. The operation returns an opaque value that captures the phase of the mbarrier object prior to the arrive-on operation. The contents of this state value are implementation-specific.
The operation takes the following operand:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.expect_tx
(NVVM::MBarrierArriveExpectTxOp) ¶
MBarrier Arrive with Expected Transaction Count
Syntax:
operation ::= `nvvm.mbarrier.arrive.expect_tx` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
The nvvm.mbarrier.arrive.expect_tx
operation performs an expect-tx operation
followed by an arrive-on operation on the mbarrier object. Uses the default
.release.cta
semantics. This release pattern establishes memory ordering for
operations occurring in program order before this arrive instruction by making
operations from the current thread visible to subsequent operations in other
threads within the CTA. When other threads perform corresponding acquire operations
(like ‘mbarrier.test.wait’), they synchronize with this release pattern.
This operation first performs an expect-tx operation with the specified transaction count, then performs an arrive-on operation with an implicit count of 1. The expect-tx operation increases the tx-count of the mbarrier object by the specified expectCount value, setting the current phase to expect and tracks the completion of additional asynchronous transactions.
The operation takes the following operands:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.txcount
: An unsigned integer specifying the expected transaction count for the expect-tx operation. This represents the number of asynchronous transactions expected to complete before the barrier phase completes.predicate
: Optional predicate for conditional execution.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
txcount | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.arrive.expect_tx.shared
(NVVM::MBarrierArriveExpectTxSharedOp) ¶
Shared MBarrier Arrive with Expected Transaction Count
Syntax:
operation ::= `nvvm.mbarrier.arrive.expect_tx.shared` $addr `,` $txcount (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
This Op is the same as nvvm.mbarrier.arrive.expect_tx
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
txcount | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.arrive.nocomplete
(NVVM::MBarrierArriveNocompleteOp) ¶
MBarrier Arrive No-Complete Operation
Syntax:
operation ::= `nvvm.mbarrier.arrive.nocomplete` $addr `,` $count attr-dict `:` type(operands) `->` type($res)
The nvvm.mbarrier.arrive.nocomplete
operation performs an arrive-on operation
on the mbarrier object with the guarantee that it will not cause the barrier to
complete its current phase. Uses the default .release.cta
semantics. This release
pattern establishes memory ordering for operations occurring in program order before
this arrive instruction by making operations from the current thread visible to
subsequent operations in other threads within the CTA. When other threads perform
corresponding acquire operations (like ‘mbarrier.test.wait’), they synchronize with
this release pattern.
This operation causes the executing thread to signal its arrival at the barrier with a specified count, but ensures that the barrier phase will not complete as a result of this operation. The operation returns an opaque value that captures the phase of the mbarrier object prior to the arrive-on operation.
The operation takes the following operands:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.count
: Integer specifying the count argument to the arrive-on operation. Must be in the valid range as specified in the mbarrier object contents.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
count | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.nocomplete.shared
(NVVM::MBarrierArriveNocompleteSharedOp) ¶
Shared MBarrier Arrive No-Complete Operation
Syntax:
operation ::= `nvvm.mbarrier.arrive.nocomplete.shared` $addr `,` $count attr-dict `:` type(operands) `->` type($res)
This Op is the same as nvvm.mbarrier.arrive.nocomplete
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
count | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.arrive.shared
(NVVM::MBarrierArriveSharedOp) ¶
Shared MBarrier Arrive Operation
Syntax:
operation ::= `nvvm.mbarrier.arrive.shared` $addr attr-dict `:` qualified(type($addr)) `->` type($res)
This Op is the same as nvvm.mbarrier.arrive
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.init
(NVVM::MBarrierInitOp) ¶
MBarrier Initialization Op
Syntax:
operation ::= `nvvm.mbarrier.init` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
The nvvm.mbarrier.init
operation initializes an mbarrier object at the specified
memory location.
This operation initializes the mbarrier object with the following state:
- Current phase: 0
- Expected arrival count:
count
- Pending arrival count:
count
- Transaction count (tx-count): 0
The operation takes the following operands:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.count
: Integer specifying the number of threads that will participate in barrier synchronization. Must be in the range [1, 2²⁰ - 1].predicate
: Optional predicate for conditional execution.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
count | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.init.shared
(NVVM::MBarrierInitSharedOp) ¶
Shared MBarrier Initialization Op
Syntax:
operation ::= `nvvm.mbarrier.init.shared` $addr `,` $count (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
This Op is the same as nvvm.mbarrier.init
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Traits: NVVMRequiresSM<80>
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
count | 32-bit signless integer |
predicate | 1-bit signless integer |
nvvm.mbarrier.inval
(NVVM::MBarrierInvalOp) ¶
MBarrier Invalidation Operation
Syntax:
operation ::= `nvvm.mbarrier.inval` $addr attr-dict `:` type(operands)
The nvvm.mbarrier.inval
operation invalidates an mbarrier object at the
specified memory location.
This operation marks the mbarrier object as invalid, making it safe to repurpose the memory location for other uses or to reinitialize it as a new mbarrier object. It is undefined behavior if the mbarrier object is already invalid.
The operation takes the following operand:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
nvvm.mbarrier.inval.shared
(NVVM::MBarrierInvalSharedOp) ¶
Shared MBarrier Invalidation Operation
Syntax:
operation ::= `nvvm.mbarrier.inval.shared` $addr attr-dict `:` type(operands)
This Op is the same as nvvm.mbarrier.inval
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
nvvm.mbarrier.test.wait
(NVVM::MBarrierTestWaitOp) ¶
MBarrier Non-Blocking Test Wait Operation
Syntax:
operation ::= `nvvm.mbarrier.test.wait` $addr `,` $state attr-dict `:` type(operands) `->` type($res)
The nvvm.mbarrier.test.wait
operation performs a non-blocking test for the
completion of a specific phase of an mbarrier object. It uses the default
.acquire.cta
semantics. This acquire pattern establishes memory ordering for
operations occurring in program order after this wait instruction by making
operations from other threads in the CTA visible to subsequent operations in the current
thread. When this wait completes, it synchronizes with the corresponding release
pattern from the mbarrier.arrive
operation, establishing memory ordering within
the CTA.
This operation tests whether the mbarrier phase specified by the state operand has completed. It is a non-blocking instruction that immediately returns the completion status without suspending the executing thread.
The operation takes the following operands:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.state
: An opaque value returned by a previousmbarrier.arrive
operation on the same mbarrier object during the current or immediately preceding phase.
The operation returns a boolean value indicating whether the specified phase has completed:
true
: The immediately preceding phase has completedfalse
: The phase is still incomplete (current phase)
Memory ordering guarantees: When this wait returns true, the following ordering guarantees hold:
- All memory accesses (except async operations) requested prior to
mbarrier.arrive
having release semantics by participating CTA threads are visible to the executing thread. - All
cp.async
operations requested prior tocp.async.mbarrier.arrive
by participating CTA threads are visible to the executing thread. - All
cp.async.bulk
operations using the same mbarrier object requested prior tombarrier.arrive
having release semantics by participating CTA threads are visible to the executing thread. - Memory accesses requested after this wait are not visible to memory
accesses performed prior to
mbarrier.arrive
by other participating threads. - No ordering guarantee exists for memory accesses by the same thread
between
mbarrier.arrive
and this wait.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
state | LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.test.wait.shared
(NVVM::MBarrierTestWaitSharedOp) ¶
Shared MBarrier Non-Blocking Test Wait Operation
Syntax:
operation ::= `nvvm.mbarrier.test.wait.shared` $addr `,` $state attr-dict `:` type(operands) `->` type($res)
This Op is the same as nvvm.mbarrier.test.wait
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
state | LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.mbarrier.try_wait.parity
(NVVM::MBarrierTryWaitParityOp) ¶
MBarrier Potentially-Blocking Try Wait with Phase Parity
Syntax:
operation ::= `nvvm.mbarrier.try_wait.parity` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)
The nvvm.mbarrier.try_wait.parity
operation performs a potentially-blocking
test for the completion of a specific phase of an mbarrier object using phase
parity. It uses the default .acquire.cta
semantics. This acquire pattern
establishes memory ordering for operations occurring in program order after this
wait instruction by making operations from other threads in the CTA visible to subsequent
operations in the current thread. When this wait completes, it synchronizes with
the corresponding release pattern from the mbarrier.arrive
operation, establishing
memory ordering within the CTA.
This operation waits for the completion of the mbarrier phase indicated by the
phase parity. While it uses the underlying PTX mbarrier.try_wait.parity
instruction, this MLIR operation generates a loop that enforces the test to
complete before continuing execution, ensuring the barrier phase is actually
completed rather than potentially timing out.
The operation takes the following operands:
addr
: A pointer to the memory location of the mbarrier object. Uses generic addressing, but the address must still be in the shared memory space.phase
: An integer specifying the phase parity (0 or 1). Even phases have parity 0, odd phases have parity 1.ticks
: An unsigned integer specifying the suspend time hint in nanoseconds. This may be used instead of the system-dependent time limit.
Memory ordering guarantees: When this wait returns true, the following ordering guarantees hold:
- All memory accesses (except async operations) requested prior to
mbarrier.arrive
having release semantics by participating CTA threads are visible to the executing thread. - All
cp.async
operations requested prior tocp.async.mbarrier.arrive
by participating CTA threads are visible to the executing thread. - All
cp.async.bulk
operations using the same mbarrier object requested prior tombarrier.arrive
having release semantics by participating CTA threads are visible to the executing thread. - Memory accesses requested after this wait are not visible to memory
accesses performed prior to
mbarrier.arrive
by other participating threads. - No ordering guarantee exists for memory accesses by the same thread
between
mbarrier.arrive
and this wait.
Implementation behavior:
This operation generates a PTX loop that repeatedly calls the underlying
mbarrier.try_wait.parity
instruction until the barrier phase completes.
Unlike the raw PTX instruction which may return without completion after a
timeout, this MLIR operation guarantees completion by continuing to loop until
the specified phase is reached.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type |
phase | 32-bit signless integer |
ticks | 32-bit signless integer |
nvvm.mbarrier.try_wait.parity.shared
(NVVM::MBarrierTryWaitParitySharedOp) ¶
Shared MBarrier Potentially-Blocking Try Wait with Phase Parity
Syntax:
operation ::= `nvvm.mbarrier.try_wait.parity.shared` $addr `,` $phase `,` $ticks attr-dict `:` type(operands)
This Op is the same as nvvm.mbarrier.try_wait.parity
except that the mbarrier object
should be accessed using a shared-memory pointer instead of a generic-memory pointer.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 3 |
phase | 32-bit signless integer |
ticks | 32-bit signless integer |
nvvm.mma.sync
(NVVM::MmaOp) ¶
Cooperative matrix-multiply and accumulate
The nvvm.mma.sync
operation collectively performs the operation
D = matmul(A, B) + C
using all threads in a warp.
All the threads in the warp must execute the same mma.sync
operation.
For each possible multiplicand PTX data type, there are one or more possible instruction shapes given as “mMnNkK”. The below table describes the posssibilities as well as the types required for the operands. Note that the data type for C (the accumulator) and D (the result) can vary independently when there are multiple possibilities in the “C/D Type” column.
When an optional attribute cannot be immediately inferred from the types of the operands and the result during parsing or validation, an error will be raised.
b1Op
is only relevant when the binary (b1) type is given to
multiplicandDataType
. It specifies how the multiply-and-acumulate is
performed and is either xor_popc
or and_poc
. The default is xor_popc
.
intOverflowBehavior
is only relevant when the multiplicandType
attribute
is one of u8, s8, u4, s4
, this attribute describes how overflow is handled
in the accumulator. When the attribute is satfinite
, the accumulator values
are clamped in the int32 range on overflow. This is the default behavior.
Alternatively, accumulator behavior wrapped
can also be specified, in
which case overflow wraps from one end of the range to the other.
layoutA
and layoutB
are required and should generally be set to
#nvvm.mma_layout<row>
and #nvvm.mma_layout<col>
respectively, but other
combinations are possible for certain layouts according to the table below.
| A/B Type | Shape | ALayout | BLayout | A Type | B Type | C/D Type |
|----------|-----------|---------|---------|----------|----------|-------------------|
| f64 | .m8n8k4 | row | col | 1x f64 | 1x f64 | 2x f64 |
| f16 | .m8n8k4 | row/col | row/col | 2x f16x2 | 2x f16x2 | 4x f16x2 or 8xf32 |
| | .m16n8k8 | row | col | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
| | .m16n8k16 | row | col | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| bf16 | .m16n8k8 | row | col | 2x i32 | 1x i32 | 4x f32 |
| | .m16n8k16 | row | col | 4x i32 | 2x i32 | 4x f32 |
| tf32 | .m16n8k4 | row | col | 2x i32 | 1x i32 | 4x f32 |
| | .m16n8k8 | row | col | 4x i32 | 2x i32 | 2x f16x2 or 4 f32 |
| u8/s8 | .m8n8k16 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | .m16n8k16 | row | col | 2x i32 | 1x i32 | 4x i32 |
| | .m16n8k32 | row | col | 4x i32 | 2x i32 | 4x i32 |
| u4/s4 | .m8n8k32 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | m16n8k32 | row | col | 2x i32 | 1x i32 | 4x i32 |
| | m16n8k64 | row | col | 4x i32 | 2x i32 | 4x i32 |
| b1 | m8n8k128 | row | col | 1x i32 | 1x i32 | 2x i32 |
| | m16n8k128 | row | col | 2x i32 | 1x i32 | 4x i32 |
Example:
%128 = nvvm.mma.sync A[%120, %121, %122, %123]
B[%124, %125]
C[%126, %127]
{layoutA = #nvvm.mma_layout<row>,
layoutB = #nvvm.mma_layout<col>,
shape = {k = 16 : i32, m = 16 : i32, n = 8 : i32}}
: (vector<2xf16>, vector<2xf16>, vector<2xf16>)
-> !llvm.struct<(vector<2xf16>, vector<2xf16>)>
Traits: AttrSizedOperandSegments
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
shape | ::mlir::NVVM::MMAShapeAttr | Attribute for MMA operation shape. |
b1Op | ::mlir::NVVM::MMAB1OpAttr | MMA binary operations |
intOverflowBehavior | ::mlir::NVVM::MMAIntOverflowAttr | MMA overflow options |
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
multiplicandAPtxType | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
multiplicandBPtxType | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
Operands: ¶
Operand | Description |
---|---|
operandA | variadic of LLVM dialect-compatible type |
operandB | variadic of LLVM dialect-compatible type |
operandC | variadic of LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.nanosleep
(NVVM::NanosleepOp) ¶
Suspends the thread for a specified duration.
Syntax:
operation ::= `nvvm.nanosleep` attr-dict $duration
The op suspends the thread for a sleep duration approximately close to the
delay $duration
, specified in nanoseconds.
The sleep duration is approximated, but guaranteed to be in the interval [0, 2*t]. The maximum sleep duration is 1 millisecond. The implementation may reduce the sleep duration for individual threads within a warp such that all sleeping threads in the warp wake up together.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
duration | ::mlir::IntegerAttr | 32-bit signless integer attribute whose minimum value is 1 whose maximum value is 1000000 |
nvvm.pmevent
(NVVM::PMEventOp) ¶
Trigger one or more Performance Monitor events.
Syntax:
operation ::= `nvvm.pmevent` attr-dict (`id` `=` $eventId^)? (`mask` `=` $maskedEventId^)?
Triggers one or more of a fixed number of performance monitor events, with event index or mask specified by immediate operand.
Without mask
it triggers a single performance monitor event indexed by
immediate operand a, in the range 0..15.
With mask
it triggers one or more of the performance monitor events. Each
bit in the 16-bit immediate operand controls an event.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
maskedEventId | ::mlir::IntegerAttr | 16-bit signless integer attribute |
eventId | ::mlir::IntegerAttr | 32-bit signless integer attribute |
nvvm.prefetch
(NVVM::PrefetchOp) ¶
Brings the cache line containing an address into the specified cache level
Syntax:
operation ::= `nvvm.prefetch` (`level` `=` $cacheLevel^ (`uniform` $uniform^)? `,`)? (`tensormap` $tensormap^ (`in_param_space` $in_param_space^)? `,`)? (`evict_priority` `=` $evictPriority^ `,`)? $addr (`,` `predicate` `=` $predicate^)? attr-dict `:` type(operands)
Prefetches the cache line containing the address given by addr
. The
operand may be a global, local, or generic pointer. When tensormap
is
specified, the operand may instead be a constant or generic pointer. If the
address maps to shared memory, the operation has no effect.
At most one of cacheLevel
or tensormap
may be present. The cacheLevel
attribute selects the target cache level. When combined with uniform
, the
prefetch is performed to the uniform cache, in which case addr
must be a
generic pointer.
When tensormap
is used, the line containing addr
is brought from the
constant or parameter state space for later use by cp.async.bulk.tensor
.
If in_param_space
is specified, the generic pointer is interpreted as
referring to the parameter state space.
uniform
can be specified after the cacheLevel
to indicate that the
prefetch is performed to the specified uniform cache level. If uniform
is
specified, addr
must be a generic address pointer and no operation is
performed if addr
maps to a const
, local
, or shared
memory location.
The evictPriority
attribute is optional and specifies the cache eviction
priority when cacheLevel
is L2.
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
cacheLevel | ::mlir::NVVM::PrefetchCacheLevelAttr | NVVM Prefetch Cache LevelEnum cases:
|
evictPriority | ::mlir::NVVM::CacheEvictionPriorityAttr | NVVM Cache Eviction PriorityEnum cases:
|
tensormap | ::mlir::UnitAttr | unit attribute |
uniform | ::mlir::UnitAttr | unit attribute |
in_param_space | ::mlir::UnitAttr | unit attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 1 or LLVM pointer in address space 5 or LLVM pointer in address space 0 or LLVM pointer in address space 4 |
predicate | 1-bit signless integer |
nvvm.rcp.approx.ftz.f
(NVVM::RcpApproxFtzF32Op) ¶
Syntax:
operation ::= `nvvm.rcp.approx.ftz.f` $arg attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Operands: ¶
Operand | Description |
---|---|
arg | 32-bit float |
Results: ¶
Result | Description |
---|---|
res | 32-bit float |
nvvm.read.ptx.sreg.clock
(NVVM::ClockOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clock` attr-dict `:` type($res)
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clock64
(NVVM::Clock64Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clock64` attr-dict `:` type($res)
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.x
(NVVM::BlockInClusterIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.y
(NVVM::BlockInClusterIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctaid.z
(NVVM::BlockInClusterIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.ctarank
(NVVM::ClusterId) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.ctarank` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.x
(NVVM::ClusterDimBlocksXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.y
(NVVM::ClusterDimBlocksYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctaid.z
(NVVM::ClusterDimBlocksZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.cluster.nctarank
(NVVM::ClusterDim) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.cluster.nctarank` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.x
(NVVM::ClusterIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
, NVVMRequiresSM<90>
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.y
(NVVM::ClusterIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.clusterid.z
(NVVM::ClusterIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.clusterid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.x
(NVVM::BlockIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.y
(NVVM::BlockIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ctaid.z
(NVVM::BlockIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg0
(NVVM::EnvReg0Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg0` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg1
(NVVM::EnvReg1Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg1` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg10
(NVVM::EnvReg10Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg10` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg11
(NVVM::EnvReg11Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg11` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg12
(NVVM::EnvReg12Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg12` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg13
(NVVM::EnvReg13Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg13` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg14
(NVVM::EnvReg14Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg14` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg15
(NVVM::EnvReg15Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg15` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg16
(NVVM::EnvReg16Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg16` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg17
(NVVM::EnvReg17Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg17` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg18
(NVVM::EnvReg18Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg18` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg19
(NVVM::EnvReg19Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg19` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg2
(NVVM::EnvReg2Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg2` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg20
(NVVM::EnvReg20Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg20` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg21
(NVVM::EnvReg21Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg21` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg22
(NVVM::EnvReg22Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg22` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg23
(NVVM::EnvReg23Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg23` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg24
(NVVM::EnvReg24Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg24` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg25
(NVVM::EnvReg25Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg25` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg26
(NVVM::EnvReg26Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg26` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg27
(NVVM::EnvReg27Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg27` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg28
(NVVM::EnvReg28Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg28` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg29
(NVVM::EnvReg29Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg29` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg3
(NVVM::EnvReg3Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg3` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg30
(NVVM::EnvReg30Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg30` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg31
(NVVM::EnvReg31Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg31` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg4
(NVVM::EnvReg4Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg4` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg5
(NVVM::EnvReg5Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg5` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg6
(NVVM::EnvReg6Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg6` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg7
(NVVM::EnvReg7Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg7` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg8
(NVVM::EnvReg8Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg8` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.envreg9
(NVVM::EnvReg9Op) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.envreg9` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.globaltimer
(NVVM::GlobalTimerOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.globaltimer` attr-dict `:` type($res)
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.globaltimer.lo
(NVVM::GlobalTimerLoOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.globaltimer.lo` attr-dict `:` type($res)
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.gridid
(NVVM::GridIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.gridid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.laneid
(NVVM::LaneIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.laneid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.eq
(NVVM::LaneMaskEqOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.eq` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.ge
(NVVM::LaneMaskGeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.ge` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.gt
(NVVM::LaneMaskGtOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.gt` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.le
(NVVM::LaneMaskLeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.le` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.lanemask.lt
(NVVM::LaneMaskLtOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.lanemask.lt` attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.x
(NVVM::ClusterDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.y
(NVVM::ClusterDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nclusterid.z
(NVVM::ClusterDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nclusterid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.x
(NVVM::GridDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.y
(NVVM::GridDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nctaid.z
(NVVM::GridDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nctaid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nsmid
(NVVM::SmDimOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nsmid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.x
(NVVM::BlockDimXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.y
(NVVM::BlockDimYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.ntid.z
(NVVM::BlockDimZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.ntid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.nwarpid
(NVVM::WarpDimOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.nwarpid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.smid
(NVVM::SmIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.smid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.x
(NVVM::ThreadIdXOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.x` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.y
(NVVM::ThreadIdYOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.y` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.tid.z
(NVVM::ThreadIdZOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.tid.z` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.warpid
(NVVM::WarpIdOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.warpid` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.read.ptx.sreg.warpsize
(NVVM::WarpSizeOp) ¶
Syntax:
operation ::= `nvvm.read.ptx.sreg.warpsize` (`range` $range^)? attr-dict `:` type($res)
Traits: AlwaysSpeculatableImplTrait
Interfaces: ConditionallySpeculatable
, InferIntRangeInterface
, NoMemoryEffect (MemoryEffectOpInterface)
Effects: MemoryEffects::Effect{}
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
range | ::mlir::LLVM::ConstantRangeAttr | A range of two integers, corresponding to LLVM's ConstantRange
|
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.redux.sync
(NVVM::ReduxOp) ¶
Redux Sync Op
Syntax:
operation ::= `nvvm.redux.sync` $kind $val `,` $mask_and_clamp attr-dict `:` type($val) `->` type($res)
redux.sync
performs a reduction operation kind
of the 32 bit source
register across all non-exited threads in the membermask.
The abs
and nan
attributes can be used in the case of f32 input type,
where the abs
attribute causes the absolute value of the input to be used
in the reduction operation, and the nan
attribute causes the reduction
operation to return NaN if any of the inputs to participating threads are
NaN.
For more information, see PTX ISA
Traits: NVVMRequiresSM<80>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ReduxKindAttr | NVVM redux kind |
abs | ::mlir::BoolAttr | bool attribute |
nan | ::mlir::BoolAttr | bool attribute |
Operands: ¶
Operand | Description |
---|---|
val | LLVM dialect-compatible type |
mask_and_clamp | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.setmaxregister
(NVVM::SetMaxRegisterOp) ¶
Syntax:
operation ::= `nvvm.setmaxregister` $action $regCount attr-dict
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
regCount | ::mlir::IntegerAttr | 32-bit signless integer attribute |
action | ::mlir::NVVM::SetMaxRegisterActionAttr | NVVM set max register action |
nvvm.shfl.sync
(NVVM::ShflOp) ¶
NVVM Dialect Op for shfl.sync
Syntax:
operation ::= `nvvm.shfl.sync` $kind $thread_mask `,` $val `,` $offset `,` $mask_and_clamp attr-dict
`:` type($val) `->` type($res)
The shfl.sync
Op implements data shuffle within threads of a warp.
The thread_mask
denotes the threads participating in the Op where
the bit position corresponds to a particular thread’s laneid.
The offset
specifies a source lane or source lane offset
(depending on kind
). The val
is the input value to be copied from
the source. The mask_and_clamp
contains two packed values specifying
a mask for logically splitting warps into sub-segments and an upper bound
for clamping the source lane index.
For more information, see PTX ISA
Traits: NVVMRequiresSM<30>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::ShflKindAttr | NVVM shuffle kind |
return_value_and_is_valid | ::mlir::UnitAttr | unit attribute |
Operands: ¶
Operand | Description |
---|---|
thread_mask | 32-bit signless integer |
val | LLVM dialect-compatible type |
offset | 32-bit signless integer |
mask_and_clamp | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM dialect-compatible type |
nvvm.st.bulk
(NVVM::BulkStoreOp) ¶
Bulk Store Op
Syntax:
operation ::= `nvvm.st.bulk` $addr `,` `size` `=` $size (`,` `init` `=` $initVal^)? attr-dict `:` type($addr)
Initializes a region of shared memory at the address given by addr
.
The size
operand specifies the number of bytes to initialize and must be
a multiple of 8.
The initVal
operand specifies the value to initialize the memory to. The
only supported value is 0.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
initVal | ::mlir::IntegerAttr | 64-bit signless integer attribute |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer in address space 0 or LLVM pointer in address space 3 |
size | 64-bit signless integer |
nvvm.stmatrix
(NVVM::StMatrixOp) ¶
Cooperative matrix store
Syntax:
operation ::= `nvvm.stmatrix` $ptr `,` $sources attr-dict `:` type(operands)
Collectively store one or more matrices across all threads in a warp to the location indicated by the address operand $ptr in shared memory.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
shape | ::mlir::NVVM::LdStMatrixShapeAttr | Matrix shape for ldmatrix and stmatrix |
eltType | ::mlir::NVVM::LdStMatrixEltTypeAttr | Element type for ldmatrix and stmatrix |
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer in address space 3 |
sources | variadic of 32-bit signless integer |
nvvm.tcgen05.alloc
(NVVM::Tcgen05AllocOp) ¶
Tcgen05 alloc operation
Syntax:
operation ::= `nvvm.tcgen05.alloc` $addr `,` $nCols attr-dict `:` type(operands)
The tcgen05.alloc
Op allocates tensor core memory for
the amount specified by nCols
and writes the destination
address to the addr
argument. The nCols
operand specifies the
number of columns to be allocated and it must be a power-of-two.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type or LLVM pointer in address space 3 |
nCols | 32-bit signless integer |
nvvm.tcgen05.commit
(NVVM::Tcgen05CommitOp) ¶
Tcgen05 commit operations
Syntax:
operation ::= `nvvm.tcgen05.commit` $addr (`,` `multicast_mask` `=` $multicastMask^)?
attr-dict `:` type(operands)
The tcgen05.commit
makes the mbarrier object, specified by
the operand addr
, track the completion of all the prior
async-tcgen05 operations initiated by the executing thread.
The multicast variants allow signaling on the mbarrier objects
of multiple CTAs within the cluster. Operand multicastMask
,
when present, specifies the destination CTAs in the cluster such
that each bit position in the 16-bit multicastMask
operand
corresponds to the nvvm.read.ptx.sreg.ctaid
of the destination CTA.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
Operands: ¶
Operand | Description |
---|---|
addr | LLVM pointer type or LLVM pointer in address space 3 |
multicastMask | 16-bit signless integer |
nvvm.tcgen05.cp
(NVVM::Tcgen05CpOp) ¶
Tcgen05 copy operation
Syntax:
operation ::= `nvvm.tcgen05.cp` $taddr`,` $smem_desc attr-dict
Instruction tcgen05.cp initiates an asynchronous copy operation from
shared memory to the location specified by the address operand taddr
in the Tensor Memory. The 64-bit register operand smem_desc
specifies
the matrix descriptor representing the source matrix in the shared memory
that needs to be copied.
Example:
nvvm.tcgen05.cp %taddr, %smem_desc {
group = #nvvm.tcgen05_group<cta_2>,
shape = #nvvm.tcgen05_cp_shape<shape_64x128b>,
multicast = #nvvm.tcgen05_cp_multicast<warpx2_01_23>,
srcFormat = #nvvm.tcgen05_cp_src_fmt<b6x16_p32>
}
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
shape | ::mlir::NVVM::Tcgen05CpShapeAttr | tcgen05 cp shapes |
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
multicast | ::mlir::NVVM::Tcgen05CpMulticastAttr | tcgen05 cp multicast |
srcFormat | ::mlir::NVVM::Tcgen05CpSrcFormatAttr | tcgen05 cp source format |
Operands: ¶
Operand | Description |
---|---|
taddr | LLVM pointer in address space 6 |
smem_desc | 64-bit signless integer |
nvvm.tcgen05.dealloc
(NVVM::Tcgen05DeallocOp) ¶
Tcgen05 dealloc operation
Syntax:
operation ::= `nvvm.tcgen05.dealloc` $taddr `,` $nCols attr-dict `:` type(operands)
The tcgen05.dealloc
Op de-allocates the tensor core memory
specified by tmemAddr
, which must be from a previous tensor
memory allocation. The nCols
operand specifies the number
of columns to be de-allocated, and it must be a power-of-two.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
Operands: ¶
Operand | Description |
---|---|
taddr | LLVM pointer in address space 6 |
nCols | 32-bit signless integer |
nvvm.tcgen05.fence
(NVVM::Tcgen05FenceOp) ¶
Tcgen05 fence operations
Syntax:
operation ::= `nvvm.tcgen05.fence` $kind attr-dict
The tcgen05.fence<before>
orders all prior async tcgen05 operations
with respect to the subsequent tcgen05 and execution ordering operations.
The tcgen05.fence<after>
orders all subsequent async tcgen05 operations
with respect to the prior tcgen05 and execution ordering operations.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::Tcgen05FenceKindAttr | NVVM Tcgen05 fence kind |
nvvm.tcgen05.ld
(NVVM::Tcgen05LdOp) ¶
Tensor memory load instructions
Syntax:
operation ::= `nvvm.tcgen05.ld` $tmemAddr (`,` $offset^)? (`pack` $pack^)? attr-dict `:` type($res)
Instruction tcgen05.ld
asynchronously loads data from the Tensor Memory at
the location specified by the 32-bit address operand tmemAddr
into the
destination register res
, collectively across all threads of the warps.
The shape
and the num
attribute together determines the total
dimension of the data which is loaded from the Tensor Memory. The shape
attribute indicates the base dimension of data to be accessed as described
in the Data Movement Shape. The num
attribute indicates the repeat
factor on the base dimension resulting in the total dimension of the data
that is accessed.
The shape 16x32bx2
performs two accesses into Tensor Memory of the shape
16x32b
. The base address of the first access is specified by tmemAddr
and the base address of the second access is specified by
tmemAddr + offset
, where offset
is an immediate argument.
The unit attribute pack
can be used to pack two 16-bit
elements from adjacent columns into a single 32-bit element during the load.
The following table describes the size of the vector for various combinations
of num
and shape
attributes:
|=====================================================================|
| num/shape | 16x32bx2/16x64b/32x32b | 16x128b | 16x256b |
|=====================================================================|
| x1 | 1 | 2 | 4 |
| x2 | 2 | 4 | 8 |
| x4 | 4 | 8 | 16 |
| x8 | 8 | 16 | 32 |
| x16 | 16 | 32 | 64 |
| x32 | 32 | 64 | 128 |
| x64 | 64 | 128 | NA |
| x128 | 128 | NA | NA |
|=====================================================================|
Example:
nvvm.tcgen05.ld %tmemAddr, %offset pack {
shape = #nvvm.tcgen05_ldst_shape<shape_16x32bx2>,
} : <2xi32>
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
pack | ::mlir::UnitAttr | unit attribute |
shape | ::mlir::NVVM::Tcgen05LdStShapeAttr | allowed 32-bit signless integer cases: 0, 1, 2, 3, 4 |
Operands: ¶
Operand | Description |
---|---|
tmemAddr | LLVM pointer in address space 6 |
offset | 64-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer or vector of 32-bit signless integer values of length 2/4/8/16/32/64/128 |
nvvm.tcgen05.mma_smem_desc
(NVVM::Tcgen05MmaSmemDescOp) ¶
Constructs a Shared Memory descriptor for MMA Operands A or B
Syntax:
operation ::= `nvvm.tcgen05.mma_smem_desc` `(` operands `)` attr-dict `:` `(` type(operands) `)` `->` type($res)
The nvvm.tcgen05_mma_smem_desc
constructs a Shared Memory descriptor
for tcgen05.mma. This descriptor is a 64-bit value which describes the
properties of multiplicand matrix in shared memory including its location
in the shared memory of the current CTA.
+-----------+------+------------------------------------------------------+
| Bit-field | Size | Description |
+-----------+------+------------------------------------------------------+
| 0-13 | 14 | Matrix start address |
| 14-15 | 2 | Reserved |
| 16-29 | 14 | Leading dim relative-offset (or) absolute-address |
| 30-31 | 2 | Reserved |
| 32-45 | 14 | Stride dimension byte offset |
| 46-48 | 3 | Fixed constant value of 0b001 |
| 49-51 | 3 | Matrix base offset |
| 52 | 1 | Leading dimension stride mode: |
| | | 0: byte offset relative |
| | | 1: byte address absolute |
| 53-60 | 8 | Fixed constant value of 0xb00000000 |
| 61-63 | 3 | Swizzling mode: |
| | | 0: No swizzling |
| | | 1: 128-Byte with 32B atomic swizzling |
| | | 2: 128-Byte swizzling |
| | | 4: 64-Byte swizzling |
| | | 6: 32-Byte swizzling |
| | | (Values 3, 5 and 7 are invalid) |
+-----------+------+------------------------------------------------------+
Example:
%desc = nvvm.tcgen05.mma_smem_desc (%startAddr, %leadingDimOffset, %strideDimOffset,
%baseOffset, %leadingDimMode, %swizzleMode) : (i32, i32, i32, i8, i1, i8) -> i64
For more information, see PTX ISA
Operands: ¶
Operand | Description |
---|---|
startAddr | 32-bit signless integer |
leadingDimOffset | 32-bit signless integer |
strideDimOffset | 32-bit signless integer |
baseOffset | 8-bit signless integer |
leadingDimMode | 1-bit signless integer |
swizzleMode | 8-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 64-bit signless integer |
nvvm.tcgen05.relinquish_alloc_permit
(NVVM::Tcgen05RelinquishAllocPermitOp) ¶
Tcgen05 Op to relinquish the right to allocate
Syntax:
operation ::= `nvvm.tcgen05.relinquish_alloc_permit` attr-dict
The tcgen05.relinquish_alloc_permit
Op specifies that the CTA
of the executing thread is relinquishing the right to allocate
Tensor Memory. So, it is illegal for a CTA to perform tcgen05.alloc
after any of its constituent threads execute tcgen05.relinquish_alloc_permit
.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
nvvm.tcgen05.shift
(NVVM::Tcgen05ShiftOp) ¶
Tcgen05 shift operation
Syntax:
operation ::= `nvvm.tcgen05.shift` $taddr attr-dict `:` type(operands)
The tcgen05.shift
is an asynchronous instruction which initiates
the shifting of 32-byte elements downwards across all the rows,
except the last, by one row. The operand taddr
specifies the base
address of the matrix in Tensor Memory whose rows must be down shifted.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101,103>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::NVVM::CTAGroupKindAttr | NVVM CTA group kind |
Operands: ¶
Operand | Description |
---|---|
taddr | LLVM pointer in address space 6 |
nvvm.tcgen05.st
(NVVM::Tcgen05StOp) ¶
Tensor memory store instructions
Syntax:
operation ::= `nvvm.tcgen05.st` $tmemAddr `,` $val (`,` $offset^)? (`unpack` $unpack^)? attr-dict `:` type($val)
Instruction tcgen05.st
asynchronously stores data from the source register r
into the Tensor Memory at the location specified by the 32-bit address operand
tmemAddr
, collectively across all threads of the warps.
The shape
and the num
attribute together determines the total dimension of
the data which is stored to the Tensor Memory. The shape
indicates the base
dimension of data to be accessed. The num
attribute indicates the repeat
factor on the base dimension resulting in the total dimension of the data that
is accessed.
The shape 16x32bx2
performs two accesses into Tensor Memory of the shape
16x32b
. The base address of the first access is specified by tmemAddr
and the base address of the second access is specified by
tmemAddr + offset
, where offset
is an immediate argument.
The unit attribute unpack
can be used to unpack a 32-bit element
in the register into two 16-bit elements and store them in adjacent columns.
The following table describes the size of the vector for various combinations
of num
and shape
attributes:
|=====================================================================|
| num/shape | 16x32bx2/16x64b/32x32b | 16x128b | 16x256b |
|=====================================================================|
| x1 | 1 | 2 | 4 |
| x2 | 2 | 4 | 8 |
| x4 | 4 | 8 | 16 |
| x8 | 8 | 16 | 32 |
| x16 | 16 | 32 | 64 |
| x32 | 32 | 64 | 128 |
| x64 | 64 | 128 | NA |
| x128 | 128 | NA | NA |
|=====================================================================|
Example:
nvvm.tcgen05.st %tmemAddr, %val, %offset unpack {
shape = #nvvm.tcgen05_ldst_shape<shape_16x32bx2>,
} : <2xi32>
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
unpack | ::mlir::UnitAttr | unit attribute |
shape | ::mlir::NVVM::Tcgen05LdStShapeAttr | allowed 32-bit signless integer cases: 0, 1, 2, 3, 4 |
Operands: ¶
Operand | Description |
---|---|
tmemAddr | LLVM pointer in address space 6 |
val | 32-bit signless integer or vector of 32-bit signless integer values of length 2/4/8/16/32/64/128 |
offset | 64-bit signless integer |
nvvm.tcgen05.wait
(NVVM::Tcgen05WaitOp) ¶
Tcgen05 wait operations
Syntax:
operation ::= `nvvm.tcgen05.wait` $kind attr-dict
The tcgen05.wait<load>
causes the executing thread to block until
all prior tcgen05.ld
operations issued by the executing thread
have completed. Similarly, the tcgen05.wait<store>
causes the executing
thread to block until all prior tcgen05.st
operations issued by the
executing thread have completed.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<100,101>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::Tcgen05WaitKindAttr | NVVM Tcgen05 wait kind |
nvvm.vote.sync
(NVVM::VoteSyncOp) ¶
Vote across thread group
Syntax:
operation ::= `nvvm.vote.sync` $kind $mask `,` $pred attr-dict `->` type($res)
The vote.sync
op will cause executing thread to wait until all non-exited
threads corresponding to membermask have executed vote.sync
with the same
qualifiers and same membermask value before resuming execution.
The vote operation kinds are:
any
: True if source predicate is True for some thread in membermask.all
: True if source predicate is True for all non-exited threads in membermask.uni
: True if source predicate has the same value in all non-exited threads in membermask.ballot
: In the ballot form, the destination result is a 32 bit integer. In this form, the predicate from each thread in membermask are copied into the corresponding bit position of the result, where the bit position corresponds to the thread’s lane id.
For more information, see PTX ISA
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
kind | ::mlir::NVVM::VoteSyncKindAttr | NVVM vote sync kind |
Operands: ¶
Operand | Description |
---|---|
mask | 32-bit signless integer |
pred | 1-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | 32-bit signless integer or 1-bit signless integer |
nvvm.wgmma.commit.group.sync.aligned
(NVVM::WgmmaGroupSyncAlignedOp) ¶
Syntax:
operation ::= `nvvm.wgmma.commit.group.sync.aligned` attr-dict
Commits all prior uncommitted warpgroup level matrix multiplication operations.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<90>
nvvm.wgmma.fence.aligned
(NVVM::WgmmaFenceAlignedOp) ¶
Syntax:
operation ::= `nvvm.wgmma.fence.aligned` attr-dict
Enforce an ordering of register accesses between warpgroup level matrix multiplication and other operations.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<90>
nvvm.wgmma.mma_async
(NVVM::WgmmaMmaAsyncOp) ¶
Syntax:
operation ::= `nvvm.wgmma.mma_async` $descriptorA `,` $descriptorB `,` $inouts `,` $shape `,`
`D` `[` $typeD `,` $scaleD (`,` $satfinite^)? `]` `,`
`A` `[` $typeA `,` $scaleA `,` $layoutA `]` `,`
`B` `[` $typeB `,` $scaleB `,` $layoutB `]`
attr-dict `:`
type($inouts) `->` type($results)
The warpgroup (128 threads) level matrix multiply and accumulate operation has either of the following forms, where matrix D is called accumulator: D = A * B + D D = A * B, where the input from accumulator D is disabled.
Supported shapes:
|--------------|--------------|------------|--------------|---------------|
| | | | |f16+=e4m3*e4m3 |
| | | | |f16+=e5m2*e5m2 |
|f32+=tf32*tf32|f16+=f16 *f16 | s32+=s8*s8 |s32 += b1 * b1|f16+=e5m2*e4m3 |
| |f32+=f16 *f16 | s32+=u8*u8 | |f16+=e4m3*e5m2 |
| |f32+=bf16*bf16| s32+=u8*u8 | |f16+=e4m3*e5m2 |
| |f32+=bf16*bf16| s32+=s8*u8 | |f32+=e4m3*e4m3 |
| | | s32+=u8*s8 | |f32+=e5m2*e5m2 |
| | | | |f32+=e4m3*e5m2 |
| | | | |f32+=e4m3*e5m2 |
|--------------|--------------|------------|--------------|---------------|
| .m64n8k8 | .m64n8k16 | .m64n8k32 | .m64n8k256 | .m64n8k32 |
| .m64n16k8 | .m64n16k16 | .m64n16k32 | .m64n16k256 | .m64n16k32 |
| .m64n24k8 | .m64n24k16 | .m64n24k32 | .m64n24k256 | .m64n24k32 |
| .m64n32k8 | .m64n32k16 | .m64n32k32 | .m64n32k256 | .m64n32k32 |
| .m64n40k8 | .m64n40k16 | .m64n48k32 | .m64n48k256 | .m64n40k32 |
| .m64n48k8 | .m64n48k16 | .m64n64k32 | .m64n64k256 | .m64n48k32 |
| .m64n56k8 | .m64n56k16 | .m64n80k32 | .m64n80k256 | .m64n56k32 |
| .m64n64k8 | .m64n64k16 | .m64n96k32 | .m64n96k256 | .m64n64k32 |
| .m64n72k8 | .m64n72k16 | .m64n112k32| .m64n112k256 | .m64n72k32 |
| .m64n80k8 | .m64n80k16 | .m64n128k32| .m64n128k256 | .m64n80k32 |
| .m64n88k8 | .m64n88k16 | .m64n144k32| .m64n144k256 | .m64n88k32 |
| .m64n96k8 | .m64n96k16 | .m64n160k32| .m64n160k256 | .m64n96k32 |
| .m64n104k8 | .m64n104k16 | .m64n176k32| .m64n176k256 | .m64n104k32 |
| .m64n112k8 | .m64n112k16 | .m64n192k32| .m64n192k256 | .m64n112k32 |
| .m64n120k8 | .m64n120k16 | .m64n208k32| .m64n208k256 | .m64n120k32 |
| .m64n128k8 | .m64n128k16 | .m64n224k32| .m64n224k256 | .m64n128k32 |
| .m64n136k8 | .m64n136k16 | .m64n240k32| .m64n240k256 | .m64n136k32 |
| .m64n144k8 | .m64n144k16 | .m64n256k32| .m64n256k256 | .m64n144k32 |
| .m64n152k8 | .m64n152k16 | | | .m64n152k32 |
| .m64n160k8 | .m64n160k16 | | | .m64n160k32 |
| .m64n168k8 | .m64n168k16 | | | .m64n168k32 |
| .m64n176k8 | .m64n176k16 | | | .m64n176k32 |
| .m64n184k8 | .m64n184k16 | | | .m64n184k32 |
| .m64n192k8 | .m64n192k16 | | | .m64n192k32 |
| .m64n200k8 | .m64n200k16 | | | .m64n200k32 |
| .m64n208k8 | .m64n208k16 | | | .m64n208k32 |
| .m64n216k8 | .m64n216k16 | | | .m64n216k32 |
| .m64n224k8 | .m64n224k16 | | | .m64n224k32 |
| .m64n232k8 | .m64n232k16 | | | .m64n232k32 |
| .m64n240k8 | .m64n240k16 | | | .m64n240k32 |
| .m64n248k8 | .m64n248k16 | | | .m64n248k32 |
| .m64n256k8 | .m64n256k16 | | | .m64n256k32 |
|--------------|--------------|------------|--------------|---------------|
For more information, see PTX ISA
Interfaces: BasicPtxBuilderInterface
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
shape | ::mlir::NVVM::MMAShapeAttr | Attribute for MMA operation shape. |
typeA | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA types |
typeB | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA types |
typeD | ::mlir::NVVM::WGMMATypesAttr | NVVM WGMMA types |
scaleD | ::mlir::NVVM::WGMMAScaleOutAttr | WGMMA input predicate |
scaleA | ::mlir::NVVM::WGMMAScaleInAttr | WGMMA overflow options |
scaleB | ::mlir::NVVM::WGMMAScaleInAttr | WGMMA overflow options |
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
satfinite | ::mlir::NVVM::MMAIntOverflowAttr | MMA overflow options |
Operands: ¶
Operand | Description |
---|---|
inouts | LLVM structure type |
descriptorA | 64-bit signless integer |
descriptorB | 64-bit signless integer |
Results: ¶
Result | Description |
---|---|
results | LLVM structure type |
nvvm.wgmma.wait.group.sync.aligned
(NVVM::WgmmaWaitGroupSyncOp) ¶
Syntax:
operation ::= `nvvm.wgmma.wait.group.sync.aligned` attr-dict $group
Signal the completion of a preceding warpgroup operation.
For more information, see PTX ISA
Traits: NVVMRequiresSMa<90>
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
group | ::mlir::IntegerAttr | 64-bit signless integer attribute |
nvvm.wmma.load
(NVVM::WMMALoadOp) ¶
Warp synchronous matrix load
Syntax:
operation ::= `nvvm.wmma.load` $ptr `,` $stride attr-dict `:` functional-type($ptr, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
eltype | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
frag | ::mlir::NVVM::MMAFragAttr | NVVM MMA frag type |
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer type |
stride | 32-bit signless integer |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.wmma.mma
(NVVM::WMMAMmaOp) ¶
Warp synchronous matrix-multiply accumulate using tensor cores.
Syntax:
operation ::= `nvvm.wmma.mma` $args attr-dict `:` functional-type($args, $res)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layoutA | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
layoutB | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
eltypeA | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
eltypeB | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
Operands: ¶
Operand | Description |
---|---|
args | variadic of LLVM dialect-compatible type |
Results: ¶
Result | Description |
---|---|
res | LLVM structure type |
nvvm.wmma.store
(NVVM::WMMAStoreOp) ¶
Warp synchronous matrix store
Syntax:
operation ::= `nvvm.wmma.store` $ptr `,` $stride `,` $args attr-dict `:` qualified(type($ptr)) `,`
type($args)
Attributes: ¶
Attribute | MLIR Type | Description |
---|---|---|
m | ::mlir::IntegerAttr | 32-bit signless integer attribute |
n | ::mlir::IntegerAttr | 32-bit signless integer attribute |
k | ::mlir::IntegerAttr | 32-bit signless integer attribute |
layout | ::mlir::NVVM::MMALayoutAttr | NVVM MMA layout |
eltype | ::mlir::NVVM::MMATypesAttr | NVVM MMA types |
Operands: ¶
Operand | Description |
---|---|
ptr | LLVM pointer type |
args | variadic of LLVM dialect-compatible type |
stride | 32-bit signless integer |
Attributes ¶
CTAGroupKindAttr ¶
NVVM CTA group kind
Syntax:
#nvvm.cta_group<
::mlir::NVVM::CTAGroupKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::CTAGroupKind | an enum of type CTAGroupKind |
CacheEvictionPriorityAttr ¶
NVVM Cache Eviction Priority
Syntax:
#nvvm.cache_eviction_priority<
::mlir::NVVM::CacheEvictionPriority # value
>
Enum cases:
- evict_normal (
EvictNormal
) - evict_first (
EvictFirst
) - evict_last (
EvictLast
) - evict_unchanged (
EvictUnchanged
) - no_allocate (
NoAllocate
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::CacheEvictionPriority | an enum of type CacheEvictionPriority |
ConvertFP6TypeAttr ¶
NVVM ConvertFP6Type kind
Syntax:
#nvvm.convert_fp6_type<
::mlir::NVVM::ConvertFP6Type # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ConvertFP6Type | an enum of type ConvertFP6Type |
ConvertFP8TypeAttr ¶
NVVM ConvertFP8Type kind
Syntax:
#nvvm.convert_fp8_type<
::mlir::NVVM::ConvertFP8Type # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ConvertFP8Type | an enum of type ConvertFP8Type |
DotAccumulateTypeAttr ¶
NVVM DotAccumulateType
Syntax:
#nvvm.dot_accumulate_type<
::mlir::NVVM::DotAccumulateType # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::DotAccumulateType | an enum of type DotAccumulateType |
FPRoundingModeAttr ¶
NVVM FPRoundingMode kind
Syntax:
#nvvm.fp_rnd_mode<
::mlir::NVVM::FPRoundingMode # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::FPRoundingMode | an enum of type FPRoundingMode |
GridDepActionKindAttr ¶
Action kind for grid dependency control
Syntax:
#nvvm.grid_dep_action<
::mlir::NVVM::GridDepActionKind # value
>
Enum cases:
- wait (
wait
) - launch_dependents (
launch_dependents
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::GridDepActionKind | an enum of type GridDepActionKind |
LdStMatrixEltTypeAttr ¶
Element type for ldmatrix and stmatrix
Syntax:
#nvvm.ld_st_matrix_elt_type<
::mlir::NVVM::LdStMatrixEltType # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::LdStMatrixEltType | an enum of type LdStMatrixEltType |
LdStMatrixShapeAttr ¶
Matrix shape for ldmatrix and stmatrix
Syntax:
#nvvm.ld_st_matrix_shape<
int, # m
int # n
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
m | int | |
n | int |
LoadCacheModifierKindAttr ¶
NVVM load cache modifier kind
Syntax:
#nvvm.load_cache_modifier<
::mlir::NVVM::LoadCacheModifierKind # value
>
Enum attribute of the different kinds of cache operators for load instructions.
For more information, see PTX ISA
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::LoadCacheModifierKind | an enum of type LoadCacheModifierKind |
MMAB1OpAttr ¶
MMA binary operations
Syntax:
#nvvm.mma_b1op<
::mlir::NVVM::MMAB1Op # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAB1Op | an enum of type MMAB1Op |
MMAFragAttr ¶
NVVM MMA frag type
Syntax:
#nvvm.mma_frag<
::mlir::NVVM::MMAFrag # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAFrag | an enum of type MMAFrag |
MMAIntOverflowAttr ¶
MMA overflow options
Syntax:
#nvvm.mma_int_overflow<
::mlir::NVVM::MMAIntOverflow # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMAIntOverflow | an enum of type MMAIntOverflow |
MMALayoutAttr ¶
NVVM MMA layout
Syntax:
#nvvm.mma_layout<
::mlir::NVVM::MMALayout # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMALayout | an enum of type MMALayout |
MMATypesAttr ¶
NVVM MMA types
Syntax:
#nvvm.mma_type<
::mlir::NVVM::MMATypes # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MMATypes | an enum of type MMATypes |
MatchSyncKindAttr ¶
NVVM match sync kind
Syntax:
#nvvm.match_sync_kind<
::mlir::NVVM::MatchSyncKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MatchSyncKind | an enum of type MatchSyncKind |
MemScopeKindAttr ¶
NVVM Memory Scope kind
Syntax:
#nvvm.mem_scope<
::mlir::NVVM::MemScopeKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::MemScopeKind | an enum of type MemScopeKind |
MMAShapeAttr ¶
Attribute for MMA operation shape.
Syntax:
#nvvm.shape<
int, # m
int, # n
int # k
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
m | int | |
n | int | |
k | int |
NVVMTargetAttr ¶
Syntax:
#nvvm.target<
int, # O
::llvm::StringRef, # triple
::llvm::StringRef, # chip
::llvm::StringRef, # features
DictionaryAttr, # flags
ArrayAttr, # link
bool # verifyTarget
>
GPU target attribute for controlling compilation of NVIDIA targets. All parameters decay into default values if not present.
Examples:
- Target with default values.
gpu.module @mymodule [#nvvm.target] attributes {...} {
...
}
- Target with
sm_90
chip and fast math.
gpu.module @mymodule [#nvvm.target<chip = "sm_90", flags = {fast}>] {
...
}
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
O | int | Optimization level to apply. |
triple | ::llvm::StringRef | Target triple. |
chip | ::llvm::StringRef | Target chip. |
features | ::llvm::StringRef | Target chip features. |
flags | DictionaryAttr | Target specific flags. |
link | ArrayAttr | Files to link to the LLVM module. |
verifyTarget | bool | Perform SM version check on Ops. |
PrefetchCacheLevelAttr ¶
NVVM Prefetch Cache Level
Syntax:
#nvvm.prefetch_cache_level<
::mlir::NVVM::PrefetchCacheLevel # value
>
Enum cases:
- L1 (
L1
) - L2 (
L2
)
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::PrefetchCacheLevel | an enum of type PrefetchCacheLevel |
ProxyKindAttr ¶
Proxy kind
Syntax:
#nvvm.proxy_kind<
::mlir::NVVM::ProxyKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ProxyKind | an enum of type ProxyKind |
ReduxKindAttr ¶
NVVM redux kind
Syntax:
#nvvm.redux_kind<
::mlir::NVVM::ReduxKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ReduxKind | an enum of type ReduxKind |
SaturationModeAttr ¶
NVVM SaturationMode kind
Syntax:
#nvvm.sat_mode<
::mlir::NVVM::SaturationMode # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::SaturationMode | an enum of type SaturationMode |
SetMaxRegisterActionAttr ¶
NVVM set max register action
Syntax:
#nvvm.action<
::mlir::NVVM::SetMaxRegisterAction # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::SetMaxRegisterAction | an enum of type SetMaxRegisterAction |
SharedSpaceAttr ¶
Shared memory space
Syntax:
#nvvm.shared_space<
::mlir::NVVM::SharedSpace # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::SharedSpace | an enum of type SharedSpace |
ShflKindAttr ¶
NVVM shuffle kind
Syntax:
#nvvm.shfl_kind<
::mlir::NVVM::ShflKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::ShflKind | an enum of type ShflKind |
TMALoadModeAttr ¶
List of Load-Modes supported for TMA Tensor Ops
Syntax:
#nvvm.tma_load_mode<
::mlir::NVVM::TMALoadMode # value
>
TMA Tensor Ops support the following modes, when copying data from global memory to shared memory (i.e. load):
Tile Mode: It’s the default mode. The source multi-dimensional tensor layout is preserved at the destination. For more information, see PTX ISA
Im2col Mode: This mode is used when im2colOffsets
operands are present.
The elements in the Bounding Box of the source tensor are rearranged into
columns at the destination. In this mode, the tensor has to be at least
3-dimensional. The number of im2colOffsets
is dims - 2
where dims
is the dimension of the tensor.
For more information, see PTX ISA
Im2col_W Mode: This mode is similar to Im2Col mode with the restriction that
elements are accessed across the W dimension only. The number of im2colOffsets
are always two, referred as wHalo
and wOffset
.
For more information, see PTX ISA
Im2col_W_128 Mode: This mode is similar to Im2Col_W mode with the number of elements accessed across the W dimension is always 128 only. For more information, see PTX ISA
Tile_Gather4 Mode: This mode is similar to Tile mode but works only on 2D tensor. In gather4 mode, four rows in the source 2D tensor are combined to form a single 2D tensor at the destination. This mode requires five co-ordinates. The first one represents the column-index followed by four row indices. For more information, see PTX ISA
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::TMALoadMode | an enum of type TMALoadMode |
TMAReduxKindAttr ¶
NVVM TMA redux kind
Syntax:
#nvvm.tma_redux_kind<
::mlir::NVVM::TMAReduxKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::TMAReduxKind | an enum of type TMAReduxKind |
TMAStoreModeAttr ¶
NVVM TMA Store Mode
Syntax:
#nvvm.tma_store_mode<
::mlir::NVVM::TMAStoreMode # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::TMAStoreMode | an enum of type TMAStoreMode |
Tcgen05CpMulticastAttr ¶
Tcgen05 cp multicast
Syntax:
#nvvm.tcgen05_cp_multicast<
::mlir::NVVM::Tcgen05CpMulticast # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05CpMulticast | an enum of type Tcgen05CpMulticast |
Tcgen05CpShapeAttr ¶
Tcgen05 cp shapes
Syntax:
#nvvm.tcgen05_cp_shape<
::mlir::NVVM::Tcgen05CpShape # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05CpShape | an enum of type Tcgen05CpShape |
Tcgen05CpSrcFormatAttr ¶
Tcgen05 cp source format
Syntax:
#nvvm.tcgen05_cp_src_fmt<
::mlir::NVVM::Tcgen05CpSrcFormat # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05CpSrcFormat | an enum of type Tcgen05CpSrcFormat |
Tcgen05FenceKindAttr ¶
NVVM Tcgen05 fence kind
Syntax:
#nvvm.tcgen05_fence<
::mlir::NVVM::Tcgen05FenceKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05FenceKind | an enum of type Tcgen05FenceKind |
Tcgen05LdStShapeAttr ¶
Allowed 32-bit signless integer cases: 0, 1, 2, 3, 4
Syntax:
#nvvm.tcgen05_ldst_shape<
::mlir::NVVM::Tcgen05LdStShape # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05LdStShape | an enum of type Tcgen05LdStShape |
Tcgen05WaitKindAttr ¶
NVVM Tcgen05 wait kind
Syntax:
#nvvm.tcgen05_wait<
::mlir::NVVM::Tcgen05WaitKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::Tcgen05WaitKind | an enum of type Tcgen05WaitKind |
VoteSyncKindAttr ¶
NVVM vote sync kind
Syntax:
#nvvm.vote_sync_kind<
::mlir::NVVM::VoteSyncKind # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::VoteSyncKind | an enum of type VoteSyncKind |
WGMMAScaleInAttr ¶
WGMMA overflow options
Syntax:
#nvvm.wgmma_scale_in<
::mlir::NVVM::WGMMAScaleIn # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMAScaleIn | an enum of type WGMMAScaleIn |
WGMMAScaleOutAttr ¶
WGMMA input predicate
Syntax:
#nvvm.wgmma_scale_out<
::mlir::NVVM::WGMMAScaleOut # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMAScaleOut | an enum of type WGMMAScaleOut |
WGMMATypesAttr ¶
NVVM WGMMA types
Syntax:
#nvvm.wgmma_type<
::mlir::NVVM::WGMMATypes # value
>
Parameters: ¶
Parameter | C++ type | Description |
---|---|---|
value | ::mlir::NVVM::WGMMATypes | an enum of type WGMMATypes |
Enums ¶
CTAGroupKind ¶
NVVM CTA group kind
Cases: ¶
Symbol | Value | String |
---|---|---|
CTA_1 | 0 | cta_1 |
CTA_2 | 1 | cta_2 |
CacheEvictionPriority ¶
NVVM Cache Eviction Priority
Cases: ¶
Symbol | Value | String |
---|---|---|
EvictNormal | 0 | evict_normal |
EvictFirst | 1 | evict_first |
EvictLast | 2 | evict_last |
EvictUnchanged | 3 | evict_unchanged |
NoAllocate | 4 | no_allocate |
ConvertFP6Type ¶
NVVM ConvertFP6Type kind
Cases: ¶
Symbol | Value | String |
---|---|---|
E2M3 | 0 | e2m3 |
E3M2 | 1 | e3m2 |
ConvertFP8Type ¶
NVVM ConvertFP8Type kind
Cases: ¶
Symbol | Value | String |
---|---|---|
E4M3 | 0 | e4m3 |
E5M2 | 1 | e5m2 |
UE8M0 | 2 | ue8m0 |
DotAccumulateType ¶
NVVM DotAccumulateType
Cases: ¶
Symbol | Value | String |
---|---|---|
SIGNED | 1 | signed |
UNSIGNED | 0 | unsigned |
FPRoundingMode ¶
NVVM FPRoundingMode kind
Cases: ¶
Symbol | Value | String |
---|---|---|
NONE | 0 | none |
RN | 1 | rn |
RM | 2 | rm |
RP | 3 | rp |
RZ | 4 | rz |
RNA | 5 | rna |
GridDepActionKind ¶
Action kind for grid dependency control
Cases: ¶
Symbol | Value | String |
---|---|---|
wait | 0 | wait |
launch_dependents | 1 | launch_dependents |
LdStMatrixEltType ¶
Element type for ldmatrix and stmatrix
Cases: ¶
Symbol | Value | String |
---|---|---|
B16 | 0 | b16 |
B8 | 1 | b8 |
B8X16_B6X16_P32 | 2 | b8x16.b6x16_p32 |
B8X16_B4X16_P64 | 3 | b8x16.b4x16_p64 |
LoadCacheModifierKind ¶
NVVM load cache modifier kind
Cases: ¶
Symbol | Value | String |
---|---|---|
CA | 0 | ca |
CG | 1 | cg |
CS | 2 | cs |
LU | 3 | lu |
CV | 4 | cv |
MMAB1Op ¶
MMA binary operations
Cases: ¶
Symbol | Value | String |
---|---|---|
none | 0 | none |
xor_popc | 1 | xor_popc |
and_popc | 2 | and_popc |
MMAFrag ¶
NVVM MMA frag type
Cases: ¶
Symbol | Value | String |
---|---|---|
a | 0 | a |
b | 1 | b |
c | 2 | c |
MMAIntOverflow ¶
MMA overflow options
Cases: ¶
Symbol | Value | String |
---|---|---|
satfinite | 1 | satfinite |
wrapped | 0 | wrapped |
MMALayout ¶
NVVM MMA layout
Cases: ¶
Symbol | Value | String |
---|---|---|
row | 0 | row |
col | 1 | col |
MMATypes ¶
NVVM MMA types
Cases: ¶
Symbol | Value | String |
---|---|---|
f16 | 0 | f16 |
f32 | 1 | f32 |
tf32 | 2 | tf32 |
bf16 | 9 | bf16 |
s8 | 4 | s8 |
u8 | 3 | u8 |
s32 | 5 | s32 |
s4 | 8 | s4 |
u4 | 7 | u4 |
b1 | 6 | b1 |
f64 | 10 | f64 |
MatchSyncKind ¶
NVVM match sync kind
Cases: ¶
Symbol | Value | String |
---|---|---|
any | 0 | any |
all | 1 | all |
MemScopeKind ¶
NVVM Memory Scope kind
Cases: ¶
Symbol | Value | String |
---|---|---|
CTA | 0 | cta |
CLUSTER | 1 | cluster |
GPU | 2 | gpu |
SYS | 3 | sys |
PrefetchCacheLevel ¶
NVVM Prefetch Cache Level
Cases: ¶
Symbol | Value | String |
---|---|---|
L1 | 0 | L1 |
L2 | 1 | L2 |
ProxyKind ¶
Proxy kind
Cases: ¶
Symbol | Value | String |
---|---|---|
alias | 0 | alias |
async | 1 | async |
async_global | 2 | async.global |
async_shared | 3 | async.shared |
TENSORMAP | 4 | tensormap |
GENERIC | 5 | generic |
ReduxKind ¶
NVVM redux kind
Cases: ¶
Symbol | Value | String |
---|---|---|
ADD | 1 | add |
AND | 2 | and |
MAX | 3 | max |
MIN | 4 | min |
OR | 5 | or |
UMAX | 6 | umax |
UMIN | 7 | umin |
XOR | 8 | xor |
FMIN | 9 | fmin |
FMAX | 10 | fmax |
SaturationMode ¶
NVVM SaturationMode kind
Cases: ¶
Symbol | Value | String |
---|---|---|
NONE | 0 | none |
SATFINITE | 1 | satfinite |
SetMaxRegisterAction ¶
NVVM set max register action
Cases: ¶
Symbol | Value | String |
---|---|---|
decrease | 1 | decrease |
increase | 0 | increase |
SharedSpace ¶
Shared memory space
Cases: ¶
Symbol | Value | String |
---|---|---|
shared_cta | 0 | cta |
shared_cluster | 1 | cluster |
ShflKind ¶
NVVM shuffle kind
Cases: ¶
Symbol | Value | String |
---|---|---|
bfly | 0 | bfly |
up | 1 | up |
down | 2 | down |
idx | 3 | idx |
TMALoadMode ¶
NVVM TMA Load Mode
Cases: ¶
Symbol | Value | String |
---|---|---|
TILE | 0 | tile |
IM2COL | 1 | im2col |
IM2COL_W | 2 | im2col_w |
IM2COL_W_128 | 3 | im2col_w_128 |
TILE_GATHER4 | 4 | tile_gather4 |
TMAReduxKind ¶
NVVM TMA redux kind
Cases: ¶
Symbol | Value | String |
---|---|---|
ADD | 0 | add |
MAX | 2 | max |
MIN | 1 | min |
INC | 3 | inc |
DEC | 4 | dec |
AND | 5 | and |
OR | 6 | or |
XOR | 7 | xor |
TMAStoreMode ¶
NVVM TMA Store Mode
Cases: ¶
Symbol | Value | String |
---|---|---|
TILE | 0 | tile |
IM2COL | 1 | im2col |
TILE_SCATTER4 | 2 | tile_scatter4 |
Tcgen05CpMulticast ¶
Tcgen05 cp multicast
Cases: ¶
Symbol | Value | String |
---|---|---|
NONE | 0 | none |
WARPX2_02_13 | 1 | warpx2_02_13 |
WARPX2_01_23 | 2 | warpx2_01_23 |
WARPX4 | 3 | warpx4 |
Tcgen05CpShape ¶
Tcgen05 cp shapes
Cases: ¶
Symbol | Value | String |
---|---|---|
SHAPE_128x256b | 0 | shape_128x256b |
SHAPE_4x256b | 1 | shape_4x256b |
SHAPE_128x128b | 2 | shape_128x128b |
SHAPE_64x128b | 3 | shape_64x128b |
SHAPE_32x128b | 4 | shape_32x128b |
Tcgen05CpSrcFormat ¶
Tcgen05 cp source format
Cases: ¶
Symbol | Value | String |
---|---|---|
B6x16_P32 | 0 | b6x16_p32 |
B4x16_P64 | 1 | b4x16_p64 |
Tcgen05FenceKind ¶
NVVM Tcgen05 fence kind
Cases: ¶
Symbol | Value | String |
---|---|---|
BEFORE_THREAD_SYNC | 0 | before |
AFTER_THREAD_SYNC | 1 | after |
Tcgen05LdStShape ¶
Allowed 32-bit signless integer cases: 0, 1, 2, 3, 4
Cases: ¶
Symbol | Value | String |
---|---|---|
SHAPE_16X64B | 0 | shape_16x64b |
SHAPE_16X128B | 1 | shape_16x128b |
SHAPE_16X256B | 2 | shape_16x256b |
SHAPE_32X32B | 3 | shape_32x32b |
SHAPE_16X32BX2 | 4 | shape_16x32bx2 |
Tcgen05WaitKind ¶
NVVM Tcgen05 wait kind
Cases: ¶
Symbol | Value | String |
---|---|---|
LOAD | 0 | load |
STORE | 1 | store |
VoteSyncKind ¶
NVVM vote sync kind
Cases: ¶
Symbol | Value | String |
---|---|---|
any | 0 | any |
all | 1 | all |
ballot | 2 | ballot |
uni | 3 | uni |
WGMMAScaleIn ¶
WGMMA overflow options
Cases: ¶
Symbol | Value | String |
---|---|---|
one | 1 | one |
neg | -1 | neg |
WGMMAScaleOut ¶
WGMMA input predicate
Cases: ¶
Symbol | Value | String |
---|---|---|
zero | 0 | zero |
one | 1 | one |
WGMMATypes ¶
NVVM WGMMA types
Cases: ¶
Symbol | Value | String |
---|---|---|
f16 | 0 | f16 |
tf32 | 1 | tf32 |
u8 | 2 | u8 |
s8 | 3 | s8 |
b1 | 4 | b1 |
bf16 | 5 | bf16 |
e4m3 | 6 | e4m3 |
e5m2 | 7 | e5m2 |
f32 | 8 | f32 |
s32 | 9 | s32 |