MLIR

Multi-Level IR Compiler Framework

'nvvm' Dialect

Attribute constraint definition 

MMA binary operations 

NVVM MMA frag type 

MMA overflow options 

NVVM MMA layout 

NVVM MMA types 

NVVM shuffle kind 

Attribute definition 

MMAB1OpAttr 

MMA binary operations

Syntax:

!nvvm.mma_b1op<
  ::mlir::NVVM::MMAB1Op   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAB1Opan enum of type MMAB1Op

MMAFragAttr 

NVVM MMA frag type

Syntax:

!nvvm.mma_frag<
  ::mlir::NVVM::MMAFrag   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAFragan enum of type MMAFrag

MMAIntOverflowAttr 

MMA overflow options

Syntax:

!nvvm.mma_int_overflow<
  ::mlir::NVVM::MMAIntOverflow   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMAIntOverflowan enum of type MMAIntOverflow

MMALayoutAttr 

NVVM MMA layout

Syntax:

!nvvm.mma_layout<
  ::mlir::NVVM::MMALayout   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMALayoutan enum of type MMALayout

MMATypesAttr 

NVVM MMA types

Syntax:

!nvvm.mma_type<
  ::mlir::NVVM::MMATypes   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::MMATypesan enum of type MMATypes

ShflKindAttr 

NVVM shuffle kind

Syntax:

!nvvm.shfl_kind<
  ::mlir::NVVM::ShflKind   # value
>

Parameters: 

ParameterC++ typeDescription
value::mlir::NVVM::ShflKindan enum of type ShflKind

Operation definition 

nvvm.barrier0 (::mlir::NVVM::Barrier0Op) 

Syntax:

operation ::= `nvvm.barrier0` attr-dict

nvvm.read.ptx.sreg.ntid.x (::mlir::NVVM::BlockDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.x` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ntid.y (::mlir::NVVM::BlockDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.y` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ntid.z (::mlir::NVVM::BlockDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ntid.z` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.x (::mlir::NVVM::BlockIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.x` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.y (::mlir::NVVM::BlockIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.y` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.ctaid.z (::mlir::NVVM::BlockIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.ctaid.z` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.cp.async.commit.group (::mlir::NVVM::CpAsyncCommitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.commit.group` attr-dict

nvvm.cp.async.shared.global (::mlir::NVVM::CpAsyncOp) 

Syntax:

operation ::= `nvvm.cp.async.shared.global` $dst `,` $src `,` $size attr-dict

Attributes: 

AttributeMLIR TypeDescription
size::mlir::IntegerAttr32-bit signless integer attribute
bypass_l1::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
dstLLVM pointer to 8-bit signless integer
srcLLVM pointer to 8-bit signless integer

nvvm.cp.async.wait.group (::mlir::NVVM::CpAsyncWaitGroupOp) 

Syntax:

operation ::= `nvvm.cp.async.wait.group` $n attr-dict

Attributes: 

AttributeMLIR TypeDescription
n::mlir::IntegerAttr32-bit signless integer attribute

nvvm.read.ptx.sreg.nctaid.x (::mlir::NVVM::GridDimXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.x` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nctaid.y (::mlir::NVVM::GridDimYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.y` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.nctaid.z (::mlir::NVVM::GridDimZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.nctaid.z` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.laneid (::mlir::NVVM::LaneIdOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.laneid` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.ldmatrix (::mlir::NVVM::LdMatrixOp) 

cooperative matrix load

Syntax:

operation ::= `nvvm.ldmatrix` $ptr attr-dict `:` functional-type($ptr, $res)

Attributes: 

AttributeMLIR TypeDescription
num::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttrNVVM MMA layout

Operands: 

OperandDescription
ptrLLVM pointer type

Results: 

ResultDescription
resany type

nvvm.mma.sync (::mlir::NVVM::MmaOp) 

cooperative matrix-multiply and accumulate

The nvvm.mma.sync operation collectively performs the operation D = matmul(A, B) + C using all threads in a warp.

All the threads in the warp must execute the same mma.sync operation.

For each possible multiplicand PTX data type, there are one or more possible instruction shapes given as “mMnNkK”. The below table describes the posssibilities as well as the types required for the operands. Note that the data type for C (the accumulator) and D (the result) can vary independently when there are multiple possibilities in the “C/D Type” column.

When an optional attribute cannot be immediately inferred from the types of the operands and the result during parsing or validation, an error will be raised.

b1Op is only relevant when the binary (b1) type is given to multiplicandDataType. It specifies how the multiply-and-acumulate is performed and is either xor_popc or and_poc. The default is xor_popc.

intOverflowBehavior is only relevant when the multiplicandType attribute is one of u8, s8, u4, s4, this attribute describes how overflow is handled in the accumulator. When the attribute is satfinite, the accumulator values are clamped in the int32 range on overflow. This is the default behavior. Alternatively, accumulator behavior wrapped can also be specified, in which case overflow wraps from one end of the range to the other.

layoutA and layoutB are required and should generally be set to #nvvm.mma_layout<row> and #nvvm.mma_layout<col> respectively, but other combinations are possible for certain layouts according to the table below.

| A/B Type | Shape     | ALayout | BLayout | A Type   | B Type   | C/D Type          |
|----------|-----------|---------|---------|----------|----------|-------------------|
| f64      | .m8n8k4   | row     | col     | 1x f64   | 1x f64   | 2x f64            |
| f16      | .m8n8k4   | row/col | row/col | 2x f16x2 | 2x f16x2 | 4x f16x2 or 8xf32 |
|          | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| bf16     | .m16n8k8  | row     | col     | 2x f16x2 | 1x f16x2 | 2x f16x2 or 4 f32 |
|          | .m16n8k16 | row     | col     | 4x f16x2 | 2x f16x2 | 2x f16x2 or 4 f32 |
| tf32     | .m16n8k4  | row     | col     | 2x i32   | 1x i32   | 4x f32            |
|          | .m16n8k8  | row     | col     | 4x i32   | 2x i32   | 2x f16x2 or 4 f32 |
| u8/s8    | .m8n8k16  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | .m16n8k16 | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | .m16n8k32 | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| u4/s4    | .m8n8k32  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k32  | row     | col     | 2x i32   | 1x i32   | 4x i32            |
|          | m16n8k64  | row     | col     | 4x i32   | 2x i32   | 4x i32            |
| b1       | m8n8k128  | row     | col     | 1x i32   | 1x i32   | 2x i32            |
|          | m16n8k128 | row     | col     | 2x i32   | 1x i32   | 4x i32            |

Example:


%128 = nvvm.mma.sync A[%120, %121, %122, %123]  
                     B[%124, %125]  
                     C[%126, %127]  
                     {layoutA = #nvvm.mma_layout<row>, 
                      layoutB = #nvvm.mma_layout<col>, 
                      shape = {k = 16 : i32, m = 16 : i32, n = 8 : i32}} 
    : (vector<2xf16>, vector<2xf16>, vector<2xf16>)
       -> !llvm.struct<(vector<2xf16>, vector<2xf16>)>

Traits: AttrSizedOperandSegments

Attributes: 

AttributeMLIR TypeDescription
shape::mlir::NVVM::MMAShapeAttrAttribute for MMA operation shape.
b1Op::mlir::NVVM::MMAB1OpAttrMMA binary operations
intOverflowBehavior::mlir::NVVM::MMAIntOverflowAttrMMA overflow options
layoutA::mlir::NVVM::MMALayoutAttrNVVM MMA layout
layoutB::mlir::NVVM::MMALayoutAttrNVVM MMA layout
multiplicandAPtxType::mlir::NVVM::MMATypesAttrNVVM MMA types
multiplicandBPtxType::mlir::NVVM::MMATypesAttrNVVM MMA types

Operands: 

OperandDescription
operandALLVM dialect-compatible type
operandBLLVM dialect-compatible type
operandCLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM structure type

nvvm.shfl.sync (::mlir::NVVM::ShflOp) 

Syntax:

operation ::= `nvvm.shfl.sync` $kind $dst `,` $val `,` $offset `,` $mask_and_clamp  attr-dict
              `:` type($val) `->` type($res)

Attributes: 

AttributeMLIR TypeDescription
kind::mlir::NVVM::ShflKindAttrNVVM shuffle kind
return_value_and_is_valid::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
dst32-bit signless integer
valLLVM dialect-compatible type
offset32-bit signless integer
mask_and_clamp32-bit signless integer

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.x (::mlir::NVVM::ThreadIdXOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.x` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.y (::mlir::NVVM::ThreadIdYOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.y` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.read.ptx.sreg.tid.z (::mlir::NVVM::ThreadIdZOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.tid.z` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.vote.ballot.sync (::mlir::NVVM::VoteBallotOp) 

Operands: 

OperandDescription
maskLLVM dialect-compatible type
predLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM dialect-compatible type

nvvm.wmma.load (::mlir::NVVM::WMMALoadOp) 

Warp synchronous matrix load

Syntax:

operation ::= `nvvm.wmma.load` $ptr `,` $stride attr-dict `:` functional-type($ptr, $res)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttrNVVM MMA layout
eltype::mlir::NVVM::MMATypesAttrNVVM MMA types
frag::mlir::NVVM::MMAFragAttrNVVM MMA frag type

Operands: 

OperandDescription
ptrLLVM pointer type
stride32-bit signless integer

Results: 

ResultDescription
resLLVM structure type

nvvm.wmma.mma (::mlir::NVVM::WMMAMmaOp) 

Warp synchronous matrix-multiply accumulate using tensor cores.

Syntax:

operation ::= `nvvm.wmma.mma` $args attr-dict `:` functional-type($args, $res)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layoutA::mlir::NVVM::MMALayoutAttrNVVM MMA layout
layoutB::mlir::NVVM::MMALayoutAttrNVVM MMA layout
eltypeA::mlir::NVVM::MMATypesAttrNVVM MMA types
eltypeB::mlir::NVVM::MMATypesAttrNVVM MMA types

Operands: 

OperandDescription
argsLLVM dialect-compatible type

Results: 

ResultDescription
resLLVM structure type

nvvm.wmma.store (::mlir::NVVM::WMMAStoreOp) 

Warp synchronous matrix store

Syntax:

operation ::= `nvvm.wmma.store` $ptr `,` $stride `,` $args attr-dict `:` type($ptr) `,` type($args)

Attributes: 

AttributeMLIR TypeDescription
m::mlir::IntegerAttr32-bit signless integer attribute
n::mlir::IntegerAttr32-bit signless integer attribute
k::mlir::IntegerAttr32-bit signless integer attribute
layout::mlir::NVVM::MMALayoutAttrNVVM MMA layout
eltype::mlir::NVVM::MMATypesAttrNVVM MMA types

Operands: 

OperandDescription
ptrLLVM pointer type
argsLLVM dialect-compatible type
stride32-bit signless integer

nvvm.read.ptx.sreg.warpsize (::mlir::NVVM::WarpSizeOp) 

Syntax:

operation ::= `nvvm.read.ptx.sreg.warpsize` attr-dict `:` type($res)

Interfaces: NoSideEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Results: 

ResultDescription
resLLVM dialect-compatible type