mlir.dialects._rocdl_ops_gen¶
Attributes¶
Classes¶
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
Note: In the general case, prefer the conventional |
|
This operation, in conjunction with |
|
Ballot provides a bit mask containing the 1-bit predicate value from each lane. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
An operation with the same expansion as HIP's __synchthreads(); |
|
Available on gfx1200+. |
|
Signal a barrier by id. Available on gfx1250+. |
|
Available on gfx1250+. |
|
Wait on a barrier by id. Available on gfx1200+. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Broadcasts memory load of 8 bits of data for a cluster of workgroups. |
|
Broadcasts memory load of 32 bits of data for a cluster of workgroups. |
|
Broadcasts memory load of 64 bits of data for a cluster of workgroups. |
|
Broadcasts memory load of 128 bits of data for a cluster of workgroups. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Convert 8-bit bf8 value from the |
|
Convert 8-bit fp8 value from the |
|
Convert |
|
Convert |
|
Convert |
|
Convert |
|
Convert two f32 values into a packed vector<2xf16>. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Available on gfx1250+. |
|
Convert a bf8 byte from |
|
Convert a fp8 byte from |
|
Convert a bf8 byte from |
|
Convert a fp8 byte from |
|
Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of |
|
Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of |
|
Convert 32 packed f16 values to packed bf6, dividing by the exponent part of |
|
Convert 32 packed bf6 values to packed bf16, multiplying by the exponent part of |
|
Convert 32 packed fp6 values to packed bf16, multiplying by the exponent part of |
|
Convert 32 packed bf6 values to packed f16, multiplying by the exponent part of |
|
Convert 32 packed fp6 values to packed f16, multiplying by the exponent part of |
|
Convert 32 packed bf6 values to packed f32, multiplying by the exponent part of |
|
Convert 32 packed fp6 values to packed f32, multiplying by the exponent part of |
|
Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of |
|
Convert 32 packed f16 values to packed fp6, dividing by the exponent part of |
|
Convert two bf16 values in |
|
Convert two f16 values in |
|
Convert two f32 values in |
|
Convert two packed bf8 values in |
|
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer |
|
Convert two packed fp8 values in |
|
Convert two packed bf8 values in |
|
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer |
|
Convert two packed fp8 values in |
|
Convert two packed bf8 values in |
|
Convert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer |
|
Convert two packed fp8 values in |
|
Convert two packed bf16 values to packed |
|
Convert two packed f16 values to packed |
|
Convert two single-precision float values, passed in |
|
Convert two bf16 values in |
|
Convert two f16 values in |
|
Convert two f32 values in |
|
Convert a bf16 value in |
|
Convert a f16 value in |
|
Convert a f32 value in |
|
Convert a bf16 value in |
|
Convert a f16 value in |
|
Convert a f32 value in |
|
Convert 8 packed bf16 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed bf8, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp4, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp8, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed bf6, multiplying by the exponent part of |
|
Convert 8 packed bf16 values to packed fp6, multiplying by the exponent part of |
|
Convert 8 packed f16 values to packed fp6, multiplying by the exponent part of |
|
Convert 8 packed f32 values to packed fp6, multiplying by the exponent part of |
|
Convert 32 packed bf16 values to packed bf6, dividing by the exponent part of |
|
Convert 32 packed f16 values to packed bf6, dividing by the exponent part of |
|
Convert 32 packed f32 values to packed bf6, dividing by the exponent part of |
|
Convert 32 packed bf16 values to packed fp6, dividing by the exponent part of |
|
Convert 32 packed f16 values to packed fp6, dividing by the exponent part of |
|
Convert 32 packed f32 values to packed fp6, dividing by the exponent part of |
|
Convert two packed bf16 values to packed |
|
Convert two packed f16 values to packed |
|
Convert two packed f32 values to packed |
|
Convert 32 single-precision float values, packed into two length-16 |
|
Convert 32 single-precision float values, packed into two length-16 |
|
Convert |
|
Convert |
|
Waits on a given DS barrier and decrements pending count by -1. |
|
Waits on a given DS barrier and decrements its pending count by a given value. Note, the barrier state |
|
Perform a backward permute (pull) operation across lanes using DS/LDS permute hardware. |
|
Load a matrix of 4-bit data from the ds memory, |
|
Load a matrix of 6-bit data from the ds memory, |
|
Load a matrix of 8-bit data from the ds memory, |
|
Load a matrix of 16-bit data from the ds memory, |
|
Perform a data-sharing swizzle operation within a wavefront. |
|
Computes the median of three floating-point values using the AMDGPU fmed3 intrinsic. |
|
Prefetches 1 byte of data per lane using flat-memory addresses into the WGP-cache or L2-cache. |
|
Available on gfx1200+. |
|
Available on gfx1250+. |
|
This operation works identically to |
|
Asynchronously loads 8 bits of data from a global memory pointer |
|
Asynchronously loads 32 bits of data from a global memory pointer |
|
Asynchronously loads 64 bits of data from a global memory pointer |
|
Asynchronously loads 128 bits of data from a global memory pointer |
|
Load a matrix of 4-bit data from the global memory, |
|
Load a matrix of 6-bit data from the global memory, |
|
Load a matrix of 8-bit data from the global memory, |
|
Load a matrix of 16-bit data from the global memory, |
|
Prefetches 1 byte of data per lane from global memory into the WGP-cache or L2-cache. |
|
Instruction-group-level parallelism optimization hint. |
|
Load |
|
Masked bit count of threads below the current lane in a wavefront. |
|
Masked bit count of threads below the current lane in a wavefront. |
|
Performs a |
|
Performs a |
|
Performs a |
|
Load from a buffer resource |
|
Returns the value in the lowest active lane of the input operand. |
|
Get the value in lane |
|
Insert a workgroup barrier without memory fences. |
|
Insert a number of NOP cycles. |
|
Sleep for a number of clock cycles. |
|
Wait for outstanding memory operations to complete, as specified by a |
|
Insert a scheduling barrier with the given mask. The mask is a |
|
Insert a scheduling group barrier. |
|
Set the wavefront scheduling priority. |
|
Moves tiles of tensor data between global memory and LDS. The tile is |
|
Moves tiles of tensor data between global memory and LDS. The tile is |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Wait for the counter specified to be less-than or equal-to the |
|
This operation, along with |
|
Wait for the counter specified to be less-than or equal-to the |
|
Wait for the counter specified to be less-than or equal-to the |
|
Wait for the counter specified to be less-than or equal-to the |
|
Wait for the counter specified to be less-than or equal-to the |
|
Wait for the counter specified to be less-than or equal-to the |
|
Wakes up waves associated with a given named barrier. Note, This op does not release waves waiting |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Read a hardware register for thread/workgroup/cluster identification. |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Matrix fused multiply-add (MFMA) intrinsic. Computes |
|
Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. |
|
Scaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Sparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 |
|
Wave Matrix Multiply-Accumulate (WMMA) with output operand selection. |
|
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with different C and D types. |
|
Wave Matrix Multiply-Accumulate (WMMA) with output operand selection. |
|
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) intrinsic. |
|
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls. |
|
Wave Matrix Multiply-Accumulate (WMMA) for integer types with |
|
Wave Matrix Multiply-Accumulate (WMMA) for integer types with |
|
Wave Matrix Multiply-Accumulate (WMMA) for integer types with |
|
Wave Matrix Multiply-Accumulate (WMMA) for integer types with |
|
Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling. |
|
Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs. |
|
Scaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling. |
|
Scaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs. |
|
Functions¶
Module Contents¶
- mlir.dialects._rocdl_ops_gen._ods_ir¶
- class mlir.dialects._rocdl_ops_gen._Dialect(descriptor: object)¶
Bases:
_ods_ir- DIALECT_NAMESPACE = 'rocdl'¶
- class mlir.dialects._rocdl_ops_gen.ROCDLCos(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.cos %a f32 -> f32
- OPERATION_NAME = 'rocdl.cos'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLCosAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLCosAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cos'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cos(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExp(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.exp %a f32 -> f32
- OPERATION_NAME = 'rocdl.exp'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.exp'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.exp(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExp2(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.exp2 %a f32 -> f32
- OPERATION_NAME = 'rocdl.exp2'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExp2Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLExp2Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.exp2'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.exp2(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLLog(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.log %a f32 -> f32
- OPERATION_NAME = 'rocdl.log'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLLogAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLLogAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.log'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.log(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRcp(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.rcp %a f32 -> f32
- OPERATION_NAME = 'rocdl.rcp'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRcpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRcpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.rcp'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.rcp(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRsq(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.rsq %a f32 -> f32
- OPERATION_NAME = 'rocdl.rsq'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRsqAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLRsqAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.rsq'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.rsq(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSin(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.sin %a f32 -> f32
- OPERATION_NAME = 'rocdl.sin'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSinAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSinAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.sin'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.sin(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSqrt(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.sqrt %a f32 -> f32
- OPERATION_NAME = 'rocdl.sqrt'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSqrtAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLSqrtAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.sqrt'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.sqrt(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLTanh(res, arg, *, loc=None, ip=None)¶
Bases:
_ods_irNote: In the general case, prefer the conventional
arith,math, orllvmops over this. Use this ROCDL-specific operation only when you fully understand its implication and when it is strictly necessary. This op is usually chosen when a small loss in precision is acceptable in exchange for higher execution speed.Example:
%0 = rocdl.tanh %a f32 -> f32
- OPERATION_NAME = 'rocdl.tanh'¶
- _ODS_REGIONS = (0, True)¶
- arg() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ROCDLTanhAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ROCDLTanhAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.tanh'¶
- arg() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.tanh(res, arg, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.AsyncmarkOp(*, loc=None, ip=None)¶
Bases:
_ods_irThis operation, in conjunction with
rocdl.wait.asyncmark, forms the compiler-provided framework for tracking explicitly asynchronous memory operations, such as copies to LDS that use async intrinsics and gfx1250’s tensor loads.Details of its behavior can be found in the LLVM documentation on async tracking.
See
rocdl.wait.asyncmark’s documentation for a usage example.Example:
// Mark the end of an async operation group. rocdl.asyncmark
Available on gfx9 and later.
- OPERATION_NAME = 'rocdl.asyncmark'¶
- _ODS_REGIONS = (0, True)¶
- class mlir.dialects._rocdl_ops_gen.AsyncmarkOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.AsyncmarkOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.asyncmark'¶
- mlir.dialects._rocdl_ops_gen.asyncmark(*, loc=None, ip=None) AsyncmarkOp¶
- class mlir.dialects._rocdl_ops_gen.BallotOp(res, pred, *, loc=None, ip=None)¶
Bases:
_ods_irBallot provides a bit mask containing the 1-bit predicate value from each lane. The nth bit of the result contains the 1 bit contributed by the nth warp lane.
Example:
// Ballot across thread group. %0 = rocdl.ballot %pred : i64
- OPERATION_NAME = 'rocdl.ballot'¶
- _ODS_REGIONS = (0, True)¶
- pred() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BallotOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BallotOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ballot'¶
- pred() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.ballot(res, pred, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierInitOp(ptr, memberCnt, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
Example:
// Initialize a named barrier with member count. rocdl.s.barrier.init %ptr member_cnt = 1 : !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.s.barrier.init'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- memberCnt() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierInitOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierInitOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.init'¶
- ptr() _ods_ir¶
- memberCnt() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_init(ptr, member_cnt, *, loc=None, ip=None) BarrierInitOp¶
- class mlir.dialects._rocdl_ops_gen.BarrierJoinOp(ptr, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
Example:
// Join a named barrier. rocdl.s.barrier.join %ptr : !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.s.barrier.join'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierJoinOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierJoinOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.join'¶
- ptr() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_join(ptr, *, loc=None, ip=None) BarrierJoinOp¶
- class mlir.dialects._rocdl_ops_gen.BarrierLeaveOp(id, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
Example:
// Leave a named barrier by id. rocdl.s.barrier.leave id = 1
- OPERATION_NAME = 'rocdl.s.barrier.leave'¶
- _ODS_REGIONS = (0, True)¶
- id() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierLeaveOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierLeaveOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.leave'¶
- id() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_leave(id, *, loc=None, ip=None) BarrierLeaveOp¶
- class mlir.dialects._rocdl_ops_gen.BarrierOp(*, loc=None, ip=None)¶
Bases:
_ods_irAn operation with the same expansion as HIP’s __synchthreads();
DEPRECATION NOTICE: Use
gpu.barrier, which will expand to these operations, instead.Example:
// Workgroup barrier with acquire/release fences. rocdl.barrier
- OPERATION_NAME = 'rocdl.barrier'¶
- _ODS_REGIONS = (0, True)¶
- class mlir.dialects._rocdl_ops_gen.BarrierOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.barrier'¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalIsfirstOp(res, id, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1200+.
Example:
// Signal barrier and check if this wave is first to arrive. %0 = rocdl.s.barrier.signal.isfirst id = 1 -> i1
- OPERATION_NAME = 'rocdl.s.barrier.signal.isfirst'¶
- _ODS_REGIONS = (0, True)¶
- id() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalIsfirstOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalIsfirstOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.signal.isfirst'¶
- id() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_signal_isfirst(res, id, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalOp(id, *, loc=None, ip=None)¶
Bases:
_ods_irSignal a barrier by id. Available on gfx1250+.
Example:
// Signal barrier with id -1 (all barriers). rocdl.s.barrier.signal id = -1
- OPERATION_NAME = 'rocdl.s.barrier.signal'¶
- _ODS_REGIONS = (0, True)¶
- id() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.signal'¶
- id() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_signal(id, *, loc=None, ip=None) BarrierSignalOp¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalVarOp(ptr, memberCnt, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
Example:
// Signal a named barrier with variable ID. rocdl.s.barrier.signal.var %ptr member_cnt = 1 : !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.s.barrier.signal.var'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- memberCnt() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalVarOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierSignalVarOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.signal.var'¶
- ptr() _ods_ir¶
- memberCnt() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_signal_var(ptr, member_cnt, *, loc=None, ip=None) BarrierSignalVarOp¶
- class mlir.dialects._rocdl_ops_gen.BarrierWaitOp(id, *, loc=None, ip=None)¶
Bases:
_ods_irWait on a barrier by id. Available on gfx1200+.
Example:
// Wait on barrier with id -1 (all barriers). rocdl.s.barrier.wait id = -1
- OPERATION_NAME = 'rocdl.s.barrier.wait'¶
- _ODS_REGIONS = (0, True)¶
- id() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BarrierWaitOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BarrierWaitOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier.wait'¶
- id() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_barrier_wait(id, *, loc=None, ip=None) BarrierWaitOp¶
- class mlir.dialects._rocdl_ops_gen.BlockDimXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockDimXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockDimXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_dim_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockDimYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockDimYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockDimYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_dim_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockDimZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockDimZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockDimZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.dim.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_dim_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workgroup.id.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockIdXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.id.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_id_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workgroup.id.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockIdYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.id.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_id_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workgroup.id.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.BlockIdZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.BlockIdZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workgroup.id.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workgroup_id_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.id.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.id.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_id_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.id.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.id.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_id_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.id.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterIdZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.id.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_id_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB8Op(globalPtr, ldsPtr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irBroadcasts memory load of 8 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 8-bit load to LDS. rocdl.cluster.load.async.to.lds.b8 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b8'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b8'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_load_async_to_lds_b8(global_ptr, lds_ptr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) ClusterLoadAsyncToLDSB8Op¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB32Op(globalPtr, ldsPtr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irBroadcasts memory load of 32 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 32-bit load to LDS. rocdl.cluster.load.async.to.lds.b32 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b32'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b32'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_load_async_to_lds_b32(global_ptr, lds_ptr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) ClusterLoadAsyncToLDSB32Op¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB64Op(globalPtr, ldsPtr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irBroadcasts memory load of 64 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 64-bit load to LDS. rocdl.cluster.load.async.to.lds.b64 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b64'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB64OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB64OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b64'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_load_async_to_lds_b64(global_ptr, lds_ptr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) ClusterLoadAsyncToLDSB64Op¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB128Op(globalPtr, ldsPtr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irBroadcasts memory load of 128 bits of data for a cluster of workgroups.
Available on gfx1250+.
Example:
// Cluster broadcast 128-bit load to LDS. rocdl.cluster.load.async.to.lds.b128 %src, %dst, 0, 0, %mask : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b128'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB128OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterLoadAsyncToLDSB128OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.load.async.to.lds.b128'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- mask() _ods_ir[_ods_ir]¶
- offset() _ods_ir¶
- cpol() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_load_async_to_lds_b128(global_ptr, lds_ptr, offset, cpol, mask, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) ClusterLoadAsyncToLDSB128Op¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.workgroup.id.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.workgroup.id.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_workgroup_id_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.workgroup.id.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.workgroup.id.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_workgroup_id_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.cluster.workgroup.id.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ClusterWorkgroupIdZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cluster.workgroup.id.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.cluster_workgroup_id_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Bf8Op(res, srcA, byteSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8-bit bf8 value from the
byteSel``th bit of ``srcAto fp32.Example:
// Convert bf8 byte 0 to f32. %0 = rocdl.cvt.f32.bf8 %src[0] : f32
- OPERATION_NAME = 'rocdl.cvt.f32.bf8'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.f32.bf8'¶
- srcA() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_f32_bf8(res, src_a, byte_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Fp8Op(res, srcA, byteSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8-bit fp8 value from the
byteSel``th bit of ``srcAto fp32.Example:
// Convert fp8 byte 0 to f32. %0 = rocdl.cvt.f32.fp8 %src[0] : f32
- OPERATION_NAME = 'rocdl.cvt.f32.fp8'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtF32Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.f32.fp8'¶
- srcA() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_f32_fp8(res, src_a, byte_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkBf8F32Op(res, srcA, srcB, old, wordSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcAandsrcBto bf8 and store into the low/high word ofold, preserving the other word.Example:
// Pack two f32 values into bf8 in the low word of old. %0 = rocdl.cvt.pk.bf8.f32 %a, %b -> %old[false] : i32
- OPERATION_NAME = 'rocdl.cvt.pk.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkBf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkBf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.pk.bf8.f32'¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_pk_bf8_f32(res, src_a, src_b, old, word_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Bf8Op(res, src, wordSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcbased on $wordSel to packed fp32.Example:
// Unpack bf8 word to packed f32. %0 = rocdl.cvt.pk.f32.bf8 %src[false] : vector<2xf32>
- OPERATION_NAME = 'rocdl.cvt.pk.f32.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.pk.f32.bf8'¶
- src() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_pk_f32_bf8(res, src, word_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Fp8Op(res, src, wordSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcbased on $wordSel to packed fp32.Example:
// Unpack fp8 word to packed f32. %0 = rocdl.cvt.pk.f32.fp8 %src[false] : vector<2xf32>
- OPERATION_NAME = 'rocdl.cvt.pk.f32.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkF32Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.pk.f32.fp8'¶
- src() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_pk_f32_fp8(res, src, word_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkFp8F32Op(res, srcA, srcB, old, wordSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcAandsrcBto fp8 and store into the low/high word ofold, preserving the other word.Example:
// Pack two f32 values into fp8 in the low word of old. %0 = rocdl.cvt.pk.fp8.f32 %a, %b -> %old[false] : i32
- OPERATION_NAME = 'rocdl.cvt.pk.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkFp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkFp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.pk.fp8.f32'¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- wordSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_pk_fp8_f32(res, src_a, src_b, old, word_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkRtz(res, srcA, srcB, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two f32 values into a packed vector<2xf16>.
Example:
// Pack two f32 values into a vector<2xf16> with round-to-zero. %0 = rocdl.cvt.pkrtz %a, %b : vector<2xf16>
- OPERATION_NAME = 'rocdl.cvt.pkrtz'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkRtzAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkRtzAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.pkrtz'¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_pkrtz(res, src_a, src_b, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Bf8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_bf16_bf8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp4Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_bf16_fp4(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8Bf16Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.bf16.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_bf16_fp8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Bf8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f16_bf8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp4Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f16_fp4(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F16Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f16.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f16_fp8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Bf8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f32_bf8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp4Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f32_fp4(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp8Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk8F32Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk8.f32.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk8_f32_fp8(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Bf6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.bf16.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.bf16.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_bf16_bf6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Fp6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.bf16.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16Bf16Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.bf16.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_bf16_fp6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Bf6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f16.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f16.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_f16_bf6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Fp6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f16.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F16Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f16.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_f16_fp6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Bf6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f32.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f32.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_f32_bf6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Fp6Op(res, src, scale, scaleSel, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f32.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtPkScalePk16F32Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scale.pk16.f32.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- scaleSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scale_pk16_f32_fp6(res, src, scale, scale_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Bf8Op(res, oldVdst, src, scale, srcSelIndex, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a bf8 byte from
src, selected bysrcSelIndex, to f16 while multiplying it by the expontent ofscale, and place it into thedstLoHiSel``th bit of ``oldVdstpreserving the other element of that vector in the return value.The bytes are stored as an
i32and not a<4 x i8>.- OPERATION_NAME = 'rocdl.cvt.scalef32.f16.bf8'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.f16.bf8'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_f16_bf8(res, old_vdst, src, scale, src_sel_index, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Fp8Op(res, oldVdst, src, scale, srcSelIndex, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a fp8 byte from
src, selected bysrcSelIndex, to f16 while multiplying it by the expontent ofscale, and place it into thedstLoHiSel``th bit of ``oldVdstpreserving the other element of that vector in the return value.The bytes are stored as an
i32and not a<4 x i8>.- OPERATION_NAME = 'rocdl.cvt.scalef32.f16.fp8'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F16Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.f16.fp8'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_f16_fp8(res, old_vdst, src, scale, src_sel_index, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Bf8Op(res, src, scale, srcSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a bf8 byte from
src, selected bysrcSelIndex, to f32, multiplying it by the exponent ofscale.The bytes are stored in an
i32, not a<4 x i8>.- OPERATION_NAME = 'rocdl.cvt.scalef32.f32.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.f32.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_f32_bf8(res, src, scale, src_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Fp8Op(res, src, scale, srcSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a fp8 byte from
src, selected bysrcSelIndex, to f32, multiplying it by the exponent ofscale.The bytes are stored in an
i32, not a<4 x i8>.- OPERATION_NAME = 'rocdl.cvt.scalef32.f32.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32F32Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.f32.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_f32_fp8(res, src, scale, src_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed bf8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_bf8_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed bf8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_bf8_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F32Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed bf8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Bf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.bf8.f32'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_bf8_f32(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp4, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp4_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp4, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp4_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F32Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp4, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp4F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp4.f32'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp4_f32(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp8_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp8_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F32Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp8, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk8Fp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk8.fp8.f32'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk8_fp8_f32(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed bf6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_bf6_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed bf6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_bf6_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F32Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed bf6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Bf6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.bf6.f32'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_bf6_f32(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_fp6_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_fp6_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F32Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp6, multiplying by the exponent part of
scalebefore doing so. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk16Fp6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk16.fp6.f32'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk16_fp6_f32(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf16 values to packed bf6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_bf6_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f16 values to packed bf6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf6.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_bf6_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Bf6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf6 values to packed bf16, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf16.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf16.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_bf16_bf6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Fp6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed fp6 values to packed bf16, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf16.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Bf16Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.bf16.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_bf16_fp6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Bf6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf6 values to packed f16, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f16.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f16.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_f16_bf6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Fp6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed fp6 values to packed f16, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f16.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F16Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f16.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_f16_fp6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Bf6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf6 values to packed f32, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f32.bf6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Bf6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Bf6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f32.bf6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_f32_bf6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Fp6Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed fp6 values to packed f32, multiplying by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f32.fp6'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Fp6OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32F32Fp6OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.f32.fp6'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_f32_fp6(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6Bf16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf16 values to packed fp6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.fp6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.fp6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_fp6_bf16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6F16Op(res, src, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f16 values to packed fp6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.fp6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32Pk32Fp6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk32.fp6.f16'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk32_fp6_f16(res, src, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8Bf16Op(res, oldVdst, src0, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two bf16 values in
src0to two bf8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf8_bf16(res, old_vdst, src0, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F16Op(res, oldVdst, src0, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two f16 values in
src0to two bf8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf8_f16(res, old_vdst, src0, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F32Op(res, oldVdst, src0, src1, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two f32 values in
src0andsrc1to two bf8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf8.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf8_f32(res, old_vdst, src0, src1, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Bf8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed bf8 values in
src0to two bf16 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf16_bf8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp4Op(res, src, scale, srcSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed bf16, multiplying by the exponent part of
scalebefore doing so.The byte to convert is chosen by
srcSelIndex.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf16_fp4(res, src, scale, src_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp8 values in
src0to two bf16 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkBf16Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.bf16.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_bf16_fp8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Bf8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed bf8 values in
src0to two f16 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f16_bf8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp4Op(res, src, scale, srcSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed f16, multiplying by the exponent part of
scalebefore doing so.The byte to convert is chosen by
srcSelIndex.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f16_fp4(res, src, scale, src_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp8 values in
src0to two f16 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF16Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f16.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f16_fp8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Bf8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed bf8 values in
src0to two f32 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.bf8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Bf8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Bf8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.bf8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f32_bf8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp4Op(res, src, scale, srcSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp4 (f4E2M1) values stored as one byte of a 32-bit integer to packed f32, multiplying by the exponent part of
scalebefore doing so.The byte to convert is chosen by
srcSelIndex.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.fp4'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp4OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp4OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.fp4'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f32_fp4(res, src, scale, src_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp8Op(res, src, scale, srcLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed fp8 values in
src0to two f32 values, multiplying by the exponent inscale. The two values to be converted are selected from the low or high half ofsrc(a packed vector represented as ani32) on the basis ofsrcLoHiSel.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.fp8'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkF32Fp8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.f32.fp8'¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- srcLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_f32_fp8(res, src, scale, src_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4Bf16Op(res, oldVdst, src, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed bf16 values to packed fp4, dividing by the exponent part of
scalebefore doing so.The two scaled values are packed into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp4_bf16(res, old_vdst, src, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F16Op(res, oldVdst, src, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed f16 values to packed fp4, dividing by the exponent part of
scalebefore doing so.The two scaled values are packed into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp4_f16(res, old_vdst, src, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F32Op(res, oldVdst, src0, src1, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two single-precision float values, passed in
src0andsrc1into two fp4 values, dividing them by the expontent part ofscalebefore doing so.The two scaled values are packed into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.Example:
// Scaled convert two f32 values to packed fp4 in byte 0 of old. %0 = rocdl.cvt.scalef32.pk.fp4.f32 %a, %b, %scale -> %old[0] : i32
- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp4F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp4.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp4_f32(res, old_vdst, src0, src1, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8Bf16Op(res, oldVdst, src0, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two bf16 values in
src0to two fp8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp8_bf16(res, old_vdst, src0, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F16Op(res, oldVdst, src0, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two f16 values in
src0to two fp8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp8_f16(res, old_vdst, src0, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F32Op(res, oldVdst, src0, src1, scale, dstLoHiSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two f32 values in
src0andsrc1to two fp8 bytes, dividing by the exponent inscale. The bytes are packed into a 16-bit value which is inserted intooldVdstat thedstLoHiSelposition, with the entire updated vector being returned.- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32PkFp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.pk.fp8.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstLoHiSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_pk_fp8_f32(res, old_vdst, src0, src1, scale, dst_lo_hi_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8BF16Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a bf16 value in
src0to a bf8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8BF16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8BF16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_bf8_bf16(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F16Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a f16 value in
src0to a bf8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_bf8_f16(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F32Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a f32 value in
src0to a bf8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrBf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.bf8.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_bf8_f32(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8BF16Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a bf16 value in
src0to a fp8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8BF16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8BF16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_fp8_bf16(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F16Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a f16 value in
src0to a fp8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_fp8_f16(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F32Op(res, oldVdst, src0, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert a f32 value in
src0to a fp8 bytes, dividing by the exponent inscaleand usingseedfor stochiastic rounding. Place the resulting byte in thedstSelIndex``th bit of ``oldVdstand return the entire packed vector, which is stored as ani32.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrFp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.fp8.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src0() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_fp8_f32(res, old_vdst, src0, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed bf8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_bf8_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed bf8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_bf8_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed bf8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Bf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.bf8.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_bf8_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp4, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp4_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp4, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp4_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp4, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp4F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp4.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp4_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp8_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp8_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp8, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk8Fp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk8.fp8.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk8_fp8_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed bf6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_bf6_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed bf6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_bf6_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed bf6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Bf6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.bf6.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_bf6_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed bf16 values to packed fp6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_fp6_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f16 values to packed fp6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_fp6_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 8 packed f32 values to packed fp6, multiplying by the exponent part of
scalebefore doing so and apply stochastic rounding. This op is for gfx1250+ arch.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk16Fp6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk16.fp6.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk16_fp6_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf16 values to packed bf6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_bf6_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f16 values to packed bf6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_bf6_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f32 values to packed bf6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Bf6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.bf6.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_bf6_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6Bf16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed bf16 values to packed fp6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.bf16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.bf16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_fp6_bf16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F16Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f16 values to packed fp6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.f16'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.f16'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_fp6_f16(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F32Op(res, src, seed, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 packed f32 values to packed fp6, dividing by the exponent part of
scalebefore doing so and applying random rounding derived fromseed.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPk32Fp6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk32.fp6.f32'¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk32_fp6_f32(res, src, seed, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4Bf16Op(res, oldVdst, src, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed bf16 values to packed fp4, dividing by the exponent part of
scalebefore doing so and usingseedas the random seed for stochiastic rounding.The two scaled values are packed (little-endian) into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.bf16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4Bf16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4Bf16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.bf16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk_fp4_bf16(res, old_vdst, src, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F16Op(res, oldVdst, src, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed f16 values to packed fp4, dividing by the exponent part of
scalebefore doing so and usingseedas the random seed for stochiastic rounding.The two scaled values are packed (little-endian) into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.f16'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.f16'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk_fp4_f16(res, old_vdst, src, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F32Op(res, oldVdst, src, seed, scale, dstSelIndex, *, loc=None, ip=None)¶
Bases:
_ods_irConvert two packed f32 values to packed fp4, dividing by the exponent part of
scalebefore doing so and usingseedas the random seed for stochiastic rounding.The two scaled values are packed (little-endian) into a byte. That byte is used to update the
dstSelIndex``th byte of ``oldVdst, which is returned in its entirity.- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.f32'¶
- _ODS_REGIONS = (0, True)¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF32SrPkFp4F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.sr.pk.fp4.f32'¶
- oldVdst() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- seed() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- dstSelIndex() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_sr_pk_fp4_f32(res, old_vdst, src, seed, scale, dst_sel_index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Bf6F32Op(res, src0, src1, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 single-precision float values, packed into two length-16 vectors that will be logically concanenated, to packed bf6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.2xpk16.bf6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Bf6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Bf6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.2xpk16.bf6.f32'¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_2xpk16_bf6_f32(res, src0, src1, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Fp6F32Op(res, src0, src1, scale, *, loc=None, ip=None)¶
Bases:
_ods_irConvert 32 single-precision float values, packed into two length-16 vectors that will be logically concanenated, to packed fp6, dividing by the exponent part of
scalebefore doing so.- OPERATION_NAME = 'rocdl.cvt.scalef32.2xpk16.fp6.f32'¶
- _ODS_REGIONS = (0, True)¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Fp6F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtScaleF322xPk16Fp6F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.scalef32.2xpk16.fp6.f32'¶
- src0() _ods_ir[_ods_ir]¶
- src1() _ods_ir[_ods_ir]¶
- scale() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.cvt_scalef32_2xpk16_fp6_f32(res, src0, src1, scale, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtSrBf8F32Op(res, srcA, srcB, old, byteSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcAto bf8, adding the rounding factor fromsrcB, and store into thebyteSel``th byte of ``old, preserving the others.Example:
// Stochastic rounding convert f32 to bf8 in byte 2 of old. %0 = rocdl.cvt.sr.bf8.f32 %val, %stoch -> %old[2] : i32
- OPERATION_NAME = 'rocdl.cvt.sr.bf8.f32'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtSrBf8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtSrBf8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.sr.bf8.f32'¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_sr_bf8_f32(res, src_a, src_b, old, byte_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtSrFp8F32Op(res, srcA, srcB, old, byteSel, *, loc=None, ip=None)¶
Bases:
_ods_irConvert
srcAto fp8, adding the rounding factor fromsrcB, and store into thebyteSel``th byte of ``old, preserving the others.Example:
// Stochastic rounding convert f32 to fp8 in byte 3 of old. %0 = rocdl.cvt.sr.fp8.f32 %val, %stoch -> %old[3] : i32
- OPERATION_NAME = 'rocdl.cvt.sr.fp8.f32'¶
- _ODS_REGIONS = (0, True)¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.CvtSrFp8F32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.CvtSrFp8F32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.cvt.sr.fp8.f32'¶
- srcA() _ods_ir[_ods_ir]¶
- srcB() _ods_ir[_ods_ir]¶
- old() _ods_ir[_ods_ir]¶
- byteSel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.cvt_sr_fp8_f32(res, src_a, src_b, old, byte_sel, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DPPUpdateOp(res, old, src, dppCtrl, rowMask, bankMask, boundCtrl, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.update.dpp'¶
- _ODS_REGIONS = (0, True)¶
- old() _ods_ir¶
- src() _ods_ir¶
- dppCtrl() _ods_ir¶
- rowMask() _ods_ir¶
- bankMask() _ods_ir¶
- boundCtrl() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DPPUpdateOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DPPUpdateOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.update.dpp'¶
- old() _ods_ir¶
- src() _ods_ir¶
- dppCtrl() _ods_ir¶
- rowMask() _ods_ir¶
- bankMask() _ods_ir¶
- boundCtrl() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.update_dpp(res, old, src, dpp_ctrl, row_mask, bank_mask, bound_ctrl, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicAsyncBarrierArriveOp(barrierPtr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irWaits on a given DS barrier and decrements pending count by -1. Stays in order with ASYNC loads to LDS, and uses ASYNCcnt to track its completion. Available on gfx1250+.
Example:
// Async atomic barrier arrive (fire-and-forget). rocdl.ds.atomic.async.barrier.arrive.b64 %ptr : !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.ds.atomic.async.barrier.arrive.b64'¶
- _ODS_REGIONS = (0, True)¶
- barrierPtr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicAsyncBarrierArriveOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicAsyncBarrierArriveOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.atomic.async.barrier.arrive.b64'¶
- barrierPtr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_atomic_async_barrier_arrive_b64(barrier_ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) DsAtomicAsyncBarrierArriveOp¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicBarrierArriveRtnOp(res, barrierPtr, val, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irWaits on a given DS barrier and decrements its pending count by a given value. Note, the barrier state is given as a 64-bit structure containing pending count, phase and init count. The op returns the old barrier state. The op is executed as an ordinary LDS operations and it is ordered with other LDS operations. Thus, check DSCNT to determine when this instruction has executed. Available on gfx1250+.
Example:
// Atomic barrier arrive with return of old barrier state. %res = rocdl.ds.atomic.barrier.arrive.rtn.b64 %ptr, %val : !llvm.ptr<3>, i64 -> i64
- OPERATION_NAME = 'rocdl.ds.atomic.barrier.arrive.rtn.b64'¶
- _ODS_REGIONS = (0, True)¶
- barrierPtr() _ods_ir¶
- val() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicBarrierArriveRtnOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsAtomicBarrierArriveRtnOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.atomic.barrier.arrive.rtn.b64'¶
- barrierPtr() _ods_ir¶
- val() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_atomic_barrier_arrive_rtn_b64(res, barrier_ptr, val, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsBpermuteOp(res, index, src, *, loc=None, ip=None)¶
Bases:
_ods_irPerform a backward permute (pull) operation across lanes using DS/LDS permute hardware.
Each lane reads the value of
srcfrom the lane whose byte address is given byindex(i.e. lane id =index / 4).This is “backward” (pull) in contrast to
ds_permute_b32, which is “forward” (push/scatter).Example:
// Backward permute across lanes (pull from selected lane). %0 = rocdl.ds_bpermute %index, %src : (i32, i32) -> i32
- OPERATION_NAME = 'rocdl.ds_bpermute'¶
- _ODS_REGIONS = (0, True)¶
- index() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.DsBpermuteOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsBpermuteOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds_bpermute'¶
- index() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.ds_bpermute(res, index, src, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr4_B64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 4-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.ds.load.tr4.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr4_B64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr4_B64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.load.tr4.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_load_tr4_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr6_B96(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 6-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.ds.load.tr6.b96'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr6_B96Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr6_B96Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.load.tr6.b96'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_load_tr6_b96(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr8_B64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 8-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.ds.load.tr8.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr8_B64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr8_B64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.load.tr8.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_load_tr8_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr16_B128(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 16-bit data from the ds memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.ds.load.tr16.b128'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr16_B128Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsLoadTr16_B128Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.load.tr16.b128'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_load_tr16_b128(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.DsSwizzleOp(res, src, offset, *, loc=None, ip=None)¶
Bases:
_ods_irPerform a data-sharing swizzle operation within a wavefront.
The
offsetoperand encodes the swizzle pattern that will be placed in the instruction’soffsetfield (i.e., the pattern used byds_swizzle_b32). See https://llvm.org/docs/AMDGPUModifierSyntax.html#swizzle-pattern for how this 16-bit pattern is constructed.Example:
// Swizzle data within a wavefront. %0 = rocdl.ds_swizzle %src, %offset : (i32, i32) -> i32
- OPERATION_NAME = 'rocdl.ds_swizzle'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.DsSwizzleOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.DsSwizzleOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds_swizzle'¶
- src() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.ds_swizzle(res, src, offset, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.FMed3Op(res, src0, src1, src2, *, loc=None, ip=None)¶
Bases:
_ods_irComputes the median of three floating-point values using the AMDGPU fmed3 intrinsic. This operation is equivalent to
max(min(a, b), min(max(a, b), c))but uses the hardware-accelerated V_MED3_F16/V_MED3_F32 instruction for better performance.The operation supports both scalar and vector floating-point types (f16, f32).
Example:
// Scalar f32 median %result = rocdl.fmed3 %a, %b, %c : f32 // Vector f16 median %result = rocdl.fmed3 %va, %vb, %vc : vector<4xf16>
- OPERATION_NAME = 'rocdl.fmed3'¶
- _ODS_REGIONS = (0, True)¶
- src0() _ods_ir¶
- src1() _ods_ir¶
- src2() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.FMed3OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.FMed3OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.fmed3'¶
- src0() _ods_ir¶
- src1() _ods_ir¶
- src2() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.fmed3(res, src0, src1, src2, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.FlatPrefetchOp(ptr, scope, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irPrefetches 1 byte of data per lane using flat-memory addresses into the WGP-cache or L2-cache. Available on gfx1250+.
Example:
// Prefetch from flat memory into cache. rocdl.flat.prefetch %ptr, scope 0 : !llvm.ptr
- OPERATION_NAME = 'rocdl.flat.prefetch'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- scope() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.FlatPrefetchOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.FlatPrefetchOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.flat.prefetch'¶
- ptr() _ods_ir¶
- scope() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.flat_prefetch(ptr, scope, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) FlatPrefetchOp¶
- class mlir.dialects._rocdl_ops_gen.GetBarrierStateOp(res, id, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1200+.
Example:
// Query barrier state by id. %0 = rocdl.s.get.barrier.state id = 1 -> i32
- OPERATION_NAME = 'rocdl.s.get.barrier.state'¶
- _ODS_REGIONS = (0, True)¶
- id() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.GetBarrierStateOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GetBarrierStateOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.get.barrier.state'¶
- id() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_get_barrier_state(res, id, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GetNamedBarrierStateOp(res, ptr, *, loc=None, ip=None)¶
Bases:
_ods_irAvailable on gfx1250+.
Example:
// Query named barrier state by pointer. %0 = rocdl.s.get.named.barrier.state %ptr : !llvm.ptr<3> -> i32
- OPERATION_NAME = 'rocdl.s.get.named.barrier.state'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.GetNamedBarrierStateOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GetNamedBarrierStateOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.get.named.barrier.state'¶
- ptr() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_get_named_barrier_state(res, ptr, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncLDSOp(globalPtr, ldsPtr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation works identically to
rocdl.load.async.to.ldsexcept that the global pointer argument is limited to pointers in address space 1 (pure global pointers) instead of also allowing fat buffer pointers.Available on gfx9 and gfx10.
For the operation introduced in gfx1250, see
rocdl.global.load.async.to.lds.bN.Example:
// Async load from global pointer to LDS (address space 1 only). rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.global.load.async.lds'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.async.lds'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_async_lds(global_ptr, lds_ptr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadAsyncLDSOp¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB8Op(globalPtr, ldsPtr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irAsynchronously loads 8 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 8-bit load from global to LDS. rocdl.global.load.async.to.lds.b8 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b8'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b8'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_async_to_lds_b8(global_ptr, lds_ptr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadAsyncToLDSB8Op¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB32Op(globalPtr, ldsPtr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irAsynchronously loads 32 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 32-bit load from global to LDS. rocdl.global.load.async.to.lds.b32 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b32'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b32'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_async_to_lds_b32(global_ptr, lds_ptr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadAsyncToLDSB32Op¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB64Op(globalPtr, ldsPtr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irAsynchronously loads 64 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 64-bit load from global to LDS. rocdl.global.load.async.to.lds.b64 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b64'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB64OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB64OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b64'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_async_to_lds_b64(global_ptr, lds_ptr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadAsyncToLDSB64Op¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB128Op(globalPtr, ldsPtr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irAsynchronously loads 128 bits of data from a global memory pointer to a Local Data Share (LDS) pointer.
Available on gfx1250+.
Example:
// Async 128-bit load from global to LDS. rocdl.global.load.async.to.lds.b128 %src, %dst, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b128'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB128OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadAsyncToLDSB128OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.async.to.lds.b128'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_async_to_lds_b128(global_ptr, lds_ptr, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadAsyncToLDSB128Op¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadLDSOp(globalPtr, ldsPtr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.lds'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.lds'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_lds(global_ptr, lds_ptr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalLoadLDSOp¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr4_B64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 4-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.global.load.tr4.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr4_B64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr4_B64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.tr4.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_tr4_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr6_B96(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 6-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 96-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.global.load.tr6.b96'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr6_B96Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr6_B96Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.tr6.b96'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_tr6_b96(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 8-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 64-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.global.load.tr.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.tr.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_tr_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B128(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad a matrix of 16-bit data from the global memory, transpose data between row-major and column-major order, and store the result into a 128-bit vector register.
Available in gfx1250+.
Example (concrete mnemonics depend on address space and element size):
// 64-bit transpose load from global memory. %0 = rocdl.global.load.tr4.b64 %ptr : !llvm.ptr<1> -> vector<2xi32> // 128-bit transpose load from global memory with f16 result. %1 = rocdl.global.load.tr.b128 %ptr : !llvm.ptr<1> -> vector<8xf16> // 64-bit transpose load from LDS. %2 = rocdl.ds.load.tr4.b64 %ptr : !llvm.ptr<3> -> vector<2xi32> // 128-bit transpose load from LDS with bf16 result. %3 = rocdl.ds.load.tr16.b128 %ptr : !llvm.ptr<3> -> vector<8xbf16>
- OPERATION_NAME = 'rocdl.global.load.tr.b128'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B128Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalLoadTr8_B128Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.load.tr.b128'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_load_tr_b128(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GlobalPrefetchOp(ptr, scope, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irPrefetches 1 byte of data per lane from global memory into the WGP-cache or L2-cache. Available on gfx1250+.
Example:
// Prefetch from global memory into cache. rocdl.global.prefetch %ptr, scope 0 : !llvm.ptr<1>
- OPERATION_NAME = 'rocdl.global.prefetch'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- scope() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.GlobalPrefetchOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GlobalPrefetchOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.global.prefetch'¶
- ptr() _ods_ir¶
- scope() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.global_prefetch(ptr, scope, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) GlobalPrefetchOp¶
- class mlir.dialects._rocdl_ops_gen.GridDimXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GridDimXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GridDimXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.grid_dim_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GridDimYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GridDimYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GridDimYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.grid_dim_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GridDimZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.GridDimZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.GridDimZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.grid.dim.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.grid_dim_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.IglpOpt(variant, *, loc=None, ip=None)¶
Bases:
_ods_irInstruction-group-level parallelism optimization hint.
Example:
// IGLP optimization hint variant 0. rocdl.iglp.opt 0
- OPERATION_NAME = 'rocdl.iglp.opt'¶
- _ODS_REGIONS = (0, True)¶
- variant() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.IglpOptAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.IglpOptAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.iglp.opt'¶
- variant() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.LoadAsyncToLDSOp(globalPtr, ldsPtr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad
sizebytes (the valid sizes vary by architecture) from the global memory pointed to byglobalPtrand put them atldsPtr, concantenating (and applying padding for sizes less than 4 bytes, along with padding out 12-byte reads to 16-byte writes). The value ofglobalPtrcan vary between lanes, whilesharedPtrmust be subgroup-uniform (the values from each lane are concatentated before being written to LDS with appropriate padding applied.)offsetis a constant offset applied to both pointers, andauxsets the cache policy. Unlikerocdl.load.to.lds, the compiler will not automatically inserts waits for this load to complete at the point it thinks you’re using a region of LDS you’ve stored values to - you need to use therocdl.asyncmarkandrocdl.wait.asyncmarkoperations to explicitly group these operations and wait for their completion.Available on gfx10 and earlier with varying suppported values of
size.Example:
// Async load 4 bytes from global pointer to LDS. rocdl.load.async.to.lds %global, %shared, 4, 0, 0 : !llvm.ptr<1>, !llvm.ptr<3> // Async load 4 bytes from fat buffer pointer to LDS. rocdl.load.async.to.lds %fatBuffer, %shared, 4, 0, 0 : !llvm.ptr<7>, !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.load.async.to.lds'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.LoadAsyncToLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.LoadAsyncToLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.load.async.to.lds'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.load_async_to_lds(global_ptr, lds_ptr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) LoadAsyncToLDSOp¶
- class mlir.dialects._rocdl_ops_gen.LoadToLDSOp(globalPtr, ldsPtr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.load.to.lds'¶
- _ODS_REGIONS = (0, True)¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.LoadToLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.LoadToLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.load.to.lds'¶
- globalPtr() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir¶
- offset() _ods_ir¶
- aux() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.load_to_lds(global_ptr, lds_ptr, size, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) LoadToLDSOp¶
- class mlir.dialects._rocdl_ops_gen.MakeBufferRsrcOp(res, base, stride, numRecords, flags, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.make.buffer.rsrc'¶
- _ODS_REGIONS = (0, True)¶
- base() _ods_ir¶
- stride() _ods_ir[_ods_ir]¶
- numRecords() _ods_ir[_ods_ir]¶
- flags() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.MakeBufferRsrcOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.MakeBufferRsrcOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.make.buffer.rsrc'¶
- base() _ods_ir¶
- stride() _ods_ir[_ods_ir]¶
- numRecords() _ods_ir[_ods_ir]¶
- flags() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.make_buffer_rsrc(res, base, stride, num_records, flags, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.MbcntHiOp(res, in0, in1, *, arg_attrs=None, res_attrs=None, loc=None, ip=None)¶
Bases:
_ods_irMasked bit count of threads below the current lane in a wavefront.
in0is a 32-bit mask that is AND-ed with the relevant half of the execution mask and the bits below the current lane;in1is added to the resulting popcount:lo:
in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1))hi:
in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))
To obtain a unique thread index within a wave64, chain the two ops with
in0 = -1(all bits set):Example:
%all_ones = arith.constant -1 : i32 %zero = arith.constant 0 : i32 // Count active threads below this lane in the low 32 lanes. %lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32 // Add the count from the high 32 lanes to get the full lane index. %hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32
- OPERATION_NAME = 'rocdl.mbcnt.hi'¶
- _ODS_REGIONS = (0, True)¶
- in0() _ods_ir[_ods_ir]¶
- in1() _ods_ir[_ods_ir]¶
- arg_attrs() _ods_ir | None¶
- res_attrs() _ods_ir | None¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.MbcntHiOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.MbcntHiOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mbcnt.hi'¶
- in0() _ods_ir[_ods_ir]¶
- in1() _ods_ir[_ods_ir]¶
- arg_attrs() _ods_ir | None¶
- res_attrs() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.mbcnt_hi(res, in0, in1, *, arg_attrs=None, res_attrs=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.MbcntLoOp(res, in0, in1, *, arg_attrs=None, res_attrs=None, loc=None, ip=None)¶
Bases:
_ods_irMasked bit count of threads below the current lane in a wavefront.
in0is a 32-bit mask that is AND-ed with the relevant half of the execution mask and the bits below the current lane;in1is added to the resulting popcount:lo:
in1 + popcount(in0 & exec_lo & ((1 << min(lane_id, 32)) - 1))hi:
in1 + popcount(in0 & exec_hi & ((1 << saturating_usub(lane_id, 32)) - 1))
To obtain a unique thread index within a wave64, chain the two ops with
in0 = -1(all bits set):Example:
%all_ones = arith.constant -1 : i32 %zero = arith.constant 0 : i32 // Count active threads below this lane in the low 32 lanes. %lo = rocdl.mbcnt.lo %all_ones, %zero : (i32, i32) -> i32 // Add the count from the high 32 lanes to get the full lane index. %hi = rocdl.mbcnt.hi %all_ones, %lo : (i32, i32) -> i32
- OPERATION_NAME = 'rocdl.mbcnt.lo'¶
- _ODS_REGIONS = (0, True)¶
- in0() _ods_ir[_ods_ir]¶
- in1() _ods_ir[_ods_ir]¶
- arg_attrs() _ods_ir | None¶
- res_attrs() _ods_ir | None¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.MbcntLoOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.MbcntLoOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mbcnt.lo'¶
- in0() _ods_ir[_ods_ir]¶
- in1() _ods_ir[_ods_ir]¶
- arg_attrs() _ods_ir | None¶
- res_attrs() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.mbcnt_lo(res, in0, in1, *, arg_attrs=None, res_attrs=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.Permlane16SwapOp(res, old, src, fi, boundControl, *, loc=None, ip=None)¶
Bases:
_ods_irPerforms a
permlane16.swapoperation with the given operands, applying the permutation specified by $fi to the provided inputs.Example:
// Swap lanes between groups of 16 threads. %res = rocdl.permlane16.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>
- OPERATION_NAME = 'rocdl.permlane16.swap'¶
- _ODS_REGIONS = (0, True)¶
- old() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.Permlane16SwapOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.Permlane16SwapOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.permlane16.swap'¶
- old() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.permlane16_swap(res, old, src, fi, bound_control, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.Permlane32SwapOp(res, old, src, fi, boundControl, *, loc=None, ip=None)¶
Bases:
_ods_irPerforms a
permlane32.swapoperation with the given operands, applying the permutation specified by $fi to the provided inputs.Example:
// Swap lanes between groups of 32 threads. %res = rocdl.permlane32.swap %src, %src, 0, -1 : (i32, i32) -> !llvm.struct<(i32, i32)>
- OPERATION_NAME = 'rocdl.permlane32.swap'¶
- _ODS_REGIONS = (0, True)¶
- old() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.Permlane32SwapOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.Permlane32SwapOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.permlane32.swap'¶
- old() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir]¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.permlane32_swap(res, old, src, fi, bound_control, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.PermlaneX16Op(res, old, src0, src1, src2, fi, boundControl, *, loc=None, ip=None)¶
Bases:
_ods_irPerforms a
permlanex16operation with the given operands, applying the permutation specified by $fi to the provided inputs.Example:
// Scalar permlanex16. %ret0 = rocdl.permlanex16 %src0, %src0, %sel, %sel, 0, -1 : f32, i32 // Vector permlanex16. %ret1 = rocdl.permlanex16 %src1, %src1, %sel, %sel, 0, -1 : vector<2xf32>, i32
- OPERATION_NAME = 'rocdl.permlanex16'¶
- _ODS_REGIONS = (0, True)¶
- old() _ods_ir¶
- src0() _ods_ir¶
- src1() _ods_ir¶
- src2() _ods_ir¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.PermlaneX16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.PermlaneX16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.permlanex16'¶
- old() _ods_ir¶
- src0() _ods_ir¶
- src1() _ods_ir¶
- src2() _ods_ir¶
- fi() _ods_ir¶
- boundControl() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.permlanex16(res, old, src0, src1, src2, fi, bound_control, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicCmpSwap(res, src, cmp, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.cmpswap'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir¶
- cmp() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicCmpSwapAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicCmpSwapAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.cmpswap'¶
- src() _ods_ir¶
- cmp() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_atomic_cmpswap(res, src, cmp, rsrc, offset, soffset, aux, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFAddOp(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.fadd'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFAddOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFAddOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.fadd'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_atomic_fadd(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None) RawBufferAtomicFAddOp¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFMaxOp(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.fmax'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFMaxOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicFMaxOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.fmax'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_atomic_fmax(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None) RawBufferAtomicFMaxOp¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicSMaxOp(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.smax'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicSMaxOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicSMaxOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.smax'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_atomic_smax(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None) RawBufferAtomicSMaxOp¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicUMinOp(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.umin'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicUMinOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferAtomicUMinOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.atomic.umin'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_atomic_umin(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None) RawBufferAtomicUMinOp¶
- class mlir.dialects._rocdl_ops_gen.RawBufferLoadOp(res, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.load'¶
- _ODS_REGIONS = (0, True)¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferLoadOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferLoadOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.load'¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_load(res, rsrc, offset, soffset, aux, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferStoreOp(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.store'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawBufferStoreOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawBufferStoreOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.buffer.store'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir¶
- soffset() _ods_ir¶
- aux() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.raw_buffer_store(vdata, rsrc, offset, soffset, aux, *, loc=None, ip=None) RawBufferStoreOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicCmpSwap(res, src, cmp, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.cmpswap'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir¶
- cmp() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicCmpSwapAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicCmpSwapAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.cmpswap'¶
- src() _ods_ir¶
- cmp() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_atomic_cmpswap(res, src, cmp, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFaddOp(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.fadd'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFaddOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFaddOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.fadd'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_atomic_fadd(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferAtomicFaddOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFmaxOp(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.fmax'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFmaxOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicFmaxOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.fmax'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_atomic_fmax(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferAtomicFmaxOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicSmaxOp(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.smax'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicSmaxOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicSmaxOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.smax'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_atomic_smax(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferAtomicSmaxOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicUminOp(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.umin'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicUminOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferAtomicUminOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.atomic.umin'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_atomic_umin(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferAtomicUminOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadAsyncLdsOp(rsrc, ldsPtr, size, voffset, soffset, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irLoad from a buffer resource
rsrctoldsPtr, which must be uniform.See
rocdl.load.async.to.ldsfor overall semantics of such loads, noting that herevoffsetcan be lane-varying and thatrsrc(which holds the base addres) must, as always, be uniform.Available on gfx9 and gfx10.
Example:
// Async buffer load to LDS via buffer resource pointer. rocdl.raw.ptr.buffer.load.async.lds %rsrc, %ldsPtr, %size, %voffset, %soffset, %offset, %aux
- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load.async.lds'¶
- _ODS_REGIONS = (0, True)¶
- rsrc() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir[_ods_ir]¶
- voffset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadAsyncLdsOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadAsyncLdsOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load.async.lds'¶
- rsrc() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir[_ods_ir]¶
- voffset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_load_async_lds(rsrc, lds_ptr, size, voffset, soffset, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferLoadAsyncLdsOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadLdsOp(rsrc, ldsPtr, size, voffset, soffset, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load.lds'¶
- _ODS_REGIONS = (0, True)¶
- rsrc() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir[_ods_ir]¶
- voffset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadLdsOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadLdsOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load.lds'¶
- rsrc() _ods_ir¶
- ldsPtr() _ods_ir¶
- size() _ods_ir[_ods_ir]¶
- voffset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- offset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_load_lds(rsrc, lds_ptr, size, voffset, soffset, offset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferLoadLdsOp¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadOp(res, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load'¶
- _ODS_REGIONS = (0, True)¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferLoadOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.load'¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_load(res, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferStoreOp(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.store'¶
- _ODS_REGIONS = (0, True)¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferStoreOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.RawPtrBufferStoreOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.raw.ptr.buffer.store'¶
- vdata() _ods_ir¶
- rsrc() _ods_ir¶
- offset() _ods_ir[_ods_ir]¶
- soffset() _ods_ir[_ods_ir]¶
- aux() _ods_ir[_ods_ir]¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.raw_ptr_buffer_store(vdata, rsrc, offset, soffset, aux, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) RawPtrBufferStoreOp¶
- class mlir.dialects._rocdl_ops_gen.ReadfirstlaneOp(res, src, *, loc=None, ip=None)¶
Bases:
_ods_irReturns the value in the lowest active lane of the input operand.
Example:
// Scalar readfirstlane. %0 = rocdl.readfirstlane %src0 : f32 // Vector readfirstlane. %1 = rocdl.readfirstlane %src1 : vector<2xf32>
- OPERATION_NAME = 'rocdl.readfirstlane'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ReadfirstlaneOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ReadfirstlaneOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.readfirstlane'¶
- src() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.readfirstlane(res, src, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ReadlaneOp(res, src0, src1, *, loc=None, ip=None)¶
Bases:
_ods_irGet the value in lane
src1from inputsrc0.Example:
// Scalar readlane. %0 = rocdl.readlane %src0, %idx : (f32, i32) -> f32 // Vector readlane. %1 = rocdl.readlane %src1, %idx : (vector<2xf32>, i32) -> vector<2xf32>
- OPERATION_NAME = 'rocdl.readlane'¶
- _ODS_REGIONS = (0, True)¶
- src0() _ods_ir¶
- src1() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ReadlaneOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ReadlaneOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.readlane'¶
- src0() _ods_ir¶
- src1() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.readlane(res, src0, src1, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SBarrierOp(*, loc=None, ip=None)¶
Bases:
_ods_irInsert a workgroup barrier without memory fences.
Available on gfx9 and later but deprecated on gfx12+; see
rocdl.s.barrier.signalandrocdl.s.barrier.waitinstead.Example:
// Synchronize threads within a workgroup. rocdl.s.barrier
- OPERATION_NAME = 'rocdl.s.barrier'¶
- _ODS_REGIONS = (0, True)¶
- class mlir.dialects._rocdl_ops_gen.SBarrierOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SBarrierOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.barrier'¶
- mlir.dialects._rocdl_ops_gen.s_barrier(*, loc=None, ip=None) SBarrierOp¶
- class mlir.dialects._rocdl_ops_gen.SNopOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irInsert a number of NOP cycles.
Example:
// Insert a no-op. rocdl.s.nop 0
- OPERATION_NAME = 'rocdl.s.nop'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SNopOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SNopOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.nop'¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SSleepOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irSleep for a number of clock cycles.
Example:
// Sleep for a minimum duration. rocdl.s.sleep 0
- OPERATION_NAME = 'rocdl.s.sleep'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SSleepOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SSleepOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.sleep'¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SWaitcntOp(bitfield, *, loc=None, ip=None)¶
Bases:
_ods_irWait for outstanding memory operations to complete, as specified by a bitfield whose semantics depend on the target chipset.
Example:
// Wait for all counters to reach zero. rocdl.s.waitcnt 0
- OPERATION_NAME = 'rocdl.s.waitcnt'¶
- _ODS_REGIONS = (0, True)¶
- bitfield() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SWaitcntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SWaitcntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.waitcnt'¶
- bitfield() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_waitcnt(bitfield, *, loc=None, ip=None) SWaitcntOp¶
- class mlir.dialects._rocdl_ops_gen.SchedBarrier(mask, *, loc=None, ip=None)¶
Bases:
_ods_irInsert a scheduling barrier with the given mask. The mask is a bitfield that controls which instruction types may be scheduled across the barrier (e.g.
0x0000= no instructions may cross,0x0001= ALU only,0x0010= all VMEM, etc.). See https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L349 for the full list of mask values.Example:
// Scheduling barrier with mask 0. rocdl.sched.barrier 0
- OPERATION_NAME = 'rocdl.sched.barrier'¶
- _ODS_REGIONS = (0, True)¶
- mask() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SchedBarrierAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SchedBarrierAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.sched.barrier'¶
- mask() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.sched_barrier(mask, *, loc=None, ip=None) SchedBarrier¶
- class mlir.dialects._rocdl_ops_gen.SchedGroupBarrier(mask, size, groupId, *, loc=None, ip=None)¶
Bases:
_ods_irInsert a scheduling group barrier.
Example:
// Schedule group barrier with mask, size, and group id. rocdl.sched.group.barrier 8, 1, 0
- OPERATION_NAME = 'rocdl.sched.group.barrier'¶
- _ODS_REGIONS = (0, True)¶
- mask() _ods_ir¶
- size() _ods_ir¶
- groupId() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SchedGroupBarrierAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SchedGroupBarrierAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.sched.group.barrier'¶
- mask() _ods_ir¶
- size() _ods_ir¶
- groupId() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.sched_group_barrier(mask, size, group_id, *, loc=None, ip=None) SchedGroupBarrier¶
- class mlir.dialects._rocdl_ops_gen.SetPrioOp(priority, *, loc=None, ip=None)¶
Bases:
_ods_irSet the wavefront scheduling priority.
Example:
// Set priority to 0. rocdl.s.setprio 0
- OPERATION_NAME = 'rocdl.s.setprio'¶
- _ODS_REGIONS = (0, True)¶
- priority() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.SetPrioOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.SetPrioOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.setprio'¶
- priority() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.TensorLoadToLDSOp(dgroup0, dgroup1, dgroup2, dgroup3, dgroup4, cachePolicy, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irMoves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.
This op is for gfx1250+ architectures.
Example:
// Tensor load from global memory to LDS using 4 descriptor groups. rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32> // Tensor store from LDS to global memory using 4 descriptor groups. rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
- OPERATION_NAME = 'rocdl.tensor.load.to.lds'¶
- _ODS_REGIONS = (0, True)¶
- dgroup0() _ods_ir[_ods_ir]¶
- dgroup1() _ods_ir[_ods_ir]¶
- dgroup2() _ods_ir[_ods_ir]¶
- dgroup3() _ods_ir[_ods_ir]¶
- dgroup4() _ods_ir[_ods_ir]¶
- cachePolicy() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.TensorLoadToLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.TensorLoadToLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.tensor.load.to.lds'¶
- dgroup0() _ods_ir[_ods_ir]¶
- dgroup1() _ods_ir[_ods_ir]¶
- dgroup2() _ods_ir[_ods_ir]¶
- dgroup3() _ods_ir[_ods_ir]¶
- dgroup4() _ods_ir[_ods_ir]¶
- cachePolicy() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.tensor_load_to_lds(dgroup0, dgroup1, dgroup2, dgroup3, dgroup4, cache_policy, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) TensorLoadToLDSOp¶
- class mlir.dialects._rocdl_ops_gen.TensorStoreFromLDSOp(dgroup0, dgroup1, dgroup2, dgroup3, dgroup4, cachePolicy, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_irMoves tiles of tensor data between global memory and LDS. The tile is described by the $dgroup descriptors. 5 $dgroup descriptors allows for movement of up to 5D tensors. $cachePolicy describes the memory scope and an indicator of expected data re-use.
This op is for gfx1250+ architectures.
Example:
// Tensor load from global memory to LDS using 4 descriptor groups. rocdl.tensor.load.to.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32> // Tensor store from LDS to global memory using 4 descriptor groups. rocdl.tensor.store.from.lds %dg0, %dg1, %dg2, %dg3 cachepolicy 0 : vector<4xi32>, vector<8xi32>
- OPERATION_NAME = 'rocdl.tensor.store.from.lds'¶
- _ODS_REGIONS = (0, True)¶
- dgroup0() _ods_ir[_ods_ir]¶
- dgroup1() _ods_ir[_ods_ir]¶
- dgroup2() _ods_ir[_ods_ir]¶
- dgroup3() _ods_ir[_ods_ir]¶
- dgroup4() _ods_ir[_ods_ir]¶
- cachePolicy() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- class mlir.dialects._rocdl_ops_gen.TensorStoreFromLDSOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.TensorStoreFromLDSOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.tensor.store.from.lds'¶
- dgroup0() _ods_ir[_ods_ir]¶
- dgroup1() _ods_ir[_ods_ir]¶
- dgroup2() _ods_ir[_ods_ir]¶
- dgroup3() _ods_ir[_ods_ir]¶
- dgroup4() _ods_ir[_ods_ir]¶
- cachePolicy() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.tensor_store_from_lds(dgroup0, dgroup1, dgroup2, dgroup3, dgroup4, cache_policy, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) TensorStoreFromLDSOp¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdXOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workitem.id.x'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdXOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdXOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workitem.id.x'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workitem_id_x(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdYOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workitem.id.y'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdYOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdYOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workitem.id.y'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workitem_id_y(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdZOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.workitem.id.z'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdZOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ThreadIdZOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.workitem.id.z'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.workitem_id_z(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitAsynccntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx1250+.
Example:
// Wait for async counter to drain. rocdl.s.wait.asynccnt 0
- OPERATION_NAME = 'rocdl.s.wait.asynccnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitAsynccntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitAsynccntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.asynccnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_asynccnt(count, *, loc=None, ip=None) WaitAsynccntOp¶
- class mlir.dialects._rocdl_ops_gen.WaitAsyncmarkOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irThis operation, along with
rocdl.asyncmark, forms the compiler-provided framework for explicitly tracking asynchronous operations.At the point where a wait.asyncmark operation is executed, all async operations that were parts of any async group (established by asyncmark in program order) other than the
countpreviously-added ones will have finished executing.For more detail, including on how this mechanism composes with function calls, see the LLVM documentation on async tracking.
Available on gfx9 and later.
Example:
// Wait until at most N async groups remain outstanding. rocdl.wait.asyncmark 1
Usage example:
rocdl.tensor.load.to.lds ... rocdl.global.async.load.to.lds ... rocdl.asyncmark rocdl.tensor.load.to.lds ... rocdl.global.async.load.to.lds ... rocdl.asyncmark rocdl.wait.asyncmark 1 // First group of loads completes after this
- OPERATION_NAME = 'rocdl.wait.asyncmark'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitAsyncmarkOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitAsyncmarkOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wait.asyncmark'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wait_asyncmark(count, *, loc=None, ip=None) WaitAsyncmarkOp¶
- class mlir.dialects._rocdl_ops_gen.WaitDscntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx12+.
Example:
// Wait for data-sharing counter to drain. rocdl.s.wait.dscnt 0
- OPERATION_NAME = 'rocdl.s.wait.dscnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitDscntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitDscntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.dscnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_dscnt(count, *, loc=None, ip=None) WaitDscntOp¶
- class mlir.dialects._rocdl_ops_gen.WaitExpcntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx12+.
Example:
// Wait for export counter to drain. rocdl.s.wait.expcnt 0
- OPERATION_NAME = 'rocdl.s.wait.expcnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitExpcntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitExpcntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.expcnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_expcnt(count, *, loc=None, ip=None) WaitExpcntOp¶
- class mlir.dialects._rocdl_ops_gen.WaitLoadcntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx12+.
Example:
// Wait for load counter to drain. rocdl.s.wait.loadcnt 0
- OPERATION_NAME = 'rocdl.s.wait.loadcnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitLoadcntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitLoadcntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.loadcnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_loadcnt(count, *, loc=None, ip=None) WaitLoadcntOp¶
- class mlir.dialects._rocdl_ops_gen.WaitStorecntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx12+.
Example:
// Wait for store counter to drain. rocdl.s.wait.storecnt 0
- OPERATION_NAME = 'rocdl.s.wait.storecnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitStorecntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitStorecntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.storecnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_storecnt(count, *, loc=None, ip=None) WaitStorecntOp¶
- class mlir.dialects._rocdl_ops_gen.WaitTensorcntOp(count, *, loc=None, ip=None)¶
Bases:
_ods_irWait for the counter specified to be less-than or equal-to the
countbefore continuing.Available on gfx1250+.
Example:
// Wait for tensor counter to drain. rocdl.s.wait.tensorcnt 0
- OPERATION_NAME = 'rocdl.s.wait.tensorcnt'¶
- _ODS_REGIONS = (0, True)¶
- count() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaitTensorcntOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaitTensorcntOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wait.tensorcnt'¶
- count() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wait_tensorcnt(count, *, loc=None, ip=None) WaitTensorcntOp¶
- class mlir.dialects._rocdl_ops_gen.WakeupBarrierOp(ptr, *, loc=None, ip=None)¶
Bases:
_ods_irWakes up waves associated with a given named barrier. Note, This op does not release waves waiting at the barrier. It just signal other waves in the same work-group waiting on the indicated named barrier to wake up. Available on gfx1250+.
Example:
// Wake up waves waiting on a named barrier. rocdl.s.wakeup.barrier %ptr : !llvm.ptr<3>
- OPERATION_NAME = 'rocdl.s.wakeup.barrier'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WakeupBarrierOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WakeupBarrierOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.s.wakeup.barrier'¶
- ptr() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.s_wakeup_barrier(ptr, *, loc=None, ip=None) WakeupBarrierOp¶
- class mlir.dialects._rocdl_ops_gen.WaveId(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.wave.id'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WaveIdAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WaveIdAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wave.id'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.wave_id(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WavefrontSizeOp(res, *, range=None, loc=None, ip=None)¶
Bases:
_ods_irRead a hardware register for thread/workgroup/cluster identification. An optional
rangeattribute can constrain the returned value.Example:
// Read the workitem id in the x dimension. %0 = rocdl.workitem.id.x : i32 // Read with a known range constraint. %1 = rocdl.workitem.id.x range <i32, 0, 64> : i32
- OPERATION_NAME = 'rocdl.wavefrontsize'¶
- _ODS_REGIONS = (0, True)¶
- range() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.WavefrontSizeOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.WavefrontSizeOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wavefrontsize'¶
- range() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.wavefrontsize(res, *, range=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr4_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr4.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr4_b64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr4_b64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr4.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_read_tr4_b64_(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr6_b96(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr6.b96'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr6_b96Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr6_b96Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr6.b96'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_read_tr6_b96_(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr8_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr8.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr8_b64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr8_b64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr8.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_read_tr8_b64_(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr16_b64(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr16.b64'¶
- _ODS_REGIONS = (0, True)¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr16_b64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.ds_read_tr16_b64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.ds.read.tr16.b64'¶
- ptr() _ods_ir¶
- alias_scopes() _ods_ir | None¶
- noalias_scopes() _ods_ir | None¶
- tbaa() _ods_ir | None¶
- mlir.dialects._rocdl_ops_gen.ds_read_tr16_b64_(res, ptr, *, alias_scopes=None, noalias_scopes=None, tbaa=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x1f32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.4x4x1f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x1f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x1f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.4x4x1f32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x1f32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x2bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.4x4x2bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x2bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x2bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.4x4x2bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x2bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4bf16_1k(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.4x4x4bf16.1k'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4bf16_1kAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4bf16_1kAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.4x4x4bf16.1k'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4bf16_1k_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.4x4x4f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.4x4x4f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_4x4x4f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x1f32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x1f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x1f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x1f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x1f32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x1f32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x2bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x2bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x2bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x2bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x2bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x2bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4bf16_1k(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4bf16.1k'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4bf16_1kAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4bf16_1kAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4bf16.1k'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4bf16_1k_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x4f32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x4f32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8_xf32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x8.xf32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8_xf32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8_xf32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x8.xf32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8_xf32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x8bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x8bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x8bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16bf16_1k(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x16bf16.1k'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16bf16_1kAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16bf16_1kAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x16bf16.1k'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16bf16_1k_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x16f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x16f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x16f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_bf8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_bf8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_fp8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf8_fp8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_bf8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_bf8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_fp8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.16x16x32.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_16x16x32_fp8_fp8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x1f32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x1f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x1f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x1f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x1f32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x1f32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x2bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x2bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2f32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x2f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x2f32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x2f32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4_xf32(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4.xf32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4_xf32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4_xf32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4.xf32'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4_xf32_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16_1k(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4bf16.1k'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16_1kAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16_1kAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4bf16.1k'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4bf16_1k_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x4f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x4f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8bf16_1k(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x8bf16.1k'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8bf16_1kAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8bf16_1kAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x8bf16.1k'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8bf16_1k_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x8f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x8f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x8f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_bf8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_bf8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_fp8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf8_fp8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_bf16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_f16(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_f16_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_bf8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_bf8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_fp8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f32.32x32x16.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f32_32x32x16_fp8_fp8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_4x4x4f64(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f64.4x4x4f64'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_4x4x4f64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_4x4x4f64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f64.4x4x4f64'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f64_4x4x4f64_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_16x16x4f64(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.f64.16x16x4f64'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_16x16x4f64Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_f64_16x16x4f64Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.f64.16x16x4f64'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_f64_16x16x4f64_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_4x4x4i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.4x4x4i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_4x4x4i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_4x4x4i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.4x4x4i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_4x4x4i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x4i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.16x16x4i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x4i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x4i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.16x16x4i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x4i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x16i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.16x16x16i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x16i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x16i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.16x16x16i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x16i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x32_i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.16x16x32.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x32_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x32_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.16x16x32.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x32_i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x64_i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.16x16x64.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x64_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x64_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.16x16x64.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_16x16x64_i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x4i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.32x32x4i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x4i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x4i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.32x32x4i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x4i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x8i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.32x32x8i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x8i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x8i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.32x32x8i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x8i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x16_i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.32x32x16.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x16_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x16_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.32x32x16.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x16_i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x32_i8(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None)¶
Bases:
_ods_irMatrix fused multiply-add (MFMA) intrinsic. Computes
D = A * B + Cwith matrix operands. Thecbsz,abid, andblgpattributes control broadcast and block layout modes.Example:
// MFMA with f32 inputs and 32-wide f32 accumulator. %r0 = rocdl.mfma.f32.32x32x1f32 %a0, %b0, %c0, 0, 0, 0 : (f32, f32, vector<32xf32>) -> vector<32xf32> // MFMA with i8 inputs and 32-wide i32 accumulator. %r1 = rocdl.mfma.i32.32x32x4i8 %a1, %a1, %c1, 0, 0, 0 : (i32, i32, vector<32xi32>) -> vector<32xi32> // MFMA with bf16 inputs and 32-wide f32 accumulator. %r2 = rocdl.mfma.f32.32x32x2bf16 %a2, %a2, %c0, 0, 0, 0 : (vector<2xi16>, vector<2xi16>, vector<32xf32>) -> vector<32xf32>
- OPERATION_NAME = 'rocdl.mfma.i32.32x32x32.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x32_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x32_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.i32.32x32x32.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- blgp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_i32_32x32x32_i8_(res, a, b, c, cbsz, abid, blgp, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_16x16x128_f8f6f4(res, a, b, c, cbsz, blgp, opselA, scaleA, opselB, scaleB, *, loc=None, ip=None)¶
Bases:
_ods_irScaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. The
opselA/opselBandscaleA/scaleBarguments control the scaling of input operands.Example:
// Scaled MFMA with fp8 * fp8 inputs. %r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32> // Scaled MFMA with fp8 * bf8 inputs. %r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32> // Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B). %r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
- OPERATION_NAME = 'rocdl.mfma.scale.f32.16x16x128.f8f6f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- blgp() _ods_ir¶
- opselA() _ods_ir¶
- opselB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_16x16x128_f8f6f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_16x16x128_f8f6f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.scale.f32.16x16x128.f8f6f4'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- blgp() _ods_ir¶
- opselA() _ods_ir¶
- opselB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_scale_f32_16x16x128_f8f6f4_(res, a, b, c, cbsz, blgp, opsel_a, scale_a, opsel_b, scale_b, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_32x32x64_f8f6f4(res, a, b, c, cbsz, blgp, opselA, scaleA, opselB, scaleB, *, loc=None, ip=None)¶
Bases:
_ods_irScaled matrix fused multiply-add (MFMA) intrinsic with per-operand scaling. The
opselA/opselBandscaleA/scaleBarguments control the scaling of input operands.Example:
// Scaled MFMA with fp8 * fp8 inputs. %r0 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 0, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32> // Scaled MFMA with fp8 * bf8 inputs. %r1 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %a, %c, 0, 1, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<8xi32>, vector<16xf32>, i32, i32) -> vector<16xf32> // Scaled MFMA with fp8 * fp6 inputs (6xi32 operand B). %r2 = rocdl.mfma.scale.f32.32x32x64.f8f6f4 %a, %b6, %c, 0, 2, 0, %scaleA, 0, %scaleB : (vector<8xi32>, vector<6xi32>, vector<16xf32>, i32, i32) -> vector<16xf32>
- OPERATION_NAME = 'rocdl.mfma.scale.f32.32x32x64.f8f6f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- blgp() _ods_ir¶
- opselA() _ods_ir¶
- opselB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_32x32x64_f8f6f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.mfma_scale_f32_32x32x64_f8f6f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.mfma.scale.f32.32x32x64.f8f6f4'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- blgp() _ods_ir¶
- opselA() _ods_ir¶
- opselB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.mfma_scale_f32_32x32x64_f8f6f4_(res, a, b, c, cbsz, blgp, opsel_a, scale_a, opsel_b, scale_b, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_bf16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x32.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_bf16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_f16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x32.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x32_f16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_bf16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_f16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_f16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x64.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x64_fp8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_bf8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.16x16x128.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_16x16x128_fp8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_bf16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x16.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x16.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_bf16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_f16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x16.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x16.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x16_f16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_bf16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_f16(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_f16_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x32.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x32_fp8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.bf8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.bf8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_bf8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_bf8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.fp8.bf8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_bf8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_fp8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.f32.32x32x64.fp8.fp8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_f32_32x32x64_fp8_fp8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x64_i8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.i32.16x16x64.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x64_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x64_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.i32.16x16x64.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x64_i8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x128_i8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.i32.16x16x128.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x128_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x128_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.i32.16x16x128.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_i32_16x16x128_i8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x32_i8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.i32.32x32x32.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x32_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x32_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.i32.32x32x32.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x32_i8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x64_i8(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None)¶
Bases:
_ods_irSparse matrix fused multiply-accumulate (SMFMAC) intrinsic with 2:4 structured sparsity. The
indexoperand provides the sparsity metadata, andcbsz/abidcontrol broadcast modes.Example:
// SMFMAC with f16 inputs. %r0 = rocdl.smfmac.f32.16x16x32.f16 %a0, %b0, %c0, %idx, 0, 0 : (vector<4xf16>, vector<8xf16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with bf16 inputs. %r1 = rocdl.smfmac.f32.16x16x32.bf16 %a1, %b1, %c0, %idx, 0, 0 : (vector<4xi16>, vector<8xi16>, vector<4xf32>, i32) -> vector<4xf32> // SMFMAC with i8 inputs and i32 accumulator. %r2 = rocdl.smfmac.i32.16x16x64.i8 %a2, %b2, %c2, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xi32>, i32) -> vector<4xi32> // SMFMAC with fp8 inputs. %r3 = rocdl.smfmac.f32.16x16x64.fp8.fp8 %a2, %b2, %c0, %idx, 0, 0 : (vector<2xi32>, vector<4xi32>, vector<4xf32>, i32) -> vector<4xf32>
- OPERATION_NAME = 'rocdl.smfmac.i32.32x32x64.i8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x64_i8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x64_i8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.smfmac.i32.32x32x64.i8'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- cbsz() _ods_ir¶
- abid() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.smfmac_i32_32x32x64_i8_(res, a, b, c, index, cbsz, abid, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x32_bf16(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16.16x16x32.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x32_bf16_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x64_bf16(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16.16x16x64.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x64_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x64_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16.16x16x64.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_bf16_16x16x64_bf16_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16f32_16x16x64_bf16(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16f32.16x16x64.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16f32_16x16x64_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_bf16f32_16x16x64_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.bf16f32.16x16x64.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_bf16f32_16x16x64_bf16_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x32_f16(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x32.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x32_f16_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x64_f16(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x64.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x64_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x64_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x64.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x64_f16_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_bf8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.bf8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_bf8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_fp8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.bf8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_bf8_fp8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_bf8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.fp8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_bf8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_fp8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f16.16x16x128.fp8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f16_16x16x128_fp8_fp8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_bf8(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_bf8_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_fp8(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf8_fp8_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf16(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.bf16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_bf16_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_f16(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.f16'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- c() _ods_ir[_ods_ir]¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_f16_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_bf8(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.fp8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_bf8_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_fp8(res, a, b, c, index, *, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x32.fp8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x32_fp8_fp8_(res, a, b, c, index, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_bf16(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x64.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x64.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_bf16_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_f16(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x64.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x64.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x64_f16_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_bf8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.bf8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.bf8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_bf8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_fp8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.bf8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.bf8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_bf8_fp8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_bf8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.fp8.bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.fp8.bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_bf8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_fp8(res, a, b, c, index, *, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.fp8.fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.f32.16x16x128.fp8.fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_f32_16x16x128_fp8_fp8_(res, a, b, c, index, *, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu4(res, a, b, c, index, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x32.iu4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x32.iu4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu4_(res, a, b, c, index, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu8(res, a, b, c, index, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x32.iu8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x32.iu8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x32_iu8_(res, a, b, c, index, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x64_iu4(res, a, b, c, index, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x64.iu4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x64_iu4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x64_iu4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x64.iu4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x64_iu4_(res, a, b, c, index, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x128_iu8(res, a, b, c, index, *, signA=None, signB=None, reuseA=None, reuseB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x128.iu8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x128_iu8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x128_iu8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.swmmac.i32.16x16x128.iu8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- index() _ods_ir[_ods_ir]¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.swmmac_i32_16x16x128_iu8_(res, a, b, c, index, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x16_bf16(res, a, b, c, *, opsel=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with output operand selection.
Example:
// WMMA f16 with opsel control. %r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} : (vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>
- OPERATION_NAME = 'rocdl.wmma.bf16.16x16x16.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- opsel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x16_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x16_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.bf16.16x16x16.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- opsel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x16_bf16_(res, a, b, c, *, opsel=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x32_bf16(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls. %r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.bf16.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.bf16.16x16x32.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_bf16_16x16x32_bf16_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16f32_16x16x32_bf16(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with different C and D types.
Example:
// WMMA bf16 output from f32 accumulator with bf16 inputs. %r = rocdl.wmma.bf16f32.16x16x32.bf16 %a, %b, %c : (vector<16xbf16>, vector<16xbf16>, vector<8xf32>) -> vector<16xbf16>
- OPERATION_NAME = 'rocdl.wmma.bf16f32.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16f32_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_bf16f32_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.bf16f32.16x16x32.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_bf16f32_16x16x32_bf16_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x16_f16(res, a, b, c, *, opsel=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with output operand selection.
Example:
// WMMA f16 with opsel control. %r = rocdl.wmma.f16.16x16x16.f16 %a, %b, %c {opsel = false} : (vector<16xf16>, vector<16xf16>, vector<16xf16>) -> vector<16xf16>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x16.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- opsel() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x16_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x16_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x16.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- opsel() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x16_f16_(res, a, b, c, *, opsel=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x32_f16(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls. %r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x32.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x32_f16_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.bf8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.bf8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.bf8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.bf8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_bf8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.fp8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.fp8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.fp8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x64.fp8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x64_fp8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.bf8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.bf8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.bf8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.bf8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_bf8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.fp8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.fp8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.fp8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f16.16x16x128.fp8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f16_16x16x128_fp8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x4_f32(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls. %r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x4.f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x4_f32Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x4_f32Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x4.f32'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x4_f32_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_bf8(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_bf8_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_fp8(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf8_fp8_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf16(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_bf16_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_f16(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_f16_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_bf8(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.fp8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.fp8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_bf8_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_fp8(res, a, b, c, *, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) intrinsic.
Example:
// WMMA with f16 inputs and f32 accumulator. %r = rocdl.wmma.f32.16x16x16.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.fp8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x16.fp8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x16_fp8_fp8_(res, a, b, c, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_bf16(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls. %r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x32.bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_bf16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_bf16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x32.bf16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_bf16_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_f16(res, a, b, c, *, signA=None, signB=None, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with sign, modC, and reuse controls.
Example:
// WMMA f32 with f16 inputs and reuse controls. %r = rocdl.wmma.f32.16x16x32.f16 %a, %b, %c : (vector<16xf16>, vector<16xf16>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x32.f16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_f16Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_f16Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x32.f16'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x32_f16_(res, a, b, c, *, sign_a=None, sign_b=None, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.bf8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.bf8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.bf8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.bf8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_bf8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.fp8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.fp8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.fp8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x64.fp8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x64_fp8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.bf8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.bf8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.bf8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.bf8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_bf8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_bf8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.fp8_bf8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_bf8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_bf8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.fp8_bf8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_bf8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_fp8(res, a, b, c, *, modC=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) with modC and reuse controls.
Example:
// WMMA f32 with fp8 inputs and modC/reuse controls. %r = rocdl.wmma.f32.16x16x64.fp8_fp8 %a, %b, %c : (vector<16xi32>, vector<16xi32>, vector<8xf32>) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.fp8_fp8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_fp8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_fp8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.f32.16x16x128.fp8_fp8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- modC() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_f32_16x16x128_fp8_fp8_(res, a, b, c, *, mod_c=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu4(res, a, b, c, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs. %r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c {signA = false, signB = false, clamp = false} : (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
- OPERATION_NAME = 'rocdl.wmma.i32.16x16x16.iu4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.i32.16x16x16.iu4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu4_(res, a, b, c, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu8(res, a, b, c, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs. %r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c {signA = false, signB = false, clamp = false} : (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
- OPERATION_NAME = 'rocdl.wmma.i32.16x16x16.iu8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.i32.16x16x16.iu8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x16_iu8_(res, a, b, c, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x32_iu4(res, a, b, c, *, signA=None, signB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) for integer types with sign and clamp control.
Example:
// WMMA i32 with unsigned i8 inputs. %r = rocdl.wmma.i32.16x16x16.iu8 %a, %b, %c {signA = false, signB = false, clamp = false} : (vector<4xi32>, vector<4xi32>, vector<8xi32>) -> vector<8xi32>
- OPERATION_NAME = 'rocdl.wmma.i32.16x16x32.iu4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x32_iu4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x32_iu4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.i32.16x16x32.iu4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x32_iu4_(res, a, b, c, *, sign_a=None, sign_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x64_iu8(res, a, b, c, *, signA=None, signB=None, reuseA=None, reuseB=None, clamp=None, loc=None, ip=None)¶
Bases:
_ods_irWave Matrix Multiply-Accumulate (WMMA) for integer types with sign, reuse, and clamp controls.
Example:
// WMMA i32 with unsigned i8 inputs and reuse controls. %r = rocdl.wmma.i32.16x16x64.iu8 %a, %b, %c {signA = false, signB = false, reuseA = false, reuseB = false, clamp = false} : (vector<8xi32>, vector<8xi32>, vector<8xi32>) -> vector<8xi32>
- OPERATION_NAME = 'rocdl.wmma.i32.16x16x64.iu8'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- clamp() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x64_iu8Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x64_iu8Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.i32.16x16x64.iu8'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- signA() _ods_ir¶
- signB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- clamp() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_i32_16x16x64_iu8_(res, a, b, c, *, sign_a=None, sign_b=None, reuse_a=None, reuse_b=None, clamp=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_16x16x128_f8f6f4(res, a, b, c, scaleA, scaleB, *, fmtA=None, fmtB=None, modC=None, scaleAType=None, fmtScaleA=None, scaleBType=None, fmtScaleB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irScaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.
Example:
// Scaled WMMA with f8f6f4 format inputs. %r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB : (vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.scale16.f32.16x16x128.f8f6f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- fmtA() _ods_ir¶
- fmtB() _ods_ir¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_16x16x128_f8f6f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_16x16x128_f8f6f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.scale16.f32.16x16x128.f8f6f4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- fmtA() _ods_ir¶
- fmtB() _ods_ir¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_16x16x128_f8f6f4_(res, a, b, c, scale_a, scale_b, *, fmt_a=None, fmt_b=None, mod_c=None, scale_a_type=None, fmt_scale_a=None, scale_b_type=None, fmt_scale_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_32x16x128_f4(res, a, b, c, scaleA, scaleB, *, modC=None, scaleAType=None, fmtScaleA=None, scaleBType=None, fmtScaleB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irScaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.
Example:
// Scaled WMMA with f4 format inputs. %r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB : (vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.scale16.f32.32x16x128.f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_32x16x128_f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_32x16x128_f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.scale16.f32.32x16x128.f4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_scale16_f32_32x16x128_f4_(res, a, b, c, scale_a, scale_b, *, mod_c=None, scale_a_type=None, fmt_scale_a=None, scale_b_type=None, fmt_scale_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_16x16x128_f8f6f4(res, a, b, c, scaleA, scaleB, *, fmtA=None, fmtB=None, modC=None, scaleAType=None, fmtScaleA=None, scaleBType=None, fmtScaleB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irScaled Wave Matrix Multiply-Accumulate (WMMA) with per-operand scaling.
Example:
// Scaled WMMA with f8f6f4 format inputs. %r = rocdl.wmma.scale.f32.16x16x128.f8f6f4 %a, %b, %c, %scaleA, %scaleB : (vector<16xi32>, vector<16xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.scale.f32.16x16x128.f8f6f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- fmtA() _ods_ir¶
- fmtB() _ods_ir¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_16x16x128_f8f6f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_16x16x128_f8f6f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.scale.f32.16x16x128.f8f6f4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- fmtA() _ods_ir¶
- fmtB() _ods_ir¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_scale_f32_16x16x128_f8f6f4_(res, a, b, c, scale_a, scale_b, *, fmt_a=None, fmt_b=None, mod_c=None, scale_a_type=None, fmt_scale_a=None, scale_b_type=None, fmt_scale_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_32x16x128_f4(res, a, b, c, scaleA, scaleB, *, modC=None, scaleAType=None, fmtScaleA=None, scaleBType=None, fmtScaleB=None, reuseA=None, reuseB=None, loc=None, ip=None)¶
Bases:
_ods_irScaled Wave Matrix Multiply-Accumulate (WMMA) for F4 format inputs.
Example:
// Scaled WMMA with f4 format inputs. %r = rocdl.wmma.scale.f32.16x16x128.f4 %a, %b, %c, %scaleA, %scaleB : (vector<8xi32>, vector<8xi32>, vector<8xf32>, i32, i32) -> vector<8xf32>
- OPERATION_NAME = 'rocdl.wmma.scale.f32.32x16x128.f4'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_32x16x128_f4Adaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._rocdl_ops_gen.wmma_scale_f32_32x16x128_f4Adaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'rocdl.wmma.scale.f32.32x16x128.f4'¶
- a() _ods_ir¶
- b() _ods_ir¶
- c() _ods_ir¶
- scaleA() _ods_ir[_ods_ir]¶
- scaleB() _ods_ir[_ods_ir]¶
- modC() _ods_ir¶
- scaleAType() _ods_ir¶
- fmtScaleA() _ods_ir¶
- scaleBType() _ods_ir¶
- fmtScaleB() _ods_ir¶
- reuseA() _ods_ir¶
- reuseB() _ods_ir¶
- mlir.dialects._rocdl_ops_gen.wmma_scale_f32_32x16x128_f4_(res, a, b, c, scale_a, scale_b, *, mod_c=None, scale_a_type=None, fmt_scale_a=None, scale_b_type=None, fmt_scale_b=None, reuse_a=None, reuse_b=None, loc=None, ip=None) _ods_ir¶