mlir.dialects._x86_ops_gen¶
Attributes¶
Classes¶
The |
|
From the Intel Intrinsics Guide: |
|
From the Intel Intrinsics Guide: |
|
The |
|
From the Intel Intrinsics Guide: |
|
The |
|
The |
|
Computes the 4-way dot products of the lower and higher parts of the source |
|
The mask.compress op is an AVX512 specific op that can lower to the |
|
The mask.rndscale op is an AVX512 specific op that can lower to the proper |
|
The |
|
Loads a tile from memory defined by a |
|
Multiplies a "m x k" tile with a "k x n" tile and accumulates the results |
|
Multiplies a "m x k" tile with a "k x n" tile and accumulates the results |
|
Stores a tile to memory defined by a |
|
Zeroes the destination tile, with the shape defined by the 2-dim |
|
The |
|
Functions¶
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Module Contents¶
- mlir.dialects._x86_ops_gen._ods_ir¶
- class mlir.dialects._x86_ops_gen._Dialect(descriptor: object)¶
Bases:
_ods_ir- DIALECT_NAMESPACE = 'x86'¶
- class mlir.dialects._x86_ops_gen.AVX10DotInt8Op(w, a, b, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe
dotop is an AVX10-Int8 specific op that can lower to the proper LLVMAVX10-INT8 operationllvm.vpdpbssd.512.Multiply groups of 4 adjacent pairs of signed 8-bit integers in
awith corresponding signed 8-bit integers inb, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer inw, and store the packed 32-bit results indst.Example:
%dst = x86.avx10.dot.i8 %w, %a, %b : vector<64xi8> -> vector<16xi32>
- OPERATION_NAME = 'x86.avx10.dot.i8'¶
- _ODS_REGIONS = (0, True)¶
- w() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.AVX10DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.AVX10DotInt8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx10.dot.i8'¶
- w() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx10_dot_i8(w, a, b, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.BcstToPackedF32Op(dst, a, *, loc=None, ip=None)¶
Bases:
_ods_irFrom the Intel Intrinsics Guide:¶
Convert scalar BF16 or F16 (16-bit) floating-point element stored at memory locations starting at location
__Ato a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results indst.Example:
%dst = x86.avx.bcst_to_f32.packed %a : memref<1xbf16> -> vector<8xf32> %dst = x86.avx.bcst_to_f32.packed %a : memref<1xf16> -> vector<8xf32>
- OPERATION_NAME = 'x86.avx.bcst_to_f32.packed'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.BcstToPackedF32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.BcstToPackedF32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.bcst_to_f32.packed'¶
- a() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_bcst_to_f32_packed(dst, a, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.CvtPackedEvenIndexedToF32Op(dst, a, *, loc=None, ip=None)¶
Bases:
_ods_irFrom the Intel Intrinsics Guide:¶
Convert packed BF16 or F16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location
__Ato packed single-precision (32-bit) floating-point elements, and store the results indst.Example:
%dst = x86.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32> %dst = x86.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32>
- OPERATION_NAME = 'x86.avx.cvt.packed.even.indexed_to_f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.CvtPackedEvenIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.CvtPackedEvenIndexedToF32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.cvt.packed.even.indexed_to_f32'¶
- a() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_cvt_packed_even_indexed_to_f32(dst, a, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.CvtPackedF32ToBF16Op(dst, a, *, loc=None, ip=None)¶
Bases:
_ods_irThe
convert_f32_to_bf16op is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operationllvm.cvtneps2bf16depending on the width of MLIR vectors it is applied to.From the Intel Intrinsics Guide:¶
Convert packed single-precision (32-bit) floating-point elements in
ato packed BF16 (16-bit) floating-point elements, and store the results indst.Example:
%dst = x86.avx512.cvt.packed.f32_to_bf16 %a : vector<8xf32> -> vector<8xbf16>
- OPERATION_NAME = 'x86.avx512.cvt.packed.f32_to_bf16'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.CvtPackedF32ToBF16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.CvtPackedF32ToBF16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.cvt.packed.f32_to_bf16'¶
- a() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx512_cvt_packed_f32_to_bf16(dst, a, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.CvtPackedOddIndexedToF32Op(dst, a, *, loc=None, ip=None)¶
Bases:
_ods_irFrom the Intel Intrinsics Guide:¶
Convert packed BF16 or F16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location
__Ato packed single-precision (32-bit) floating-point elements, and store the results indst.Example:
%dst = x86.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32> %dst = x86.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32>
- OPERATION_NAME = 'x86.avx.cvt.packed.odd.indexed_to_f32'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.CvtPackedOddIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.CvtPackedOddIndexedToF32OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.cvt.packed.odd.indexed_to_f32'¶
- a() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_cvt_packed_odd_indexed_to_f32(dst, a, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.DotBF16Op(src, a, b, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe
dotop is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operationllvm.dpbf16psdepending on the width of MLIR vectors it is applied to.From the Intel Intrinsics Guide:¶
Compute dot-product of BF16 (16-bit) floating-point pairs in
aandb, accumulating the intermediate single-precision (32-bit) floating-point elements with elements insrc, and store the results indst.Example:
%dst = x86.avx512.dot %src, %a, %b : vector<32xbf16> -> vector<16xf32>
- OPERATION_NAME = 'x86.avx512.dot'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.DotBF16OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.DotBF16OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.dot'¶
- src() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx512_dot(src, a, b, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.DotInt8Op(w, a, b, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe
dotop is an AVX2-Int8 specific op that can lower to the proper LLVMAVX2-INT8 operationllvm.vpdpbssddepending on the width of MLIR vectors it is applied to.From the Intel Intrinsics Guide:¶
Multiply groups of 4 adjacent pairs of signed 8-bit integers in
awith corresponding signed 8-bit integers inb, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer inw, and store the packed 32-bit results indst.Example:
%dst = x86.avx.dot.i8 %w, %a, %b : vector<32xi8> -> vector<8xi32>
- OPERATION_NAME = 'x86.avx.dot.i8'¶
- _ODS_REGIONS = (0, True)¶
- w() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.DotInt8OpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.dot.i8'¶
- w() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_dot_i8(w, a, b, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.DotOp(a, b, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irComputes the 4-way dot products of the lower and higher parts of the source vectors and broadcasts the two results to the lower and higher elements of the destination vector, respectively. Adding one element of the lower part to one element of the higher part in the destination vector yields the full dot product of the two source vectors.
Example:
%0 = x86.avx.intr.dot %a, %b : vector<8xf32> %1 = vector.extract %0[%i0] : f32 from vector<8xf32> %2 = vector.extract %0[%i4] : f32 from vector<8xf32> %d = arith.addf %1, %2 : f32
- OPERATION_NAME = 'x86.avx.intr.dot'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- res() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.DotOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.DotOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.intr.dot'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_intr_dot(a, b, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.MaskCompressOp(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe mask.compress op is an AVX512 specific op that can lower to the
llvm.mask.compressinstruction. Instead ofsrc, a constant vector vector attributeconstant_srcmay be specified. If neithersrcnorconstant_srcis specified, the remaining elements in the result vector are set to zero.From the Intel Intrinsics Guide:¶
Contiguously store the active integer/floating-point elements in
a(those with their respective bit set in writemaskk) todst, and pass through the remaining elements fromsrc.- OPERATION_NAME = 'x86.avx512.mask.compress'¶
- _ODS_REGIONS = (0, True)¶
- k() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir] | None¶
- constant_src() _ods_ir | None¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.MaskCompressOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.MaskCompressOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.mask.compress'¶
- k() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- src() _ods_ir[_ods_ir] | None¶
- constant_src() _ods_ir | None¶
- mlir.dialects._x86_ops_gen.avx512_mask_compress(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.MaskRndScaleOp(src, k, a, imm, rounding, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe mask.rndscale op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation:
llvm.mask.rndscale.ps.512orllvm.mask.rndscale.pd.512instruction depending on the type of vectors it is applied to.From the Intel Intrinsics Guide:¶
Round packed floating-point elements in
ato the number of fraction bits specified byimm, and store the results indstusing writemaskk(elements are copied from src when the corresponding mask bit is not set).- OPERATION_NAME = 'x86.avx512.mask.rndscale'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- k() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- imm() _ods_ir¶
- rounding() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.MaskRndScaleOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.MaskRndScaleOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.mask.rndscale'¶
- src() _ods_ir[_ods_ir]¶
- k() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- imm() _ods_ir¶
- rounding() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx512_mask_rndscale(src, k, a, imm, rounding, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.MaskScaleFOp(src, a, b, k, rounding, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe
mask.scalefop is an AVX512 specific op that can lower to the proper LLVMAVX512 operation:llvm.mask.scalef.ps.512orllvm.mask.scalef.pd.512depending on the type of MLIR vectors it is applied to.From the Intel Intrinsics Guide:¶
Scale the packed floating-point elements in
ausing values fromb, and store the results indstusing writemaskk(elements are copied from src when the corresponding mask bit is not set).- OPERATION_NAME = 'x86.avx512.mask.scalef'¶
- _ODS_REGIONS = (0, True)¶
- src() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- k() _ods_ir¶
- rounding() _ods_ir[_ods_ir]¶
- dst() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.MaskScaleFOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.MaskScaleFOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.mask.scalef'¶
- src() _ods_ir[_ods_ir]¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- k() _ods_ir¶
- rounding() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx512_mask_scalef(src, a, b, k, rounding, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.RsqrtOp(a, *, results=None, loc=None, ip=None)¶
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.rsqrt'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.RsqrtOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.RsqrtOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx.rsqrt'¶
- a() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx_rsqrt(a, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileLoadOp(res, base, indices, *, stride=None, loc=None, ip=None)¶
Bases:
_ods_irLoads a tile from memory defined by a
baseandindices, with the shape defined by the 2-dim vector type of the result. The tile’s rows are populated by reading contiguous elements starting at thebase. For each tile row, thebaseis incremented bystridenumber of elements.The tile is loaded using the following indexing scheme:
for row in enumerate(tile_rows): mem_row = base[i0, i1, ..., iN + row * stride] for col in enumerate(tile_cols): tile[row, col] = mem_row[col]
If the
strideis not provided, then thebasebuffer must be at least 2-dimensional, and thestrideis automatically inferred and corresponds to the stride of the buffer’s second innermost dimension.The operation is eventually lowered into the “tileloadd” instruction with the corresponding tile configuration.
With the write memory effect, each
x86.amx.tile_loadoperation serves as a compilation hint to use a separate tile register.Example:
// Tile load from a 2-D memref with implicit stride. %0 = x86.amx.tile_load %arg0[%c0, %c0] : memref<?x?xi8> into !x86.amx.tile<16x64xi8> // Tile load from a 1-D memref with explicit stride. %0 = x86.amx.tile_load %arg0[%c0], %stride : memref<?xi8> into !x86.amx.tile<16x64xi8>
- OPERATION_NAME = 'x86.amx.tile_load'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- base() _ods_ir[_ods_ir]¶
- indices() _ods_ir¶
- stride() _ods_ir[_ods_ir] | None¶
- res() _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileLoadOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.TileLoadOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.amx.tile_load'¶
- base() _ods_ir[_ods_ir]¶
- indices() _ods_ir¶
- stride() _ods_ir[_ods_ir] | None¶
- mlir.dialects._x86_ops_gen.amx_tile_load(res, base, indices, *, stride=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileMulFOp(lhs, rhs, acc, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irMultiplies a “m x k” tile with a “k x n” tile and accumulates the results into a “m x n” destination tile. Supports “f32 <- bf16 x bf16” (with pairs of “bf16”).
The operation is eventually lowered into the “tdpbf16ps” instruction with the corresponding tile configuration.
Example:
%0 = x86.amx.tile_mulf %a, %b, %c : !x86.amx.tile<16x32xbf16>, !x86.amx.tile<16x32xbf16>, !x86.amx.tile<16x16xf32>
- OPERATION_NAME = 'x86.amx.tile_mulf'¶
- _ODS_REGIONS = (0, True)¶
- lhs() _ods_ir¶
- rhs() _ods_ir¶
- acc() _ods_ir¶
- res() _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileMulFOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.TileMulFOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.amx.tile_mulf'¶
- lhs() _ods_ir¶
- rhs() _ods_ir¶
- acc() _ods_ir¶
- mlir.dialects._x86_ops_gen.amx_tile_mulf(lhs, rhs, acc, *, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileMulIOp(lhs, rhs, acc, *, isZextLhs=None, isZextRhs=None, results=None, loc=None, ip=None)¶
Bases:
_ods_irMultiplies a “m x k” tile with a “k x n” tile and accumulates the results into a “m x n” destination tile. Supports all “si32 <- s/ui8 x s/ui8” combinations (4 bytes packed into dwords in the columns of both the source operand tiles; the zero or sign extension is specified with the attributes and default to sign extended).
The operation is eventually lowered into one of the “tdpbssd”, “tdpbsud”, “tdpbusd”, or “tdpbuud” instructions with the corresponding tile configuration.
Example:
%0 = x86.amx.tile_muli %a zext, %b zext, %c : !x86.amx.tile<16x64xi8>, !x86.amx.tile<16x64xi8>, !x86.amx.tile<16x16xi32>
- OPERATION_NAME = 'x86.amx.tile_muli'¶
- _ODS_REGIONS = (0, True)¶
- lhs() _ods_ir¶
- rhs() _ods_ir¶
- acc() _ods_ir¶
- isZextLhs() bool¶
- isZextRhs() bool¶
- res() _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileMulIOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.TileMulIOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.amx.tile_muli'¶
- lhs() _ods_ir¶
- rhs() _ods_ir¶
- acc() _ods_ir¶
- isZextLhs() bool¶
- isZextRhs() bool¶
- mlir.dialects._x86_ops_gen.amx_tile_muli(lhs, rhs, acc, *, is_zext_lhs=None, is_zext_rhs=None, results=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileStoreOp(base, indices, val, *, stride=None, loc=None, ip=None)¶
Bases:
_ods_irStores a tile to memory defined by a
baseandindices, with the shape defined by the 2-dim vector type of the value. The tile’s rows are written contiguously to the buffer starting at thebase. For each tile row, thebaseis incremented bystridenumber of elements.The tile is stored using the following indexing scheme:
for row in enumerate(tile_rows): mem_row = base[i0, i1, ..., iN + row * stride] for col in enumerate(tile_cols): mem_row[col] = tile[row, col]
If the
strideis not provided, then thebasebuffer must be at least 2-dimensional, and thestrideis automatically inferred and corresponds to the stride of the buffer’s second innermost dimension.The operation is eventually lowered into the “tilestored” instruction with the corresponding tile configuration.
Example:
// Tile store to a 2-D memref with implicit stride. x86.amx.tile_store %arg1[%c0, %c0], %0 : memref<?x?xi8>, !x86.amx.tile<16x64xi8> // Tile store to a 1-D memref with explicit stride. x86.amx.tile_store %arg1[%c0], %0, %stride : memref<?xi8>, !x86.amx.tile<16x64xi8>
- OPERATION_NAME = 'x86.amx.tile_store'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- base() _ods_ir[_ods_ir]¶
- indices() _ods_ir¶
- val() _ods_ir¶
- stride() _ods_ir[_ods_ir] | None¶
- class mlir.dialects._x86_ops_gen.TileStoreOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.TileStoreOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.amx.tile_store'¶
- base() _ods_ir[_ods_ir]¶
- indices() _ods_ir¶
- val() _ods_ir¶
- stride() _ods_ir[_ods_ir] | None¶
- mlir.dialects._x86_ops_gen.amx_tile_store(base, indices, val, *, stride=None, loc=None, ip=None) TileStoreOp¶
- class mlir.dialects._x86_ops_gen.TileZeroOp(res, *, loc=None, ip=None)¶
Bases:
_ods_irZeroes the destination tile, with the shape defined by the 2-dim vector type of the result.
The operation is eventually lowered into the “tilezero” instruction with the corresponding tile configuration.
With the write memory effect, each
x86.amx.tile_zerooperation serves as a compilation hint to use a separate tile register.Example:
%0 = x86.amx.tile_zero : !x86.amx.tile<16x16xbf16>
- OPERATION_NAME = 'x86.amx.tile_zero'¶
- _ODS_REGIONS = (0, True)¶
- res() _ods_ir¶
- class mlir.dialects._x86_ops_gen.TileZeroOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.TileZeroOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.amx.tile_zero'¶
- mlir.dialects._x86_ops_gen.amx_tile_zero(res, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._x86_ops_gen.Vp2IntersectOp(a, b, *, results=None, loc=None, ip=None)¶
Bases:
_ods_irThe
vp2intersectop is an AVX512 specific op that can lower to the proper LLVMAVX512 operation:llvm.vp2intersect.d.512orllvm.vp2intersect.q.512depending on the type of MLIR vectors it is applied to.From the Intel Intrinsics Guide:¶
Compute intersection of packed integer vectors
aandb, and store indication of match in the corresponding bit of two mask registers specified byk1andk2. A match in corresponding elements ofaandbis indicated by a set bit in the corresponding bit of the mask registers.- OPERATION_NAME = 'x86.avx512.vp2intersect'¶
- _ODS_REGIONS = (0, True)¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- k1() _ods_ir[_ods_ir]¶
- k2() _ods_ir[_ods_ir]¶
- class mlir.dialects._x86_ops_gen.Vp2IntersectOpAdaptor(operands: list, attributes: OpAttributeMap)¶
- class mlir.dialects._x86_ops_gen.Vp2IntersectOpAdaptor(operands: list, opview: OpView)
Bases:
_ods_ir- OPERATION_NAME = 'x86.avx512.vp2intersect'¶
- a() _ods_ir[_ods_ir]¶
- b() _ods_ir[_ods_ir]¶
- mlir.dialects._x86_ops_gen.avx512_vp2intersect(a, b, *, results=None, loc=None, ip=None) _ods_ir¶