mlir.dialects.x86vector ======================= .. py:module:: mlir.dialects.x86vector Classes ------- .. autoapisummary:: mlir.dialects.x86vector.AVX10DotInt8Op mlir.dialects.x86vector.AVX10DotInt8OpAdaptor mlir.dialects.x86vector.BcstToPackedF32Op mlir.dialects.x86vector.BcstToPackedF32OpAdaptor mlir.dialects.x86vector.CvtPackedEvenIndexedToF32Op mlir.dialects.x86vector.CvtPackedEvenIndexedToF32OpAdaptor mlir.dialects.x86vector.CvtPackedF32ToBF16Op mlir.dialects.x86vector.CvtPackedF32ToBF16OpAdaptor mlir.dialects.x86vector.CvtPackedOddIndexedToF32Op mlir.dialects.x86vector.CvtPackedOddIndexedToF32OpAdaptor mlir.dialects.x86vector.DotBF16Op mlir.dialects.x86vector.DotBF16OpAdaptor mlir.dialects.x86vector.DotInt8Op mlir.dialects.x86vector.DotInt8OpAdaptor mlir.dialects.x86vector.DotOp mlir.dialects.x86vector.DotOpAdaptor mlir.dialects.x86vector.MaskCompressOp mlir.dialects.x86vector.MaskCompressOpAdaptor mlir.dialects.x86vector.MaskRndScaleOp mlir.dialects.x86vector.MaskRndScaleOpAdaptor mlir.dialects.x86vector.MaskScaleFOp mlir.dialects.x86vector.MaskScaleFOpAdaptor mlir.dialects.x86vector.RsqrtOp mlir.dialects.x86vector.RsqrtOpAdaptor mlir.dialects.x86vector.Vp2IntersectOp mlir.dialects.x86vector.Vp2IntersectOpAdaptor Functions --------- .. autoapisummary:: mlir.dialects.x86vector.avx10_dot_i8 mlir.dialects.x86vector.avx_bcst_to_f32_packed mlir.dialects.x86vector.avx_cvt_packed_even_indexed_to_f32 mlir.dialects.x86vector.avx512_cvt_packed_f32_to_bf16 mlir.dialects.x86vector.avx_cvt_packed_odd_indexed_to_f32 mlir.dialects.x86vector.avx512_dot mlir.dialects.x86vector.avx_dot_i8 mlir.dialects.x86vector.avx_intr_dot mlir.dialects.x86vector.avx512_mask_compress mlir.dialects.x86vector.avx512_mask_rndscale mlir.dialects.x86vector.avx512_mask_scalef mlir.dialects.x86vector.avx_rsqrt mlir.dialects.x86vector.avx512_vp2intersect Module Contents --------------- .. py:class:: AVX10DotInt8Op(w, a, b, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``dot`` op is an AVX10-Int8 specific op that can lower to the proper LLVMAVX10-INT8 operation ``llvm.vpdpbssd.512``. Multiply groups of 4 adjacent pairs of signed 8-bit integers in ``a`` with corresponding signed 8-bit integers in ``b``, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in ``w``, and store the packed 32-bit results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx10.dot.i8 %w, %a, %b : vector<64xi8> -> vector<16xi32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx10.dot.i8' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: w() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: AVX10DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap) AVX10DotInt8OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx10.dot.i8' .. py:method:: w() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:function:: avx10_dot_i8(w, a, b, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: BcstToPackedF32Op(dst, a, *, loc=None, ip=None) Bases: :py:obj:`_ods_ir` From the Intel Intrinsics Guide: -------------------------------- Convert scalar BF16 or F16 (16-bit) floating-point element stored at memory locations starting at location ``__A`` to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx.bcst_to_f32.packed %a : memref<1xbf16> -> vector<8xf32> %dst = x86vector.avx.bcst_to_f32.packed %a : memref<1xf16> -> vector<8xf32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.bcst_to_f32.packed' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: BcstToPackedF32OpAdaptor(operands: list, attributes: OpAttributeMap) BcstToPackedF32OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.bcst_to_f32.packed' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:function:: avx_bcst_to_f32_packed(dst, a, *, loc=None, ip=None) -> _ods_ir .. py:class:: CvtPackedEvenIndexedToF32Op(dst, a, *, loc=None, ip=None) Bases: :py:obj:`_ods_ir` From the Intel Intrinsics Guide: -------------------------------- Convert packed BF16 or F16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location ``__A`` to packed single-precision (32-bit) floating-point elements, and store the results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32> %dst = x86vector.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.cvt.packed.even.indexed_to_f32' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: CvtPackedEvenIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap) CvtPackedEvenIndexedToF32OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.cvt.packed.even.indexed_to_f32' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:function:: avx_cvt_packed_even_indexed_to_f32(dst, a, *, loc=None, ip=None) -> _ods_ir .. py:class:: CvtPackedF32ToBF16Op(dst, a, *, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``convert_f32_to_bf16`` op is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operation ``llvm.cvtneps2bf16`` depending on the width of MLIR vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Convert packed single-precision (32-bit) floating-point elements in ``a`` to packed BF16 (16-bit) floating-point elements, and store the results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx512.cvt.packed.f32_to_bf16 %a : vector<8xf32> -> vector<8xbf16> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.cvt.packed.f32_to_bf16' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: CvtPackedF32ToBF16OpAdaptor(operands: list, attributes: OpAttributeMap) CvtPackedF32ToBF16OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.cvt.packed.f32_to_bf16' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:function:: avx512_cvt_packed_f32_to_bf16(dst, a, *, loc=None, ip=None) -> _ods_ir .. py:class:: CvtPackedOddIndexedToF32Op(dst, a, *, loc=None, ip=None) Bases: :py:obj:`_ods_ir` From the Intel Intrinsics Guide: -------------------------------- Convert packed BF16 or F16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location ``__A`` to packed single-precision (32-bit) floating-point elements, and store the results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32> %dst = x86vector.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.cvt.packed.odd.indexed_to_f32' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: CvtPackedOddIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap) CvtPackedOddIndexedToF32OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.cvt.packed.odd.indexed_to_f32' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:function:: avx_cvt_packed_odd_indexed_to_f32(dst, a, *, loc=None, ip=None) -> _ods_ir .. py:class:: DotBF16Op(src, a, b, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``dot`` op is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operation ``llvm.dpbf16ps`` depending on the width of MLIR vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Compute dot-product of BF16 (16-bit) floating-point pairs in ``a`` and ``b``, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in ``src``, and store the results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx512.dot %src, %a, %b : vector<32xbf16> -> vector<16xf32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.dot' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: DotBF16OpAdaptor(operands: list, attributes: OpAttributeMap) DotBF16OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.dot' .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:function:: avx512_dot(src, a, b, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: DotInt8Op(w, a, b, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``dot`` op is an AVX2-Int8 specific op that can lower to the proper LLVMAVX2-INT8 operation ``llvm.vpdpbssd`` depending on the width of MLIR vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Multiply groups of 4 adjacent pairs of signed 8-bit integers in ``a`` with corresponding signed 8-bit integers in ``b``, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in ``w``, and store the packed 32-bit results in ``dst``. Example: .. code:: mlir %dst = x86vector.avx.dot.i8 %w, %a, %b : vector<32xi8> -> vector<8xi32> .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.dot.i8' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: w() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap) DotInt8OpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.dot.i8' .. py:method:: w() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:function:: avx_dot_i8(w, a, b, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: DotOp(a, b, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` Computes the 4-way dot products of the lower and higher parts of the source vectors and broadcasts the two results to the lower and higher elements of the destination vector, respectively. Adding one element of the lower part to one element of the higher part in the destination vector yields the full dot product of the two source vectors. Example: .. code:: mlir %0 = x86vector.avx.intr.dot %a, %b : vector<8xf32> %1 = vector.extract %0[%i0] : f32 from vector<8xf32> %2 = vector.extract %0[%i4] : f32 from vector<8xf32> %d = arith.addf %1, %2 : f32 .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.intr.dot' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: res() -> _ods_ir[_ods_ir] .. py:class:: DotOpAdaptor(operands: list, attributes: OpAttributeMap) DotOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.intr.dot' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:function:: avx_intr_dot(a, b, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: MaskCompressOp(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The mask.compress op is an AVX512 specific op that can lower to the ``llvm.mask.compress`` instruction. Instead of ``src``, a constant vector vector attribute ``constant_src`` may be specified. If neither ``src`` nor ``constant_src`` is specified, the remaining elements in the result vector are set to zero. From the Intel Intrinsics Guide: -------------------------------- Contiguously store the active integer/floating-point elements in ``a`` (those with their respective bit set in writemask ``k``) to ``dst``, and pass through the remaining elements from ``src``. .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.compress' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: k() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: src() -> Optional[_ods_ir[_ods_ir]] .. py:method:: constant_src() -> Optional[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: MaskCompressOpAdaptor(operands: list, attributes: OpAttributeMap) MaskCompressOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.compress' .. py:method:: k() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: src() -> Optional[_ods_ir[_ods_ir]] .. py:method:: constant_src() -> Optional[_ods_ir] .. py:function:: avx512_mask_compress(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: MaskRndScaleOp(src, k, a, imm, rounding, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The mask.rndscale op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: ``llvm.mask.rndscale.ps.512`` or ``llvm.mask.rndscale.pd.512`` instruction depending on the type of vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Round packed floating-point elements in ``a`` to the number of fraction bits specified by ``imm``, and store the results in ``dst`` using writemask ``k`` (elements are copied from src when the corresponding mask bit is not set). .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.rndscale' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: k() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: imm() -> _ods_ir .. py:method:: rounding() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: MaskRndScaleOpAdaptor(operands: list, attributes: OpAttributeMap) MaskRndScaleOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.rndscale' .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: k() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: imm() -> _ods_ir .. py:method:: rounding() -> _ods_ir[_ods_ir] .. py:function:: avx512_mask_rndscale(src, k, a, imm, rounding, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: MaskScaleFOp(src, a, b, k, rounding, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``mask.scalef`` op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: ``llvm.mask.scalef.ps.512`` or ``llvm.mask.scalef.pd.512`` depending on the type of MLIR vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Scale the packed floating-point elements in ``a`` using values from ``b``, and store the results in ``dst`` using writemask ``k`` (elements are copied from src when the corresponding mask bit is not set). .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.scalef' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: k() -> _ods_ir .. py:method:: rounding() -> _ods_ir[_ods_ir] .. py:method:: dst() -> _ods_ir[_ods_ir] .. py:class:: MaskScaleFOpAdaptor(operands: list, attributes: OpAttributeMap) MaskScaleFOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.mask.scalef' .. py:method:: src() -> _ods_ir[_ods_ir] .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: k() -> _ods_ir .. py:method:: rounding() -> _ods_ir[_ods_ir] .. py:function:: avx512_mask_scalef(src, a, b, k, rounding, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: RsqrtOp(a, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.rsqrt' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:class:: RsqrtOpAdaptor(operands: list, attributes: OpAttributeMap) RsqrtOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx.rsqrt' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:function:: avx_rsqrt(a, *, results=None, loc=None, ip=None) -> _ods_ir .. py:class:: Vp2IntersectOp(a, b, *, results=None, loc=None, ip=None) Bases: :py:obj:`_ods_ir` The ``vp2intersect`` op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: ``llvm.vp2intersect.d.512`` or ``llvm.vp2intersect.q.512`` depending on the type of MLIR vectors it is applied to. From the Intel Intrinsics Guide: -------------------------------- Compute intersection of packed integer vectors ``a`` and ``b``, and store indication of match in the corresponding bit of two mask registers specified by ``k1`` and ``k2``. A match in corresponding elements of ``a`` and ``b`` is indicated by a set bit in the corresponding bit of the mask registers. .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.vp2intersect' .. py:attribute:: _ODS_REGIONS :value: (0, True) .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:method:: k1() -> _ods_ir[_ods_ir] .. py:method:: k2() -> _ods_ir[_ods_ir] .. py:class:: Vp2IntersectOpAdaptor(operands: list, attributes: OpAttributeMap) Vp2IntersectOpAdaptor(operands: list, opview: OpView) Bases: :py:obj:`_ods_ir` .. py:attribute:: OPERATION_NAME :value: 'x86vector.avx512.vp2intersect' .. py:method:: a() -> _ods_ir[_ods_ir] .. py:method:: b() -> _ods_ir[_ods_ir] .. py:function:: avx512_vp2intersect(a, b, *, results=None, loc=None, ip=None) -> _ods_ir