mlir.dialects.x86vector

Classes

AVX10DotInt8Op

The dot op is an AVX10-Int8 specific op that can lower to the proper

AVX10DotInt8OpAdaptor

BcstToPackedF32Op

From the Intel Intrinsics Guide:

BcstToPackedF32OpAdaptor

CvtPackedEvenIndexedToF32Op

From the Intel Intrinsics Guide:

CvtPackedEvenIndexedToF32OpAdaptor

CvtPackedF32ToBF16Op

The convert_f32_to_bf16 op is an AVX512-BF16 specific op that can lower

CvtPackedF32ToBF16OpAdaptor

CvtPackedOddIndexedToF32Op

From the Intel Intrinsics Guide:

CvtPackedOddIndexedToF32OpAdaptor

DotBF16Op

The dot op is an AVX512-BF16 specific op that can lower to the proper

DotBF16OpAdaptor

DotInt8Op

The dot op is an AVX2-Int8 specific op that can lower to the proper

DotInt8OpAdaptor

DotOp

Computes the 4-way dot products of the lower and higher parts of the source

DotOpAdaptor

MaskCompressOp

The mask.compress op is an AVX512 specific op that can lower to the

MaskCompressOpAdaptor

MaskRndScaleOp

The mask.rndscale op is an AVX512 specific op that can lower to the proper

MaskRndScaleOpAdaptor

MaskScaleFOp

The mask.scalef op is an AVX512 specific op that can lower to the proper

MaskScaleFOpAdaptor

RsqrtOp

RsqrtOpAdaptor

Vp2IntersectOp

The vp2intersect op is an AVX512 specific op that can lower to the proper

Vp2IntersectOpAdaptor

Functions

avx10_dot_i8(→ _ods_ir)

avx_bcst_to_f32_packed(→ _ods_ir)

avx_cvt_packed_even_indexed_to_f32(→ _ods_ir)

avx512_cvt_packed_f32_to_bf16(→ _ods_ir)

avx_cvt_packed_odd_indexed_to_f32(→ _ods_ir)

avx512_dot(→ _ods_ir)

avx_dot_i8(→ _ods_ir)

avx_intr_dot(→ _ods_ir)

avx512_mask_compress(→ _ods_ir)

avx512_mask_rndscale(→ _ods_ir)

avx512_mask_scalef(→ _ods_ir)

avx_rsqrt(→ _ods_ir)

avx512_vp2intersect(→ _ods_ir)

Module Contents

class mlir.dialects.x86vector.AVX10DotInt8Op(w, a, b, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The dot op is an AVX10-Int8 specific op that can lower to the proper LLVMAVX10-INT8 operation llvm.vpdpbssd.512.

Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in w, and store the packed 32-bit results in dst.

Example:

%dst = x86vector.avx10.dot.i8 %w, %a, %b : vector<64xi8> -> vector<16xi32>
OPERATION_NAME = 'x86vector.avx10.dot.i8'
_ODS_REGIONS = (0, True)
w() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.AVX10DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.AVX10DotInt8OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx10.dot.i8'
w() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx10_dot_i8(w, a, b, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.BcstToPackedF32Op(dst, a, *, loc=None, ip=None)

Bases: _ods_ir

From the Intel Intrinsics Guide:

Convert scalar BF16 or F16 (16-bit) floating-point element stored at memory locations starting at location __A to a single-precision (32-bit) floating-point, broadcast it to packed single-precision (32-bit) floating-point elements, and store the results in dst.

Example:

%dst = x86vector.avx.bcst_to_f32.packed %a : memref<1xbf16> -> vector<8xf32>
%dst = x86vector.avx.bcst_to_f32.packed %a : memref<1xf16> -> vector<8xf32>
OPERATION_NAME = 'x86vector.avx.bcst_to_f32.packed'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.BcstToPackedF32OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.BcstToPackedF32OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.bcst_to_f32.packed'
a() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_bcst_to_f32_packed(dst, a, *, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.CvtPackedEvenIndexedToF32Op(dst, a, *, loc=None, ip=None)

Bases: _ods_ir

From the Intel Intrinsics Guide:

Convert packed BF16 or F16 (16-bit) floating-point even-indexed elements stored at memory locations starting at location __A to packed single-precision (32-bit) floating-point elements, and store the results in dst.

Example:

%dst = x86vector.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32>
%dst = x86vector.avx.cvt.packed.even.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32>
OPERATION_NAME = 'x86vector.avx.cvt.packed.even.indexed_to_f32'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.CvtPackedEvenIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.CvtPackedEvenIndexedToF32OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.cvt.packed.even.indexed_to_f32'
a() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_cvt_packed_even_indexed_to_f32(dst, a, *, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.CvtPackedF32ToBF16Op(dst, a, *, loc=None, ip=None)

Bases: _ods_ir

The convert_f32_to_bf16 op is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operation llvm.cvtneps2bf16 depending on the width of MLIR vectors it is applied to.

From the Intel Intrinsics Guide:

Convert packed single-precision (32-bit) floating-point elements in a to packed BF16 (16-bit) floating-point elements, and store the results in dst.

Example:

%dst = x86vector.avx512.cvt.packed.f32_to_bf16 %a : vector<8xf32> -> vector<8xbf16>
OPERATION_NAME = 'x86vector.avx512.cvt.packed.f32_to_bf16'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.CvtPackedF32ToBF16OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.CvtPackedF32ToBF16OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.cvt.packed.f32_to_bf16'
a() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx512_cvt_packed_f32_to_bf16(dst, a, *, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.CvtPackedOddIndexedToF32Op(dst, a, *, loc=None, ip=None)

Bases: _ods_ir

From the Intel Intrinsics Guide:

Convert packed BF16 or F16 (16-bit) floating-point odd-indexed elements stored at memory locations starting at location __A to packed single-precision (32-bit) floating-point elements, and store the results in dst.

Example:

%dst = x86vector.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xbf16> -> vector<8xf32>
%dst = x86vector.avx.cvt.packed.odd.indexed_to_f32 %a : memref<16xf16> -> vector<8xf32>
OPERATION_NAME = 'x86vector.avx.cvt.packed.odd.indexed_to_f32'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.CvtPackedOddIndexedToF32OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.CvtPackedOddIndexedToF32OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.cvt.packed.odd.indexed_to_f32'
a() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_cvt_packed_odd_indexed_to_f32(dst, a, *, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.DotBF16Op(src, a, b, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The dot op is an AVX512-BF16 specific op that can lower to the proper LLVMAVX512BF16 operation llvm.dpbf16ps depending on the width of MLIR vectors it is applied to.

From the Intel Intrinsics Guide:

Compute dot-product of BF16 (16-bit) floating-point pairs in a and b, accumulating the intermediate single-precision (32-bit) floating-point elements with elements in src, and store the results in dst.

Example:

%dst = x86vector.avx512.dot %src, %a, %b : vector<32xbf16> -> vector<16xf32>
OPERATION_NAME = 'x86vector.avx512.dot'
_ODS_REGIONS = (0, True)
src() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.DotBF16OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.DotBF16OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.dot'
src() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx512_dot(src, a, b, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.DotInt8Op(w, a, b, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The dot op is an AVX2-Int8 specific op that can lower to the proper LLVMAVX2-INT8 operation llvm.vpdpbssd depending on the width of MLIR vectors it is applied to.

From the Intel Intrinsics Guide:

Multiply groups of 4 adjacent pairs of signed 8-bit integers in a with corresponding signed 8-bit integers in b, producing 4 intermediate signed 16-bit results. Sum these 4 results with the corresponding 32-bit integer in w, and store the packed 32-bit results in dst.

Example:

%dst = x86vector.avx.dot.i8 %w, %a, %b : vector<32xi8> -> vector<8xi32>
OPERATION_NAME = 'x86vector.avx.dot.i8'
_ODS_REGIONS = (0, True)
w() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.DotInt8OpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.DotInt8OpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.dot.i8'
w() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_dot_i8(w, a, b, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.DotOp(a, b, *, results=None, loc=None, ip=None)

Bases: _ods_ir

Computes the 4-way dot products of the lower and higher parts of the source vectors and broadcasts the two results to the lower and higher elements of the destination vector, respectively. Adding one element of the lower part to one element of the higher part in the destination vector yields the full dot product of the two source vectors.

Example:

%0 = x86vector.avx.intr.dot %a, %b : vector<8xf32>
%1 = vector.extract %0[%i0] : f32 from vector<8xf32>
%2 = vector.extract %0[%i4] : f32 from vector<8xf32>
%d = arith.addf %1, %2 : f32
OPERATION_NAME = 'x86vector.avx.intr.dot'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
res() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.DotOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.DotOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.intr.dot'
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_intr_dot(a, b, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.MaskCompressOp(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None)

Bases: _ods_ir

The mask.compress op is an AVX512 specific op that can lower to the llvm.mask.compress instruction. Instead of src, a constant vector vector attribute constant_src may be specified. If neither src nor constant_src is specified, the remaining elements in the result vector are set to zero.

From the Intel Intrinsics Guide:

Contiguously store the active integer/floating-point elements in a (those with their respective bit set in writemask k) to dst, and pass through the remaining elements from src.

OPERATION_NAME = 'x86vector.avx512.mask.compress'
_ODS_REGIONS = (0, True)
k() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
src() _ods_ir[_ods_ir] | None
constant_src() _ods_ir | None
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.MaskCompressOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.MaskCompressOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.mask.compress'
k() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
src() _ods_ir[_ods_ir] | None
constant_src() _ods_ir | None
mlir.dialects.x86vector.avx512_mask_compress(k, a, *, src=None, constant_src=None, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.MaskRndScaleOp(src, k, a, imm, rounding, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The mask.rndscale op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: llvm.mask.rndscale.ps.512 or llvm.mask.rndscale.pd.512 instruction depending on the type of vectors it is applied to.

From the Intel Intrinsics Guide:

Round packed floating-point elements in a to the number of fraction bits specified by imm, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).

OPERATION_NAME = 'x86vector.avx512.mask.rndscale'
_ODS_REGIONS = (0, True)
src() _ods_ir[_ods_ir]
k() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
imm() _ods_ir
rounding() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.MaskRndScaleOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.MaskRndScaleOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.mask.rndscale'
src() _ods_ir[_ods_ir]
k() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
imm() _ods_ir
rounding() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx512_mask_rndscale(src, k, a, imm, rounding, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.MaskScaleFOp(src, a, b, k, rounding, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The mask.scalef op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: llvm.mask.scalef.ps.512 or llvm.mask.scalef.pd.512 depending on the type of MLIR vectors it is applied to.

From the Intel Intrinsics Guide:

Scale the packed floating-point elements in a using values from b, and store the results in dst using writemask k (elements are copied from src when the corresponding mask bit is not set).

OPERATION_NAME = 'x86vector.avx512.mask.scalef'
_ODS_REGIONS = (0, True)
src() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
k() _ods_ir
rounding() _ods_ir[_ods_ir]
dst() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.MaskScaleFOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.MaskScaleFOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.mask.scalef'
src() _ods_ir[_ods_ir]
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
k() _ods_ir
rounding() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx512_mask_scalef(src, a, b, k, rounding, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.RsqrtOp(a, *, results=None, loc=None, ip=None)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.rsqrt'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.RsqrtOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.RsqrtOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx.rsqrt'
a() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx_rsqrt(a, *, results=None, loc=None, ip=None) _ods_ir
class mlir.dialects.x86vector.Vp2IntersectOp(a, b, *, results=None, loc=None, ip=None)

Bases: _ods_ir

The vp2intersect op is an AVX512 specific op that can lower to the proper LLVMAVX512 operation: llvm.vp2intersect.d.512 or llvm.vp2intersect.q.512 depending on the type of MLIR vectors it is applied to.

From the Intel Intrinsics Guide:

Compute intersection of packed integer vectors a and b, and store indication of match in the corresponding bit of two mask registers specified by k1 and k2. A match in corresponding elements of a and b is indicated by a set bit in the corresponding bit of the mask registers.

OPERATION_NAME = 'x86vector.avx512.vp2intersect'
_ODS_REGIONS = (0, True)
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
k1() _ods_ir[_ods_ir]
k2() _ods_ir[_ods_ir]
class mlir.dialects.x86vector.Vp2IntersectOpAdaptor(operands: list, attributes: OpAttributeMap)
class mlir.dialects.x86vector.Vp2IntersectOpAdaptor(operands: list, opview: OpView)

Bases: _ods_ir

OPERATION_NAME = 'x86vector.avx512.vp2intersect'
a() _ods_ir[_ods_ir]
b() _ods_ir[_ods_ir]
mlir.dialects.x86vector.avx512_vp2intersect(a, b, *, results=None, loc=None, ip=None) _ods_ir