|
MLIR 23.0.0git
|
Namespaces | |
| namespace | impl |
| namespace | uArch |
Classes | |
| struct | UnrollOptions |
| Options to control the XeGPU unrolling. More... | |
| struct | XeGPUPropagateLayoutOptions |
Enumerations | |
| enum class | LayoutKind { Lane , InstData , Subgroup } |
| Specifies the level of a layout hierarchy for comparison or propagation. More... | |
Functions | |
| void | registerTransformDialectExtension (DialectRegistry ®istry) |
| std::unique_ptr<::mlir::Pass > | createXeGPUBlocking () |
| std::unique_ptr<::mlir::Pass > | createXeGPUPeepHoleOptimizer () |
| std::unique_ptr<::mlir::Pass > | createXeGPUPropagateLayout () |
| std::unique_ptr<::mlir::Pass > | createXeGPUPropagateLayout (XeGPUPropagateLayoutOptions options) |
| std::unique_ptr<::mlir::Pass > | createXeGPUSgToWiDistributeExperimental () |
| std::unique_ptr<::mlir::Pass > | createXeGPUSubgroupDistribute () |
| std::unique_ptr<::mlir::Pass > | createXeGPUVectorLinearize () |
| std::unique_ptr<::mlir::Pass > | createXeGPUWgToSgDistribute () |
| void | registerXeGPUBlocking () |
| void | registerXeGPUBlockingPass () |
| void | registerXeGPUPeepHoleOptimizer () |
| void | registerXeGPUPeepHoleOptimizerPass () |
| void | registerXeGPUPropagateLayout () |
| void | registerXeGPUPropagateLayoutPass () |
| void | registerXeGPUSgToWiDistributeExperimental () |
| void | registerXeGPUSgToWiDistributeExperimentalPass () |
| void | registerXeGPUSubgroupDistribute () |
| void | registerXeGPUSubgroupDistributePass () |
| void | registerXeGPUVectorLinearize () |
| void | registerXeGPUVectorLinearizePass () |
| void | registerXeGPUWgToSgDistribute () |
| void | registerXeGPUWgToSgDistributePass () |
| void | registerXeGPUPasses () |
| void | populateXeGPUPeepHoleOptimizerPatterns (RewritePatternSet &patterns) |
| Appends patterns for optimizing block load operations into patterns. | |
| void | populateXeGPUArrayLengthOptimizationPatterns (RewritePatternSet &patterns) |
| Appends patterns for array length optimization into patterns. | |
| void | populateXeGPUSubgroupDistributePatterns (RewritePatternSet &patterns) |
| Appends patterns for XeGPU SIMT distribution into patterns. | |
| void | populateXeGPUMoveFuncBodyToWarpOpPatterns (RewritePatternSet &patterns) |
| Appends patterns for moving function body into gpu.warp_execute_on_lane0 op. | |
| void | populateXeGPUWgToSgDistributePatterns (RewritePatternSet &patterns) |
| Appends patterns for XeGPU workgroup to subgroup distribution into patterns. | |
| void | populateXeGPUSgToWiDistributeTypeConversions (TypeConverter &typeConverter) |
| Define only the type conversions needed for XeGPU subgroup to workitem distribution. | |
| void | populateXeGPUSgToWiDistributeTypeConversionAndLegality (TypeConverter &typeConverter, RewritePatternSet &patterns, ConversionTarget &target) |
| Defines type conversions and legality for XeGPU subgroup to workitem distribution and appends the required conversion patterns into patterns. | |
| void | populateXeGPUUnrollPatterns (RewritePatternSet &patterns, const UnrollOptions &options) |
| Collect a set of patterns to unroll xegpu operations to a smaller shapes. | |
| LogicalResult | propagateLayouts (OpBuilder &builder, Operation *target, LayoutKind layoutKind, unsigned indexBitWidth, bool printOnly=false) |
| LogicalResult | resolveLayoutConflicts (Operation *target) |
| bool | recoverTemporaryLayouts (Operation *rootOp) |
| Attach layout attributes to all vector-type operands of operations within the given operation's nested region. | |
| template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>> | |
| void | removeLayoutAttr (const T &operandOrResult) |
| Removes the LayoutAttr for a given OpOperand or OpResult if it exists. | |
| void | removeLayoutAttrs (Operation *op) |
| Removes the DistributeLayoutAttr for each OpOperand and OpResult of the given operation if they exist. | |
| void | removeTemporaryLayoutAttrs (Operation *op) |
| Removes the temporary layout attributes for each OpOperand and OpResult of the given operation. | |
| SmallVector< NamedAttribute > | dropSgLayoutAndDataOnAttrs (ArrayRef< NamedAttribute > attrs) |
| Updates the NamedAttribute sequence by dropping sg-layout and sg-data information from any DistributeLayoutAttr found. | |
| SmallVector< NamedAttribute > | dropInstDataOnAttrs (ArrayRef< NamedAttribute > attrs) |
| Updates the NamedAttribute sequence by dropping inst-data information from any DistributeLayoutAttr found. | |
| DistributeLayoutAttr | inferBroadcastSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape) |
| Infers the source layout attribute for a broadcast operation given the result layout attribute, result shape, and source shape. | |
| DistributeLayoutAttr | inferMultiReductionSourceLayout (DistributeLayoutAttr resLayout, SmallVector< int64_t > reduceDims) |
| Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims. | |
| DistributeLayoutAttr | inferReductionSourceLayout (DistributeLayoutAttr resLayout) |
| Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims. | |
| DistributeLayoutAttr | inferTransposeSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > permutation) |
| Infers the source layout attribute for a transpose operation given the result layout attribute and permutation. | |
| DistributeLayoutAttr | inferBitCastSourceLayout (DistributeLayoutAttr resLayout, int resElemTyBitWidth, int srcElemTyBitWidth) |
| Infers the source layout attribute for a bitcast operation given the result layout attribute, result element type bitwidth, and source element type bitwidth. | |
| DistributeLayoutAttr | inferInterleaveSourceLayout (DistributeLayoutAttr resLayout) |
| Infers the source layout attribute for an interleave operation given the result layout attribute. | |
| DistributeLayoutAttr | inferDeinterleaveSourceLayout (DistributeLayoutAttr resLayout) |
| Infers the source layout attribute for a deinterleave operation given the result layout attribute. | |
| DistributeLayoutAttr | inferShapeCastSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape) |
| Infers the source layout attribute for a shape cast operation given the result layout attribute, result shape, and source shape. | |
| DistributeLayoutAttr | inferInsertStridedSliceSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape) |
| Infers the source layout attribute for an insert strided slice operation given the result layout attribute, result shape, and source shape. | |
| DistributeLayoutAttr | inferInsertSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape) |
| Infers the source layout attribute for an insert operation. | |
| DistributeLayoutAttr | inferExtractSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape) |
| Infers the source layout attribute for an extract operation. | |
| DistributeLayoutAttr | inferMaskOffsetLayoutForScatterIO (DistributeLayoutAttr payloadLayout, int chunkSize) |
| Infers the layout attribute for mask and offset operand for Chunked load and store, given the anchor layout attribute for the value being load/store. | |
| DistributeLayoutAttr | inferSourceLayoutFromResultForNonAnchorOp (OpOperand &operand, DistributeLayoutAttr resLayout) |
| Infers the source layout attribute for an operand using result layout attribute. | |
| SliceAttr | setupMultiReductionResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, DistributeLayoutAttr consumerLayout, SmallVector< int64_t > reductionDims, int numSg, const uArch::uArch *uArch) |
| Sets up layout for Multi-Reduction operations by creating a SliceAttr for the result. | |
| SliceAttr | setupReductionResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, const uArch::uArch *uArch) |
| Sets up layout for Reduction operations by creating a SliceAttr for the result. | |
| DistributeLayoutAttr | setupBitCastResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch) |
| Setup the result layout attribute for a bitcast operation based on element type bitwidths. | |
| DistributeLayoutAttr | setupInterleaveResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch) |
| Sets up the result layout for an interleave operation to ensure the source layout can be safely derived. | |
| DistributeLayoutAttr | setupInsertStridedSliceResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch) |
| Sets up the result layout for an insert strided slice operation. | |
| DistributeLayoutAttr | setupLoadGatherAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int chunkSize, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch) |
| Sets up the anchor layout for a load gather operation. | |
| DistributeLayoutAttr | setupLoadMatrixAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch) |
| Sets up the anchor layout for load matrix operation. | |
| DistributeLayoutAttr | setupStoreScatterAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int chunkSize, const uArch::uArch *uArch) |
| Sets up the anchor layout for a store scatter operation. | |
| DistributeLayoutAttr | setupStoreMatrixAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, const uArch::uArch *uArch) |
| Sets up the anchor layout for a store matrix operation. | |
| std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > | setupDpasLayout (LayoutKind layoutKind, VectorType aTy, VectorType bTy, VectorType cdTy, DistributeLayoutAttr consumerLayout, int numSg, const uArch::uArch *uArch) |
| Sets up the anchor layouts for a dpas operands (A, B, and C/D). | |
| std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > | setupDpasMxLayout (LayoutKind layoutKind, VectorType aTy, VectorType bTy, VectorType cdTy, VectorType aScaleTy, VectorType bScaleTy, DistributeLayoutAttr consumerLayout, int numSg, const uArch::uArch *uArch) |
| Sets up the anchor layouts for dpas_mx operands (A, B, C/D, A_scale, and B_scale). | |
| DistributeLayoutAttr | getConsumerLayoutAt (OpOperand &operand) |
| Gets the expected layout for a given consumer operand. | |
| SmallVector< Value > | flattenValues (ArrayRef< ValueRange > values) |
| Flatten a set of ValueRange into a single SmallVector<Value> | |
| FailureOr< VectorType > | getDistributedVectorType (xegpu::TensorDescType tdescTy) |
| If tensor descriptor has a layout attribute it is used in SIMT mode. | |
| FailureOr< VectorType > | getDistributedVectorType (VectorType originalType, LayoutAttr layout) |
| Helper to get the distributed vector type for a given vector type according to a given LayoutAttr. | |
| FailureOr< VectorType > | getDistVecTypeBasedOnLaneLayout (DistributeLayoutAttr layout, VectorType originalType) |
| Helper function to get distributed vector type for a source vector type according to the lane_layout. | |
| SmallVector< Value > | extractVectorsWithShapeFromValue (OpBuilder &builder, Location loc, Value value, ArrayRef< int64_t > shape) |
| Extract a set of small vectors from a value with a given shape using vector.extract_stride_slice. | |
| Value | createVectorWithShapeFromValues (OpBuilder &builder, Location loc, ValueRange values, ArrayRef< int64_t > shape) |
| Create a vector of shape from a set of values using vector.insert_stride_slice. | |
| void | doSCFStructuralTypeConversionWithTensorType (Operation *op, TypeConverter converter) |
| Do type conversion for SCF structural ops, e.g., scf.for using SCF structure type convertion patterns. | |
| std::optional< std::string > | getChipStr (Operation *op) |
| Retrieves the chip string from the XeVM target attribute of the parent GPU module operation. | |
| SmallVector< OpFoldResult > | addElementwise (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > lhs, ArrayRef< OpFoldResult > rhs) |
| Generates element-wise addition ops of two arrays with same length. | |
| SmallVector< OpFoldResult > | addWithRightAligned (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > lhs, ArrayRef< OpFoldResult > rhs) |
| Generates element-wise addition ops of two arrays with automatic alignment. | |
| Value | subgroupReduction (Location loc, OpBuilder &builder, Value input, vector::CombiningKind kind, uint32_t size) |
| Given an input value representing per-lane data, this function returns the result after performing a reduction on the input over all lanes (number of lanes given by size). | |
| Value | lowerToVectorReductions (TypedValue< VectorType > src, TypedValue< VectorType > acc, vector::CombiningKind kind, int64_t reductionDim, Location loc, PatternRewriter &rewriter) |
| Given a src and an acc argumments from a vector::MultiDimReductionOp, lower to a set of vector::ReductionOp ops over 1D slices extracted from src. | |
| Value | createReductionNeutralValue (OpBuilder &builder, Location loc, Type type, vector::CombiningKind kind) |
| Creates a constant filled with the neutral (identity) value for the given reduction kind. | |
| Value | lowerCrossLaneReductionToShuffles (TypedValue< VectorType > src, TypedValue< VectorType > acc, vector::CombiningKind kind, int64_t reductionDim, int64_t reductionSize, Location loc, PatternRewriter &rewriter) |
| Lowers cross-lane reductions to shuffle operations on a 2D vector. | |
| template<typename T> | |
| int | getLargestDivisor (T dim, ArrayRef< T > candidates, ArrayRef< T > candidateMultiples={}) |
| Helper Function to find a proper instruction multiple for the user-supplied sg-level data shape (diven by dim). | |
| DistributeLayoutAttr | getDistributeLayoutAttr (const Value value) |
| Retrieves the DistributeLayoutAttr associated with a given Value. | |
| DistributeLayoutAttr | getDistributeLayoutAttr (const OpOperand &opr) |
| Retrieves the DistributeLayoutAttr associated with a given OpOperand. | |
| void | setDistributeLayoutAttr (const OpResult &Result, const DistributeLayoutAttr layout) |
| [to-be-deprecated] Sets the DistributeLayoutAttr for a given OpResult user should use setAnchorLayout instead | |
| void | setDistributeLayoutAttr (const OpOperand &opr, const DistributeLayoutAttr layout) |
| [to-be-deprecated] Sets the DistributeLayoutAttr for a given OpOperand user should use setAnchorLayout instead | |
| std::string | getTemporaryLayoutName (const OpOperand &operand) |
| Return the attribute name for the OpOperand to attach DistributeLayoutAttr. | |
| std::string | getTemporaryLayoutName (const OpResult result) |
| Return the attribute name for the OpResult to attach DistributeLayoutAttr. | |
| template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>> | |
| DistributeLayoutAttr | getTemporaryLayout (const T &operandOrResult) |
| get and set distribute layout attribute for non-anchor operations (and offsets/masks of load/store ops before we get rid of their temp attrs) | |
| template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>> | |
| void | setTemporaryLayout (const T &operandOrResult, const DistributeLayoutAttr layout) |
| bool | requirePacked (const DistributeLayoutAttr layout) |
| Helper function to check if the layout is packed. | |
| bool | requireTranspose (const DistributeLayoutAttr layout, const uArch::uArch *uArch) |
| Helper function to check if the layout requires a transpose effect. | |
| bool | matchUnitDimExpansion (ArrayRef< int64_t > src, ArrayRef< int64_t > dst, SmallVector< int64_t > &expandedUnitDims) |
| bool | matchSplitDimExpansion (ArrayRef< int64_t > src, ArrayRef< int64_t > dst, SmallVector< SmallVector< int64_t > > &splitDimGroups) |
| static SmallVector< SmallVector< Value > > | genCoordinates (OpBuilder &builder, Location loc, SmallVector< Value > delinearizedId, ArrayRef< int64_t > subShapesLayout, ArrayRef< int64_t > subShape, ArrayRef< int64_t > srcShape) |
| static SmallVector< SmallVector< int64_t > > | genStaticCoordinates (llvm::ArrayRef< int64_t > canonicalIds, llvm::ArrayRef< int64_t > layout, llvm::ArrayRef< int64_t > subShape, llvm::ArrayRef< int64_t > shape) |
| static SmallVector< int64_t > | mapSlicedDimsToParentSpace (const SmallVector< int64_t > &dimsToMap, ArrayRef< int64_t > sliceDims) |
| SmallVector< int64_t > | getPermForParentLayout (ArrayRef< int64_t > sliceDims, ArrayRef< int64_t > permutation) |
| template<typename ArithOp> | |
| OpFoldResult | genBinOp (OpFoldResult a, OpFoldResult b, Location loc, OpBuilder &builder) |
| SmallVector< OpFoldResult > | getBlockedOffsets (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > offsets, ArrayRef< int64_t > blockShape) |
|
strong |
| SmallVector< OpFoldResult > mlir::xegpu::addElementwise | ( | OpBuilder & | builder, |
| Location | loc, | ||
| ArrayRef< OpFoldResult > | lhs, | ||
| ArrayRef< OpFoldResult > | rhs ) |
Generates element-wise addition ops of two arrays with same length.
Definition at line 626 of file XeGPUUtils.cpp.
References mlir::OpBuilder::createOrFold(), mlir::getValueOrCreateConstantIndexOp(), lhs, and rhs.
Referenced by addWithRightAligned().
| SmallVector< OpFoldResult > mlir::xegpu::addWithRightAligned | ( | OpBuilder & | builder, |
| Location | loc, | ||
| ArrayRef< OpFoldResult > | lhs, | ||
| ArrayRef< OpFoldResult > | rhs ) |
Generates element-wise addition ops of two arrays with automatic alignment.
When the input arrays have different sizes, the shorter array is right-aligned with the longer array, and the unmatched leading elements from the longer array are preserved unchanged. This is commonly used for offset computation where higher-dimensional offsets need to be added to lower-dimensional adjustments.
Example: lhs = [l1, l2, l3], rhs = [r1, r2] Result: [11, l2+r1, l3+r2]
Definition at line 651 of file XeGPUUtils.cpp.
References addElementwise(), b, lhs, and rhs.
| Value mlir::xegpu::createReductionNeutralValue | ( | OpBuilder & | builder, |
| Location | loc, | ||
| Type | type, | ||
| vector::CombiningKind | kind ) |
Creates a constant filled with the neutral (identity) value for the given reduction kind.
For example: 0 for ADD/OR/XOR, 1 for MUL/AND, max/min signed/unsigned int for MINSI/MINUI/MAXSI/MAXUI, and +/-infinity for float min/max operations. If type is a VectorType, returns a splat vector constant; otherwise returns a scalar constant. Returns nullptr if the element type is incompatible with the requested reduction kind.
Definition at line 844 of file XeGPUUtils.cpp.
References mlir::DenseElementsAttr::get(), mlir::Builder::getFloatAttr(), mlir::Builder::getIntegerAttr(), mlir::Builder::getOneAttr(), and mlir::Builder::getZeroAttr().
| Value mlir::xegpu::createVectorWithShapeFromValues | ( | OpBuilder & | builder, |
| Location | loc, | ||
| ValueRange | values, | ||
| ArrayRef< int64_t > | shape ) |
Create a vector of shape from a set of values using vector.insert_stride_slice.
Definition at line 435 of file XeGPUUtils.cpp.
References mlir::DenseElementsAttr::get(), mlir::getType(), mlir::ValueRange::getTypes(), mlir::Builder::getZeroAttr(), and result.
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUBlocking | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 88 of file XeGPUBlocking.cpp.
References getTileShape().
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUPeepHoleOptimizer | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 165 of file XeGPUPeepHoleOptimizer.cpp.
References mlir::arith::ConstantIndexOp::create(), and mlir::getConstantIntValue().
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUPropagateLayout | ( | ) |
Definition at line 264 of file XeGPUPropagateLayout.cpp.
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUPropagateLayout | ( | XeGPUPropagateLayoutOptions | options | ) |
Definition at line 268 of file XeGPUPropagateLayout.cpp.
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUSgToWiDistributeExperimental | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 346 of file XeGPUSgToWiDistributeExperimental.cpp.
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUSubgroupDistribute | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 424 of file XeGPUSubgroupDistribute.cpp.
References mlir::detail::DenseArrayAttrImpl< int64_t >::get(), mlir::IROperand< DerivedT, IRValueT >::get(), mlir::vector::getAsValues(), getChipStr(), mlir::Builder::getContext(), mlir::Value::getDefiningOp(), getDistributedVectorType(), mlir::OpOperand::getOperandNumber(), mlir::Value::getType(), mlir::xegpu::uArch::getUArch(), mlir::RewriterBase::notifyMatchFailure(), removeLayoutAttrs(), mlir::RewriterBase::replaceAllUsesWith(), requirePacked(), requireTranspose(), mlir::OpBuilder::setInsertionPointAfter(), and success().
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUVectorLinearize | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 503 of file XeGPUVectorLinearize.cpp.
| std::unique_ptr<::mlir::Pass > mlir::xegpu::createXeGPUWgToSgDistribute | ( | ) |
We declare an explicit private instantiation because Pass classes should only be visible by the current library.
Definition at line 583 of file XeGPUWgToSgDistribute.cpp.
| void mlir::xegpu::doSCFStructuralTypeConversionWithTensorType | ( | Operation * | op, |
| TypeConverter | converter ) |
Do type conversion for SCF structural ops, e.g., scf.for using SCF structure type convertion patterns.
Since VectorType cannot carry the layout attribute, which is needed to guide the type conversion for XeGPU, they are first converted into RankedTensorType, where the layout attribute can be attached. And then upstream SCF structural type conversion patterns are applied with the provided converter. TODO: This is a temporary solution. We should refactor it when context-aware type conversion is available.
Definition at line 460 of file XeGPUUtils.cpp.
References mlir::WalkResult::advance(), flattenValues(), mlir::Operation::getContext(), getDistributeLayoutAttr(), mlir::Operation::getOperandTypes(), mlir::Operation::getOpResults(), mlir::Operation::getParentOp(), mlir::Operation::getResultTypes(), mlir::Value::getType(), mlir::scf::populateSCFStructuralTypeConversionsAndLegality(), result, mlir::Value::setType(), mlir::WalkResult::skip(), success(), target, and mlir::Operation::walk().
| SmallVector< NamedAttribute > mlir::xegpu::dropInstDataOnAttrs | ( | ArrayRef< NamedAttribute > | attrs | ) |
Updates the NamedAttribute sequence by dropping inst-data information from any DistributeLayoutAttr found.
Definition at line 55 of file XeGPULayoutImpl.cpp.
| SmallVector< NamedAttribute > mlir::xegpu::dropSgLayoutAndDataOnAttrs | ( | ArrayRef< NamedAttribute > | attrs | ) |
Updates the NamedAttribute sequence by dropping sg-layout and sg-data information from any DistributeLayoutAttr found.
Definition at line 37 of file XeGPULayoutImpl.cpp.
| SmallVector< Value > mlir::xegpu::extractVectorsWithShapeFromValue | ( | OpBuilder & | builder, |
| Location | loc, | ||
| Value | value, | ||
| ArrayRef< int64_t > | shape ) |
Extract a set of small vectors from a value with a given shape using vector.extract_stride_slice.
Definition at line 398 of file XeGPUUtils.cpp.
References mlir::computeShapeRatio(), mlir::Value::getType(), and result.
| SmallVector< Value > mlir::xegpu::flattenValues | ( | ArrayRef< ValueRange > | values | ) |
Flatten a set of ValueRange into a single SmallVector<Value>
convert ArrayRef<ValueRange> into SmallVector<Value>
Definition at line 33 of file XeGPUUtils.cpp.
References result.
Referenced by doSCFStructuralTypeConversionWithTensorType().
| OpFoldResult mlir::xegpu::genBinOp | ( | OpFoldResult | a, |
| OpFoldResult | b, | ||
| Location | loc, | ||
| OpBuilder & | builder ) |
Definition at line 1461 of file XeGPUDialect.cpp.
References b, and mlir::getValueOrCreateConstantIndexOp().
|
static |
Definition at line 53 of file XeGPUDialect.cpp.
References mlir::computeElementwiseMul(), mlir::arith::ConstantIndexOp::create(), and mlir::OpBuilder::createOrFold().
|
static |
Definition at line 101 of file XeGPUDialect.cpp.
| SmallVector< OpFoldResult > mlir::xegpu::getBlockedOffsets | ( | OpBuilder & | builder, |
| Location | loc, | ||
| ArrayRef< OpFoldResult > | offsets, | ||
| ArrayRef< int64_t > | blockShape ) |
Definition at line 1486 of file XeGPUDialect.cpp.
| std::optional< std::string > mlir::xegpu::getChipStr | ( | Operation * | op | ) |
Retrieves the chip string from the XeVM target attribute of the parent GPU module operation.
Returns the chip identifier if found, or nullopt if no GPU module parent or XeVM target attribute exists.
Definition at line 607 of file XeGPUUtils.cpp.
References mlir::Operation::getParentOfType().
Referenced by createXeGPUSubgroupDistribute().
| xegpu::DistributeLayoutAttr mlir::xegpu::getConsumerLayoutAt | ( | OpOperand & | operand | ) |
Gets the expected layout for a given consumer operand.
This will check if the owning operation of the consumer operand is one of the special layout users and determine the expected layout accordingly.
Definition at line 1903 of file XeGPULayoutImpl.cpp.
References getDistributeLayoutAttr(), mlir::Operation::getNumResults(), mlir::detail::IROperandBase::getOwner(), mlir::Operation::getResult(), and inferSourceLayoutFromResultForNonAnchorOp().
| FailureOr< VectorType > mlir::xegpu::getDistributedVectorType | ( | VectorType | originalType, |
| LayoutAttr | layout ) |
Helper to get the distributed vector type for a given vector type according to a given LayoutAttr.
| FailureOr< VectorType > mlir::xegpu::getDistributedVectorType | ( | xegpu::TensorDescType | tdescTy | ) |
If tensor descriptor has a layout attribute it is used in SIMT mode.
In this mode, the distributed vector shape is determined as follows: Definitions: lane_data_size = lane_data[0] × lane_data[1] subgroup_size = lane_layout[0] × lane_layout[1] distribution_unit_size = subgroup_size × lane_data_size
Case 1: Regular loads/stores. The following conditions must be met:
Case 2: Block loads/stores Additional definitions: tensor_size = tensor_desc[0] * .. * tensor_desc[r-1] * array_length n_distribution_units = tensor_size / distribution_unit_size fragment_size = n_distribution_units * lane_data_size Given above definitions, the following conditions must be met:
Definition at line 41 of file XeGPUUtils.cpp.
Referenced by createXeGPUSubgroupDistribute().
| xegpu::DistributeLayoutAttr mlir::xegpu::getDistributeLayoutAttr | ( | const OpOperand & | opr | ) |
Retrieves the DistributeLayoutAttr associated with a given OpOperand.
It will first check the operand_layout_{id} of the owner operation. If not found, it will check the operand itself and its defining op.
Definition at line 172 of file XeGPUUtils.cpp.
References mlir::Operation::getAttrOfType(), mlir::detail::IROperandBase::getOwner(), getTemporaryLayoutName(), mlir::Operation::hasAttr(), and load.
| xegpu::DistributeLayoutAttr mlir::xegpu::getDistributeLayoutAttr | ( | const Value | value | ) |
Retrieves the DistributeLayoutAttr associated with a given Value.
For TensorDescType values, the DistributeLayoutAttr is extracted from the TensorDescType itself. For other values, it is obtained from the attributes of the defining operation. Returns nullptr if no DistributeLayoutAttr is found.
Definition at line 135 of file XeGPUUtils.cpp.
References mlir::Operation::getAttrOfType(), getTemporaryLayout(), getTemporaryLayoutName(), mlir::Value::getType(), mlir::Operation::hasAttr(), and result.
Referenced by doSCFStructuralTypeConversionWithTensorType(), getConsumerLayoutAt(), getLayoutFromUsePoints(), populateXeGPUSgToWiDistributeTypeConversionAndLegality(), and populateXeGPUSgToWiDistributeTypeConversions().
| FailureOr< VectorType > mlir::xegpu::getDistVecTypeBasedOnLaneLayout | ( | DistributeLayoutAttr | layout, |
| VectorType | originalType ) |
Helper function to get distributed vector type for a source vector type according to the lane_layout.
We simply divide each dimension of tensor descriptor shape by corresponding lane_layout dimension. If array_length > 1, that is appended to the front of the distributed shape.
Examples:
| original vector shape | lane_layout | distributed vector shape |
|---|---|---|
| 32x16 | [1, 16] | 32x1 |
| 32x16 | [2, 8] | 16x2 |
| 2x32x16 | [1, 16] | 2x32x1 |
Referenced by populateXeGPUSgToWiDistributeTypeConversions().
| int mlir::xegpu::getLargestDivisor | ( | T | dim, |
| ArrayRef< T > | candidates, | ||
| ArrayRef< T > | candidateMultiples = {} ) |
Helper Function to find a proper instruction multiple for the user-supplied sg-level data shape (diven by dim).
candidates are uArch allowed shapes. candidateMultiples are uArch multiples of such shapes (i.e. block count or array length).
Definition at line 664 of file XeGPUUtils.cpp.
Referenced by getDpasInstDataVectors().
| SmallVector< int64_t > mlir::xegpu::getPermForParentLayout | ( | ArrayRef< int64_t > | sliceDims, |
| ArrayRef< int64_t > | permutation ) |
Definition at line 1145 of file XeGPUDialect.cpp.
| DistributeLayoutAttr mlir::xegpu::getTemporaryLayout | ( | const T & | operandOrResult | ) |
get and set distribute layout attribute for non-anchor operations (and offsets/masks of load/store ops before we get rid of their temp attrs)
Referenced by getDistributeLayoutAttr(), lowerToVectorReductions(), populateXeGPUSgToWiDistributeTypeConversionAndLegality(), propagateLayouts(), xegpu::getTemporaryLayout< mlir::OpOperand >(), and xegpu::getTemporaryLayout< mlir::OpResult >().
| std::string mlir::xegpu::getTemporaryLayoutName | ( | const OpOperand & | operand | ) |
Return the attribute name for the OpOperand to attach DistributeLayoutAttr.
Definition at line 124 of file XeGPUUtils.cpp.
Referenced by getDistributeLayoutAttr(), getDistributeLayoutAttr(), removeLayoutAttr(), and setDistributeLayoutAttr().
| std::string mlir::xegpu::getTemporaryLayoutName | ( | const OpResult | result | ) |
Return the attribute name for the OpResult to attach DistributeLayoutAttr.
Definition at line 130 of file XeGPUUtils.cpp.
References result.
| DistributeLayoutAttr mlir::xegpu::inferBitCastSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| int | resElemTyBitWidth, | ||
| int | srcElemTyBitWidth ) |
Infers the source layout attribute for a bitcast operation given the result layout attribute, result element type bitwidth, and source element type bitwidth.
| DistributeLayoutAttr mlir::xegpu::inferBroadcastSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | resShape, | ||
| ArrayRef< int64_t > | srcShape ) |
Infers the source layout attribute for a broadcast operation given the result layout attribute, result shape, and source shape.
| DistributeLayoutAttr mlir::xegpu::inferDeinterleaveSourceLayout | ( | DistributeLayoutAttr | resLayout | ) |
Infers the source layout attribute for a deinterleave operation given the result layout attribute.
Deinterleave halves the innermost dimension size.
| DistributeLayoutAttr mlir::xegpu::inferExtractSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | resShape, | ||
| ArrayRef< int64_t > | srcShape ) |
Infers the source layout attribute for an extract operation.
Adds leading dimensions to the source layout to match the source shape size.
| DistributeLayoutAttr mlir::xegpu::inferInsertSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | resShape, | ||
| ArrayRef< int64_t > | srcShape ) |
Infers the source layout attribute for an insert operation.
using same logic as inferInsertStridedSliceSourceLayout
| DistributeLayoutAttr mlir::xegpu::inferInsertStridedSliceSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | resShape, | ||
| ArrayRef< int64_t > | srcShape ) |
Infers the source layout attribute for an insert strided slice operation given the result layout attribute, result shape, and source shape.
Removes leading dimensions from the result layout to match the source shape size.
| DistributeLayoutAttr mlir::xegpu::inferInterleaveSourceLayout | ( | DistributeLayoutAttr | resLayout | ) |
Infers the source layout attribute for an interleave operation given the result layout attribute.
Interleave doubles the innermost dimension size.
| DistributeLayoutAttr mlir::xegpu::inferMaskOffsetLayoutForScatterIO | ( | DistributeLayoutAttr | payloadLayout, |
| int | chunkSize ) |
Infers the layout attribute for mask and offset operand for Chunked load and store, given the anchor layout attribute for the value being load/store.
| DistributeLayoutAttr mlir::xegpu::inferMultiReductionSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| SmallVector< int64_t > | reduceDims ) |
Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.
| DistributeLayoutAttr mlir::xegpu::inferReductionSourceLayout | ( | DistributeLayoutAttr | resLayout | ) |
Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.
| DistributeLayoutAttr mlir::xegpu::inferShapeCastSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | resShape, | ||
| ArrayRef< int64_t > | srcShape ) |
Infers the source layout attribute for a shape cast operation given the result layout attribute, result shape, and source shape.
| DistributeLayoutAttr mlir::xegpu::inferSourceLayoutFromResultForNonAnchorOp | ( | OpOperand & | operand, |
| DistributeLayoutAttr | resLayout ) |
Infers the source layout attribute for an operand using result layout attribute.
Referenced by getConsumerLayoutAt(), and propagateResultsToRegularOperands().
| DistributeLayoutAttr mlir::xegpu::inferTransposeSourceLayout | ( | DistributeLayoutAttr | resLayout, |
| ArrayRef< int64_t > | permutation ) |
Infers the source layout attribute for a transpose operation given the result layout attribute and permutation.
| Value mlir::xegpu::lowerCrossLaneReductionToShuffles | ( | TypedValue< VectorType > | src, |
| TypedValue< VectorType > | acc, | ||
| vector::CombiningKind | kind, | ||
| int64_t | reductionDim, | ||
| int64_t | reductionSize, | ||
| Location | loc, | ||
| PatternRewriter & | rewriter ) |
Lowers cross-lane reductions to shuffle operations on a 2D vector.
Extracts slices along the reduction dimension, performs subgroup reductions with shuffles across reductionSize work-items, and inserts the results back into an accumulator vector.
Definition at line 776 of file XeGPUUtils.cpp.
References mlir::DenseElementsAttr::get(), mlir::Value::getType(), mlir::Builder::getZeroAttr(), mlir::vector::makeArithReduction(), and subgroupReduction().
| Value mlir::xegpu::lowerToVectorReductions | ( | TypedValue< VectorType > | src, |
| TypedValue< VectorType > | acc, | ||
| vector::CombiningKind | kind, | ||
| int64_t | reductionDim, | ||
| Location | loc, | ||
| PatternRewriter & | rewriter ) |
Given a src and an acc argumments from a vector::MultiDimReductionOp, lower to a set of vector::ReductionOp ops over 1D slices extracted from src.
The reduction is performed along reductionDim. The result is a vector with the same shape as acc. TODO: Only 2D to 1D reduction is supported for now.
Definition at line 697 of file XeGPUUtils.cpp.
References mlir::DenseElementsAttr::get(), getTemporaryLayout(), mlir::Value::getType(), mlir::Builder::getZeroAttr(), and setTemporaryLayout().
|
static |
Definition at line 1010 of file XeGPUDialect.cpp.
| bool mlir::xegpu::matchSplitDimExpansion | ( | ArrayRef< int64_t > | src, |
| ArrayRef< int64_t > | dst, | ||
| SmallVector< SmallVector< int64_t > > & | splitDimGroups ) |
Definition at line 961 of file XeGPUUtils.cpp.
| bool mlir::xegpu::matchUnitDimExpansion | ( | ArrayRef< int64_t > | src, |
| ArrayRef< int64_t > | dst, | ||
| SmallVector< int64_t > & | expandedUnitDims ) |
Definition at line 941 of file XeGPUUtils.cpp.
| void mlir::xegpu::populateXeGPUArrayLengthOptimizationPatterns | ( | RewritePatternSet & | patterns | ) |
Appends patterns for array length optimization into patterns.
Definition at line 289 of file XeGPUArrayLengthOptimization.cpp.
References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().
| void mlir::xegpu::populateXeGPUMoveFuncBodyToWarpOpPatterns | ( | RewritePatternSet & | patterns | ) |
Appends patterns for moving function body into gpu.warp_execute_on_lane0 op.
Definition at line 2137 of file XeGPUSubgroupDistribute.cpp.
References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().
| void mlir::xegpu::populateXeGPUPeepHoleOptimizerPatterns | ( | RewritePatternSet & | patterns | ) |
Appends patterns for optimizing block load operations into patterns.
Definition at line 554 of file XeGPUPeepHoleOptimizer.cpp.
References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().
| void mlir::xegpu::populateXeGPUSgToWiDistributeTypeConversionAndLegality | ( | TypeConverter & | typeConverter, |
| RewritePatternSet & | patterns, | ||
| ConversionTarget & | target ) |
Defines type conversions and legality for XeGPU subgroup to workitem distribution and appends the required conversion patterns into patterns.
Appends patterns for XeGPU subgroup to workitem distribution into patterns.
Definition at line 1680 of file XeGPUSgToWiDistributeExperimental.cpp.
References mlir::RewritePatternSet::add(), mlir::RewritePatternSet::getContext(), getDistributeLayoutAttr(), getTemporaryLayout(), mlir::OpTrait::hasElementwiseMappableTraits(), populateXeGPUSgToWiDistributeTypeConversions(), and target.
| void mlir::xegpu::populateXeGPUSgToWiDistributeTypeConversions | ( | TypeConverter & | typeConverter | ) |
Define only the type conversions needed for XeGPU subgroup to workitem distribution.
Definition at line 1646 of file XeGPUSgToWiDistributeExperimental.cpp.
References getDistributeLayoutAttr(), getDistVecTypeBasedOnLaneLayout(), and mlir::Value::getType().
Referenced by populateXeGPUSgToWiDistributeTypeConversionAndLegality().
| void mlir::xegpu::populateXeGPUSubgroupDistributePatterns | ( | RewritePatternSet & | patterns | ) |
Appends patterns for XeGPU SIMT distribution into patterns.
Definition at line 2116 of file XeGPUSubgroupDistribute.cpp.
References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().
| void mlir::xegpu::populateXeGPUUnrollPatterns | ( | RewritePatternSet & | patterns, |
| const UnrollOptions & | options ) |
Collect a set of patterns to unroll xegpu operations to a smaller shapes.
Users can control whether an operation to be unrolled or not, as well as its target shape via options structure. (via setting filterConstraint and nativeShape respectively, both of them are function refs taking op as input). An op is unrolled to the targetShape as follows, for each of its operands:
Definition at line 809 of file XeGPUUnroll.cpp.
References mlir::RewritePatternSet::add(), mlir::RewritePatternSet::getContext(), and options.
| void mlir::xegpu::populateXeGPUWgToSgDistributePatterns | ( | RewritePatternSet & | patterns | ) |
Appends patterns for XeGPU workgroup to subgroup distribution into patterns.
Definition at line 1552 of file XeGPUWgToSgDistribute.cpp.
References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().
| LogicalResult mlir::xegpu::propagateLayouts | ( | OpBuilder & | builder, |
| Operation * | target, | ||
| LayoutKind | layoutKind, | ||
| unsigned | indexBitWidth, | ||
| bool | printOnly = false ) |
Definition at line 1787 of file XeGPUPropagateLayout.cpp.
References mlir::WalkResult::advance(), mlir::Operation::emitError(), mlir::Block::getOperations(), getTemporaryLayout(), mlir::WalkResult::interrupt(), success(), target, updateControlFlowOps(), updateFunctionOpInterface(), updateOp(), and mlir::Operation::walk().
Attach layout attributes to all vector-type operands of operations within the given operation's nested region.
Reports an error if any vector operand lacks a layout attribute.
Definition at line 284 of file XeGPULayoutImpl.cpp.
References propagateRegionArgsToInits(), propagateRegionResultsToYieldOperands(), propagateResultsToRegularOperands(), removeTemporaryLayoutAttrs(), mlir::Operation::walk(), and walkRegionBackward().
| void mlir::xegpu::registerTransformDialectExtension | ( | DialectRegistry & | registry | ) |
Definition at line 616 of file XeGPUTransformOps.cpp.
References mlir::DialectRegistry::addExtensions().
Referenced by mlir::registerAllExtensions().
|
inline |
Definition at line 751 of file Passes.h.
Referenced by mlir::registerAllPasses().
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
|
inline |
| void mlir::xegpu::removeLayoutAttr | ( | const T & | operandOrResult | ) |
Removes the LayoutAttr for a given OpOperand or OpResult if it exists.
Definition at line 309 of file XeGPULayoutImpl.cpp.
References getTemporaryLayoutName(), mlir::Operation::hasAttrOfType(), and mlir::Operation::removeAttr().
Referenced by xegpu::removeLayoutAttr< mlir::OpOperand >(), and xegpu::removeLayoutAttr< mlir::OpResult >().
Removes the DistributeLayoutAttr for each OpOperand and OpResult of the given operation if they exist.
If the operation contains regions, it is also applied recursively to the contained operations
Definition at line 324 of file XeGPULayoutImpl.cpp.
References mlir::Operation::getAttrs(), mlir::Operation::removeAttr(), and mlir::Operation::walk().
Referenced by createXeGPUSubgroupDistribute().
Removes the temporary layout attributes for each OpOperand and OpResult of the given operation.
Recursive for contained operations if the given operation contains regions.
Definition at line 337 of file XeGPULayoutImpl.cpp.
References mlir::Operation::getDiscardableAttrs(), mlir::Operation::removeDiscardableAttr(), and mlir::Operation::walk().
Referenced by recoverTemporaryLayouts().
| bool mlir::xegpu::requirePacked | ( | const DistributeLayoutAttr | layout | ) |
Helper function to check if the layout is packed.
Layout is packed if it is 2D and lane_data[0] != 1 (data packed from col dimension). TODO: Move to target info.
Referenced by createXeGPUSubgroupDistribute().
| bool mlir::xegpu::requireTranspose | ( | const DistributeLayoutAttr | layout, |
| const uArch::uArch * | uArch ) |
Helper function to check if the layout requires a transpose effect.
Referenced by createXeGPUSubgroupDistribute().
| LogicalResult mlir::xegpu::resolveLayoutConflicts | ( | Operation * | target | ) |
Definition at line 1851 of file XeGPUPropagateLayout.cpp.
References target.
| void mlir::xegpu::setDistributeLayoutAttr | ( | const OpOperand & | opr, |
| const DistributeLayoutAttr | layout ) |
[to-be-deprecated] Sets the DistributeLayoutAttr for a given OpOperand user should use setAnchorLayout instead
Definition at line 311 of file XeGPUUtils.cpp.
References mlir::detail::IROperandBase::getOwner(), getTemporaryLayoutName(), mlir::Operation::hasAttrOfType(), and mlir::Operation::setAttr().
| void mlir::xegpu::setDistributeLayoutAttr | ( | const OpResult & | Result, |
| const DistributeLayoutAttr | layout ) |
[to-be-deprecated] Sets the DistributeLayoutAttr for a given OpResult user should use setAnchorLayout instead
References result.
Referenced by updateControlFlowOps(), and updateOp().
| void mlir::xegpu::setTemporaryLayout | ( | const T & | operandOrResult, |
| const DistributeLayoutAttr | layout ) |
| xegpu::DistributeLayoutAttr mlir::xegpu::setupBitCastResultLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | srcVecTy, | ||
| VectorType | resVecTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| const uArch::uArch * | uArch ) |
Setup the result layout attribute for a bitcast operation based on element type bitwidths.
Sets up the result layout for a bitcast operation.
This ensures the source layout can always be derived from the result layout.
When casting from a narrower to a wider element type (srcElemTyBitWidth < resElemTyBitWidth), the result layout's innermost dimension data sizes (inst_data, lane_data) are scaled up by the bitwidth ratio. This maintains the invariant that the source layout can be recovered by adjusting the result layout based on bitwidth ratio of input vs output.
When casting to a smaller bitwidth, adjusts the layout dimensions (sgData, instData, or laneData) by multiplying by the bitwidth ratio to ensure the result layout can be correctly divided back to the source layout during inference.
Examples:
Definition at line 1000 of file XeGPULayoutImpl.cpp.
| std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::setupDpasLayout | ( | LayoutKind | layoutKind, |
| VectorType | aTy, | ||
| VectorType | bTy, | ||
| VectorType | cdTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| int | numSg, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layouts for a dpas operands (A, B, and C/D).
The numSg and consumerLayout (optional) are only used by sg layout creation.
| std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::setupDpasMxLayout | ( | LayoutKind | layoutKind, |
| VectorType | aTy, | ||
| VectorType | bTy, | ||
| VectorType | cdTy, | ||
| VectorType | aScaleTy, | ||
| VectorType | bScaleTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| int | numSg, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layouts for dpas_mx operands (A, B, C/D, A_scale, and B_scale).
The numSg and consumerLayout (optional) are only used by sg layout creation. A_scale and B_scale are optional.
| DistributeLayoutAttr mlir::xegpu::setupInsertStridedSliceResultLayout | ( | LayoutKind | layoutKind, |
| VectorType | srcVectorTy, | ||
| VectorType | resVectorTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| const uArch::uArch * | uArch ) |
Sets up the result layout for an insert strided slice operation.
Creates a result layout based on the specified layout kind (InstData or Lane).
| xegpu::DistributeLayoutAttr mlir::xegpu::setupInterleaveResultLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | srcVecTy, | ||
| VectorType | resVecTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| const uArch::uArch * | uArch ) |
Sets up the result layout for an interleave operation to ensure the source layout can be safely derived.
Interleave doubles the innermost dimension, so the result layout must ensure that laneData is at least 2 (or a multiple of 2), and instData must be divisible by innermostDimLaneLayout * 2.
Interleave doubles the innermost dimension, so the result layout must ensure that laneData is a multiple of 2, and instData must be divisible by innermostDimLaneLayout * 2.
Example: Interleave: vector<128x256xf4> -> vector<128x512xf4> Consumer layout: laneLayout=[1, 16], laneData=[1, 4], instData=[1, 64] Result layout adjustment to ensure source can be safely inferred:
Definition at line 1069 of file XeGPULayoutImpl.cpp.
| DistributeLayoutAttr mlir::xegpu::setupLoadGatherAnchorLayout | ( | LayoutKind | layoutKind, |
| VectorType | vectorTy, | ||
| int | chunkSize, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layout for a load gather operation.
| DistributeLayoutAttr mlir::xegpu::setupLoadMatrixAnchorLayout | ( | LayoutKind | layoutKind, |
| VectorType | vectorTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layout for load matrix operation.
| xegpu::SliceAttr mlir::xegpu::setupMultiReductionResultLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | srcVecTy, | ||
| DistributeLayoutAttr | consumerLayout, | ||
| SmallVector< int64_t > | reductionDims, | ||
| int | numSg, | ||
| const uArch::uArch * | uArch ) |
Sets up layout for Multi-Reduction operations by creating a SliceAttr for the result.
Sets up layout for reduction operations by creating a SliceAttr for the result.
This function first attempts to construct a source layout that, when sliced along reduction dimensions, produces a result layout compatible with the consumer's preferred layout. This minimizes data redistribution overhead. The SliceAttr for the result is then created based on the derived source layout and the specified reduction dimensions.
Algorithm Overview: This function attempts to construct a source layout that, when sliced along reduction dimensions, produces a result layout compatible with the consumer layout.
For subgroup layouts, it first tries to align the source layout's subgroup layout and data with the consumer's layout on non-reduction dimensions. Then, it distributes remaining subgroups across reduction dimensions. This avoids subgroup data redistribution overhead between the reduced result and its consumer. When the consumer layout is a slice layout, it attempts to reuse the slice layout's parent layout for the source to further minimize potential data redistribution.
InstData requries {1, ..., min(maxReduceVectorSize, srcShape),subgroupSize} Lane Layout requires {1, ..., 1, subgroupSize} Lane data requires {1, ..., min(maxReduceVectorSize, srcShape), 1}
Examples:
Result Layout: #xegpu.slice<#xegpu.layout<sg_layout=[4, 8],sg_data=[8, 16]>, dims = [1]>} Note that the sg_layout is reused but sg_data needs to be adjusted to evenly distribute the source tensor tile among the reduction dim.
Definition at line 837 of file XeGPULayoutImpl.cpp.
References mlir::computeShapeRatio(), mlir::detail::DenseArrayAttrImpl< int32_t >::get(), mlir::detail::DenseArrayAttrImpl< int64_t >::get(), InstData, Lane, and Subgroup.
| xegpu::SliceAttr mlir::xegpu::setupReductionResultLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | srcVectorTy, | ||
| const uArch::uArch * | uArch ) |
Sets up layout for Reduction operations by creating a SliceAttr for the result.
Definition at line 949 of file XeGPULayoutImpl.cpp.
References mlir::detail::DenseArrayAttrImpl< int32_t >::get(), mlir::detail::DenseArrayAttrImpl< int64_t >::get(), InstData, Lane, result, and Subgroup.
| xegpu::DistributeLayoutAttr mlir::xegpu::setupStoreMatrixAnchorLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | vectorTy, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layout for a store matrix operation.
Definition at line 1335 of file XeGPULayoutImpl.cpp.
References setupGenericStoreAnchorLayout(), and mlir::xegpu::uArch::StoreScatter.
| xegpu::DistributeLayoutAttr mlir::xegpu::setupStoreScatterAnchorLayout | ( | xegpu::LayoutKind | layoutKind, |
| VectorType | vectorTy, | ||
| int | chunkSize, | ||
| const uArch::uArch * | uArch ) |
Sets up the anchor layout for a store scatter operation.
Definition at line 1316 of file XeGPULayoutImpl.cpp.
References setupGenericStoreAnchorLayout(), and mlir::xegpu::uArch::StoreScatter.
| Value mlir::xegpu::subgroupReduction | ( | Location | loc, |
| OpBuilder & | builder, | ||
| Value | input, | ||
| vector::CombiningKind | kind, | ||
| uint32_t | size ) |
Given an input value representing per-lane data, this function returns the result after performing a reduction on the input over all lanes (number of lanes given by size).
This uses butterfly shuffles to perform the reduction in a log2(size) number of steps. NOTE: Implementation taken from TestVectorTransforms.cpp
width =
mode =
Definition at line 682 of file XeGPUUtils.cpp.
Referenced by lowerCrossLaneReductionToShuffles().