MLIR 23.0.0git
mlir::xegpu Namespace Reference

Namespaces

namespace  uArch

Classes

struct  UnrollOptions
 Options to control the XeGPU unrolling. More...

Typedefs

using GetLayoutFnTy = llvm::function_ref<DistributeLayoutAttr(Value)>
 Callable returning the propagated layout for a given Value, used by the layout-propagation helpers below.
using SubShapeAndCountFn
 Callback type for computing sub-shape and count for 1:N (or 1:1 shape-changing) VectorType conversion.

Enumerations

enum class  LayoutKind { Lane , InstData , Subgroup }
 Specifies the level of a layout hierarchy for comparison or propagation. More...

Functions

void registerTransformDialectExtension (DialectRegistry &registry)
void populateXeGPUPeepHoleOptimizerPatterns (RewritePatternSet &patterns)
 Appends patterns for optimizing block load operations into patterns.
void populateXeGPUArrayLengthOptimizationPatterns (RewritePatternSet &patterns)
 Appends patterns for array length optimization into patterns.
void populateXeGPUSubgroupDistributePatterns (RewritePatternSet &patterns)
 Appends patterns for XeGPU SIMT distribution into patterns.
void populateXeGPUMoveFuncBodyToWarpOpPatterns (RewritePatternSet &patterns)
 Appends patterns for moving function body into gpu.warp_execute_on_lane0 op.
void populateXeGPUWgToSgDistributeTypeConversions (TypeConverter &converter, Operation *topLevelOp)
 Define the type conversions needed for XeGPU workgroup to subgroup distribution.
void populateXeGPUWgToSgDistributePatterns (RewritePatternSet &patterns)
 Appends patterns for XeGPU workgroup to subgroup distribution into patterns.
void populateXeGPUSgToLaneDistributeTypeConversions (TypeConverter &typeConverter, Operation *topLevelOp)
 Define only the type conversions needed for XeGPU subgroup to lane distribution.
void populateXeGPUSgToLaneDistributeTypeConversionAndLegality (TypeConverter &typeConverter, RewritePatternSet &patterns, ConversionTarget &target, Operation *topLevelOp)
 Defines type conversions and legality for XeGPU subgroup to lane distribution and appends the required conversion patterns into patterns.
void populateXeGPUUnrollPatterns (RewritePatternSet &patterns, const UnrollOptions &options)
 Collect a set of patterns to unroll xegpu operations to a smaller shapes.
LogicalResult propagateLayouts (OpBuilder &builder, Operation *target, LayoutKind layoutKind, unsigned indexBitWidth, bool printOnly=false)
LogicalResult resolveLayoutConflicts (Operation *target)
LogicalResult propagateRegionArgsToInits (RegionBranchOpInterface regionOp, GetLayoutFnTy getLayoutOfValue)
 Propagate layouts from a region branch op's region entry block arguments back to its init operands.
bool recoverTemporaryLayouts (Operation *rootOp)
 Attach layout attributes to all vector-type operands of operations within the given operation's nested region.
template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
void removeLayoutAttr (const T &operandOrResult)
 Removes the LayoutAttr for a given OpOperand or OpResult if it exists.
void removeLayoutAttrs (Operation *op)
 Removes the DistributeLayoutAttr for each OpOperand and OpResult of the given operation if they exist.
void removeTemporaryLayoutAttrs (Operation *op)
 Removes the temporary layout attributes for each OpOperand and OpResult of the given operation.
SmallVector< NamedAttributedropSgLayoutAndDataOnAttrs (ArrayRef< NamedAttribute > attrs)
 Updates the NamedAttribute sequence by dropping sg-layout and sg-data information from any DistributeLayoutAttr found.
SmallVector< NamedAttributedropInstDataOnAttrs (ArrayRef< NamedAttribute > attrs)
 Updates the NamedAttribute sequence by dropping inst-data information from any DistributeLayoutAttr found.
DistributeLayoutAttr inferBroadcastSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape)
 Infers the source layout attribute for a broadcast operation given the result layout attribute, result shape, and source shape.
DistributeLayoutAttr inferMultiReductionSourceLayout (DistributeLayoutAttr resLayout, SmallVector< int64_t > reduceDims)
 Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.
DistributeLayoutAttr inferReductionSourceLayout (DistributeLayoutAttr resLayout)
 Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.
DistributeLayoutAttr inferTransposeSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > permutation)
 Infers the source layout attribute for a transpose operation given the result layout attribute and permutation.
DistributeLayoutAttr inferBitCastSourceLayout (DistributeLayoutAttr resLayout, int resElemTyBitWidth, int srcElemTyBitWidth)
 Infers the source layout attribute for a bitcast operation given the result layout attribute, result element type bitwidth, and source element type bitwidth.
DistributeLayoutAttr inferInterleaveSourceLayout (DistributeLayoutAttr resLayout)
 Infers the source layout attribute for an interleave operation given the result layout attribute.
DistributeLayoutAttr inferDeinterleaveSourceLayout (DistributeLayoutAttr resLayout)
 Infers the source layout attribute for a deinterleave operation given the result layout attribute.
DistributeLayoutAttr inferShapeCastSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape)
 Infers the source layout attribute for a shape cast operation given the result layout attribute, result shape, and source shape.
DistributeLayoutAttr inferInsertStridedSliceSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape)
 Infers the source layout attribute for an insert strided slice operation given the result layout attribute, result shape, and source shape.
DistributeLayoutAttr inferInsertSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape)
 Infers the source layout attribute for an insert operation.
DistributeLayoutAttr inferExtractSourceLayout (DistributeLayoutAttr resLayout, ArrayRef< int64_t > resShape, ArrayRef< int64_t > srcShape)
 Infers the source layout attribute for an extract operation.
DistributeLayoutAttr inferMaskOffsetLayoutForScatterIO (DistributeLayoutAttr payloadLayout, int chunkSize)
 Infers the layout attribute for mask and offset operand for Chunked load and store, given the anchor layout attribute for the value being load/store.
DistributeLayoutAttr inferSourceLayoutFromResultForNonAnchorOp (OpOperand &operand, DistributeLayoutAttr resLayout)
 Infers the source layout attribute for an operand using result layout attribute.
SliceAttr setupMultiReductionResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, DistributeLayoutAttr consumerLayout, SmallVector< int64_t > reductionDims, int numSg, const uArch::uArch *uArch)
 Note on the consumerLayout argument used by the consumer-driven setup* / complete* helpers below:
SliceAttr setupReductionResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, const uArch::uArch *uArch)
 Sets up layout for Reduction operations by creating a SliceAttr for the result.
DistributeLayoutAttr setupBitCastResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch)
 Setup the result layout attribute for a bitcast operation based on element type bitwidths.
DistributeLayoutAttr setupInterleaveResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch)
 Sets up the result layout for an interleave operation to ensure the source layout can be safely derived.
DistributeLayoutAttr setupInsertStridedSliceResultLayout (LayoutKind layoutKind, VectorType srcVectorTy, VectorType resVectorTy, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch)
 Sets up the result layout for an insert strided slice operation.
DistributeLayoutAttr setupLoadGatherAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int contigChunkSize, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch)
 Sets up the anchor layout for a load gather operation.
DistributeLayoutAttr setupLoadMatrixAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int contigChunkSize, DistributeLayoutAttr consumerLayout, const uArch::uArch *uArch)
 Sets up the anchor layout for load matrix operation.
DistributeLayoutAttr setupStoreScatterAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int contigChunkSize, const uArch::uArch *uArch)
 Sets up the anchor layout for a store scatter operation.
DistributeLayoutAttr setupStoreMatrixAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int contigChunkSize, const uArch::uArch *uArch)
 Sets up the anchor layout for a store matrix operation.
std::optional< DistributeLayoutAttr > completeScatterLoadLaneLayoutFromInstData (DistributeLayoutAttr userSpecifiedLayout, DistributeLayoutAttr consumerLayout, Type elemTy, const xegpu::uArch::LoadGatherInstructionInterface *uArchInstruction, const int subgroupSize)
 If the consumer layout has only inst_data (no lane_layout/lane_data), completes it by running the corresponding scatter-style Lane-kind setup rule with inst_data as the destination shape.
std::optional< DistributeLayoutAttr > completeScatterStoreLaneLayoutFromInstData (DistributeLayoutAttr specifiedLayout, Type elemTy, const xegpu::uArch::StoreScatterInstructionInterface *uArchInstruction, const int subgroupSize)
 Like completeScatterLoadLaneLayoutFromInstData, but for scatter stores (store_scatter / store_matrix).
std::optional< DistributeLayoutAttr > completeBlockStoreLaneLayoutFromInstData (DistributeLayoutAttr specifiedLayout, Type elemTy, const xegpu::uArch::BlockIOInstructionInterface *uArchInstruction, const int subgroupSize)
 Completes a user-provided 2D-block store_nd / prefetch_nd anchor that has only inst_data.
std::optional< DistributeLayoutAttr > completeBlockLoadLaneLayoutFromInstData (DistributeLayoutAttr specifiedLayout, DistributeLayoutAttr consumerLayout, Type elemTy, const xegpu::uArch::BlockIOInstructionInterface *uArchInstruction, const int subgroupSize)
 Like completeBlockStoreLaneLayoutFromInstData, but for load_nd.
DistributeLayoutAttr setupStoreNdAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, int numSg, const uArch::uArch *uArch)
 Sets up the anchor layout for a store_nd operation.
DistributeLayoutAttr setupPrefetchNdAnchorLayout (LayoutKind layoutKind, TensorDescType tdescTy, int numSg, const uArch::uArch *uArch)
 Sets up the anchor layout for a prefetch_nd operation.
DistributeLayoutAttr setupLoadNdAnchorLayout (LayoutKind layoutKind, VectorType vectorTy, DistributeLayoutAttr consumerLayout, int numSg, const uArch::uArch *uArch)
 Sets up the anchor layout for a load_nd operation.
std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > setupDpasLayout (LayoutKind layoutKind, VectorType aTy, VectorType bTy, VectorType cdTy, DistributeLayoutAttr consumerLayout, int numSg, const uArch::uArch *uArch)
 Sets up the anchor layouts for a dpas operands (A, B, and C/D).
std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > setupDpasMxLayout (LayoutKind layoutKind, VectorType aTy, VectorType bTy, VectorType cdTy, VectorType aScaleTy, VectorType bScaleTy, DistributeLayoutAttr consumerLayout, int numSg, const uArch::uArch *uArch)
 Sets up the anchor layouts for dpas_mx operands (A, B, C/D, A_scale, and B_scale).
std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > completeDpasLaneLayoutFromInstData (DistributeLayoutAttr aLayout, DistributeLayoutAttr bLayout, DistributeLayoutAttr cdLayout, VectorType aTy, VectorType bTy, VectorType cdTy, const uArch::uArch *uArch)
 Completes user-provided DPAS A/B/C-D anchors that carry only inst_data by filling in lane_layout / lane_data derived from the operand shapes (mirrors the InstData branch of setupDpasLayout).
std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > completeDpasMxLaneLayoutFromInstData (DistributeLayoutAttr aLayout, DistributeLayoutAttr bLayout, DistributeLayoutAttr cdLayout, VectorType aTy, VectorType bTy, VectorType cdTy, VectorType aScaleTy, VectorType bScaleTy, const uArch::uArch *uArch)
 Like completeDpasLaneLayoutFromInstData, but for dpas_mx: additionally re-derives the A_scale / B_scale layouts from the completed A / B layouts.
DistributeLayoutAttr getConsumerLayoutAt (OpOperand &operand)
 Gets the expected layout for a given consumer operand.
bool isTriviallyRematerializable (Operation *op)
 Returns true if op is safe and cheap to clone: it has no side effects, no regions, and all of its operands are themselves trivially rematerializable (e.g.
SmallVector< ValueflattenValues (ArrayRef< ValueRange > values)
 Flatten a set of ValueRange into a single SmallVector<Value>
FailureOr< VectorType > getDistributedVectorType (xegpu::TensorDescType tdescTy)
 If tensor descriptor has a layout attribute it is used in SIMT mode.
FailureOr< VectorType > getDistributedVectorType (VectorType originalType, LayoutAttr layout)
 Helper to get the distributed vector type for a given vector type according to a given LayoutAttr.
FailureOr< VectorType > getDistVecTypeBasedOnLaneLayout (DistributeLayoutAttr layout, VectorType originalType)
 Helper function to get distributed vector type for a source vector type according to the lane_layout.
SmallVector< ValueextractVectorsWithShapeFromValue (OpBuilder &builder, Location loc, Value value, ArrayRef< int64_t > shape)
 Extract a set of small vectors from a value with a given shape using vector.extract_stride_slice.
Value createVectorWithShapeFromValues (OpBuilder &builder, Location loc, ValueRange values, ArrayRef< int64_t > shape)
 Create a vector of shape from a set of values using vector.insert_stride_slice.
std::optional< std::string > getChipStr (Operation *op)
 Retrieves the chip string from the XeVM target attribute of the parent GPU module operation.
SmallVector< OpFoldResultaddElementwise (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > lhs, ArrayRef< OpFoldResult > rhs)
 Generates element-wise addition ops of two arrays with same length.
SmallVector< OpFoldResultaddWithRightAligned (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > lhs, ArrayRef< OpFoldResult > rhs)
 Generates element-wise addition ops of two arrays with automatic alignment.
Value subgroupReduction (Location loc, OpBuilder &builder, Value input, vector::CombiningKind kind, uint32_t size)
 Given an input value representing per-lane data, this function returns the result after performing a reduction on the input over all lanes (number of lanes given by size).
Value lowerToVectorReductions (TypedValue< VectorType > src, TypedValue< VectorType > acc, vector::CombiningKind kind, int64_t reductionDim, Location loc, PatternRewriter &rewriter)
 Given a src and an acc argumments from a vector::MultiDimReductionOp, lower to a set of vector::ReductionOp ops over 1D slices extracted from src.
Value createReductionNeutralValue (OpBuilder &builder, Location loc, Type type, vector::CombiningKind kind)
 Creates a constant filled with the neutral (identity) value for the given reduction kind.
Value lowerCrossLaneReductionToShuffles (TypedValue< VectorType > src, TypedValue< VectorType > acc, vector::CombiningKind kind, int64_t reductionDim, int64_t reductionSize, Location loc, PatternRewriter &rewriter)
 Lowers cross-lane reductions to shuffle operations on a 2D vector.
template<typename T>
int getLargestDivisor (T dim, ArrayRef< T > candidates, ArrayRef< T > candidateMultiples={})
 Helper Function to find a proper instruction multiple for the user-supplied sg-level data shape (diven by dim).
DistributeLayoutAttr getDistributeLayoutAttr (const Value value)
 Retrieves the DistributeLayoutAttr associated with a given Value.
DistributeLayoutAttr getDistributeLayoutAttr (const OpOperand &opr)
 Retrieves the DistributeLayoutAttr associated with a given OpOperand.
void setDistributeLayoutAttr (const OpResult &Result, const DistributeLayoutAttr layout)
 [to-be-deprecated] Sets the DistributeLayoutAttr for a given OpResult user should use setAnchorLayout instead
void setDistributeLayoutAttr (const OpOperand &opr, const DistributeLayoutAttr layout)
 [to-be-deprecated] Sets the DistributeLayoutAttr for a given OpOperand user should use setAnchorLayout instead
std::string getTemporaryLayoutName (const OpOperand &operand)
 Return the attribute name for the OpOperand to attach DistributeLayoutAttr.
std::string getTemporaryLayoutName (const OpResult result)
 Return the attribute name for the OpResult to attach DistributeLayoutAttr.
template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
DistributeLayoutAttr getTemporaryLayout (const T &operandOrResult)
 get and set distribute layout attribute for non-anchor operations (and offsets/masks of load/store ops before we get rid of their temp attrs)
template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
void setTemporaryLayout (const T &operandOrResult, const DistributeLayoutAttr layout)
bool requirePacked (const DistributeLayoutAttr layout)
 Helper function to check if the layout is packed.
bool requireTranspose (const DistributeLayoutAttr layout, const uArch::uArch *uArch)
 Helper function to check if the layout requires a transpose effect.
bool matchUnitDimExpansion (ArrayRef< int64_t > src, ArrayRef< int64_t > dst, SmallVector< int64_t > &expandedUnitDims)
bool matchSplitDimExpansion (ArrayRef< int64_t > src, ArrayRef< int64_t > dst, SmallVector< SmallVector< int64_t > > &splitDimGroups)
DenseMap< Value, SmallVector< Type > > precomputeLoopBlockArgTypes (Operation *topLevelOp, SubShapeAndCountFn getSubShapeAndCount)
 Pre-computes distributed VectorType mappings for every value carried through an SCF loop under topLevelOp (1:1 shape-changing or 1:N): the region block args (scf.while before/after args, scf.for iter_args), the loop results, and the terminator operands feeding them.
void addVectorTypeConversion (TypeConverter &converter, SubShapeAndCountFn getSubShapeAndCount, DenseMap< Value, SmallVector< Type > > loopArgTypes)
 Adds a context-aware VectorType conversion to converter (1:1 shape-changing or 1:N, depending on getSubShapeAndCount's returned count).
void cleanupUnrealizedConversionCasts (Operation *root, const llvm::SmallSetVector< UnrealizedConversionCastOp, 8 > &existingCasts)
 Cleans up UnrealizedConversionCastOps inserted during SCF structural type conversion and/or XeGPU unrolling.
bool matchDimCollapse (ArrayRef< int64_t > src, ArrayRef< int64_t > dst, SmallVector< SmallVector< int64_t > > &collapseDims)
static SmallVector< SmallVector< Value > > genCoordinates (OpBuilder &builder, Location loc, SmallVector< Value > delinearizedId, ArrayRef< int64_t > subShapesLayout, ArrayRef< int64_t > subShape, ArrayRef< int64_t > srcShape)
static SmallVector< SmallVector< int64_t > > genStaticCoordinates (llvm::ArrayRef< int64_t > canonicalIds, llvm::ArrayRef< int64_t > layout, llvm::ArrayRef< int64_t > subShape, llvm::ArrayRef< int64_t > shape)
static SmallVector< SmallVector< int64_t > > expandBlockCoords (ArrayRef< SmallVector< int64_t > > blockStarts, ArrayRef< int64_t > subShape)
 Expands per-distribution-unit block-start coordinates into the full list of element coordinates each block covers: every element of the subShape-sized region (row-major) offset by the block start.
static bool compareDistributedCoords (xegpu::DistributeLayoutAttr self, const xegpu::DistributeLayoutAttr &other, ArrayRef< int64_t > shape, xegpu::LayoutKind level, int64_t size)
 Returns true if self and other distribute shape identically at level: every id in [0, size) owns the same coordinates under both.
static SmallVector< int64_tmapSlicedDimsToParentSpace (const SmallVector< int64_t > &dimsToMap, ArrayRef< int64_t > sliceDims)
SmallVector< int64_tgetPermForParentLayout (ArrayRef< int64_t > sliceDims, ArrayRef< int64_t > permutation)
template<typename ArithOp>
OpFoldResult genBinOp (OpFoldResult a, OpFoldResult b, Location loc, OpBuilder &builder)
SmallVector< OpFoldResultgetBlockedOffsets (OpBuilder &builder, Location loc, ArrayRef< OpFoldResult > offsets, ArrayRef< int64_t > blockShape)

Typedef Documentation

◆ GetLayoutFnTy

using mlir::xegpu::GetLayoutFnTy = llvm::function_ref<DistributeLayoutAttr(Value)>

Callable returning the propagated layout for a given Value, used by the layout-propagation helpers below.

Definition at line 46 of file XeGPULayoutImpl.h.

◆ SubShapeAndCountFn

Initial value:
std::function<std::pair<SmallVector<int64_t>, int>(
VectorType, DistributeLayoutAttr)>

Callback type for computing sub-shape and count for 1:N (or 1:1 shape-changing) VectorType conversion.

Given a VectorType and its DistributeLayoutAttr, returns (subShape, count). A count <= 0 signals "no conversion needed"; count == 1 is a 1:1 shape-changing conversion; count > 1 produces count copies of subShape.

Definition at line 235 of file XeGPUUtils.h.

Enumeration Type Documentation

◆ LayoutKind

enum class mlir::xegpu::LayoutKind
strong

Specifies the level of a layout hierarchy for comparison or propagation.

Enumerator
Lane 
InstData 
Subgroup 

Definition at line 32 of file XeGPU.h.

Function Documentation

◆ addElementwise()

SmallVector< OpFoldResult > mlir::xegpu::addElementwise ( OpBuilder & builder,
Location loc,
ArrayRef< OpFoldResult > lhs,
ArrayRef< OpFoldResult > rhs )

Generates element-wise addition ops of two arrays with same length.

Definition at line 493 of file XeGPUUtils.cpp.

References mlir::OpBuilder::createOrFold(), mlir::getValueOrCreateConstantIndexOp(), lhs, and rhs.

Referenced by addWithRightAligned().

◆ addVectorTypeConversion()

void mlir::xegpu::addVectorTypeConversion ( TypeConverter & converter,
SubShapeAndCountFn getSubShapeAndCount,
DenseMap< Value, SmallVector< Type > > loopArgTypes )

Adds a context-aware VectorType conversion to converter (1:1 shape-changing or 1:N, depending on getSubShapeAndCount's returned count).

getSubShapeAndCount computes (subShape, count) for a VectorType and its layout; count <= 0 means no conversion needed. loopArgTypes (typically obtained from precomputeLoopBlockArgTypes) provides the pre-computed types for SCF loop block arguments (scf.while, scf.for); pass an empty map if the IR has no such loops.

Definition at line 933 of file XeGPUUtils.cpp.

References getDistributeLayoutAttr(), mlir::Value::getType(), result, and success().

Referenced by populateXeGPUSgToLaneDistributeTypeConversions(), and populateXeGPUWgToSgDistributeTypeConversions().

◆ addWithRightAligned()

SmallVector< OpFoldResult > mlir::xegpu::addWithRightAligned ( OpBuilder & builder,
Location loc,
ArrayRef< OpFoldResult > lhs,
ArrayRef< OpFoldResult > rhs )

Generates element-wise addition ops of two arrays with automatic alignment.

When the input arrays have different sizes, the shorter array is right-aligned with the longer array, and the unmatched leading elements from the longer array are preserved unchanged. This is commonly used for offset computation where higher-dimensional offsets need to be added to lower-dimensional adjustments.

Example: lhs = [l1, l2, l3], rhs = [r1, r2] Result: [11, l2+r1, l3+r2]

Definition at line 518 of file XeGPUUtils.cpp.

References addElementwise(), b, lhs, and rhs.

◆ cleanupUnrealizedConversionCasts()

void mlir::xegpu::cleanupUnrealizedConversionCasts ( Operation * root,
const llvm::SmallSetVector< UnrealizedConversionCastOp, 8 > & existingCasts )

Cleans up UnrealizedConversionCastOps inserted during SCF structural type conversion and/or XeGPU unrolling.

Folds cancelling N:1->1:N and 1:N->N:1 cast chains (inserting vector.shape_cast when shapes differ but element counts match). Unpaired pack (1:N) and unpack (N:1) casts between a single large VectorType and N identically-typed smaller VectorTypes are lowered to vector.extract_strided_slice / vector.insert_strided_slice. Dead casts are erased. Casts in existingCasts are preserved.

Definition at line 974 of file XeGPUUtils.cpp.

References createVectorWithShapeFromValues(), extractVectorsWithShapeFromValue(), mlir::Value::getType(), mlir::ValueRange::getTypes(), result, mlir::OpBuilder::setInsertionPoint(), ValueRange, and mlir::Operation::walk().

◆ compareDistributedCoords()

bool mlir::xegpu::compareDistributedCoords ( xegpu::DistributeLayoutAttr self,
const xegpu::DistributeLayoutAttr & other,
ArrayRef< int64_t > shape,
xegpu::LayoutKind level,
int64_t size )
static

Returns true if self and other distribute shape identically at level: every id in [0, size) owns the same coordinates under both.

At the Lane level, layouts that pack lane_data differently can still own the same per-lane elements in the same order; their block starts differ but the expanded per-element coordinates match. So block starts are expanded (via expandBlockCoords) before comparing, but only when it can change the result (Lane level with differing lane_data) - otherwise comparing the cheaper block starts is already exact.

TODO: Extend the same handling to the Subgroup level (sg_data repacks).

Definition at line 158 of file XeGPUDialect.cpp.

References expandBlockCoords(), and Lane.

◆ completeBlockLoadLaneLayoutFromInstData()

std::optional< DistributeLayoutAttr > mlir::xegpu::completeBlockLoadLaneLayoutFromInstData ( DistributeLayoutAttr specifiedLayout,
DistributeLayoutAttr consumerLayout,
Type elemTy,
const xegpu::uArch::BlockIOInstructionInterface * uArchInstruction,
const int subgroupSize )

Like completeBlockStoreLaneLayoutFromInstData, but for load_nd.

The consumer layout supplies the transform / transpose / packing properties; the lane factorization is recomputed from inst_data (load-side lane counts differ from the consumer's).

◆ completeBlockStoreLaneLayoutFromInstData()

std::optional< DistributeLayoutAttr > mlir::xegpu::completeBlockStoreLaneLayoutFromInstData ( DistributeLayoutAttr specifiedLayout,
Type elemTy,
const xegpu::uArch::BlockIOInstructionInterface * uArchInstruction,
const int subgroupSize )

Completes a user-provided 2D-block store_nd / prefetch_nd anchor that has only inst_data.

These ops are data sinks, so lane info is derived purely from inst_data using the shared BlockIOInstructionInterface; one helper serves both store_nd and prefetch_nd.

◆ completeDpasLaneLayoutFromInstData()

std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::completeDpasLaneLayoutFromInstData ( DistributeLayoutAttr aLayout,
DistributeLayoutAttr bLayout,
DistributeLayoutAttr cdLayout,
VectorType aTy,
VectorType bTy,
VectorType cdTy,
const uArch::uArch * uArch )

Completes user-provided DPAS A/B/C-D anchors that carry only inst_data by filling in lane_layout / lane_data derived from the operand shapes (mirrors the InstData branch of setupDpasLayout).

Returns nullopt if the uArch lacks the matmul instruction.

◆ completeDpasMxLaneLayoutFromInstData()

std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::completeDpasMxLaneLayoutFromInstData ( DistributeLayoutAttr aLayout,
DistributeLayoutAttr bLayout,
DistributeLayoutAttr cdLayout,
VectorType aTy,
VectorType bTy,
VectorType cdTy,
VectorType aScaleTy,
VectorType bScaleTy,
const uArch::uArch * uArch )

Like completeDpasLaneLayoutFromInstData, but for dpas_mx: additionally re-derives the A_scale / B_scale layouts from the completed A / B layouts.

◆ completeScatterLoadLaneLayoutFromInstData()

std::optional< DistributeLayoutAttr > mlir::xegpu::completeScatterLoadLaneLayoutFromInstData ( DistributeLayoutAttr userSpecifiedLayout,
DistributeLayoutAttr consumerLayout,
Type elemTy,
const xegpu::uArch::LoadGatherInstructionInterface * uArchInstruction,
const int subgroupSize )

If the consumer layout has only inst_data (no lane_layout/lane_data), completes it by running the corresponding scatter-style Lane-kind setup rule with inst_data as the destination shape.

The resulting lane info is merged with the consumer's inst_data so downstream setup* paths see a fully-populated layout. Returns the layout unchanged when it is null, has no inst_data, or already carries lane info; returns nullopt when the derived lane factorization does not divide the user's inst_data (an invalid inst_data).

◆ completeScatterStoreLaneLayoutFromInstData()

std::optional< DistributeLayoutAttr > mlir::xegpu::completeScatterStoreLaneLayoutFromInstData ( DistributeLayoutAttr specifiedLayout,
Type elemTy,
const xegpu::uArch::StoreScatterInstructionInterface * uArchInstruction,
const int subgroupSize )

Like completeScatterLoadLaneLayoutFromInstData, but for scatter stores (store_scatter / store_matrix).

A store is a data sink: lane info is derived purely from inst_data using the uArch's StoreScatter per-lane store width, with no consumer layout to reuse.

◆ createReductionNeutralValue()

Value mlir::xegpu::createReductionNeutralValue ( OpBuilder & builder,
Location loc,
Type type,
vector::CombiningKind kind )

Creates a constant filled with the neutral (identity) value for the given reduction kind.

For example: 0 for ADD/OR/XOR, 1 for MUL/AND, max/min signed/unsigned int for MINSI/MINUI/MAXSI/MAXUI, and +/-infinity for float min/max operations. If type is a VectorType, returns a splat vector constant; otherwise returns a scalar constant. Returns nullptr if the element type is incompatible with the requested reduction kind.

Definition at line 709 of file XeGPUUtils.cpp.

References mlir::DenseElementsAttr::get(), mlir::Builder::getFloatAttr(), mlir::Builder::getIntegerAttr(), mlir::Builder::getOneAttr(), and mlir::Builder::getZeroAttr().

◆ createVectorWithShapeFromValues()

Value mlir::xegpu::createVectorWithShapeFromValues ( OpBuilder & builder,
Location loc,
ValueRange values,
ArrayRef< int64_t > shape )

Create a vector of shape from a set of values using vector.insert_stride_slice.

Definition at line 449 of file XeGPUUtils.cpp.

References mlir::DenseElementsAttr::get(), mlir::getType(), mlir::ValueRange::getTypes(), mlir::Builder::getZeroAttr(), and result.

Referenced by cleanupUnrealizedConversionCasts().

◆ dropInstDataOnAttrs()

SmallVector< NamedAttribute > mlir::xegpu::dropInstDataOnAttrs ( ArrayRef< NamedAttribute > attrs)

Updates the NamedAttribute sequence by dropping inst-data information from any DistributeLayoutAttr found.

Definition at line 56 of file XeGPULayoutImpl.cpp.

◆ dropSgLayoutAndDataOnAttrs()

SmallVector< NamedAttribute > mlir::xegpu::dropSgLayoutAndDataOnAttrs ( ArrayRef< NamedAttribute > attrs)

Updates the NamedAttribute sequence by dropping sg-layout and sg-data information from any DistributeLayoutAttr found.

Definition at line 38 of file XeGPULayoutImpl.cpp.

◆ expandBlockCoords()

SmallVector< SmallVector< int64_t > > mlir::xegpu::expandBlockCoords ( ArrayRef< SmallVector< int64_t > > blockStarts,
ArrayRef< int64_t > subShape )
static

Expands per-distribution-unit block-start coordinates into the full list of element coordinates each block covers: every element of the subShape-sized region (row-major) offset by the block start.

Comparing these instead of the bare block starts lets layouts that differ only in lane_data blocking, but own the same elements in the same order, be recognized as equivalent.

Definition at line 132 of file XeGPUDialect.cpp.

Referenced by compareDistributedCoords().

◆ extractVectorsWithShapeFromValue()

SmallVector< Value > mlir::xegpu::extractVectorsWithShapeFromValue ( OpBuilder & builder,
Location loc,
Value value,
ArrayRef< int64_t > shape )

Extract a set of small vectors from a value with a given shape using vector.extract_stride_slice.

Definition at line 412 of file XeGPUUtils.cpp.

References mlir::computeShapeRatio(), mlir::Value::getType(), and result.

Referenced by cleanupUnrealizedConversionCasts().

◆ flattenValues()

SmallVector< Value > mlir::xegpu::flattenValues ( ArrayRef< ValueRange > values)

Flatten a set of ValueRange into a single SmallVector<Value>

convert ArrayRef<ValueRange> into SmallVector<Value>

Definition at line 36 of file XeGPUUtils.cpp.

References result.

◆ genBinOp()

template<typename ArithOp>
OpFoldResult mlir::xegpu::genBinOp ( OpFoldResult a,
OpFoldResult b,
Location loc,
OpBuilder & builder )

Definition at line 1747 of file XeGPUDialect.cpp.

References b, and mlir::getValueOrCreateConstantIndexOp().

◆ genCoordinates()

SmallVector< SmallVector< Value > > mlir::xegpu::genCoordinates ( OpBuilder & builder,
Location loc,
SmallVector< Value > delinearizedId,
ArrayRef< int64_t > subShapesLayout,
ArrayRef< int64_t > subShape,
ArrayRef< int64_t > srcShape )
static

◆ genStaticCoordinates()

SmallVector< SmallVector< int64_t > > mlir::xegpu::genStaticCoordinates ( llvm::ArrayRef< int64_t > canonicalIds,
llvm::ArrayRef< int64_t > layout,
llvm::ArrayRef< int64_t > subShape,
llvm::ArrayRef< int64_t > shape )
static

Definition at line 101 of file XeGPUDialect.cpp.

◆ getBlockedOffsets()

SmallVector< OpFoldResult > mlir::xegpu::getBlockedOffsets ( OpBuilder & builder,
Location loc,
ArrayRef< OpFoldResult > offsets,
ArrayRef< int64_t > blockShape )

Definition at line 1772 of file XeGPUDialect.cpp.

References div, and rem.

◆ getChipStr()

std::optional< std::string > mlir::xegpu::getChipStr ( Operation * op)

Retrieves the chip string from the XeVM target attribute of the parent GPU module operation.

Returns the chip identifier if found, or nullopt if no GPU module parent or XeVM target attribute exists.

Definition at line 474 of file XeGPUUtils.cpp.

References mlir::Operation::getParentOfType().

◆ getConsumerLayoutAt()

xegpu::DistributeLayoutAttr mlir::xegpu::getConsumerLayoutAt ( OpOperand & operand)

Gets the expected layout for a given consumer operand.

Returns the layout required on operand: anchor ops report their declared per-operand layout directly; non-anchor ops back-derive it from their result layout via inferSourceLayoutFromResultForNonAnchorOp.

This will check if the owning operation of the consumer operand is one of the special layout users and determine the expected layout accordingly.

Definition at line 2656 of file XeGPULayoutImpl.cpp.

References getDistributeLayoutAttr(), mlir::Operation::getNumResults(), mlir::detail::IROperandBase::getOwner(), mlir::Operation::getResult(), and inferSourceLayoutFromResultForNonAnchorOp().

◆ getDistributedVectorType() [1/2]

FailureOr< VectorType > mlir::xegpu::getDistributedVectorType ( VectorType originalType,
LayoutAttr layout )

Helper to get the distributed vector type for a given vector type according to a given LayoutAttr.

◆ getDistributedVectorType() [2/2]

FailureOr< VectorType > mlir::xegpu::getDistributedVectorType ( xegpu::TensorDescType tdescTy)

If tensor descriptor has a layout attribute it is used in SIMT mode.

In this mode, the distributed vector shape is determined as follows: Definitions: lane_data_size = lane_data[0] × lane_data[1] subgroup_size = lane_layout[0] × lane_layout[1] distribution_unit_size = subgroup_size × lane_data_size

Case 1: Regular loads/stores. The following conditions must be met:

  • tensor_desc[0] == lane_layout[0] Distributed vector is a 1D vector with shape: [chunk_size]

Case 2: Block loads/stores Additional definitions: tensor_size = tensor_desc[0] * .. * tensor_desc[r-1] * array_length n_distribution_units = tensor_size / distribution_unit_size fragment_size = n_distribution_units * lane_data_size Given above definitions, the following conditions must be met:

  • tensor_desc[0] % (lane_layout[0] × lane_data[0]) == 0
  • tensor_desc[1] % (lane_layout[1] × lane_data[1]) == 0 Distributed vector is a 1D vector with shape: [fragment_size]

Definition at line 44 of file XeGPUUtils.cpp.

◆ getDistributeLayoutAttr() [1/2]

xegpu::DistributeLayoutAttr mlir::xegpu::getDistributeLayoutAttr ( const OpOperand & opr)

Retrieves the DistributeLayoutAttr associated with a given OpOperand.

It will first check the operand_layout_{id} of the owner operation. If not found, it will check the operand itself and its defining op.

Definition at line 186 of file XeGPUUtils.cpp.

References mlir::Operation::getAttrOfType(), mlir::detail::IROperandBase::getOwner(), getTemporaryLayoutName(), mlir::Operation::hasAttr(), and load.

◆ getDistributeLayoutAttr() [2/2]

xegpu::DistributeLayoutAttr mlir::xegpu::getDistributeLayoutAttr ( const Value value)

Retrieves the DistributeLayoutAttr associated with a given Value.

For TensorDescType values, the DistributeLayoutAttr is extracted from the TensorDescType itself. For other values, it is obtained from the attributes of the defining operation. Returns nullptr if no DistributeLayoutAttr is found.

Definition at line 149 of file XeGPUUtils.cpp.

References mlir::Operation::getAttrOfType(), getTemporaryLayout(), getTemporaryLayoutName(), mlir::Value::getType(), mlir::Operation::hasAttr(), and result.

Referenced by addVectorTypeConversion(), getConsumerLayoutAt(), getLayoutFromUsePoints(), populateXeGPUSgToLaneDistributeTypeConversionAndLegality(), and precomputeLoopBlockArgTypes().

◆ getDistVecTypeBasedOnLaneLayout()

FailureOr< VectorType > mlir::xegpu::getDistVecTypeBasedOnLaneLayout ( DistributeLayoutAttr layout,
VectorType originalType )

Helper function to get distributed vector type for a source vector type according to the lane_layout.

We simply divide each dimension of tensor descriptor shape by corresponding lane_layout dimension. If array_length > 1, that is appended to the front of the distributed shape.

Examples:

original vector shape lane_layout distributed vector shape
32x16 [1, 16] 32x1
32x16 [2, 8] 16x2
2x32x16 [1, 16] 2x32x1

References lhs, and rhs.

Referenced by populateXeGPUSgToLaneDistributeTypeConversions().

◆ getLargestDivisor()

template<typename T>
int mlir::xegpu::getLargestDivisor ( T dim,
ArrayRef< T > candidates,
ArrayRef< T > candidateMultiples = {} )

Helper Function to find a proper instruction multiple for the user-supplied sg-level data shape (diven by dim).

candidates are uArch allowed shapes. candidateMultiples are uArch multiples of such shapes (i.e. block count or array length).

Definition at line 531 of file XeGPUUtils.cpp.

Referenced by get2DBlockIOInstDataLayout(), and getDpasInstDataLayouts().

◆ getPermForParentLayout()

SmallVector< int64_t > mlir::xegpu::getPermForParentLayout ( ArrayRef< int64_t > sliceDims,
ArrayRef< int64_t > permutation )

Definition at line 1431 of file XeGPUDialect.cpp.

◆ getTemporaryLayout()

template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
DistributeLayoutAttr mlir::xegpu::getTemporaryLayout ( const T & operandOrResult)

get and set distribute layout attribute for non-anchor operations (and offsets/masks of load/store ops before we get rid of their temp attrs)

Referenced by getDistributeLayoutAttr(), lowerToVectorReductions(), populateXeGPUSgToLaneDistributeTypeConversionAndLegality(), propagateLayouts(), xegpu::getTemporaryLayout< mlir::OpOperand >(), and xegpu::getTemporaryLayout< mlir::OpResult >().

◆ getTemporaryLayoutName() [1/2]

std::string mlir::xegpu::getTemporaryLayoutName ( const OpOperand & operand)

Return the attribute name for the OpOperand to attach DistributeLayoutAttr.

Definition at line 138 of file XeGPUUtils.cpp.

Referenced by getDistributeLayoutAttr(), getDistributeLayoutAttr(), removeLayoutAttr(), and setDistributeLayoutAttr().

◆ getTemporaryLayoutName() [2/2]

std::string mlir::xegpu::getTemporaryLayoutName ( const OpResult result)

Return the attribute name for the OpResult to attach DistributeLayoutAttr.

Definition at line 144 of file XeGPUUtils.cpp.

References result.

◆ inferBitCastSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferBitCastSourceLayout ( DistributeLayoutAttr resLayout,
int resElemTyBitWidth,
int srcElemTyBitWidth )

Infers the source layout attribute for a bitcast operation given the result layout attribute, result element type bitwidth, and source element type bitwidth.

◆ inferBroadcastSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferBroadcastSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > resShape,
ArrayRef< int64_t > srcShape )

Infers the source layout attribute for a broadcast operation given the result layout attribute, result shape, and source shape.

◆ inferDeinterleaveSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferDeinterleaveSourceLayout ( DistributeLayoutAttr resLayout)

Infers the source layout attribute for a deinterleave operation given the result layout attribute.

Deinterleave halves the innermost dimension size.

◆ inferExtractSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferExtractSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > resShape,
ArrayRef< int64_t > srcShape )

Infers the source layout attribute for an extract operation.

Adds leading dimensions to the source layout to match the source shape size.

◆ inferInsertSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferInsertSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > resShape,
ArrayRef< int64_t > srcShape )

Infers the source layout attribute for an insert operation.

using same logic as inferInsertStridedSliceSourceLayout

◆ inferInsertStridedSliceSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferInsertStridedSliceSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > resShape,
ArrayRef< int64_t > srcShape )

Infers the source layout attribute for an insert strided slice operation given the result layout attribute, result shape, and source shape.

Removes leading dimensions from the result layout to match the source shape size.

◆ inferInterleaveSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferInterleaveSourceLayout ( DistributeLayoutAttr resLayout)

Infers the source layout attribute for an interleave operation given the result layout attribute.

Interleave doubles the innermost dimension size.

◆ inferMaskOffsetLayoutForScatterIO()

DistributeLayoutAttr mlir::xegpu::inferMaskOffsetLayoutForScatterIO ( DistributeLayoutAttr payloadLayout,
int chunkSize )

Infers the layout attribute for mask and offset operand for Chunked load and store, given the anchor layout attribute for the value being load/store.

◆ inferMultiReductionSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferMultiReductionSourceLayout ( DistributeLayoutAttr resLayout,
SmallVector< int64_t > reduceDims )

Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.

◆ inferReductionSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferReductionSourceLayout ( DistributeLayoutAttr resLayout)

Infers the source layout attribute for a reduction operation given the result layout attribute and reduced dims.

◆ inferShapeCastSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferShapeCastSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > resShape,
ArrayRef< int64_t > srcShape )

Infers the source layout attribute for a shape cast operation given the result layout attribute, result shape, and source shape.

◆ inferSourceLayoutFromResultForNonAnchorOp()

DistributeLayoutAttr mlir::xegpu::inferSourceLayoutFromResultForNonAnchorOp ( OpOperand & operand,
DistributeLayoutAttr resLayout )

Infers the source layout attribute for an operand using result layout attribute.

Referenced by getConsumerLayoutAt(), and propagateResultsToRegularOperands().

◆ inferTransposeSourceLayout()

DistributeLayoutAttr mlir::xegpu::inferTransposeSourceLayout ( DistributeLayoutAttr resLayout,
ArrayRef< int64_t > permutation )

Infers the source layout attribute for a transpose operation given the result layout attribute and permutation.

◆ isTriviallyRematerializable()

bool mlir::xegpu::isTriviallyRematerializable ( Operation * op)

Returns true if op is safe and cheap to clone: it has no side effects, no regions, and all of its operands are themselves trivially rematerializable (e.g.

vector.step, splat arith.constant, or vector.create_mask whose operands are constants).

Definition at line 132 of file XeGPULayoutImpl.cpp.

References mlir::Operation::getNumRegions(), mlir::Operation::getOperands(), mlir::isMemoryEffectFree(), and isTriviallyRematerializable().

Referenced by isTriviallyRematerializable().

◆ lowerCrossLaneReductionToShuffles()

Value mlir::xegpu::lowerCrossLaneReductionToShuffles ( TypedValue< VectorType > src,
TypedValue< VectorType > acc,
vector::CombiningKind kind,
int64_t reductionDim,
int64_t reductionSize,
Location loc,
PatternRewriter & rewriter )

Lowers cross-lane reductions to shuffle operations on a 2D vector.

Extracts slices along the reduction dimension, performs subgroup reductions with shuffles across reductionSize work-items, and inserts the results back into an accumulator vector.

Definition at line 641 of file XeGPUUtils.cpp.

References mlir::DenseElementsAttr::get(), mlir::Value::getType(), mlir::Builder::getZeroAttr(), mlir::vector::makeArithReduction(), and subgroupReduction().

◆ lowerToVectorReductions()

Value mlir::xegpu::lowerToVectorReductions ( TypedValue< VectorType > src,
TypedValue< VectorType > acc,
vector::CombiningKind kind,
int64_t reductionDim,
Location loc,
PatternRewriter & rewriter )

Given a src and an acc argumments from a vector::MultiDimReductionOp, lower to a set of vector::ReductionOp ops over 1D slices extracted from src.

The reduction is performed along reductionDim. The result is a vector with the same shape as acc. TODO: Only 2D to 1D reduction is supported for now.

Definition at line 564 of file XeGPUUtils.cpp.

References mlir::DenseElementsAttr::get(), getTemporaryLayout(), mlir::Value::getType(), mlir::Builder::getZeroAttr(), and setTemporaryLayout().

◆ mapSlicedDimsToParentSpace()

SmallVector< int64_t > mlir::xegpu::mapSlicedDimsToParentSpace ( const SmallVector< int64_t > & dimsToMap,
ArrayRef< int64_t > sliceDims )
static

Definition at line 1269 of file XeGPUDialect.cpp.

◆ matchDimCollapse()

bool mlir::xegpu::matchDimCollapse ( ArrayRef< int64_t > src,
ArrayRef< int64_t > dst,
SmallVector< SmallVector< int64_t > > & collapseDims )

Definition at line 1093 of file XeGPUUtils.cpp.

◆ matchSplitDimExpansion()

bool mlir::xegpu::matchSplitDimExpansion ( ArrayRef< int64_t > src,
ArrayRef< int64_t > dst,
SmallVector< SmallVector< int64_t > > & splitDimGroups )

Definition at line 826 of file XeGPUUtils.cpp.

◆ matchUnitDimExpansion()

bool mlir::xegpu::matchUnitDimExpansion ( ArrayRef< int64_t > src,
ArrayRef< int64_t > dst,
SmallVector< int64_t > & expandedUnitDims )

Definition at line 806 of file XeGPUUtils.cpp.

◆ populateXeGPUArrayLengthOptimizationPatterns()

void mlir::xegpu::populateXeGPUArrayLengthOptimizationPatterns ( RewritePatternSet & patterns)

Appends patterns for array length optimization into patterns.

Definition at line 292 of file XeGPUArrayLengthOptimization.cpp.

References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().

◆ populateXeGPUMoveFuncBodyToWarpOpPatterns()

void mlir::xegpu::populateXeGPUMoveFuncBodyToWarpOpPatterns ( RewritePatternSet & patterns)

Appends patterns for moving function body into gpu.warp_execute_on_lane0 op.

References options, and target.

◆ populateXeGPUPeepHoleOptimizerPatterns()

void mlir::xegpu::populateXeGPUPeepHoleOptimizerPatterns ( RewritePatternSet & patterns)

Appends patterns for optimizing block load operations into patterns.

Definition at line 554 of file XeGPUPeepHoleOptimizer.cpp.

References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().

◆ populateXeGPUSgToLaneDistributeTypeConversionAndLegality()

void mlir::xegpu::populateXeGPUSgToLaneDistributeTypeConversionAndLegality ( TypeConverter & typeConverter,
RewritePatternSet & patterns,
ConversionTarget & target,
Operation * topLevelOp )

Defines type conversions and legality for XeGPU subgroup to lane distribution and appends the required conversion patterns into patterns.

Appends patterns for XeGPU subgroup to lane distribution into patterns.

Definition at line 1873 of file XeGPUSgToLaneDistribute.cpp.

References mlir::RewritePatternSet::add(), mlir::RewritePatternSet::getContext(), getDistributeLayoutAttr(), getTemporaryLayout(), mlir::OpTrait::hasElementwiseMappableTraits(), populateXeGPUSgToLaneDistributeTypeConversions(), and target.

◆ populateXeGPUSgToLaneDistributeTypeConversions()

void mlir::xegpu::populateXeGPUSgToLaneDistributeTypeConversions ( TypeConverter & typeConverter,
Operation * topLevelOp )

Define only the type conversions needed for XeGPU subgroup to lane distribution.

Definition at line 1843 of file XeGPUSgToLaneDistribute.cpp.

References addVectorTypeConversion(), getDistVecTypeBasedOnLaneLayout(), and precomputeLoopBlockArgTypes().

Referenced by populateXeGPUSgToLaneDistributeTypeConversionAndLegality().

◆ populateXeGPUSubgroupDistributePatterns()

void mlir::xegpu::populateXeGPUSubgroupDistributePatterns ( RewritePatternSet & patterns)

Appends patterns for XeGPU SIMT distribution into patterns.

◆ populateXeGPUUnrollPatterns()

void mlir::xegpu::populateXeGPUUnrollPatterns ( RewritePatternSet & patterns,
const UnrollOptions & options )

Collect a set of patterns to unroll xegpu operations to a smaller shapes.

Users can control whether an operation to be unrolled or not, as well as its target shape via options structure. (via setting filterConstraint and nativeShape respectively, both of them are function refs taking op as input). An op is unrolled to the targetShape as follows, for each of its operands:

  1. the unrolled type unrolledType and number of unrolled instances numUnrolledInstances are computed from the targetShape.
  2. pack each operand. ExtractStridedSlice are created to break-up the vector operands. And BuiltinUnrealizedCastOp are created to break-up the TensorDesc operands.
  3. the original op is cloned numUnrolledInstances times, once for each result.
  4. unpack the results. InsertStridedSlice are inserted for VectorType result, and BuiltinUnrealizedCastOp are inserted for TensorDescType result to re-assemble the slices into the original shape.

Definition at line 1149 of file XeGPUUnroll.cpp.

References mlir::RewritePatternSet::add(), mlir::RewritePatternSet::getContext(), and options.

◆ populateXeGPUWgToSgDistributePatterns()

void mlir::xegpu::populateXeGPUWgToSgDistributePatterns ( RewritePatternSet & patterns)

Appends patterns for XeGPU workgroup to subgroup distribution into patterns.

Definition at line 1526 of file XeGPUWgToSgDistribute.cpp.

References mlir::RewritePatternSet::add(), and mlir::RewritePatternSet::getContext().

◆ populateXeGPUWgToSgDistributeTypeConversions()

void mlir::xegpu::populateXeGPUWgToSgDistributeTypeConversions ( TypeConverter & converter,
Operation * topLevelOp )

Define the type conversions needed for XeGPU workgroup to subgroup distribution.

This includes a context-aware 1:N conversion for VectorType (using the distribute layout attribute on the Value) and a 1:N conversion for TensorDescType.

Definition at line 1483 of file XeGPUWgToSgDistribute.cpp.

References addVectorTypeConversion(), mlir::Type::getContext(), precomputeLoopBlockArgTypes(), result, and success().

◆ precomputeLoopBlockArgTypes()

DenseMap< Value, SmallVector< Type > > mlir::xegpu::precomputeLoopBlockArgTypes ( Operation * topLevelOp,
SubShapeAndCountFn getSubShapeAndCount )

Pre-computes distributed VectorType mappings for every value carried through an SCF loop under topLevelOp (1:1 shape-changing or 1:N): the region block args (scf.while before/after args, scf.for iter_args), the loop results, and the terminator operands feeding them.

Each is derived from a single source – the layout of the feeding value (loop init or scf.condition operand) – and keyed by Value, because the SCF converters detach/replace the loop body mid-conversion, after which a layout query on a block arg returns null. Recording results and terminator operands lets a 1:N pass resolve them from the map after stripping the loop op's transient layout attrs. scf.if has no loop-carried block args and needs no entry.

Definition at line 880 of file XeGPUUtils.cpp.

References getDistributeLayoutAttr(), mlir::Value::getType(), and mlir::Operation::walk().

Referenced by populateXeGPUSgToLaneDistributeTypeConversions(), and populateXeGPUWgToSgDistributeTypeConversions().

◆ propagateLayouts()

◆ propagateRegionArgsToInits()

LogicalResult mlir::xegpu::propagateRegionArgsToInits ( RegionBranchOpInterface regionOp,
GetLayoutFnTy getLayoutOfValue )

Propagate layouts from a region branch op's region entry block arguments back to its init operands.

The block argument's layout is obtained via getLayoutOfValue; the matching layout is then recorded on each init operand that flows into that block argument (e.g. scf.for's iter_args inits), and on tensor descriptor block argument types.

Referenced by propagateLayouts(), and recoverTemporaryLayouts().

◆ recoverTemporaryLayouts()

bool mlir::xegpu::recoverTemporaryLayouts ( Operation * rootOp)

Attach layout attributes to all vector-type operands of operations within the given operation's nested region.

Reports an error if any vector operand lacks a layout attribute.

Definition at line 307 of file XeGPULayoutImpl.cpp.

References getLayoutFromUsePoints(), propagateRegionArgsToInits(), propagateRegionResultsToYieldOperands(), propagateResultsToRegularOperands(), removeTemporaryLayoutAttrs(), mlir::Operation::walk(), and walkRegionBackward().

◆ registerTransformDialectExtension()

void mlir::xegpu::registerTransformDialectExtension ( DialectRegistry & registry)

◆ removeLayoutAttr()

template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
void mlir::xegpu::removeLayoutAttr ( const T & operandOrResult)

◆ removeLayoutAttrs()

void mlir::xegpu::removeLayoutAttrs ( Operation * op)

Removes the DistributeLayoutAttr for each OpOperand and OpResult of the given operation if they exist.

If the operation contains regions, it is also applied recursively to the contained operations

Definition at line 348 of file XeGPULayoutImpl.cpp.

References mlir::Operation::getAttrs(), mlir::Operation::removeAttr(), and mlir::Operation::walk().

◆ removeTemporaryLayoutAttrs()

void mlir::xegpu::removeTemporaryLayoutAttrs ( Operation * op)

Removes the temporary layout attributes for each OpOperand and OpResult of the given operation.

Recursive for contained operations if the given operation contains regions.

Definition at line 361 of file XeGPULayoutImpl.cpp.

References mlir::Operation::getDiscardableAttrs(), mlir::Operation::removeDiscardableAttr(), and mlir::Operation::walk().

Referenced by recoverTemporaryLayouts().

◆ requirePacked()

bool mlir::xegpu::requirePacked ( const DistributeLayoutAttr layout)

Helper function to check if the layout is packed.

Layout is packed if it is 2D and lane_data[0] != 1 (data packed from col dimension). TODO: Move to target info.

◆ requireTranspose()

bool mlir::xegpu::requireTranspose ( const DistributeLayoutAttr layout,
const uArch::uArch * uArch )

Helper function to check if the layout requires a transpose effect.

◆ resolveLayoutConflicts()

LogicalResult mlir::xegpu::resolveLayoutConflicts ( Operation * target)

Definition at line 1897 of file XeGPUPropagateLayout.cpp.

References target.

◆ setDistributeLayoutAttr() [1/2]

void mlir::xegpu::setDistributeLayoutAttr ( const OpOperand & opr,
const DistributeLayoutAttr layout )

[to-be-deprecated] Sets the DistributeLayoutAttr for a given OpOperand user should use setAnchorLayout instead

Definition at line 325 of file XeGPUUtils.cpp.

References mlir::detail::IROperandBase::getOwner(), getTemporaryLayoutName(), mlir::Operation::hasAttrOfType(), and mlir::Operation::setAttr().

◆ setDistributeLayoutAttr() [2/2]

void mlir::xegpu::setDistributeLayoutAttr ( const OpResult & Result,
const DistributeLayoutAttr layout )

[to-be-deprecated] Sets the DistributeLayoutAttr for a given OpResult user should use setAnchorLayout instead

References result.

Referenced by updateControlFlowOps(), and updateOp().

◆ setTemporaryLayout()

template<typename T, typename = std::enable_if_t<std::is_same_v<T, OpOperand> || std::is_same_v<T, OpResult>>>
void mlir::xegpu::setTemporaryLayout ( const T & operandOrResult,
const DistributeLayoutAttr layout )

◆ setupBitCastResultLayout()

xegpu::DistributeLayoutAttr mlir::xegpu::setupBitCastResultLayout ( xegpu::LayoutKind layoutKind,
VectorType srcVecTy,
VectorType resVecTy,
DistributeLayoutAttr consumerLayout,
const uArch::uArch * uArch )

Setup the result layout attribute for a bitcast operation based on element type bitwidths.

Sets up the result layout for a bitcast operation.

This ensures the source layout can always be derived from the result layout.

When casting from a narrower to a wider element type (srcElemTyBitWidth < resElemTyBitWidth), the result layout's innermost dimension data sizes (inst_data, lane_data) are scaled up by the bitwidth ratio. This maintains the invariant that the source layout can be recovered by adjusting the result layout based on bitwidth ratio of input vs output.

When casting to a smaller bitwidth, adjusts the layout dimensions (sgData, instData, or laneData) by multiplying by the bitwidth ratio to ensure the result layout can be correctly divided back to the source layout during inference.

Examples:

  1. Casting f32 -> f16 (32-bit to 16-bit, bitWidthRatio = 2): Consumer layout: instData=[1, 16], subgroupSize=16 Source shape: [8, 32] Result layout: instData=[1, 32] (16 * 2) The innermost dimension is multiplied by 2 to maintain consistency.
  2. Casting f32 -> i8 (32-bit to 8-bit, bitWidthRatio = 4): Consumer instData=[1, 16], subgroupSize=16 Source shape: [4, 128] adjust the instData from [1, 16] to [1, 16 * 4 = 64]
  3. Casting i8 -> i32 (8-bit to 32-bit, bitWidthRatio = 1/4): Consumer layout: laneLayout=[1, 16], laneData=[1, 4] No adjustment needed - returns consumer layout directly.

Definition at line 2419 of file XeGPULayoutImpl.cpp.

References adjustInnermostDimForDivisibility().

◆ setupDpasLayout()

std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::setupDpasLayout ( LayoutKind layoutKind,
VectorType aTy,
VectorType bTy,
VectorType cdTy,
DistributeLayoutAttr consumerLayout,
int numSg,
const uArch::uArch * uArch )

Sets up the anchor layouts for a dpas operands (A, B, and C/D).

The numSg and consumerLayout (optional) are only used by sg layout creation.

◆ setupDpasMxLayout()

std::optional< std::tuple< DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr, DistributeLayoutAttr > > mlir::xegpu::setupDpasMxLayout ( LayoutKind layoutKind,
VectorType aTy,
VectorType bTy,
VectorType cdTy,
VectorType aScaleTy,
VectorType bScaleTy,
DistributeLayoutAttr consumerLayout,
int numSg,
const uArch::uArch * uArch )

Sets up the anchor layouts for dpas_mx operands (A, B, C/D, A_scale, and B_scale).

The numSg and consumerLayout (optional) are only used by sg layout creation. A_scale and B_scale are optional.

◆ setupInsertStridedSliceResultLayout()

DistributeLayoutAttr mlir::xegpu::setupInsertStridedSliceResultLayout ( LayoutKind layoutKind,
VectorType srcVectorTy,
VectorType resVectorTy,
DistributeLayoutAttr consumerLayout,
const uArch::uArch * uArch )

Sets up the result layout for an insert strided slice operation.

Creates a result layout based on the specified layout kind (InstData or Lane).

◆ setupInterleaveResultLayout()

xegpu::DistributeLayoutAttr mlir::xegpu::setupInterleaveResultLayout ( xegpu::LayoutKind layoutKind,
VectorType srcVecTy,
VectorType resVecTy,
DistributeLayoutAttr consumerLayout,
const uArch::uArch * uArch )

Sets up the result layout for an interleave operation to ensure the source layout can be safely derived.

Interleave doubles the innermost dimension, so the result layout must ensure that laneData is at least 2 (or a multiple of 2), and instData must be divisible by innermostDimLaneLayout * 2.

Interleave doubles the innermost dimension, so the result layout must ensure that laneData is a multiple of 2, and instData must be divisible by innermostDimLaneLayout * 2.

Example: Interleave: vector<128x256xf4> -> vector<128x512xf4> Consumer layout: laneLayout=[1, 16], laneData=[1, 4], instData=[1, 64] Result layout adjustment to ensure source can be safely inferred:

  • laneData must be >= 2 and multiple of 2 (so source = laneData/2 is valid)
  • instData must be divisible by (16 * 2 = 32) (so source = instData/2 is valid)
  • Adjusted instData: ensure (instData % 32 == 0)

Definition at line 2462 of file XeGPULayoutImpl.cpp.

References adjustInnermostDimForDivisibility().

◆ setupLoadGatherAnchorLayout()

DistributeLayoutAttr mlir::xegpu::setupLoadGatherAnchorLayout ( LayoutKind layoutKind,
VectorType vectorTy,
int contigChunkSize,
DistributeLayoutAttr consumerLayout,
const uArch::uArch * uArch )

Sets up the anchor layout for a load gather operation.

◆ setupLoadMatrixAnchorLayout()

DistributeLayoutAttr mlir::xegpu::setupLoadMatrixAnchorLayout ( LayoutKind layoutKind,
VectorType vectorTy,
int contigChunkSize,
DistributeLayoutAttr consumerLayout,
const uArch::uArch * uArch )

Sets up the anchor layout for load matrix operation.

◆ setupLoadNdAnchorLayout()

DistributeLayoutAttr mlir::xegpu::setupLoadNdAnchorLayout ( LayoutKind layoutKind,
VectorType vectorTy,
DistributeLayoutAttr consumerLayout,
int numSg,
const uArch::uArch * uArch )

Sets up the anchor layout for a load_nd operation.

LoadNd takes a (downstream) consumer layout and validates it against uArch constraints; when valid, the consumer's inst_data / sg_layout are honored. Otherwise defaults derived from uArch block parameters are used. consumerLayout must be presented. numSg is only used for Subgroup-kind layouts when the consumer does not already provide an sg_layout.

◆ setupMultiReductionResultLayout()

xegpu::SliceAttr mlir::xegpu::setupMultiReductionResultLayout ( xegpu::LayoutKind layoutKind,
VectorType srcVecTy,
DistributeLayoutAttr consumerLayout,
SmallVector< int64_t > reductionDims,
int numSg,
const uArch::uArch * uArch )

Note on the consumerLayout argument used by the consumer-driven setup* / complete* helpers below:

Sets up layout for reduction operations by creating a SliceAttr for the result.

Layout propagation is a backward dataflow analysis, so a producer learns its consumers' demands one at a time. The consumerLayout passed to these helpers is the single layout that the first consumer to reach the producer has requested (see getConsumerLayoutAt); these helpers do not pick among, or merge, multiple consumers, and they do not reason about cost (e.g. a consumer inside a loop vs. one outside). If a producer has several consumers with conflicting layout demands, only the first-arriving one shapes the producer's anchor layout here; any later, inconsistent consumer is left as-is and reconciled afterwards by the layout conflict resolution process (ResolveLayoutConflicts), which inserts a convert_layout op on that edge. So these helpers can always assume exactly one (possibly null) consumer layout to honor. Sets up layout for Multi-Reduction operations by creating a SliceAttr for the result.

This function first attempts to construct a source layout that, when sliced along reduction dimensions, produces a result layout compatible with the consumer's preferred layout. This minimizes data redistribution overhead. The SliceAttr for the result is then created based on the derived source layout and the specified reduction dimensions.

Algorithm Overview: This function attempts to construct a source layout that, when sliced along reduction dimensions, produces a result layout compatible with the consumer layout.

For subgroup layouts, it first tries to align the source layout's subgroup layout and data with the consumer's layout on non-reduction dimensions. Then, it distributes remaining subgroups across reduction dimensions. This avoids subgroup data redistribution overhead between the reduced result and its consumer. When the consumer layout is a slice layout, it attempts to reuse the slice layout's parent layout for the source to further minimize potential data redistribution.

This is a best-effort alignment, not a hard constraint: the goal is only to pick a legal source layout that minimizes redistribution against the (single, first-arriving) consumer layout. There is no failure path - when the consumer's slice layout cannot be reused as-is (example 2 below), the function falls back to distributing all subgroups on the non-reduction dimensions first and the remainder on the reduction dimensions, which always yields a valid source layout. If the resulting source layout still differs from what some consumer expects (e.g. a second, inconsistent consumer), that mismatch is reconciled later by the layout conflict resolution process (ResolveLayoutConflicts), which inserts a convert_layout op - this function never has to give up.

For the InstData and Lane layout kinds only the innermost two dimensions are distributed; all leading dimensions are assumed to be unit dimensions. This assumption is checked via leadingDimsAreUnit. The lane_layout and lane_data are computed by computeReductionLaneLayoutAndData, which picks a layout that minimizes cross-lane reduction (reducing within a lane when only one of the innermost two dims is a reduction dim). The inst_data is simply the element-wise product lane_layout * lane_data.

The function returns the result layout (the SliceAttr). The source layout it decides on is the parent of that slice; both are listed below so the relationship is explicit.

Examples:

  1. Subgroup layout - Row reduction on 2D tensor: srcShape=[32, 128], reductionDims=[1], resShape=[32], subgroupSize=16, NumSg=32
    • Consumer Layout: #xegpu.slice<#xegpu.layout<sg_layout=[4, 8], sg_data=[8, 8]>, dims = [1]>}
    • Source Layout (decided by this function): #xegpu.layout<sg_layout=[4, 8], sg_data=[8, 16]>
    • Result Layout (returned): #xegpu.slice<#xegpu.layout<sg_layout=[4, 8], sg_data=[8, 16]>, dims = [1]>} The consumer slices exactly the reduction dim, so its parent layout is reused for the source: sg_layout is kept, but the source's sg_data on the reduction dim is grown from 8 to 16 (= srcShape[1] / sg_layout[1] = 128 / 8) so the source tile is evenly distributed over the reduction dim. Slicing that source over dim 1 reproduces the consumer.
  2. Subgroup layout - Same shapes as above but consumer doesn't have a reusable slice layout, so the algorithm distributes all subgroups on the non-reduction dims first and the remainder on the reduction dims. 2a. * Consumer Layout: #xegpu.layout<sg_layout=[32], sg_data=[1]>
    • Source Layout (decided by this function): #xegpu.layout<sg_layout=[32, 1], sg_data=[1, 128]>
    • Result Layout (returned): #xegpu.slice<#xegpu.layout<sg_layout=[32, 1], sg_data=[1, 128]>, dims = [1]>} All 32 subgroups land on the non-reduction dim 0; the reduction dim 1 gets the leftover (sg_layout=1, so the whole length 128 lives in one subgroup's sg_data). 2b. * Consumer Layout: #xegpu.slice<#xegpu.layout<sg_layout=[8, 2, 4], sg_data=[4, 64, 32]>, dims = [1, 2]>}
    • Source Layout (decided by this function): #xegpu.layout<sg_layout=[8, 4], sg_data=[4, 32]>
    • Result Layout (returned): #xegpu.slice<#xegpu.layout<sg_layout=[8, 4], sg_data=[4, 32]>, dims = [1]>} The consumer slices dims [1, 2] which do not match this op's reductionDims, so it can't be reused as-is; subgroups are re-distributed (non-reduction dim first, then reduction dim).
  3. Lane layout - Default (lanes on innermost dim): srcShape=[32, 64], reductionDims=[0], subgroupSize=16
    • Source Layout (decided by this function): laneLayout=[1, 16], laneData=[1, 1] (returned sliced over dim 0). The innermost dim is not reduced, so lanes stay on it.
  4. Lane layout - Switch (lanes moved off the reduction dim): srcShape=[32, 64], reductionDims=[1], subgroupSize=16
    • Source Layout (decided by this function): laneLayout=[16, 1], laneData=[1, 1] (returned sliced over dim 1). The innermost dim is the sole reduction dim, so lanes move to the non-reduction dim to reduce within a lane. This switch only happens when the consumer has no reduction dims to broadcast the result back along (i.e. the consumer layout is not a slice over this reduction); otherwise the default (example 3) is used.
  5. Lane layout - No switch when both inner dims are reduced (reduction to scalar): srcShape=[32, 64], reductionDims=[0, 1], subgroupSize=16
    • Source Layout (decided by this function): laneLayout=[1, 16], laneData=[1, 1] (returned sliced over dims [0,1]). Both dims are reduced, so this is not a sole innermost reduction; the switch condition (example 4) does not apply and lanes stay on the innermost dim. The cross-lane reduction here is unavoidable.
  6. Lane layout - No switch when the consumer slices the reduction dim: srcShape=[32, 64], reductionDims=[1], subgroupSize=16
    • Consumer Layout: #xegpu.slice<#xegpu.layout<laneLayout=[1, 16], laneData=[1, 1]>, dims = [1]>}
    • Source Layout (decided by this function): #xegpu.layout<laneLayout=[1, 16], laneData=[1, 1]> (the consumer slice's parent, reused directly; returned sliced over dim 1). Same shape/reductionDims as example 4, but here the consumer is a slice over the reduction dim, so it can broadcast the result back along that dim. The slice's parent layout is reused as the source (no switch, no re-derivation); the inst_data propagation step has already inserted a convert_layout if needed, so the lane-level layout can be reused as-is.

Definition at line 2179 of file XeGPULayoutImpl.cpp.

References buildInstDataLayoutWithLane(), buildLaneLayout(), buildLayout(), computeReductionLaneLayoutAndData(), mlir::computeShapeRatio(), mlir::detail::DenseArrayAttrImpl< int32_t >::get(), mlir::detail::DenseArrayAttrImpl< int64_t >::get(), InstData, Lane, leadingDimsAreUnit(), and Subgroup.

◆ setupPrefetchNdAnchorLayout()

DistributeLayoutAttr mlir::xegpu::setupPrefetchNdAnchorLayout ( LayoutKind layoutKind,
TensorDescType tdescTy,
int numSg,
const uArch::uArch * uArch )

Sets up the anchor layout for a prefetch_nd operation.

PrefetchNd has no value result and thus no consumer; it picks its layout from uArch block parameters. numSg is only used for Subgroup-kind layouts.

◆ setupReductionResultLayout()

xegpu::SliceAttr mlir::xegpu::setupReductionResultLayout ( xegpu::LayoutKind layoutKind,
VectorType srcVectorTy,
const uArch::uArch * uArch )

Sets up layout for Reduction operations by creating a SliceAttr for the result.

Definition at line 2316 of file XeGPULayoutImpl.cpp.

References buildLaneLayout(), mlir::detail::DenseArrayAttrImpl< int64_t >::get(), InstData, Lane, result, and Subgroup.

◆ setupStoreMatrixAnchorLayout()

xegpu::DistributeLayoutAttr mlir::xegpu::setupStoreMatrixAnchorLayout ( xegpu::LayoutKind layoutKind,
VectorType vectorTy,
int contigChunkSize,
const uArch::uArch * uArch )

Sets up the anchor layout for a store matrix operation.

Definition at line 1801 of file XeGPULayoutImpl.cpp.

References setupGenericStoreAnchorLayout(), and mlir::xegpu::uArch::StoreScatter.

◆ setupStoreNdAnchorLayout()

xegpu::DistributeLayoutAttr mlir::xegpu::setupStoreNdAnchorLayout ( xegpu::LayoutKind layoutKind,
VectorType srcVecTy,
int numSg,
const uArch::uArch * uArch )

Sets up the anchor layout for a store_nd operation.

StoreNd does not consider a consumer layout (it is a data sink), and picks its layout from uArch block parameters. numSg is only used for Subgroup-kind layouts.

StoreNd picks its own layout based on uArch block parameters (it does not take a consumer layout, since it is a data sink).

Definition at line 1442 of file XeGPULayoutImpl.cpp.

References buildInstDataLayoutWithLane(), buildLaneLayout(), buildSgLayout(), compute2DBlockIOLaneLayoutAndData(), get2DBlockIOInstDataLayout(), mlir::Type::getIntOrFloatBitWidth(), getSgLayoutCandidates(), InstData, isValidLaneLayout(), Lane, Subgroup, and mlir::xegpu::uArch::Subgroup2DBlockStore.

◆ setupStoreScatterAnchorLayout()

xegpu::DistributeLayoutAttr mlir::xegpu::setupStoreScatterAnchorLayout ( xegpu::LayoutKind layoutKind,
VectorType vectorTy,
int contigChunkSize,
const uArch::uArch * uArch )

Sets up the anchor layout for a store scatter operation.

Definition at line 1781 of file XeGPULayoutImpl.cpp.

References setupGenericStoreAnchorLayout(), and mlir::xegpu::uArch::StoreScatter.

◆ subgroupReduction()

Value mlir::xegpu::subgroupReduction ( Location loc,
OpBuilder & builder,
Value input,
vector::CombiningKind kind,
uint32_t size )

Given an input value representing per-lane data, this function returns the result after performing a reduction on the input over all lanes (number of lanes given by size).

This uses butterfly shuffles to perform the reduction in a log2(size) number of steps. NOTE: Implementation taken from TestVectorTransforms.cpp

width =

mode =

Definition at line 549 of file XeGPUUtils.cpp.

Referenced by lowerCrossLaneReductionToShuffles().