#include "mlir/Dialect/Linalg/TransformOps/GPUHeuristics.h"

Inheritance diagram for mlir::transform::gpu::CopyMappingInfo:

Public Types
enum class	Status { Success = 0 , RequiresPredication , Invalid }
	Status of the mapping computation, invalid usually means too many threads are required and we fail to map. More...

Public Member Functions
	CopyMappingInfo (MLIRContext *ctx, int totalNumThreads, int64_t desiredBitAlignment, ArrayRef< int64_t > sizes, bool favorPredication=false, int64_t elementalBitwidth=32)
	Greedily compute the MappingInfo to use to perform a copy of `sizes` elements of bitwidth `elementalBitwidth`. More...

void	print (llvm::raw_ostream &os) const

LLVM_DUMP_METHOD void	dump () const

Public Attributes
int64_t	vectorSize
	Most minor vector size (i.e. 1-D), in number of elements, used in a copy. More...

SmallVector< int64_t >	smallestBoundingTileSizes
	Explicit computation / injection of the smallest bounding tile sizes after mapping to `numThreads`. More...

Status	status
	The status of a particular copy mapping. More...

SmallVector< int64_t >	numThreads
	Number of threads to use for the copy mapping, from most major to most minor dims (i.e. More...

SmallVector< Attribute >	threadMapping
	Thread mapping attributes, one per entry of `numThreads`. More...

Public Attributes inherited from mlir::transform::gpu::MappingInfo
SmallVector< int64_t >	numThreads
	Number of threads to use for the mapping. More...

SmallVector< Attribute >	threadMapping
	Thread mapping attributes, one per entry of `numThreads`. More...

Static Public Attributes
static constexpr int64_t	kMaxVectorLoadBitWidth = 128
	Static quantity determining the number of bits to target in an individual copy. More...

Detailed Description

Definition at line 32 of file GPUHeuristics.h.

Member Enumeration Documentation

◆ Status

enum mlir::transform::gpu::CopyMappingInfo::Status

strong

Status of the mapping computation, invalid usually means too many threads are required and we fail to map.

This usually happens when the copy is too large compared to the number of threads.

Enumerator
Success
RequiresPredication
Invalid

Definition at line 36 of file GPUHeuristics.h.

Constructor & Destructor Documentation

◆ CopyMappingInfo()

transform::gpu::CopyMappingInfo::CopyMappingInfo	(	MLIRContext *	ctx,
		int	totalNumThreads,
		int64_t	desiredBitAlignment,
		ArrayRef< int64_t >	sizes,
		bool	favorPredication = `false`,
		int64_t	elementalBitwidth = `32`
	)

Greedily compute the MappingInfo to use to perform a copy of sizes elements of bitwidth elementalBitwidth.

The desiredBitAlignment is the number of elements by which the most minor dimension of the copy is expected to be aligned. This is an approximation of the final alignment, for each row of the copy. This is used to restrict the size of copied vector so that they match potential subsequent cp.async. If the alignment does not match the required alignment for a cp.async down the line, the conversion to cp.async will be eventually skipped, possibly degrading performance. When favorPredication is false, the mapping is computed to fill all threads with an equal amount of data to copy, so as to avoid predication. Predication ends up requiring a split epilogue in current pipelining implementations and is better avoided when possible.

Definition at line 38 of file GPUHeuristics.cpp.

References DBGS, Invalid, kMaxVectorLoadBitWidth, LDBG, linearId0(), linearId1(), linearId2(), numThreads, print(), smallestBoundingTileSizes, status, threadMapping, and vectorSize.

Member Function Documentation

◆ dump()

LLVM_DUMP_METHOD void mlir::transform::gpu::CopyMappingInfo::dump ( ) const

◆ print()

void transform::gpu::CopyMappingInfo::print ( llvm::raw_ostream & os ) const

Definition at line 250 of file GPUHeuristics.cpp.

Referenced by CopyMappingInfo().

Member Data Documentation

◆ kMaxVectorLoadBitWidth

constexpr int64_t mlir::transform::gpu::CopyMappingInfo::kMaxVectorLoadBitWidth = 128

staticconstexpr

Static quantity determining the number of bits to target in an individual copy.

Assumes that smaller increments of 64, 32, 16, 8 are also valid transfer sizes. In the future we should have more hardware pluggability here, especially when we want sub-byte granularity

Definition at line 100 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ numThreads

SmallVector<int64_t> mlir::transform::gpu::MappingInfo::numThreads

Number of threads to use for the copy mapping, from most major to most minor dims (i.e.

numThreads.back() should be mapped to contiguous threads for best coalescing).

Definition at line 26 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ smallestBoundingTileSizes

SmallVector<int64_t> mlir::transform::gpu::CopyMappingInfo::smallestBoundingTileSizes

Explicit computation / injection of the smallest bounding tile sizes after mapping to numThreads.

This is useful in masked scenarios.

Definition at line 112 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ status

Status mlir::transform::gpu::CopyMappingInfo::status

The status of a particular copy mapping.

Must be checked before applying transformations.

Definition at line 119 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ threadMapping

SmallVector<Attribute> mlir::transform::gpu::MappingInfo::threadMapping

Thread mapping attributes, one per entry of numThreads.

Definition at line 29 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ vectorSize

int64_t mlir::transform::gpu::CopyMappingInfo::vectorSize

Most minor vector size (i.e. 1-D), in number of elements, used in a copy.

Definition at line 103 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

The documentation for this struct was generated from the following files:

include/mlir/Dialect/Linalg/TransformOps/GPUHeuristics.h
lib/Dialect/Linalg/TransformOps/GPUHeuristics.cpp

Public Types

Public Member Functions

Public Attributes

Static Public Attributes

Detailed Description

Member Enumeration Documentation

◆ Status

Constructor & Destructor Documentation

◆ CopyMappingInfo()

Member Function Documentation

◆ dump()

◆ print()

Member Data Documentation

◆ kMaxVectorLoadBitWidth

◆ numThreads

◆ smallestBoundingTileSizes

◆ status

◆ threadMapping

◆ vectorSize