MLIR
20.0.0git
|
#include "mlir/Dialect/Linalg/TransformOps/GPUHeuristics.h"
Public Types | |
enum class | Status { Success = 0 , RequiresPredication , Invalid } |
Status of the mapping computation, invalid usually means too many threads are required and we fail to map. More... | |
Public Member Functions | |
CopyMappingInfo (MLIRContext *ctx, int totalNumThreads, int64_t desiredBitAlignment, ArrayRef< int64_t > sizes, bool favorPredication=false, int64_t elementalBitwidth=32) | |
Greedily compute the MappingInfo to use to perform a copy of sizes elements of bitwidth elementalBitwidth . More... | |
void | print (llvm::raw_ostream &os) const |
LLVM_DUMP_METHOD void | dump () const |
Public Attributes | |
int64_t | vectorSize |
Most minor vector size (i.e. 1-D), in number of elements, used in a copy. More... | |
SmallVector< int64_t > | smallestBoundingTileSizes |
Explicit computation / injection of the smallest bounding tile sizes after mapping to numThreads . More... | |
Status | status |
The status of a particular copy mapping. More... | |
SmallVector< int64_t > | numThreads |
Number of threads to use for the copy mapping, from most major to most minor dims (i.e. More... | |
SmallVector< Attribute > | threadMapping |
Thread mapping attributes, one per entry of numThreads . More... | |
Public Attributes inherited from mlir::transform::gpu::MappingInfo | |
SmallVector< int64_t > | numThreads |
Number of threads to use for the mapping. More... | |
SmallVector< Attribute > | threadMapping |
Thread mapping attributes, one per entry of numThreads . More... | |
Static Public Attributes | |
static constexpr int64_t | kMaxVectorLoadBitWidth = 128 |
Static quantity determining the number of bits to target in an individual copy. More... | |
Definition at line 32 of file GPUHeuristics.h.
|
strong |
Status of the mapping computation, invalid usually means too many threads are required and we fail to map.
This usually happens when the copy is too large compared to the number of threads.
Enumerator | |
---|---|
Success | |
RequiresPredication | |
Invalid |
Definition at line 36 of file GPUHeuristics.h.
transform::gpu::CopyMappingInfo::CopyMappingInfo | ( | MLIRContext * | ctx, |
int | totalNumThreads, | ||
int64_t | desiredBitAlignment, | ||
ArrayRef< int64_t > | sizes, | ||
bool | favorPredication = false , |
||
int64_t | elementalBitwidth = 32 |
||
) |
Greedily compute the MappingInfo to use to perform a copy of sizes
elements of bitwidth elementalBitwidth
.
The desiredBitAlignment
is the number of elements by which the most minor dimension of the copy is expected to be aligned. This is an approximation of the final alignment, for each row of the copy. This is used to restrict the size of copied vector so that they match potential subsequent cp.async. If the alignment does not match the required alignment for a cp.async down the line, the conversion to cp.async will be eventually skipped, possibly degrading performance. When favorPredication
is false, the mapping is computed to fill all threads with an equal amount of data to copy, so as to avoid predication. Predication ends up requiring a split epilogue in current pipelining implementations and is better avoided when possible.
Definition at line 37 of file GPUHeuristics.cpp.
References DBGS, Invalid, kMaxVectorLoadBitWidth, LDBG, linearId0(), linearId1(), linearId2(), numThreads, print(), smallestBoundingTileSizes, status, threadMapping, and vectorSize.
LLVM_DUMP_METHOD void mlir::transform::gpu::CopyMappingInfo::dump | ( | ) | const |
void transform::gpu::CopyMappingInfo::print | ( | llvm::raw_ostream & | os | ) | const |
Definition at line 257 of file GPUHeuristics.cpp.
Referenced by CopyMappingInfo().
|
staticconstexpr |
Static quantity determining the number of bits to target in an individual copy.
Assumes that smaller increments of 64, 32, 16, 8 are also valid transfer sizes. In the future we should have more hardware pluggability here, especially when we want sub-byte granularity
Definition at line 100 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().
SmallVector<int64_t> mlir::transform::gpu::MappingInfo::numThreads |
Number of threads to use for the copy mapping, from most major to most minor dims (i.e.
numThreads.back() should be mapped to contiguous threads for best coalescing).
Definition at line 26 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().
SmallVector<int64_t> mlir::transform::gpu::CopyMappingInfo::smallestBoundingTileSizes |
Explicit computation / injection of the smallest bounding tile sizes after mapping to numThreads
.
This is useful in masked scenarios.
Definition at line 112 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().
Status mlir::transform::gpu::CopyMappingInfo::status |
The status of a particular copy mapping.
Must be checked before applying transformations.
Definition at line 119 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().
SmallVector<Attribute> mlir::transform::gpu::MappingInfo::threadMapping |
Thread mapping attributes, one per entry of numThreads
.
Definition at line 29 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().
int64_t mlir::transform::gpu::CopyMappingInfo::vectorSize |
Most minor vector size (i.e. 1-D), in number of elements, used in a copy.
Definition at line 103 of file GPUHeuristics.h.
Referenced by CopyMappingInfo().