MLIR  20.0.0git
Public Types | Public Member Functions | Public Attributes | Static Public Attributes | List of all members
mlir::transform::gpu::CopyMappingInfo Struct Reference

#include "mlir/Dialect/Linalg/TransformOps/GPUHeuristics.h"

+ Inheritance diagram for mlir::transform::gpu::CopyMappingInfo:

Public Types

enum class  Status { Success = 0 , RequiresPredication , Invalid }
 Status of the mapping computation, invalid usually means too many threads are required and we fail to map. More...
 

Public Member Functions

 CopyMappingInfo (MLIRContext *ctx, int totalNumThreads, int64_t desiredBitAlignment, ArrayRef< int64_t > sizes, bool favorPredication=false, int64_t elementalBitwidth=32)
 Greedily compute the MappingInfo to use to perform a copy of sizes elements of bitwidth elementalBitwidth. More...
 
void print (llvm::raw_ostream &os) const
 
LLVM_DUMP_METHOD void dump () const
 

Public Attributes

int64_t vectorSize
 Most minor vector size (i.e. 1-D), in number of elements, used in a copy. More...
 
SmallVector< int64_t > smallestBoundingTileSizes
 Explicit computation / injection of the smallest bounding tile sizes after mapping to numThreads. More...
 
Status status
 The status of a particular copy mapping. More...
 
SmallVector< int64_t > numThreads
 Number of threads to use for the copy mapping, from most major to most minor dims (i.e. More...
 
SmallVector< AttributethreadMapping
 Thread mapping attributes, one per entry of numThreads. More...
 
- Public Attributes inherited from mlir::transform::gpu::MappingInfo
SmallVector< int64_t > numThreads
 Number of threads to use for the mapping. More...
 
SmallVector< AttributethreadMapping
 Thread mapping attributes, one per entry of numThreads. More...
 

Static Public Attributes

static constexpr int64_t kMaxVectorLoadBitWidth = 128
 Static quantity determining the number of bits to target in an individual copy. More...
 

Detailed Description

Definition at line 32 of file GPUHeuristics.h.

Member Enumeration Documentation

◆ Status

Status of the mapping computation, invalid usually means too many threads are required and we fail to map.

This usually happens when the copy is too large compared to the number of threads.

Enumerator
Success 
RequiresPredication 
Invalid 

Definition at line 36 of file GPUHeuristics.h.

Constructor & Destructor Documentation

◆ CopyMappingInfo()

transform::gpu::CopyMappingInfo::CopyMappingInfo ( MLIRContext ctx,
int  totalNumThreads,
int64_t  desiredBitAlignment,
ArrayRef< int64_t >  sizes,
bool  favorPredication = false,
int64_t  elementalBitwidth = 32 
)

Greedily compute the MappingInfo to use to perform a copy of sizes elements of bitwidth elementalBitwidth.

The desiredBitAlignment is the number of elements by which the most minor dimension of the copy is expected to be aligned. This is an approximation of the final alignment, for each row of the copy. This is used to restrict the size of copied vector so that they match potential subsequent cp.async. If the alignment does not match the required alignment for a cp.async down the line, the conversion to cp.async will be eventually skipped, possibly degrading performance. When favorPredication is false, the mapping is computed to fill all threads with an equal amount of data to copy, so as to avoid predication. Predication ends up requiring a split epilogue in current pipelining implementations and is better avoided when possible.

Definition at line 37 of file GPUHeuristics.cpp.

References DBGS, Invalid, kMaxVectorLoadBitWidth, LDBG, linearId0(), linearId1(), linearId2(), numThreads, print(), smallestBoundingTileSizes, status, threadMapping, and vectorSize.

Member Function Documentation

◆ dump()

LLVM_DUMP_METHOD void mlir::transform::gpu::CopyMappingInfo::dump ( ) const

◆ print()

void transform::gpu::CopyMappingInfo::print ( llvm::raw_ostream &  os) const

Definition at line 257 of file GPUHeuristics.cpp.

Referenced by CopyMappingInfo().

Member Data Documentation

◆ kMaxVectorLoadBitWidth

constexpr int64_t mlir::transform::gpu::CopyMappingInfo::kMaxVectorLoadBitWidth = 128
staticconstexpr

Static quantity determining the number of bits to target in an individual copy.

Assumes that smaller increments of 64, 32, 16, 8 are also valid transfer sizes. In the future we should have more hardware pluggability here, especially when we want sub-byte granularity

Definition at line 100 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ numThreads

SmallVector<int64_t> mlir::transform::gpu::MappingInfo::numThreads

Number of threads to use for the copy mapping, from most major to most minor dims (i.e.

numThreads.back() should be mapped to contiguous threads for best coalescing).

Definition at line 26 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ smallestBoundingTileSizes

SmallVector<int64_t> mlir::transform::gpu::CopyMappingInfo::smallestBoundingTileSizes

Explicit computation / injection of the smallest bounding tile sizes after mapping to numThreads.

This is useful in masked scenarios.

Definition at line 112 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ status

Status mlir::transform::gpu::CopyMappingInfo::status

The status of a particular copy mapping.

Must be checked before applying transformations.

Definition at line 119 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ threadMapping

SmallVector<Attribute> mlir::transform::gpu::MappingInfo::threadMapping

Thread mapping attributes, one per entry of numThreads.

Definition at line 29 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().

◆ vectorSize

int64_t mlir::transform::gpu::CopyMappingInfo::vectorSize

Most minor vector size (i.e. 1-D), in number of elements, used in a copy.

Definition at line 103 of file GPUHeuristics.h.

Referenced by CopyMappingInfo().


The documentation for this struct was generated from the following files: