MLIR  19.0.0git
Namespaces | Macros | Functions
AMDGPUToROCDL.cpp File Reference
#include "mlir/Conversion/AMDGPUToROCDL/AMDGPUToROCDL.h"
#include "mlir/Conversion/LLVMCommon/ConversionTarget.h"
#include "mlir/Conversion/LLVMCommon/Pattern.h"
#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
#include "mlir/Dialect/AMDGPU/IR/AMDGPUDialect.h"
#include "mlir/Dialect/LLVMIR/LLVMDialect.h"
#include "mlir/Dialect/LLVMIR/ROCDLDialect.h"
#include "mlir/IR/BuiltinTypes.h"
#include "mlir/IR/TypeUtilities.h"
#include "mlir/Pass/Pass.h"
#include "llvm/ADT/STLExtras.h"
#include <optional>
#include "mlir/Conversion/Passes.h.inc"

Go to the source code of this file.

Namespaces

 mlir
 Include the generated interface declarations.
 

Macros

#define GEN_PASS_DEF_CONVERTAMDGPUTOROCDL
 

Functions

static Value createI32Constant (ConversionPatternRewriter &rewriter, Location loc, int32_t value)
 
static Value createI1Constant (ConversionPatternRewriter &rewriter, Location loc, bool value)
 
static Value mfmaConcatIfNeeded (ConversionPatternRewriter &rewriter, Location loc, Value input)
 If input is a vector of bytes, concatentate those bytes in little-endian order to form a single integer of size 8 * [vector length]. More...
 
static void wmmaPushInputOperand (ConversionPatternRewriter &rewriter, Location loc, const TypeConverter *typeConverter, bool isUnsigned, Value llvmInput, SmallVector< Value, 4 > &operands)
 Push an input operand. More...
 
static void wmmaPushOutputOperand (ConversionPatternRewriter &rewriter, Location loc, const TypeConverter *typeConverter, Value output, int32_t subwordOffset, bool clamp, SmallVector< Value, 4 > &operands)
 Push the output operand. More...
 
static std::optional< StringRef > mfmaOpToIntrinsic (MFMAOp mfma, Chipset chipset)
 Return the rocdl intrinsic corresponding to a MFMA operation mfma if one exists. More...
 
static std::optional< StringRef > wmmaOpToIntrinsic (WMMAOp wmma, Chipset chipset)
 Return the rocdl intrinsic corresponding to a WMMA operation wmma if one exists. More...
 

Macro Definition Documentation

◆ GEN_PASS_DEF_CONVERTAMDGPUTOROCDL

#define GEN_PASS_DEF_CONVERTAMDGPUTOROCDL

Definition at line 25 of file AMDGPUToROCDL.cpp.

Function Documentation

◆ createI1Constant()

static Value createI1Constant ( ConversionPatternRewriter rewriter,
Location  loc,
bool  value 
)
static

◆ createI32Constant()

static Value createI32Constant ( ConversionPatternRewriter rewriter,
Location  loc,
int32_t  value 
)
static

Definition at line 32 of file AMDGPUToROCDL.cpp.

References mlir::OpBuilder::create(), and mlir::Builder::getI32Type().

Referenced by mfmaConcatIfNeeded().

◆ mfmaConcatIfNeeded()

static Value mfmaConcatIfNeeded ( ConversionPatternRewriter rewriter,
Location  loc,
Value  input 
)
static

If input is a vector of bytes, concatentate those bytes in little-endian order to form a single integer of size 8 * [vector length].

This works around a wart in the AMDGPU intrinsics where operations that logically take vectors of bytes instead integers. Since we do not want to expose this implementation detail to MLIR, we correct for it here.

In addition, convert vectors of LLVM bfloats to vectors of i16, since AMDGPU MFMA intrinsics pre-date the bfloat type.

Definition at line 334 of file AMDGPUToROCDL.cpp.

References mlir::OpBuilder::create(), createI32Constant(), mlir::Builder::getI16Type(), mlir::Builder::getIntegerAttr(), mlir::Builder::getIntegerType(), and mlir::Value::getType().

◆ mfmaOpToIntrinsic()

static std::optional<StringRef> mfmaOpToIntrinsic ( MFMAOp  mfma,
Chipset  chipset 
)
static

Return the rocdl intrinsic corresponding to a MFMA operation mfma if one exists.

This includes checking to ensure the intrinsic is supported on the architecture you are compiling for.

Definition at line 433 of file AMDGPUToROCDL.cpp.

◆ wmmaOpToIntrinsic()

static std::optional<StringRef> wmmaOpToIntrinsic ( WMMAOp  wmma,
Chipset  chipset 
)
static

Return the rocdl intrinsic corresponding to a WMMA operation wmma if one exists.

This includes checking to ensure the intrinsic is supported on the architecture you are compiling for.

Definition at line 570 of file AMDGPUToROCDL.cpp.

◆ wmmaPushInputOperand()

static void wmmaPushInputOperand ( ConversionPatternRewriter rewriter,
Location  loc,
const TypeConverter typeConverter,
bool  isUnsigned,
Value  llvmInput,
SmallVector< Value, 4 > &  operands 
)
static

Push an input operand.

If it is a float type, nothing to do. If it is an integer type, then we need to also push its signdness (1 for signed, 0 for unsigned) and we need to pack the input 16xi8 vector into a 4xi32 vector. We also need to convert bfloat inputs to i16 to account for the lack of bfloat support in the WMMA intrinsics themselves.

Definition at line 368 of file AMDGPUToROCDL.cpp.

References mlir::TypeConverter::convertType(), mlir::OpBuilder::create(), createI1Constant(), mlir::OpBuilder::createOrFold(), mlir::get(), mlir::Builder::getI16Type(), mlir::Builder::getI32Type(), mlir::Value::getType(), mlir::Type::isBF16(), mlir::Type::isInteger(), mlir::Type::isSignedInteger(), and mlir::Type::isUnsignedInteger().

◆ wmmaPushOutputOperand()

static void wmmaPushOutputOperand ( ConversionPatternRewriter rewriter,
Location  loc,
const TypeConverter typeConverter,
Value  output,
int32_t  subwordOffset,
bool  clamp,
SmallVector< Value, 4 > &  operands 
)
static

Push the output operand.

For many cases this is only pushing the output in the operand list. But when we have f16 -> f16 or bf16 -> bf16 intrinsics, since the same numbers of VGPRs is used, we need to decide if to store the result in the upper 16 bits of the VGPRs or in the lower part. To store the result in the lower 16 bits, set subwordOffset to 1, otherwise result will be stored it in the upper part

Definition at line 411 of file AMDGPUToROCDL.cpp.

References clamp(), mlir::OpBuilder::create(), createI1Constant(), mlir::Builder::getI16Type(), mlir::Value::getType(), mlir::Type::isBF16(), mlir::Type::isF16(), and mlir::Type::isInteger().