This dialect maps
LLVM IR
into MLIR by
defining the corresponding operations and types. LLVM IR metadata is usually
represented as MLIR attributes, which offer additional structure verification.
We use “LLVM IR” to designate the
intermediate representation of LLVM
and
“LLVM dialect” or “LLVM IR dialect” to refer to this MLIR dialect.
Unless explicitly stated otherwise, the semantics of the LLVM dialect operations
must correspond to the semantics of LLVM IR instructions and any divergence is
considered a bug. The dialect also contains auxiliary operations that smoothen
the differences in the IR structure, e.g., MLIR does not have phi operations
and LLVM IR does not have a constant operation. These auxiliary operations are
systematically prefixed with mlir, e.g. llvm.mlir.constant where llvm. is
the dialect namespace prefix.
LLVM dialect is not expected to depend on any object that requires an
LLVMContext, such as an LLVM IR instruction or type. Instead, MLIR provides
thread-safe alternatives compatible with the rest of the infrastructure. The
dialect is allowed to depend on the LLVM IR objects that don’t require a
context, such as data layout and triple description.
IR modules use the built-in MLIR ModuleOp and support all its features. In
particular, modules can be named, nested and are subject to symbol visibility.
Modules can contain any operations, including LLVM functions and globals.
An IR module may have an optional data layout and triple information attached
using MLIR attributes llvm.data_layout and llvm.triple, respectively. Both
are string attributes with the
same syntax
as in LLVM IR and
are verified to be correct. They can be defined as follows.
LLVM functions are represented by a special operation, llvm.func, that has
syntax similar to that of the built-in function operation but supports
LLVM-related features such as linkage and variadic argument lists. See detailed
description in the operation list
below
.
MLIR uses block arguments instead of PHI nodes to communicate values between
blocks. Therefore, the LLVM dialect has no operation directly equivalent to
phi in LLVM IR. Instead, all terminators can pass values as successor operands
as these values will be forwarded as block arguments when the control flow is
transferred.
For example:
^bb1:%0= llvm.addi %arg0,%cst:i32
llvm.br ^bb2[%0:i32]// If the control flow comes from ^bb1, %arg1 == %0.
^bb2(%arg1:i32)// ...
Since there is no need to use the block identifier to differentiate the source
of different values, the LLVM dialect supports terminators that transfer the
control flow to the same block with different arguments. For example:
Some value kinds in LLVM IR, such as constants and undefs, are uniqued in
context and used directly in relevant operations. MLIR does not support such
values for thread-safety and concept parsimony reasons. Instead, regular values
are produced by dedicated operations that have the corresponding semantics:
llvm.mlir.constant
,
llvm.mlir.undef
,
llvm.mlir.null
. Note how these operations are
prefixed with mlir. to indicate that they don’t belong to LLVM IR but are only
necessary to model it in MLIR. The values produced by these operations are
usable just like any other value.
Examples:
// Create an undefined value of structure type with a 32-bit integer followed
// by a float.
%0= llvm.mlir.undef :!llvm.struct<(i32,f32)>// Null pointer to i8.
%1= llvm.mlir.null :!llvm.ptr<i8>// Null pointer to a function with signature void().
%2= llvm.mlir.null :!llvm.ptr<func<void ()>>// Constant 42 as i32.
%3= llvm.mlir.constant(42:i32):i32// Splat dense vector constant.
%3= llvm.mlir.constant(dense<1.0>:vector<4xf32>):vector<4xf32>
Note that constants list the type twice. This is an artifact of the LLVM dialect
not using built-in types, which are used for typed MLIR attributes. The syntax
will be reevaluated after considering composite constants.
Global variables are also defined using a special operation,
llvm.mlir.global
, located at the module
level. Globals are MLIR symbols and are identified by their name.
Since functions need to be isolated-from-above, i.e. values defined outside the
function cannot be directly used inside the function, an additional operation,
llvm.mlir.addressof
, is provided to
locally define a value containing the address of a global. The actual value
can then be loaded from that pointer, or a new value can be stored into it if
the global is not declared constant. This is similar to LLVM IR where globals
are accessed through name and have a pointer type.
Module-level named objects in the LLVM dialect, namely functions and globals,
have an optional linkage attribute derived from LLVM IR
linkage types
. Linkage is
specified by the same keyword as in LLVM IR and is located between the operation
name (llvm.func or llvm.global) and the symbol name. If no linkage keyword
is present, external linkage is assumed by default. Linakge is distinct from
MLIR symbol visibility.
The LLVM dialect provides a mechanism to forward function-level attributes to
LLVM IR using the passthrough attribute. This is an array attribute containing
either string attributes or array attributes. In the former case, the value of
the string is interpreted as the name of LLVM IR function attribute. In the
latter case, the array is expected to contain exactly two string attributes, the
first corresponding to the name of LLVM IR function attribute, and the second
corresponding to its value. Note that even integer LLVM IR function attributes
have their value represented in the string form.
Example:
llvm.func@func() attributes {passthrough =["noinline",// value-less attribute
["alignstack","4"],// integer attribute with value
["other","attr"]]// attribute unknown to LLVM
}{
llvm.return}
If the attribute is not known to LLVM IR, it will be attached as a string
attribute.
LLVM dialect uses built-in types whenever possible and defines a set of
complementary types, which correspond to the LLVM IR types that cannot be
directly represented with built-in types. Similarly to other MLIR context-owned
objects, the creation and manipulation of LLVM dialect types is thread-safe.
MLIR does not support module-scoped named type declarations, e.g. %s = type {i32, i32} in LLVM IR. Instead, types must be fully specified at each use,
except for recursive types where only the first reference to a named type needs
to be fully specified. MLIR
type aliases
can be used
to achieve more compact syntax.
The general syntax of LLVM dialect types is !llvm., followed by a type kind
identifier (e.g., ptr for pointer or struct for structure) and by an
optional list of type parameters in angle brackets. The dialect follows MLIR
style for types with nested angle brackets and keyword specifiers rather than
using different bracket styles to differentiate types. Types inside the angle
brackets may omit the !llvm. prefix for brevity: the parser first attempts to
find a type (starting with ! or a built-in type) and falls back to accepting a
keyword. For example, !llvm.ptr<!llvm.ptr<i32>> and !llvm.ptr<ptr<i32>> are
equivalent, with the latter being the canonical form, and denote a pointer to a
pointer to a 32-bit integer.
1D vectors of signless integers or floating point types - vector<NxT>
(VectorType).
Note that only a subset of types that can be represented by a given class is
compatible. For example, signed and unsigned integers are not compatible. LLVM
provides a function, bool LLVM::isCompatibleType(Type), that can be used as a
compatibility check.
Each LLVM IR type corresponds to exactly one MLIR type, either built-in or
LLVM dialect type. For example, because i32 is LLVM-compatible, there is no
!llvm.i32 type. However, !llvm.ptr<T> is defined in the LLVM dialect as
there is no corresponding built-in type.
These types are parameterized by the types they contain, e.g., the pointee or
the element type, which can be either compatible built-in or LLVM dialect types.
Pointer types are parametric types parameterized by the element type and the
address space. The address space is an integer, but this choice may be
reconsidered if MLIR implements named address spaces. Their syntax is as
follows:
llvm-ptr-type ::= `!llvm.ptr<` type (`,` integer-literal)? `>`
where the optional integer literal corresponds to the memory space. Both cases
are represented by LLVMPointerType internally.
Array types represent sequences of elements in memory. Array elements can be
addressed with a value unknown at compile time, and can be nested. Only 1D
arrays are allowed though.
Array types are parameterized by the fixed size and the element type.
Syntactically, their representation is the following:
llvm-array-type ::= `!llvm.array<` integer-literal `x` type `>`
and they are internally represented as LLVMArrayType.
Function types represent the type of a function, i.e. its signature.
Function types are parameterized by the result type, the list of argument types
and by an optional “variadic” flag. Unlike built-in FunctionType, LLVM dialect
functions (LLVMFunctionType) always have single result, which may be
!llvm.void if the function does not return anything. The syntax is as follows:
llvm-func-type ::= `!llvm.func<` type `(` type-list (`,` `...`)? `)` `>`
For example,
!llvm.func<void ()>// a function with no arguments;
!llvm.func<i32(f32,i32)>// a function with two arguments and a result;
!llvm.func<void (i32,...)>// a variadic function with at least one argument.
In the LLVM dialect, functions are not first-class objects and one cannot have a
value of function type. Instead, one can take the address of a function and
operate on pointers to functions.
Vector types represent sequences of elements, typically when multiple data
elements are processed by a single instruction (SIMD). Vectors are thought of as
stored in registers and therefore vector elements can only be addressed through
constant indices.
Vector types are parameterized by the size, which may be either fixed or a
multiple of some fixed size in case of scalable vectors, and the element type.
Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are
still considered 1D.
LLVM dialect uses built-in vector types for fixed-size vectors of built-in
types, and provides additional types for fixed-sized vectors of LLVM dialect
types (LLVMFixedVectorType) and scalable vectors of any types
(LLVMScalableVectorType). These two additional types share the following
syntax:
llvm-vec-type ::= `!llvm.vec<` (`?` `x`)? integer-literal `x` type `>`
Note that the sets of element types supported by built-in and LLVM dialect
vector types are mutually exclusive, e.g., the built-in vector type does not
accept !llvm.ptr<i32> and the LLVM dialect fixed-width vector type does not
accept i32.
The following functions are provided to operate on any kind of the vector types
compatible with the LLVM dialect:
bool LLVM::isCompatibleVectorType(Type) - checks whether a type is a
vector type compatible with the LLVM dialect;
Type LLVM::getVectorElementType(Type) - returns the element type of any
vector type compatible with the LLVM dialect;
llvm::ElementCount LLVM::getVectorNumElements(Type) - returns the number
of elements in any vector type compatible with the LLVM dialect;
Type LLVM::getFixedVectorType(Type, unsigned) - gets a fixed vector type
with the given element type and size; the resulting type is either a
built-in or an LLVM dialect vector type depending on which one supports the
given element type.
vector<42 xi32>// Vector of 42 32-bit integers.
!llvm.vec<42 x ptr<i32>>// Vector of 42 pointers to 32-bit integers.
!llvm.vec<? x4 xi32>// Scalable vector of 32-bit integers with
// size divisible by 4.
!llvm.array<2 xvector<2 xi32>>// Array of 2 vectors of 2 32-bit integers.
!llvm.array<2 x vec<2 x ptr<i32>>>// Array of 2 vectors of 2 pointers to 32-bit
// integers.
The structure type is used to represent a collection of data members together in
memory. The elements of a structure may be any type that has a size.
Structure types are represented in a single dedicated class
mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially
empty) name, a (potentially empty) list of contained types and a bitmask
indicating whether the struct is named, opaque, packed or uninitialized.
Structure types that don’t have a name are referred to as literal structs.
Such structures are uniquely identified by their contents. Identified structs
on the other hand are uniquely identified by the name.
Identified structure types are uniqued using their name in a given context.
Attempting to construct an identified structure with the same name a structure
that already exists in the context will result in the existing structure being
returned. MLIR does not auto-rename identified structs in case of name
conflicts because there is no naming scope equivalent to a module in LLVM IR
since MLIR modules can be arbitrarily nested.
Programmatically, identified structures can be constructed in an uninitialized
state. In this case, they are given a name but the body must be set up by a
later call, using MLIR’s type mutation mechanism. Such uninitialized types can
be used in type construction, but must be eventually initialized for IR to be
valid. This mechanism allows for constructing recursive or mutually referring
structure types: an uninitialized type can be used in its own initialization.
Once the type is initialized, its body cannot be changed anymore. Any further
attempts to modify the body will fail and return failure to the caller unless
the type is initialized with the exact same body. Type initialization is
thread-safe; however, if a concurrent thread initializes the type before the
current thread, the initialization may return failure.
The syntax for identified structure types is as follows.
llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>`
| `!llvm.struct<` string-literal, `packed`?
`(` type-or-ref-list `)` `>`
type-or-ref-list ::= <maybe empty comma-separated list of type-or-ref>
type-or-ref ::= <any compatible type with optional !llvm.>
| `!llvm.`? `struct<` string-literal `>`
The body of the identified struct is printed in full unless the it is
transitively contained in the same struct. In the latter case, only the
identifier is printed. For example, the structure containing the pointer to
itself is represented as !llvm.struct<"A", (ptr<"A">)>, and the structure A
containing two pointers to the structure B containing a pointer to the
structure A is represented as !llvm.struct<"A", (ptr<"B", (ptr<"A">)>, ptr<"B", (ptr<"A">))>. Note that the structure B is “unrolled” for both
elements. A structure with the same name but different body is a syntax error.The user must ensure structure name uniqueness across all modules processed in
a given MLIR context. Structure names are arbitrary string literals and may
include, e.g., spaces and keywords.
Identified structs may be opaque. In this case, the body is unknown but the
structure type is considered initialized and is valid in the IR.
Literal structures are uniqued according to the list of elements they contain,
and can optionally be packed. The syntax for such structs is as follows.
llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` type-list `)` `>`
type-list ::= <maybe empty comma-separated list of types with optional !llvm.>
Literal structs cannot be recursive, but can contain other structs. Therefore,
they must be constructed in a single step with the entire list of contained
elements provided.
!llvm.struct<>// NOT allowed
!llvm.struct<()>// empty, literal
!llvm.struct<(i32)>// literal
!llvm.struct<(struct<(i32)>)>// struct containing a struct
!llvm.struct<packed (i8,i32)>// packed struct
!llvm.struct<"a">// recursive reference, only allowed within
// another struct, NOT allowed at top level
!llvm.struct<"a", ptr<struct<"a">>>// supported example of recursive reference
!llvm.struct<"a",()>// empty, named (necessary to differentiate from
// recursive reference)
!llvm.struct<"a", opaque>// opaque, named
!llvm.struct<"a",(i32)>// named
!llvm.struct<"a", packed (i8,i32)>// named, packed
Creates an SSA value containing a pointer to a global variable or constant
defined by llvm.mlir.global. The global value can be defined after its
first referenced. If the global value is a constant, storing into it is not
allowed.
Examples:
func@foo(){// Get the address of a global variable.
%0= llvm.mlir.addressof @const:!llvm.ptr<i32>// Use it as a regular pointer.
%1= llvm.load %0:!llvm.ptr<i32>// Get the address of a function.
%2= llvm.mlir.addressof @foo:!llvm.ptr<func<void ()>>// The function address can be used for indirect calls.
llvm.call %2():()->()}// Define the global.
llvm.mlir.global @const(42:i32):i32
In LLVM IR, functions may return either 0 or 1 value. LLVM IR dialect
implements this behavior by providing a variadic call operation for 0- and
1-result functions. Even though MLIR supports multi-result functions, LLVM
IR dialect disallows them.
The call instruction supports both direct and indirect calls. Direct calls
start with a function name (@-prefixed) and indirect calls start with an
SSA value (%-prefixed). The direct callee, if present, is stored as a
function attribute callee. The trailing type of the instruction is always
the MLIR function type, which may be different from the indirect callee that
has the wrapped LLVM IR function type.
Examples:
// Direct call without arguments and with one result.
%0= llvm.call @foo():()->(f32)// Direct call with arguments and without a result.
llvm.call @bar(%0):(f32)->()// Indirect call with an argument and without a result.
llvm.call %1(%0):(f32)->()
Unlike LLVM IR, MLIR does not have first-class constant values. Therefore,
all constants must be created as SSA values before being used in other
operations. llvm.mlir.constant creates such values for scalars and
vectors. It has a mandatory value attribute, which may be an integer,
floating point attribute; dense or sparse attribute containing integers or
floats. The type of the attribute is one of the corresponding MLIR builtin
types. It may be omitted for i64 and f64 types that are implied. The
operation produces a new SSA value of the specified LLVM IR dialect type.
The type of that value must correspond to the attribute type converted to
LLVM IR.
Examples:
// Integer constant, internal i32 is mandatory
%0= llvm.mlir.constant(42:i32):i32// It's okay to omit i64.
%1= llvm.mlir.constant(42):i64// Floating point constant.
%2= llvm.mlir.constant(42.0:f32):f32// Splat dense vector constant.
%3= llvm.mlir.constant(dense<1.0>:vector<4xf32>):vector<4xf32>
Since MLIR allows for arbitrary operations to be present at the top level,
global variables are defined using the llvm.mlir.global operation. Both
global constants and variables can be defined, and the value may also be
initialized in both cases.
There are two forms of initialization syntax. Simple constants that can be
represented as MLIR attributes can be given in-line:
llvm.mlir.global @variable(32.0:f32):f32
This initialization and type syntax is similar to llvm.mlir.constant and
may use two types: one for MLIR attribute and another for the LLVM value.
These types must be compatible.
More complex constants that cannot be represented as MLIR attributes can be
given in an initializer region:
// This global is initialized with the equivalent of:
// i32* getelementptr (i32* @g2, i32 2)
llvm.mlir.global constant@int_gep():!llvm.ptr<i32>{%0= llvm.mlir.addressof @g2:!llvm.ptr<i32>%1= llvm.mlir.constant(2:i32):i32%2= llvm.getelementptr %0[%1]:(!llvm.ptr<i32>,i32)->!llvm.ptr<i32>// The initializer region must end with `llvm.return`.
llvm.return%2:!llvm.ptr<i32>}
Only one of the initializer attribute or initializer region may be provided.
llvm.mlir.global must appear at top-level of the enclosing module. It uses
an @-identifier for its value, which will be uniqued by the module with
respect to other @-identifiers in it.
Examples:
// Global values use @-identifiers.
llvm.mlir.global constant@cst(42:i32):i32// Non-constant values must also be initialized.
llvm.mlir.global @variable(32.0:f32):f32// Strings are expected to be of wrapped LLVM i8 array type and do not
// automatically include the trailing zero.
llvm.mlir.global @string("abc"):!llvm.array<3 xi8>// For strings globals, the trailing type may be omitted.
llvm.mlir.global constant@no_trailing_type("foo bar")// A complex initializer is constructed with an initializer region.
llvm.mlir.global constant@int_gep():!llvm.ptr<i32>{%0= llvm.mlir.addressof @g2:!llvm.ptr<i32>%1= llvm.mlir.constant(2:i32):i32%2= llvm.getelementptr %0[%1]:(!llvm.ptr<i32>,i32)->!llvm.ptr<i32>
llvm.return%2:!llvm.ptr<i32>}
Similarly to functions, globals have a linkage attribute. In the custom
syntax, this attribute is placed between llvm.mlir.global and the optional
constant keyword. If the attribute is omitted, external linkage is
assumed by default.
Examples:
// A constant with internal linkage will not participate in linking.
llvm.mlir.global internal constant@cst(42:i32):i32// By default, "external" linkage is assumed and the global participates in
// symbol resolution at link-time.
llvm.mlir.global @glob(0:f32):f32
The InlineAsmOp mirrors the underlying LLVM semantics with a notable
exception: the embedded asm_string is not allowed to define or reference
any symbol or any global variable: only the operands of the op may be read,
written, or referenced.
Attempting to define or reference any symbol or any global behavior is
considered undefined behavior at this time.
MLIR functions are defined by an operation that is not built into the IR
itself. The LLVM dialect provides an llvm.func operation to define
functions compatible with LLVM IR. These functions have LLVM dialect
function type but use MLIR syntax to express it. They are required to have
exactly one result type. LLVM function operation is intended to capture
additional properties of LLVM functions, such as linkage and calling
convention, that may be modeled differently by the built-in MLIR function.
// The type of @bar is !llvm<"i64 (i64)">
llvm.func@bar(%arg0:i64)->i64{
llvm.return%arg0:i64}// Type type of @foo is !llvm<"void (i64)">
// !llvm.void type is omitted
llvm.func@foo(%arg0:i64){
llvm.return}// A function with `internal` linkage.
llvm.func internal @internal_func(){
llvm.return}
Unlike LLVM IR, MLIR does not have first-class null pointers. They must be
explicitly created as SSA values using llvm.mlir.null. This operation has
no operands or attributes, and returns a null value of a wrapped LLVM IR
pointer type.
Examples:
// Null pointer to i8.
%0= llvm.mlir.null :!llvm.ptr<i8>// Null pointer to a function with signature void().
%1= llvm.mlir.null :!llvm.ptr<func<void ()>>
Unlike LLVM IR, MLIR does not have first-class undefined values. Such values
must be created as SSA values using llvm.mlir.undef. This operation has no
operands or attributes. It creates an undefined value of the specified LLVM
IR dialect type wrapping an LLVM IR structure type.
Example:
// Create a structure with a 32-bit integer followed by a float.
%0= llvm.mlir.undef :!llvm.struct<(i32,f32)>