MLIR

Multi-Level IR Compiler Framework

'linalg' Dialect

Rationale 

MLIR Codegen Flow

Linalg is designed to solve the High-level Hierarchical Optimization (HHO box) in MLIR and to interoperate nicely within a Mixture Of Expert Compilers environment (i.e. the CGSel box).

The Rationale Document goes into significantly more design and architectural decision details.

Set of Key Transformations 

The following key transformations have been central to driving the design of Linalg. They are all implemented in terms of the properties of the linalg.generic OpInterface and avoid the pitfall of relying on hardcoded one-off op knowledge.

The textual form description of these transformations is left for future work. Still, it is useful to list the key transformations that are performed on the Linalg IR and that have influenced its design:

  1. Progressive Buffer Allocation.
  2. Parametric Tiling.
  3. Promotion to Temporary Buffer in Fast Memory.
  4. Tiled Producer-Consumer Fusion with Parametric Tile-And-Fuse.
  5. Map to Parallel and Reduction Loops and Hardware.
  6. Vectorization: Rewrite in Vector Form.
  7. Lower to Loops (Affine, Generic, and Parallel).
  8. Lower to Library Calls or Special Instructions, Intrinsics or ISA.
  9. Partially Lower to Iterations Over a Finer-Grained Linalg Op.

High-Level Description of Linalg Ops 

Linalg takes at least some inspiration from all previously listed prior art . The design enables the definition of CustomOps with generic properties that enable key transformations , including lowering to scalar load/store and other operations or to external library calls and intrinsics.

These ops can have either tensor or buffer as both input and output operands. Output tensors operands serve the purpose of providing a unifying abstraction and give a shape to the results. Output tensors can come in 2 flavors and are always associated with a corresponding op result:

  1. an “init tensor” output value which provides an initial value for a tensor that is created by iteratively updating the result (also called “destructive updates”). Such tensor is always materialized in some form. If enough fusion occurs it may end up being materialized only as a register-level SSA value. It is expected (but not required) that the destructive update pattern can be rewritten as an inplace update on buffers.

  2. a “shape-only” tensor output value whose underlying elements are not used in the payload computation and only serves the purpose of carrying shape information to lower levels of abstraction. In the future this will be replaced by an appropriate shape type when it is available as a builtin type (see the discourse discussion Linalg and Shapes for more details).

Payload-Carrying Ops 

Linalg defines two payload carrying operations that implement the structured ops abstraction on tensors and buffers. This is architected as two generic operations linalg.generic (resp. linalg.indexed_generic) that can express custom operations with index-free semantics (resp. indexing semantics). The properties of these generic ops are the result of applying the guiding principles described in the Rationale Document . They are listed next, with a brief example and discussion for each.

Property 1: Input and Output Operands Define The Iteration Space 

A linalg.generic op fully derives the specification of its iteration space from its operands. The property enforces that a localized IR element (the op) has all the information needed to synthesize the control-flow required to iterate over its operands, according to their type. This notion of IR localization bears some resemblance to URUK .

Consider the following fully specified linalg.generic example. Here, the first operand is a memref of f32 scalar elements that has an ordinary identity layout, and the second one is a memref of 4-element vectors with a 2-strided, 1-offset layout.

// File name: example1.mlir
#accesses = [
  affine_map<(m) -> (m)>,
  affine_map<(m) -> (m)>
]

#attrs = {
  indexing_maps = #accesses,
  iterator_types = ["parallel"]
}

// memory layouts
#identity = affine_map<(d0) -> (d0)>

func @example(%A: memref<?xf32, #identity>,
              %B: memref<?xvector<4xf32>, offset: 1, strides: [2]>) {
  linalg.generic #attrs
  ins(%A: memref<?xf32, #identity>)
  outs(%B: memref<?xvector<4xf32>, offset: 1, strides: [2]>) {
  ^bb0(%a: f32, %b: vector<4xf32>):
    %c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
    linalg.yield %c: vector<4xf32>
  }
  return
}

The property “Input and Output Operands Define The Iteration Space” is materialized by a lowering into a form that will resemble:

// Run: mlir-opt example1.mlir -allow-unregistered-dialect -convert-linalg-to-loops
// This converted representation is in the `scf` dialect.
// It's syntax can be found here: https://mlir.llvm.org/docs/Dialects/SCFDialect/
#map0 = affine_map<(d0) -> (d0 * 2 + 1)>

func @example(%arg0: memref<?xf32>, %arg1: memref<?xvector<4xf32>, #map0>) {
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %0 = dim %arg0, %c0 : memref<?xf32>
  scf.for %arg2 = %c0 to %0 step %c1 {
    %1 = load %arg0[%arg2] : memref<?xf32>
    %2 = load %arg1[%arg2] : memref<?xvector<4xf32>, #map0>
    %3 = "some_compute"(%1, %2) : (f32, vector<4xf32>) -> vector<4xf32>
    store %3, %arg1[%arg2] : memref<?xvector<4xf32>, #map0>
  }
  return
}

The property participates in simplifying analyses and transformations. For instance, it guarantees no out-of bounds access can occur by construction (assuming dynamic operand dimensions agree with each other, which is the purpose of the assert runtime check).

Before lowering to loop form, loop induction variables and iterators are implicit (i.e. not yet materialized).

The main implications are that:

  1. The semantics of the ops are restricted to operate on structured data types, on which we can define an iterator.

  2. This does not model arbitrary code with side-effects.

We do not think these are serious limitations in practice because MLIR is all about mixing different levels of abstractions in the same IR. As long as Linalg can progressively lower to the next level of abstraction, it can also be just bypassed for things that do not fit.

At the same time, conditioning op semantics on structured data types is a very promising path towards extensibility to non-dense tensors as experience with LIFT abstractions for sparse and position-dependent arrays , as well as TACO , has shown.

Property 2: Reversible Mappings Between Control and Data Structures 

A linalg.generic defines the mapping between the iteration space (i.e. the loops) and the data.

Consider the following fully specified linalg.generic example. Here, the first memref is a 2-strided one on both of its dimensions, and the second memref uses an identity layout.

// File name: example2.mlir
#indexing_maps = [
  affine_map<(i, j) -> (j, i)>,
  affine_map<(i, j) -> (j)>
]

#attrs = {
  indexing_maps = #indexing_maps,
  iterator_types = ["parallel", "parallel"]
}

func @example(%A: memref<8x?xf32, offset: 0, strides: [2, 2]>,
              %B: memref<?xvector<4xf32>>) {
  linalg.generic #attrs
  ins(%A: memref<8x?xf32, offset: 0, strides: [2, 2]>)
  outs(%B: memref<?xvector<4xf32>>) {
  ^bb0(%a: f32, %b: vector<4xf32>):
    %c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
    linalg.yield %c: vector<4xf32>
  }
  return
}

The property “Reversible Mappings Between Control and Data Structures” is materialized by a lowering into a form that will resemble:

// Run: mlir-opt example2.mlir -allow-unregistered-dialect -convert-linalg-to-loops
#map0 = affine_map<(d0, d1) -> (d0 * 2 + d1 * 2)>

func @example(%arg0: memref<8x?xf32, #map0>, %arg1: memref<?xvector<4xf32>>) {
  %c8 = constant 8 : index
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %0 = dim %arg0, %c1 : memref<8x?xf32, #map0>
  scf.for %arg2 = %c0 to %0 step %c1 {
    scf.for %arg3 = %c0 to %c8 step %c1 {
      %1 = load %arg0[%arg3, %arg2] : memref<8x?xf32, #map0>
      %2 = load %arg1[%arg3] : memref<?xvector<4xf32>>
      %3 = "some_compute"(%1, %2) : (f32, vector<4xf32>) -> vector<4xf32>
      store %3, %arg1[%arg3] : memref<?xvector<4xf32>>
    }
  }
  return
}

This mapping needs to be reversible because we want to be able to go back and forth between the two and answer questions such as:

  • Given a subset of the iteration space, what subset of data does it read and write?
  • Given a subset of data read or written, what subset of the iteration space is responsible for this read or write?

Answering these 2 questions is one of the main analyses that Linalg uses to implement transformations such as tiling, tiled producer-consumer fusion, and promotion to temporary buffers in fast memory.

In the current implementation, linalg.generic uses a list of AffineMaps (see the #indexing_maps attribute in the previous examples). This is a pragmatic short-term solution, but in the longer term note that this property could be even evaluated dynamically, similarly to inspector-executor algorithms.

Property 3: The Type Of Iterators is Defined Explicitly 

A linalg.generic op fully declares the type of its iterators. This information is used in transformations.

These properties are derived from established practice in the field and mirror the properties from Ken Kennedy’s Optimizing Compilers for Modern Architectures . The key idea of legality of loop transformations expressed by Kennedy is that the lexicographic order of all dependence vectors must be preserved.

This can be better captured directly at the loop level thanks to specific iterator types, among which: parallel, reduction, partition, permutable/monotonic, sequential, dependence distance, …

These types are traditionally the result of complex dependence analyses and have been referred to as “bands” in the polyhedral community (e.g. parallel bands, permutable bands, etc, in ISL schedule tree parlance).

Specifying the information declaratively in a linalg.generic allows conveying properties that may be hard (or even impossible) to derive from lower-level information. These properties can be brought all the way to the moment when they are useful for transformations, used and then discarded.

Additionally, these properties may also be viewed as a contract that the frontend/user guarantees and that the compiler may take advantage of. The common example is the use of data-dependent reduction semantics for specifying histogram computations. If the frontend has additional knowledge that proper atomic operations are available, it may be better to specify parallel semantics and use the special atomic in the computation region.

At this time, Linalg only has an explicit use for parallel and reduction loops but previous experience shows that the abstraction generalizes.

Property 4: The Compute Payload is Specified With a Region 

A linalg.generic op has a compute payload that is fully generic thanks to the use of Regions .

The region takes as arguments the scalar elemental types of the tensor or buffer operands of the linalg.generic. For flexibility and ability to match library calls, additional special values may be passed. For instance, a linalg.fill operation takes a buffer and an additional scalar value.

At this time there are no additional restrictions to the region semantics. This is meant to allow the exploration of various design tradeoffs at the intersection of regions and iterator types. In particular, the frontend is responsible for the semantics of iterator types to correspond to the operations inside the region: the region can capture buffers arbitrarily and write into them. If this conflicts with some parallel iterator requirement, this is undefined behavior.

Previous examples already elaborate compute payloads with an unregistered function "some_compute". The following code snippet shows what the result will be when using a concrete operation addf:

// File name: example3.mlir
#map = affine_map<(i, j) -> (i, j)>

#attrs = {
  indexing_maps = [#map, #map, #map],
  iterator_types = ["parallel", "parallel"]
}

func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
  linalg.generic #attrs
  ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)
  outs(%C: memref<?x?xf32>) {
    ^bb0(%a: f32, %b: f32, %c: f32):
      %d = addf %a, %b : f32
      linalg.yield %d : f32
  }

  return
}

This function basically element-wise adds up two matrices (%A and %B) and stores the result into another one (%C).

The property “The Compute Payload is Specified With a Region” is materialized by a lowering into a form that will resemble:

func @example(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
  %c0 = constant 0 : index
  %c1 = constant 1 : index
  %0 = dim %arg0, %c0 : memref<?x?xf32>
  %1 = dim %arg0, %c1 : memref<?x?xf32>
  scf.for %arg3 = %c0 to %0 step %c1 {
    scf.for %arg4 = %c0 to %1 step %c1 {
      %2 = load %arg0[%arg3, %arg4] : memref<?x?xf32>
      %3 = load %arg1[%arg3, %arg4] : memref<?x?xf32>
      %4 = addf %2, %3 : f32
      store %4, %arg2[%arg3, %arg4] : memref<?x?xf32>
    }
  }
  return
}

In the process of lowering to loops and lower-level constructs, similar requirements are encountered, as are discussed in the inlined call op proposal . We expect to be able to reuse the common lower-level infrastructure provided it evolves to support both region arguments and captures.

Property 5: May Map To an External Library Call 

A linalg.generic op may map to an external library call by specifying a SymbolAttr. At this level of abstraction, the important glue is the ability to perform transformations that preserve the structure necessary to call the external library after different transformations have been applied.

This involves considerations related to preservation of op semantics and integration at the ABI level. Regardless of whether one wants to use external library calls or a custom ISA, the problem for codegen is similar: preservation of a fixed granularity.

Consider the following example that adds an additional attribute library_call="pointwise_add" that specifies the name of an external library call we intend to use:

// File name: example4.mlir
#indexing_maps = [
  affine_map<(i, j) -> (i, j)>,
  affine_map<(i, j) -> (i, j)>,
  affine_map<(i, j) -> (i, j)>
]

#attrs = {
  indexing_maps = #indexing_maps,
  iterator_types = ["parallel", "parallel"],
  library_call = "pointwise_add"
}

func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
  linalg.generic #attrs
  ins(%A, %B: memref<?x?xf32>, memref<?x?xf32>)
  outs(%C: memref<?x?xf32>) {
  ^bb0(%a: f32, %b: f32, %c: f32):
    %d = addf %a, %b : f32
    linalg.yield %d : f32
  }
  return
}

The property “Map To an External Library Call” is materialized by a lowering into a form that will resemble:

// Run: mlir-opt example4.mlir -convert-linalg-to-std
// Note that we lower the Linalg dialect directly to the Standard dialect.
// See this doc: https://mlir.llvm.org/docs/Dialects/Standard/

#map0 = affine_map<(d0, d1)[s0, s1, s2] -> (d0 * s1 + s0 + d1 * s2)>

func @example(%arg0: memref<?x?xf32>, %arg1: memref<?x?xf32>, %arg2: memref<?x?xf32>) {
  %0 = memref.cast %arg0 : memref<?x?xf32> to memref<?x?xf32, #map0>
  %1 = memref.cast %arg1 : memref<?x?xf32> to memref<?x?xf32, #map0>
  %2 = memref.cast %arg2 : memref<?x?xf32> to memref<?x?xf32, #map0>
  call @pointwise_add(%0, %1, %2) : (memref<?x?xf32, #map0>, memref<?x?xf32, #map0>, memref<?x?xf32, #map0>) -> ()
  return
}
func @pointwise_add(memref<?x?xf32, #map0>, memref<?x?xf32, #map0>, memref<?x?xf32, #map0>) attributes {llvm.emit_c_interface}

Which, after lowering to LLVM resembles:

// Run: mlir-opt example4.mlir -convert-linalg-to-std | mlir-opt -convert-std-to-llvm
// Some generated code are omitted here.
func @example(%arg0: !llvm<"float*">, ...) {
  ...
  llvm.call @pointwise_add(...) : (!llvm<"float*">, ...) -> ()
  return
}

llvm.func @pointwise_add(%arg0: !llvm<"float*">, ...) attributes {llvm.emit_c_interface} {
  ...
  llvm.call @_mlir_ciface_pointwise_add(%9, %19, %29) : (!llvm."{ float*, float*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ f32*, f32*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ float*, float*, i64, [2 x i64], [2 x i64] }
*">) -> ()
  llvm.return
}
llvm.func @_mlir_ciface_pointwise_add(!llvm."{ float*, float*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ f32*, f32*, i64, [2 x i64], [2 x i64] }*">, !llvm<"{ f32*, f32*, i64, [2 x i64], [2 x i64] }*">) attributes {llvm.emit_c_interface}
Convention For External Library Interoperability 

The linalg dialect adopts a convention that is similar to BLAS when offloading operations to fast library implementations: pass a non-owning pointer to input and output data with additional metadata. This convention is also found in libraries such as MKL, OpenBLAS, BLIS, cuBLAS, cuDNN, etc.. and more generally at interface points across language boundaries (e.g. C++ / Python).

Generally, linalg passes non-owning pointers to View data structures to pre-compiled library calls linked externally.

There is an ongoing discussion on the topic of extending interoperability in the presence of key attributes.

Property 6: Perfectly Nested Writes To The Whole Output Operands 

Perfectly nested loops form a particularly important class of structure that enables key loop transformations such as tiling and mapping to library calls. Unfortunately, this type of structure is easily broken by transformations such as partial loop fusion. Tiling and mapping to library calls become more challenging, or even infeasible. Linalg ops adopt perfect-nestedness as a first-class property: the structure cannot be broken and is transported in the IR by construction.

A linalg.generic op represents a perfectly nested loop nest that writes the entire memory region. This is a structural constraint across regions and loops that has proven to be key in simplifying transformations.

One particular point to mention is that converting imperfectly nested code into perfectly nested code can often be done with enough loop distribution and embedding of conditionals down to the innermost loop level.

Previous experience with Tensor Comprehensions gave us the intuition that forcing innermost control-flow nesting is a lot like writing data-parallel code with arrays of boolean values and predication. This type of trick has also been used before in polyhedral compilers to convert non-affine control into affine compute dependencies.

While it may be possible to automate such rewrites from generic IR, linalg.generic just forces the semantics for now.

The key implication is that this conversion to deep predication needs to be undone once we are done with Linalg transformations. After iterators and induction variables are materialized (i.e. after lowering out of linalg.generic occurred), the overall performance will be greatly influenced by the quality of canonicalizations, foldings and Loop Independent Code Motion (LICM).

In the grander scheme, the reliance on late LICM was deemed a necessary risk.

Putting it Together 

As it stands, the six properties above define the semantics of a linalg.generic op. It is an open question whether all of these semantics are strictly necessary in practice and whether some should or could be derived automatically while still maintaining the core guiding principles .

For the time being, we have settled on the combination of these properties because of empirical evidence building and working on multiple high-level compilers. As we lay those down and engage more with the community, we expect multiple rounds of discussions and design changes to the original architecture.

Data Representation: Views 

The current implementation uses the Strided MemRef (a.k.a View) abstraction. The name View is used interchangeably in linalg to signify Strided MemRef. In the future we expect to use other structured data types and support ragged, mixed-sparse and other types. We expect to draw on the experience from existing LIFT abstractions for sparse and position-dependent arrays .

Metadata Ops 

A set of ops that manipulate metadata but do not move memory. These ops take view operands + extra attributes and return new views. The returned views generally alias the operand view. At the moment the existing ops are:

* `memref.view`,
* `std.subview`,
* `memref.transpose`.
* `linalg.range`,
* `linalg.slice`,
* `linalg.reshape`,

Future ops are added on a per-need basis but should include:

* `linalg.tile`,
* `linalg.intersection`,
* `linalg.convex_union`,
* `linalg.difference` (would need to work on a list of views).

These additional operations correspond to abstractions that have been known to work in the field of large-scale distributed stencil computations.

In a longer-term future, the abstractions from Legion data-centric programming model seem generally appealing.

Named Payload-Carrying Ops 

Additionally, linalg provides a small subset of commonly named operations:

* `linalg.copy`,
* `linalg.fill`,
* `linalg.dot`,
* `linalg.matmul`,
* `linalg.conv`.

These named operations adhere to the linalg.generic op interface. Work is in progress to define declarative mechanisms to automatically generate named ops from a description in terms of only the generic op interface.

This is the main reason there are only a small number of ops today: we expect them to be auto-generated from Tablegen soon.

Named Payload Ops Specification 

Linalg provides a declarative specification and a generation tool (mlir-linalg-ods-gen) to automatically produce named ops from a notation that is inspired by Einstein notation.

The syntax and semantics used in mlir-linalg-ods-gen are very much in flight and borrow from Tensor Comprehensions (TC) but differ in a few dimensions, to better adapt to Linalg:

  1. The input and output tensor parameters are specified as id : type(symbolic-affine-expression-list) (e.g. A : f32(M, N + M)) and each new symbol is discovered eagerly. TC on the other hand does not allow general symbolic affine expressions.
  2. The output shapes are specified explicitly, in TC they are always derived from the input shapes.
  3. The operations used to specify computations use EDSC intrinsics so that they can easily be parsed and emitted into a simple region builder without resorting to more general MLIR parsing.
  4. Reduction dimensions are specified with angle bracket notation on the operation they apply to (e.g. std_add<k> specifies that k is a reduction dimension). In TC, the reduction dimensions are inferred. If one of the operand is not used in any expressions, it will be considered a shape-only operand, and the result of the indexing_map will be reduction dimensions.
  5. The parallel and reduction dimension are ordered by the textual program order. For instance, in the comprehension O(i, j) = std_add<k, l>(...), i (resp. j) is a parallel iterator encoded by affine dimension of position 0 (resp. 1); k (resp. l) is a reduction iterator encoded by an affine dimension of position 2 (resp. 3).
  6. A list of attributes can be defined for the op with the format of attr( strides: 2xi32) and referenced in comprehension like strides[0]. These attribute uses will be parsed as affine symbols to generate op definition and implementation. For a concrete op instance, the runtime constant values from the attributes will be used to replace the affine symbols and simplify the indexing maps.

These decisions and syntax are subject to evolution and change. In particular, op-specific attributes, dynamic ranks, some form of templating, shape calculation function specification, etc. may be added in the future.

At this time, the following restrictions are imposed on the syntax and semantics:

  1. Each def may only contain a single comprehension but each comprehension may perform multiple updates.
  2. Each tensor may only be used with a single indexing expression.

A """-wrapped doc string can be attached to the named op. It should contain a oneliner for summary first, followed by lengthy description.

The following specification may be used to define a named batchmatmul op:

def batchmatmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N))
"""Batch matrix-multiply operation.

This operation performs batch matrix-multiply over ...
"""
{
  C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}

When mlir-linalg-ods-gen -gen-ods-decl=1 is called, the following ODS is produced:

def batchmatmulOp : LinalgNamedStructured_Op<"batchmatmul", [
  NInputs<2>,
  NOutputs<1>,
  NamedStructuredOpTrait]> { ... }

When mlir-linalg-ods-gen -gen-impl=1 is called, the following C++ is produced:

llvm::Optional<SmallVector<StringRef, 8>> batchmatmul::referenceIterators() {
  return SmallVector<StringRef, 8>{
    getParallelIteratorTypeName(),
    getParallelIteratorTypeName(),
    getParallelIteratorTypeName(),
    getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batchmatmul::referenceIndexingMaps() {
  MLIRContext *context = getContext();
  AffineExpr d0, d1, d2, d3;
  bindDims(context, d0, d1, d2, d3);
  return SmallVector<AffineMap, 8>{
      AffineMap::get(4, 0, {d0, d1, d3}),
      AffineMap::get(4, 0, {d3, d2}),
      AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batchmatmul::regionBuilder(ArrayRef<BlockArgument> args) {
  using namespace edsc;
  using namespace intrinsics;
  Value _0(args[0]), _1(args[1]), _2(args[2]);
  Value _4 = std_mulf(_0, _1);
  Value _5 = std_addf(_2, _4);
  (linalg_yield(ValueRange{ _5 }));
}

YAML Based Named Structured Ops 

Linalg provides a declarative generation tool (mlir-linalg-ods-yaml-gen) to automatically produce named ops from a YAML-based op description format intended to capture the structure of the named ops and be generated from a higher level “mathy” DSL syntax. This facility is currently in flight and is intended to subsume the above when ready. See the C++ class to YAML mapping traits in mlir-mlinalg-ods-yaml-gen.cpp as the source of truth for the schema.

Most of the above documentation roughly applies to this path and will be ported as migration continues.

Open Issues and Design Alternatives 

Multiple open issues and design alternatives are in flight and it is time to lay them out for the community to discuss and pick apart:

  1. Should linalg.generic support nesting?
  2. Should linalg.generic regions take views or only scalars?
  3. Should we try to solve automatic differentiation at this level of abstraction?
  4. Are all the six properties really necessary?
  5. Is this relying too much on declarative specification and would we be better off relying more on analyses?
  6. Is this general enough for the community’s needs? If not how should this be extended, if at all? …

These key questions (and much more) should be really thought of in the general context of MLIR in which different levels of IR interoperate seamlessly. In practice, it is not necessary (or beneficial) to try and solve all problems in the same IR.

Operations 

linalg.batch_matmul_i16_i16_i32 (::mlir::linalg::BatchMatmulI16I16I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.batch_matmul_i32_i32_i32 (::mlir::linalg::BatchMatmulI32I32I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.batch_matmul_i8_i8_i32 (::mlir::linalg::BatchMatmulI8I8I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.batch_matmul (::mlir::linalg::BatchMatmulOp) 

Performs a batched matrix multiplication of two 3D inputs.

Numeric casting is performed on the operands to the inner multiply, promoting the same data type as the accumulator/output.

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_3d (::mlir::linalg::ConvDHWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_2d (::mlir::linalg::ConvHWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_3d_input_ncdhw_filter_dhwcf (::mlir::linalg::ConvInputNCDHWFilterDHWCFOp) 

A 3-D convolution given NCDHW layout input and DHWCF layout filter.

Computes a 3-D convolution given 5-D input and filter. The data layout is NCDHW and the data layout of filter is DHWCF.

xing maps for these three tensors contain 9 dimensions, following the (N, F, D, H, W, KD, KH, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [3]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [3]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_2d_input_nchw_filter_hwcf (::mlir::linalg::ConvInputNCHWFilterHWCFOp) 

A 2-D convolution given NCHW layout input and HWCF layout filter.

Computes a 2-D convolution given 4-D input and filter. The data layout is NCHW and the data layout of filter is HWCF.

xing maps for these three tensors contain 7 dimensions, following the (N, F, H, W, KH, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_1d_input_ncw_filter_wcf (::mlir::linalg::ConvInputNCWFilterWCFOp) 

A 1-D convolution given NCW layout input and WCF layout filter.

Computes a 1-D convolution given 3-D input and filter. The data layout is NCW and the data layout of filter is WCF.

xing maps for these three tensors contain 5 dimensions, following the (N, F, W, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [1]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [1]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_3d_input_ndhwc_filter_dhwcf (::mlir::linalg::ConvInputNDHWCFilterDHWCFOp) 

A 3-D convolution given NDHWC layout input and DHWCF layout filter.

Computes a 3-D convolution given 5-D input and filter. The data layout is NDHWC and the data layout of filter is DHWCF.

xing maps for these three tensors contain 9 dimensions, following the (N, D, H, W, F, KD, KH, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [3]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [3]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_2d_input_nhwc_filter_hwcf (::mlir::linalg::ConvInputNHWCFilterHWCFOp) 

A 2-D convolution given NHWC layout input and HWCF layout filter.

Computes a 2-D convolution given 4-D input and filter. The data layout is NHWC and the data layout of filter is HWCF.

xing maps for these three tensors contain 7 dimensions, following the (N, H, W, F, KH, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_1d_input_nwc_filter_wcf (::mlir::linalg::ConvInputNWCFilterWCFOp) 

A 1-D convolution given NWC layout input and WCF layout filter.

Computes a 1-D convolution given 3-D input and filter. The data layout is NWC and the data layout of filter is WCF.

xing maps for these three tensors contain 5 dimensions, following the (N, W, F, KW, C).

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [1]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [1]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_3d_ncdhw (::mlir::linalg::ConvNCDHWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_2d_nchw (::mlir::linalg::ConvNCHWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_1d_ncw (::mlir::linalg::ConvNCWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_3d_ndhwc (::mlir::linalg::ConvNDHWCOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_2d_nhwc (::mlir::linalg::ConvNHWCOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv_1d_nwc (::mlir::linalg::ConvNWCOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.conv (::mlir::linalg::ConvOp) 

Syntax:

operation ::= `linalg.conv` `(` operands `)` attr-dict `:` type(operands)

Generic n-D convolution as described in the TF documentation: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/nn/convolution

  output[b, x[0], ..., x[N-1], k] =
  sum_{z[0], ..., z[N-1], q}
      filter[z[0], ..., z[N-1], q, k] *
      padded_input[b,
                   x[0] * strides[0] + dilation_rate[0] * z[0],
                   ...,
                   x[N-1] * strides[N-1] + dilation_rate[N-1] * z[N-1],
                   q]

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::ArrayAttr64-bit integer array attribute
dilations::mlir::ArrayAttr64-bit integer array attribute
padding::mlir::DenseIntElementsAttr64-bit signless integer elements attribute

Operands: 

OperandDescription
filterstrided memref of any type values
inputstrided memref of any type values
outputstrided memref of any type values

linalg.conv_1d (::mlir::linalg::ConvWOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.copy (::mlir::linalg::CopyOp) 

Syntax:

operation ::= `linalg.copy` `(` $input `,` $output `)` attr-dict `:`
              type($input) `,` type($output)
              custom<CopyOpRegion>($region, ref(type($input)), ref(type($input)))

Copies the data in the input view into the output view.

Usage:

linalg.copy(%arg0, %arg1) : memref<?xf32, stride_specification>,
                            memref<?xf32, stride_specification>

One possible lowering to loop form is:

%0 = linalg.dim %arg0, 0 : index
scf.for %i0 = %c0 to %0 step %c1 {
  %1 = load %arg0[%i0] : memref<?xf32, stride_specification>
  store %1, %arg1[%i0] : memref<?xf32, stride_specification>
}

Optionally, can take input_permutation and output_permutation attributes to reorder the dimensions of the input and output views.

Usage:

linalg.copy(%arg0, %arg1) {inputPermutation : (i, j, k) -> (i, k, j),
                           outputPermutation : (i, j, k) -> (k, j, i)} :
  memref<?x?x?xf32, stride_specification>,
  memref<?x?x?xf32, stride_specification>

One possible lowering to loop form is:

%0 = linalg.dim %arg0, 0
%1 = linalg.dim %arg0, 1
%2 = linalg.dim %arg0, 2
scf.for %i0 = %c0 to %{{.*}} step %c1 {
  scf.for %i1 = %c0 to %{{.*}} step %c1 {
    scf.for %i2 = %c0 to %{{.*}} step %c1 {
      %3 = load %arg0[%i0, %i2, %i1] :
              memref<?x?x?xf32, stride_specification>
      store %3, %arg1[%i2, %i1, %i0] :
              memref<?x?x?xf32, stride_specification>

The views are expected to be compatible for correctness but this is not enforced at the moment.

Attributes: 

AttributeMLIR TypeDescription
inputPermutation::mlir::AffineMapAttrAffineMap attribute
outputPermutation::mlir::AffineMapAttrAffineMap attribute

Operands: 

OperandDescription
inputstrided memref of any type values
outputstrided memref of any type values

linalg.depthwise_conv_2d_input_nhwc_filter_hwcf (::mlir::linalg::DepthwiseConvInputNHWCFilterHWCFOp) 

A general depth-wise 2-D convolution operation.

This operation performs depth-wise 2-D convolution over an input I and filter generates output O using the following computation:

h, ow, ci, co) = std_addf<kh, kw>( n, oh, ow, ci, co), d_mulf(I(n, oh * strides[0] + kh, ow * strides[1] + kw, ci), K(kh, kw, ci, co)));

a 4-D tensor with shape (N, IH, IW, CI). a 4-D tensor with shape (KH, KW, CI, CO). a 5-D tensor with shape (N, OH, OW, CI, CO). es` is a 2-element vector attribute for window strides along the /width dimension.

xing maps for these three tensors contain 7 dimensions, following the (N, OH, OW, CI, CO, KH, KW).

is op only supports any channel multiplier, which is CO. To map back sult as DepthwiseConvInputNHWCFilterHWCOp, you will have to create a eshape op which collapses CI and CO into one dimension.

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.depthwise_conv_2d_input_nhwc_filter_hwc (::mlir::linalg::DepthwiseConvInputNHWCFilterHWCOp) 

A depth-wise 2-D convolution operation.

This operation performs depth-wise 2-D convolution over an input I and filter generates output O using the following computation:

ow, c) = std_addf<kh, kw>( oh, ow, c), mulf(I(n, oh * strides[0] + kh, ow * strides[1] + kw, c), K(kh, kw, c)));

a 4-D tensor with shape (N, IH, IW, C). a 3-D tensor with shape (KH, KW, C). a 4-D tensor with shape (N, OH, OW, C). es` is a 2-element vector attribute for window strides along the /width dimension.

xing maps for these three tensors contain 6 dimensions, following the (N, OH, OW, C, KH, KW).

is op only supports channel multiplier == 1.

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.dot_i16_i16_i32 (::mlir::linalg::DotI16I16I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.dot_i32_i32_i32 (::mlir::linalg::DotI32I32I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.dot_i8_i8_i32 (::mlir::linalg::DotI8I8I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.dot (::mlir::linalg::DotOp) 

Performs a dot product of two vectors to a scalar result.

Numeric casting is performed on the operands to the inner multiply, promoting the same data type as the accumulator/output.

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.fill (::mlir::linalg::FillOp) 

Syntax:

operation ::= `linalg.fill` `(` $output `,` $value `)` attr-dict `:`
              type($output) `,` type($value) (`->` type($result)^)?
              custom<FillOpRegion>($region, ref(type($output)), ref($value))

Operands: 

OperandDescription
outputshaped of any type values
valuecomplex-type or floating-point or signless integer or vector of any type values

Results: 

ResultDescription
resultranked tensor of any type values

linalg.generic (::mlir::linalg::GenericOp) 

Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.generic op is written as:

linalg.generic #trait_attribute
    ins(%A, %B : memref<?x?xf32, stride_specification>,
                 memref<?x?xf32, stride_specification>)
    outs(%C : memref<?x?xf32, stride_specification>)
    attrs = {other-optional-attributes}
    {region}

Where #trait_attributes is an alias of a dictionary attribute containing:

  • doc [optional]: a documentation string
  • indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
  • library_call [optional]: a StringAttr containing the name of an external library function that the linalg.generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.generic will always lower to loops.
  • iterator_types: an ArrayAttr specifying the type of the enclosing loops. Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

#matmul_accesses = [
  (m, n, k) -> (m, k),
  (m, n, k) -> (k, n),
  (m, n, k) -> (m, n)
]
#matmul_trait = {
  doc = "C(m, n) += A(m, k) * B(k, n)",
  indexing_maps = #matmul_accesses,
  library_call = "linalg_matmul",
  iterator_types = ["parallel", "parallel", "reduction"]
}

And can be reused in multiple places as:

linalg.generic #matmul_trait
  ins(%A, %B : memref<?x?xf32, stride_specification>,
               memref<?x?xf32, stride_specification>)
  outs(%C : memref<?x?xf32, stride_specification>)
  {other-optional-attributes} {
  ^bb0(%a: f32, %b: f32, %c: f32) :
    %d = mulf %a, %b: f32
    %e = addf %c, %d: f32
    linalg.yield %e : f32
}

This may lower to either:

call @linalg_matmul(%A, %B, %C) :
  (memref<?x?xf32, stride_specification>,
   memref<?x?xf32, stride_specification>,
   memref<?x?xf32, stride_specification>)
  -> ()

or IR resembling:

scf.for %m = %c0 to %M step %c1 {
  scf.for %n = %c0 to %N step %c1 {
    scf.for %k = %c0 to %K step %c1 {
      %a = load %A[%m, %k] : memref<?x?xf32, stride_specification>
      %b = load %B[%k, %n] : memref<?x?xf32, stride_specification>
      %c = load %C[%m, %n] : memref<?x?xf32, stride_specification>
      %d = mulf %a, %b: f32
      %e = addf %c, %d: f32
      store %e, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
    }
  }
}

To allow progressive lowering from the value world (a.k.a tensor values) to the buffer world (a.k.a memref values), a linalg.generic op allows mixing tensors and buffers operands and tensor results.

%C = linalg.generic #trait_attribute
  ins(%A, %B : tensor<?x?xf32>, memref<?x?xf32, stride_specification>)
  outs(%C : tensor<?x?xf32>)
  {other-optional-attributes}
  {region}
  -> (tensor<?x?xf32>)

Attributes: 

AttributeMLIR TypeDescription
indexing_maps::mlir::ArrayAttrAffineMap array attribute
iterator_types::mlir::ArrayAttrarray attribute
doc::mlir::StringAttrstring attribute
library_call::mlir::StringAttrstring attribute

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.indexed_generic (::mlir::linalg::IndexedGenericOp) 

Indexed Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.indexed_generic op is written as:

linalg.indexed_generic #trait_attribute
    ins(%A, %B : memref<?x?xf32, stride_specification>,
                 memref<?x?xf32, stride_specification>)
    outs(%C : memref<?x?xf32, stride_specification>)
    attrs = {other-optional-attributes}
    {region}

Where #trait_attributes is an alias of a dictionary attribute containing:

  • doc [optional]: a documentation string
  • indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
  • library_call [optional]: a StringAttr containing the name of an external library function that the linalg.indexed_generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.indexed_generic will always lower to loops.
  • iterator_types: an ArrayAttr they type of the enclosing loops; Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

#matmul_accesses = [
  (m, n, k) -> (m, k),
  (m, n, k) -> (k, n),
  (m, n, k) -> (m, n)
]
#matmul_trait = {
  doc = "C(m, n) += A(m, k) * B(k, n)",
  indexing_maps = #matmul_accesses,
  library_call = "linalg_matmul",
  iterator_types = ["parallel", "parallel", "reduction"]
}

And can be reused in multiple places as:

  linalg.indexed_generic #matmul_trait
     ins(%A, %B : memref<?x?xf32, stride_specification>,
                  memref<?x?xf32, stride_specification>)
    outs(%C : memref<?x?xf32, stride_specification>) {
  (%offset_m: index, %offset_n: index, %offset_k: index,
   %a: f32, %b: f32, %c: f32) :
    "some_optional_computation"(%offset_m, %offset_n, %offset_k)
    %d = mulf %a, %b: f32
    %e = addf %c, %d: f32
    linalg_yield %e : f32
}

This may lower to either:

call @linalg_matmul(%offset_m, %offset_n, %offset_k, %A, %B, %C) :
  (index, index, index,
   memref<?x?xf32, stride_specification>,
   memref<?x?xf32, stride_specification>,
   memref<?x?xf32, stride_specification>)
  -> ()

or IR resembling:

scf.for %m = %c0 to %M step %c1 {
  scf.for %n = %c0 to %N step %c1 {
    scf.for %k = %c0 to %K step %c1 {
      %a = load %A[%m, %k] : memref<?x?xf32, stride_specification>
      %b = load %B[%k, %n] : memref<?x?xf32, stride_specification>
      %c = load %C[%m, %n] : memref<?x?xf32, stride_specification>
      "some_optional_computation"(%m, %n, %k)
      %d = mulf %a, %b: f32
      %e = addf %c, %d: f32
      store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
    }
  }
}

To allow progressive lowering from the value world (a.k.a tensor values) to the buffer world (a.k.a memref values), a linalg.indexed_generic op allows mixing tensors and buffers operands and tensor results.

%C = linalg.indexed_generic #trait_attribute
   ins(%A, %B : tensor<?x?xf32>, memref<?x?xf32, stride_specification>)
  outs(%C : tensor<?x?xf32>)
  {other-optional-attributes}
  {region_with_index_arguments}
  -> (tensor<?x?xf32>)

Attributes: 

AttributeMLIR TypeDescription
indexing_maps::mlir::ArrayAttrAffineMap array attribute
iterator_types::mlir::ArrayAttrarray attribute
doc::mlir::StringAttrstring attribute
library_call::mlir::StringAttrstring attribute

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.index (::mlir::linalg::IndexOp) 

linalg index operation

Syntax:

operation ::= `linalg.index` $dim attr-dict `:` type($result)

The linalg.index operation returns the iteration index of the immediately enclosing linalg structured operation for the iteration dimension dim. The dim attribute specifies the position of the accessed dimension in the indexing map domain.

Example:

#map = affine_map<(i, j) -> (i, j)>
linalg.generic {indexing_maps = [#map, #map],
                iterator_types = ["parallel", "parallel"]}
  outs(%I, %J : memref<?x?xindex>, memref<?x?xindex>) {
  ^bb0(%arg0 : index, %arg1 : index):
  // Access the outer iteration dimension i
  %i = linalg.index 0 : index
  // Access the inner iteration dimension j
  %j = linalg.index 1 : index
  linalg.yield %i, %j : index, index
}

This may lower to IR resembling:

%0 = dim %I, %c0 : memref<?x?xindex>
%1 = dim %I, %c1 : memref<?x?xindex>
scf.for %i = %c0 to %0 step %c1 {
  scf.for %j = %c0 to %1 step %c1 {
    store %i, %I[%i, %j] : memref<?x?xindex>
    store %j, %J[%i, %j] : memref<?x?xindex>
  }
}

Attributes: 

AttributeMLIR TypeDescription
dim::mlir::IntegerAttr64-bit signless integer attribute whose minimum value is 0

Results: 

ResultDescription
resultindex

linalg.init_tensor (::mlir::linalg::InitTensorOp) 

operation to define a tensor of particular value

Syntax:

operation ::= `linalg.init_tensor` custom<OperandsOrIntegersSizesList>($sizes, $static_sizes) attr-dict
              `:` type($result)

linalg.init_tensor is an operation that materializes a tensor of a given shape. The shape could be dynamic or static.

Attributes: 

AttributeMLIR TypeDescription
static_sizes::mlir::ArrayAttr64-bit integer array attribute

Operands: 

OperandDescription
sizesindex

Results: 

ResultDescription
resulttensor of any type values

linalg.pad_tensor (::mlir::linalg::PadTensorOp) 

tensor pad operation

Syntax:

operation ::= `linalg.pad_tensor` $source `low` `` custom<OperandsOrIntegersSizesList>($low, $static_low)
              `high` `` custom<OperandsOrIntegersSizesList>($high, $static_high)
              $region attr-dict `:` type($source) `to` type($result)

linalg.pad_tensor is an operation that pads the source tensor with given low and high padding config.

The PadTensor operation supports the following arguments:

  • source: the “base” tensor on which to pad.
  • low: A list contains the padding along the start of each dimension, i.e low.
  • high: A list contains the padding along the end of each dimension, i.e. high.

The result tensor dimensions are low + dim + high along that dimension. The number of elements of low and high must match the rank of the input tensor (which is also the rank of the output tensor). They can be either a constant or a dynamic value.

The region of the pad_tensor operation returns the value to use for the padding. The arguments of the region represent the index of the source being accessed. There should be as many arguments as the rank of the source tensor. The value yield-ed by the region is used as the value of the view at the given position.

Example 1:

  %pad_value = ... : f32
  %0 = linalg.pad_tensor %0 low[1, 2] high[2, 3] {
  ^bb0(%arg0 : index, %arg1 : index):
    linalg.yield %pad_value : f32
  } : tensor<?x?xf32> to tensor<?x?xf32>

Example 2:

  %pad_value = ... : f32
  %0 = linalg.pad_tensor %arg0 low[2, %arg1, 3, 3] high[3, 3, %arg1, 2] {
  ^bb0(%arg2: index, %arg3: index, %arg4: index, %arg5: index):
      linalg.yield %pad_value : f32
  } : tensor<1x2x2x?xf32> to tensor<6x?x?x?xf32>

Example 3:

  %pad_value = ... : f32
  %0 = linalg.pad_tensor %arg0 low[0, 0] high[%ub0, %ub1] {
  ^bb0(%arg1: index, %arg2: index):
    linalg.yield %pad_value : f32
  } : tensor<2x3xf32> to tensor<?x?xf32>

Attributes: 

AttributeMLIR TypeDescription
static_low::mlir::ArrayAttr64-bit integer array attribute
static_high::mlir::ArrayAttr64-bit integer array attribute

Operands: 

OperandDescription
sourcetensor of any type values
lowindex
highindex

Results: 

ResultDescription
resulttensor of any type values

linalg.range (::mlir::linalg::RangeOp) 

Create a range type value, used to create views

Syntax:

operation ::= `linalg.range` $min `:` $max `:` $step attr-dict `:` type(results)

The linalg.range op creates a !linalg.range from 3 values of type index that represent the min, max and step values of the range. This type does not pass function boundaries at the moment.

Example:

%3 = linalg.range %0:%1:%2 : !linalg.range

Operands: 

OperandDescription
minindex
maxindex
stepindex

Results: 

ResultDescription
«unnamed»range

linalg.reshape (::mlir::linalg::ReshapeOp) 

linalg.reshape produces a new view into the operand view

The linalg.reshape op produces a new view whose sizes are a reassociation of the original view. Depending on whether or not the reassociated MemRefType is contiguous, the resulting memref may require explicit alloc and copies.

A reassociation is defined as a continuous grouping of dimensions and is represented with an array of I64ArrayAttr attribute.

For now, it is assumed that either:

  1. a reassociation produces and consumes contiguous MemRefType or,
  2. the reshape op will be folded into its consumers (by changing the shape of the computations). All other cases are undefined behavior and a reshape op may not lower to LLVM if it cannot be proven statically that it does not require alloc+copy.

A reshape may either collapse or expand dimensions, depending on the relationship between source and target memref ranks. The verification rule is that the reassociation maps are applied to the memref with the larger rank to obtain the memref with the smaller rank. In the case of a dimension expansion, the reassociation maps can be interpreted as inverse maps.

The result memref type of a reshape when dimensions are collapsed (operand memref type when dimensions are expanded) can be zero-ranked if the operand memref type (or the result memref type when dimensions are expanded) is statically shaped with all dimensions being unit extent. In such cases the reassociation map is empty.

Examples:

// Dimension collapse (i, j) -> i' and k -> k'
%1 = linalg.reshape %0 [[0, 1], [2]] :
  memref<?x?x?xf32, stride_spec> into memref<?x?xf32, stride_spec_2>
// Dimension expansion i -> (i', j') and (k) -> (k')
%1 = linalg.reshape %0 [[0, 1], [2]] :
  memref<?x?xf32, stride_spec> into memref<?x?x?xf32, stride_spec_2>

Attributes: 

AttributeMLIR TypeDescription
reassociation::mlir::ArrayAttrArray of 64-bit integer array attributes

Operands: 

OperandDescription
srcstrided memref of any type values

Results: 

ResultDescription
resultstrided memref of any type values

linalg.tensor_reshape (::mlir::linalg::TensorReshapeOp) 

linalg.tensor_reshape produces a new reshaped tensor.

The linalg.reshape op produces a new tensor whose sizes are a reassociation of the original src.

A reassociation is defined as a continuous grouping of dimensions and is represented with an array of I64ArrayAttr attribute.

A reshape may either collapse or expand dimensions, depending on the relationship between source and target tensor ranks. The verification rule is that the reassociation maps are applied to the tensor with the larger rank to obtain the tensor with the smaller rank. In the case of a dimension expansion, the reassociation maps can be interpreted as inverse maps.

The result tensor type of a reshape when dimensions are collapsed (operand tensor type when dimensions are expanded) can be zero-ranked if the operand tensor type (or the result tensor type when dimensions are expanded) is statically shaped with all dimensions being unit extent. In such cases the reassociation map is empty.

Examples:

// Dimension collapse (i, j) -> i' and k -> k'
%b = linalg.tensor_reshape %a [[0, 1], [2]]
    : tensor<?x?x?xf32> into tensor<?x?xf32>
// Dimension expansion i -> (i', j') and (k) -> (k')
%b = linalg.tensor_reshape %a [[0, 1], [2]]
    : tensor<?x?xf32> into tensor<?x?x?xf32>

Attributes: 

AttributeMLIR TypeDescription
reassociation::mlir::ArrayAttrArray of 64-bit integer array attributes

Operands: 

OperandDescription
srctensor of any type values

Results: 

ResultDescription
resulttensor of any type values

linalg.tiled_loop (::mlir::linalg::TiledLoopOp) 

Linalg tiled loop operation

This is a loop-like operation with additional properties. The arguments also include the input and the output tensors or memrefs and the attributes to specify the iterator types.

Parsing TiledLoopOp will set all elements of the iterator_types attribute to “parallel” type, when it is absent from the custom format.

Tensor-based version:

The body region of the loop contains subtensor operations applied to every tensor argument of TiledLoopOp.

The body region must contain exactly one block that terminates with linalg.yield with the operands resulting from subtensor_insert operations.

Example:

%0 = linalg.tiled_loop (%i) = (%c0) to (%c24) step (%c4)
    ins(%lhs, %rhs : tensor<24x64xi8>, tensor<24x64xi8>)
    outs(%out : tensor<24x64xi8>)
    iterators("parallel") {
  %lhs_sub = subtensor %lhs[%i, 0] [%c4, %c64] [1, 1]
      : tensor<24x64xi8> to tensor<?x?xi8>
  %rhs_sub = subtensor %rhs[%i, 0] [%c4, %c64] [1, 1]
      : tensor<24x64xi8> to tensor<?x?xi8>
  %out_sub = subtensor %out[%i, 0] [%c4, %c64] [1, 1]
      : tensor<24x64xi8> to tensor<?x?xi8>

  %result_sub = linalg.generic ...

  %result = subtensor_insert %result_sub into %out[%i, 0][%c4, %c64][1, 1]
    : tensor<?x?xi8> into tensor<24x64xi8>
  linalg.yield %result : tensor<24x64xi8>
}

MemRef-based version:

The body region of the loop contains subview operations applied to every memref argument of TiledLoopOp.

The body region must contain exactly one block that terminates with linalg.yield with no operands.

Example:

linalg.tiled_loop (%i) = (%c0) to (%c24) step (%c4)
    ins(%lhs, %rhs : memref<24x64xi8>, memref<24x64xi8>)
    outs(%out : memref<24x64xi8>)
    iterators("parallel") {
  %lhs_sub = subview %lhs[%i, 0] [%c4, %c64] [1, 1]
      : memref<24x64xi8> to memref<?x?xi8>
  %rhs_sub = subview %rhs[%i, 0] [%c4, %c64] [1, 1]
      : memref<24x64xi8> to memref<?x?xi8>
  %out_sub = subview %out[%i, 0] [%c4, %c64] [1, 1]
      : memref<24x64xi8> to memref<?x?xi8>

  %result_sub = linalg.generic ...
  linalg.yield
}

Attributes: 

AttributeMLIR TypeDescription
iterator_types::mlir::ArrayAttrarray attribute

Operands: 

OperandDescription
lowerBoundindex
upperBoundindex
stepindex
inputsranked tensor of any type values or strided memref of any type values
outputsranked tensor of any type values or strided memref of any type values

Results: 

ResultDescription
resultsranked tensor of any type values

linalg.yield (::mlir::linalg::YieldOp) 

Linalg yield operation

linalg.yield is a special terminator operation for blocks inside regions in linalg generic ops. It returns values to the immediately enclosing linalg generic op.

Example:

linalg.yield %f0, %f1 : f32, f32

Operands: 

OperandDescription
valuesany type

linalg.matmul_column_major (::mlir::linalg::MatmulColumnMajorOp) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matmul_i16_i16_i32 (::mlir::linalg::MatmulI16I16I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matmul_i32_i32_i32 (::mlir::linalg::MatmulI32I32I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matmul_i8_i8_i32 (::mlir::linalg::MatmulI8I8I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matmul (::mlir::linalg::MatmulOp) 

Performs a matrix multiplication of two 2D inputs.

Numeric casting is performed on the operands to the inner multiply, promoting the same data type as the accumulator/output.

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matvec_i16_i16_i32 (::mlir::linalg::MatvecI16I16I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matvec_i32_i32_i32 (::mlir::linalg::MatvecI32I32I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matvec_i8_i8_i32 (::mlir::linalg::MatvecI8I8I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.matvec (::mlir::linalg::MatvecOp) 

Performs a matrix-vector multiplication.

Numeric casting is performed on the operands to the inner multiply, promoting the same data type as the accumulator/output.

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_max (::mlir::linalg::PoolingMaxOp) 

Syntax:

operation ::= `linalg.pooling_max` `(` operands `)` attr-dict `:` type(operands)

Takes max op as pooling operation, i.e., it samples the maximum value in the window.

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::ArrayAttr64-bit integer array attribute
dilations::mlir::ArrayAttr64-bit integer array attribute
padding::mlir::DenseIntElementsAttr64-bit signless integer elements attribute

Operands: 

OperandDescription
inputstrided memref of any type values
windowDimsstrided memref of any type values
outputstrided memref of any type values

linalg.pooling_min (::mlir::linalg::PoolingMinOp) 

Syntax:

operation ::= `linalg.pooling_min` `(` operands `)` attr-dict `:` type(operands)

Takes min op as pooling operation, i.e., it samples the minimum value in the window.

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::ArrayAttr64-bit integer array attribute
dilations::mlir::ArrayAttr64-bit integer array attribute
padding::mlir::DenseIntElementsAttr64-bit signless integer elements attribute

Operands: 

OperandDescription
inputstrided memref of any type values
windowDimsstrided memref of any type values
outputstrided memref of any type values

linalg.pooling_nhwc_max (::mlir::linalg::PoolingNHWCMaxFOp) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_nhwc_i16_max (::mlir::linalg::PoolingNHWCMaxI16Op) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_nhwc_i32_max (::mlir::linalg::PoolingNHWCMaxI32Op) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_nhwc_i8_max (::mlir::linalg::PoolingNHWCMaxI8Op) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_nhwc_min (::mlir::linalg::PoolingNHWCMinFOp) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_nhwc_sum (::mlir::linalg::PoolingNHWCSumFOp) 

Attributes: 

AttributeMLIR TypeDescription
dilations::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]
strides::mlir::DenseIntElementsAttr64-bit signless int elements attribute of shape [2]

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.pooling_sum (::mlir::linalg::PoolingSumOp) 

Syntax:

operation ::= `linalg.pooling_sum` `(` operands `)` attr-dict `:` type(operands)

Takes add op as pooling operation, i.e., it accumulates the values in the window.

Attributes: 

AttributeMLIR TypeDescription
strides::mlir::ArrayAttr64-bit integer array attribute
dilations::mlir::ArrayAttr64-bit integer array attribute
padding::mlir::DenseIntElementsAttr64-bit signless integer elements attribute

Operands: 

OperandDescription
inputstrided memref of any type values
windowDimsstrided memref of any type values
outputstrided memref of any type values

linalg.vecmat_i16_i16_i32 (::mlir::linalg::VecmatI16I16I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.vecmat_i32_i32_i32 (::mlir::linalg::VecmatI32I32I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.vecmat_i8_i8_i32 (::mlir::linalg::VecmatI8I8I32Op) 

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values

linalg.vecmat (::mlir::linalg::VecmatOp) 

Performs a vector-matrix multiplication.

Numeric casting is performed on the operands to the inner multiply, promoting the same data type as the accumulator/output.

Operands: 

OperandDescription
inputsshaped of any type values
outputsshaped of any type values

Results: 

ResultDescription
result_tensorsranked tensor of any type values