# 'linalg' Dialect

## Rationale

Linalg is designed to solve the High-level Hierarchical Optimization
(HHO box) in MLIR and to interoperate nicely within a
*Mixture Of Expert Compilers* environment (i.e. the *CGSel* box).

The Rationale Document goes into significantly more design and architectural decision details.

## Set of Key Transformations

The following key transformations have been central to driving the design of
Linalg. They are all implemented in terms of the properties of the
`linalg.generic`

OpInterface and avoid the pitfall of relying on hardcoded
one-off op knowledge.

The textual form description of these transformations is left for future work. Still, it is useful to at least the key transformations that are performed on the Linalg IR and that have influenced its design:

- Progressive Buffer Allocation.
- Parametric Tiling.
- Promotion to Temporary Buffer in Fast Memory.
- Tiled Producer-Consumer Fusion with Parametric Tile-And-Fuse.
- Map to Parallel and Reduction Loops and Hardware.
- Vectorization: Rewrite in Vector Form.
- Lower to Loops (Affine, Generic and Parallel).
- Lower to Library Calls or Special Instructions, Intrinsics or ISA.
- Partially Lower to Iterations Over a Finer-Grained Linalg Op.

## High-Level Description of Linalg Ops

Linalg takes at least some inspiration from all previously
listed prior
art
. The design enables the definition of * CustomOps* with
generic properties that enable
key transformations
,
including lowering to scalar load/store and other operations or to external
library calls and intrinsics.

These ops can have * either tensor or buffer operands*.

### Payload-Carrying Ops

Linalg defines two payload carrying operations that implement the
structured ops
abstraction on tensors and buffers. This is architected as two generic operations
`linalg.generic`

(resp. `linalg.indexed_generic`

) that can express custom
operations with *index-free semantics* (resp. *indexing semantics*).
The properties of these generic ops are the result of applying the
guiding principles described in the
Rationale Document
.
They are listed next, with a brief example and discussion for each.

#### Property 1: Input and Output Operands Define The Iteration Space

A `linalg.generic`

op fully *derives* the specification of its iteration space
from its operands.
The property enforces that a localized IR element (the op) *has* all the information
needed to synthesize the control-flow required to iterate over its operands,
according to their type. This notion of IR localization bears some resemblance
to
URUK
.

Consider the following, partially specified, `linalg.generic`

example:

```
#attrs = {args_in: 1, args_out: 1}
func @example(%A: memref<?xf32, layout1>,
%B: memref<?xvector<4xf32, layout2>>) {
linalg.generic #attrs (%2, %3): memref<?xf32, layout1>,
memref<?xvector<4xf32, layout2>>
return
}
```

The property “*Input and Output Operands Define The Iteration Space*” is
materialized by a lowering into a form that will resemble:

```
func @example(%A: memref<?xf32, layout1>,
%B: memref<?xvector<4xf32, layout2>>) {
%M = "dim" %A, 0: index
%N = "dim" %B, 0: index
%eq = eq %M, %N: i1 // iteration space is consistent with data
assert(%eq): (i1) -> ()
for %i = 0 to %M {
%a = load %A[%i]: memref<?xf32, layout1>
%b = load %B[%i]: memref<?xvector<4xf32>, layout2>
// compute arg types match elemental tensor types
%c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
store %c, %B[%i]: memref<?xvector<4xf32>, layout2>
}
return
}
```

The property participates in simplifying analyses and transformations. For
instance, it guarantees no out-of bounds access can occur by construction
(assuming dynamic operand dimensions agree with each other, which is the
purpose of the `assert`

runtime check).

Before lowering to loop form, loop induction variables and iterators are *not yet
materialized*. This is a necessary property if we want an abstraction that
works on both tensor values and buffers because * values don’t escape
loops/nesting*.

The main implications are that:

- The semantics of the ops are
*restricted to operate on structured data types*, on which we can define an iterator. - This does not model arbitrary code with side-effects.

We do not think these are serious limitations in practice because MLIR is all about mixing different levels of abstractions in the same IR. As long as Linalg can progressively lower to the next level of abstraction, it can also be just bypassed for things that do not fit.

At the same time, conditioning op semantics on structured data types is a very promising path towards extensibility to non-dense tensors as experience with LIFT abstractions for sparse and position-dependent arrays , as well as TACO , has shown.

#### Property 2: Reversible Mappings Between Control and Data Structures

A `linalg.generic`

*defines* the mapping between the iteration space (i.e. the
loops) and the data.

Consider the following, partially specified, `linalg.generic`

example:

```
#indexing_maps = {
(i, j) -> (j, i),
(i, j) -> (j)
}
#attrs = {args_in: 1, args_out: 1, indexings: indexing_maps}
func @example(%A: memref<?xf32, layout1>,
%B: memref<?xvector<4xf32, layout2>>) {
linalg.generic #attrs (%A, %B): memref<?xf32, layout1>,
memref<?xvector<4xf32, layout2>>
return
}
```

The property “*Reversible Mappings Between Control and Data Structures*” is
materialized by a lowering into a form that will resemble:

```
#attrs = {args_in: 1, args_out: 1, indexings: indexing_maps}
func @example(%A: memref<?xf32, layout1>,
%B: memref<?xvector<4xf32, layout2>>) {
// loop bounds determined from data sizes by “inverting the map”
%J = "dim" %2, 0: index
%I = "dim" %2, 1: index
%J2 = "dim" %3, 0: index
// iteration space is consistent with data + mapping inference
%eq = "eq" %J, %J2: i1
"assert" %eq: (i1) -> ()
for %i = 0 to %I { // loop order is fully defined by indexing maps
for %j = 0 to %J { // arbitrary permutations are possible
%a = "load" %2, %j, %i: memref<8x?xf32>
%b = "load" %3, %j: memref<?xvector<4xf32>>
%c = "some_compute"(%a, %b): (f32, vector<4xf32>) -> (vector<4xf32>)
"store" %c, %3, %j: memref<?xvector<4xf32>>
}
}
return
}
```

This mapping needs to be reversible because we want to be able to go back and forth between the two and answer questions such as:

- Given a subset of the iteration space, what subset of data does it read and write?
- Given a subset of data read or written, what subset of the iteration space is responsible for this read or write?

Answering these `2`

questions is one of the main analyses that Linalg uses to
implement transformations such as tiling, tiled producer-consumer fusion, and
promotion to temporary buffers in fast memory.

In the current implementation, `linalg.generic`

uses a list of
AffineMaps
.
This is a pragmatic short-term solution, but in the longer term note that
this property could be even evaluated dynamically, similarly to
inspector-executor algorithms.

#### Property 3: The Type Of Iterators is Defined Explicitly

A `linalg.generic`

op fully *declares* the type of its iterators. This
information is used in transformations.

These properties are derived from established practice in the field and mirror
the properties from Ken Kennedy’s
Optimizing Compilers for Modern Architectures
.
The key idea of legality of loop transformations expressed by Kennedy is
that * the lexicographic order of all dependence vectors must be
preserved*.

This can be better captured directly at the loop level thanks to specific
iterator types, among which:
*parallel*, *reduction*, *partition*, *permutable/monotonic*, *sequential*,
*dependence distance*, …

These types are traditionally the result of complex dependence analyses and
have been referred to as “*bands*” in the polyhedral community (e.g. *parallel
bands*, *permutable bands*, etc, in
ISL
schedule tree
parlance).

Specifying the information declaratively in a `linalg.generic`

allows
conveying properties that may be hard (or even impossible) to derive from
lower-level information. These properties can be brought all the way to the
moment when they are useful for transformations, used and then discarded.

Additionally, these properties may also be viewed as a contract that the frontend/user guarantees and that the compiler may take advantage of. The common example is the use of data-dependent reduction semantics for specifying histogram computations. If the frontend has additional knowledge that proper atomic operations are available, it may be better to specify parallel semantics and use the special atomic in the computation region.

At this time, Linalg only has an explicit use for *parallel* and *reduction*
loops but previous experience shows that the abstraction generalizes.

#### Property 4: The Compute Payload is Specified With a Region

A `linalg.generic`

op has a compute payload that is fully generic thanks to
the use of
Regions
.

The region takes as arguments the scalar elemental types of the tensor or
buffer operands of the `linalg.generic`

. For flexibility and ability to match
library calls, additional special values may be passed. For instance, a
`linalg.fill`

operation takes a buffer and an additional scalar value.

At this time there are no additional restrictions to the region semantics. This is meant to allow the exploration of various design tradeoffs at the intersection of regions and iterator types. In particular, the frontend is responsible for the semantics of iterator types to correspond to the operations inside the region: the region can capture buffers arbitrarily and write into them. If this conflicts with some parallel iterator requirement, this is undefined behavior.

Concretely, consider the following, partially specified, `linalg.generic`

example:

```
#indexing_maps = {
(i, j) -> (i, j),
(i, j) -> (i, j)
}
#attrs = {args_in: 2, args_out: 1, indexings: #indexing_maps}
func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
linalg.generic #attrs (%A, %B, %C) {
^bb0(%a: f32, %b: f32):
%c = addf %a, %b : f32
return %c : f32
}: memref<?x?xf32>, memref<?x?xf32>, memref<?x?xf32>
return
}
```

The property “*The Compute Payload is Specified With a Region*” is
materialized by a lowering into a form that will resemble:

```
func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
%M = dim %A, 0: index
%N = dim %B, 1: index
for %i = 0 to %M {
for %j = 0 to %N {
%a = load %A[%i, %j]: memref<?x?xf32>
%b = load %B[%i, %j]: memref<?x?xf32>>
%c = addf %a, %b : f32
store %c, %C[%i, %j]: memref<?x?xf32>
}
}
return
}
```

In the process of lowering to loops and lower-level constructs, similar requirements are encountered, as are discussed in the inlined call op proposal . We expect to be able to reuse the common lower-level infrastructure provided it evolves to support both region arguments and captures.

#### Property 5: May Map To an External Library Call

A `linalg.generic`

op may map to an external library call by specifying a
`SymbolAttr`

. At this level of abstraction, the important glue is the ability
to perform transformations that preserve the structure necessary to * call
the external library after different transformations have been applied*.

This involves considerations related to preservation of op semantics and integration at the ABI level. Regardless of whether one wants to use external library calls or a custom ISA, the problem for codegen is similar: preservation of a fixed granularity.

Consider the following, partially specified, `linalg.generic`

example:

```
#fun_attr = "pointwise_add"
#indexing_maps = {
(i, j) -> (i, j),
(i, j) -> (i, j)
}
#attrs = {args_in: 2, args_out: 1, indexings: #indexing_maps, fun: #fun_attr}
func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
linalg.generic #attrs (%A, %B, %C) {
^bb0(%a: f32, %b: f32):
%c = addf %a, %b : f32
return %c : f32
}: memref<?x?xf32>, memref<?x?xf32>, memref<?x?xf32>
return
}
```

The property “*Map To an External Library Call*” is
materialized by a lowering into a form that will resemble:

```
func @pointwise_add_sxsxf32_sxsxf32(memref<?x?xf32>, memref<?x?xf32>, memref<?x?xf32>) -> ()
func @example(%A: memref<?x?xf32>, %B: memref<?x?xf32>, %C: memref<?x?xf32>) {
call @pointwise_add_sxsxf32_sxsxf32 (%A, %B, %C):
(memref<?x?xf32>, memref<?x?xf32>, memref<?x?xf32>) -> ()
return
}
```

Which, after lowering to LLVM resembles:

```
func @pointwise_add_sxsxf32_sxsxf32(!llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">,
!llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">,
!llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">) -> ()
func @example(%A: !llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">,
%B: !llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">,
%C: !llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">) {
llvm.call @pointwise_add_sxsxf32_sxsxf32 (%A, %B, %C):
(!llvm<"{ float*, i64, [2 x i64], [3 x i64] }*">...) -> ()
return
}
```

##### Convention For External Library Interoperability

The `linalg`

dialect adopts a convention that is similar to `BLAS`

when
offloading operations to fast library implementations: pass a non-owning
pointer to input and output data with additional metadata. This convention
is also found in libraries such as `MKL`

, `OpenBLAS`

, `BLIS`

, `cuBLAS`

,
`cuDNN`

, etc.. and more generally at interface points across language
boundaries (e.g. C++ / Python).

Generally, `linalg`

passes non-owning pointers to View data structures
to pre-compiled library calls linked externally.

There is an ongoing discussion on the topic of extending interoperability in the presence of key attributes.

#### Property 6: Perfectly Nested Writes To The Whole Output Operands

Perfectly nested loops form a particularly important class of structure that enables key loop transformations such as tiling and mapping to library calls. Unfortunately, this type of structure is easily broken by transformations such as partial loop fusion. Tiling and mapping to library calls become more challenging, or even infeasible. Linalg ops adopt perfect-nestedness as a first-class property: the structure cannot be broken and is transported in the IR by construction.

A `linalg.generic`

op represents a perfectly nested loop nest that writes the
entire memory region. This is a structural constraint across regions and
loops that has proven to be key in simplifying transformations.

One particular point to mention is that converting imperfectly nested code into perfectly nested code can often be done with enough loop distribution and embedding of conditionals down to the innermost loop level.

Previous experience with Tensor Comprehensions gave us the intuition that forcing innermost control-flow nesting is a lot like writing data-parallel code with arrays of boolean values and predication. This type of trick has also been used before in polyhedral compilers to convert non-affine control into affine compute dependencies.

While it may be possible to automate such rewrites from generic IR,
`linalg.generic`

just forces the semantics for now.

The key implication is that this conversion to deep predication needs to be
undone once we are done with Linalg transformations.
After iterators and induction variables are materialized (i.e. after lowering
out of `linalg.generic`

occurred), the overall performance will be greatly
influenced by the quality of canonicalizations, foldings and *Loop Independent
Code Motion* (LICM).

In the grander scheme, the reliance on late LICM was deemed a necessary risk.

#### Putting it Together

As it stands, the six properties above define the semantics of a
`linalg.generic`

op. It is an open question whether all of these semantics are
strictly necessary in practice and whether some should or could be derived
automatically while still maintaining the
core guiding
principles
.

For the time being, we have settled on the combination of these properties because of empirical evidence building and working on multiple high-level compilers. As we lay those down and engage more with the community, we expect multiple rounds of discussions and design changes to the original architecture.

### Data Representation: Views

The current implementation uses the
Strided MemRef (a.k.a View)
abstraction. The name *View* is used interchangeably in `linalg`

to signify
*Strided MemRef*.
In the future we expect to use other structured data types and
support ragged, mixed-sparse and other types. We expect to draw on the
experience from existing LIFT abstractions for
sparse
and
position-dependent
arrays
.

### Metadata Ops

A set of ops that manipulate metadata but do not move memory. These ops take
`view`

operands + extra attributes and return new `view`

s. The returned
`view`

s generally alias the operand `view`

. At the moment the existing ops
are:

```
* `std.view`,
* `std.subview`,
* `linalg.range`,
* `linalg.slice`,
* `linalg.transpose`.
* `linalg.reshape`,
```

Future ops are added on a per-need basis but should include:

```
* `linalg.tile`,
* `linalg.intersection`,
* `linalg.convex_union`,
* `linalg.difference` (would need to work on a list of views).
```

These additional operations correspond to abstractions that have been known to work in the field of large-scale distributed stencil computations.

In a longer-term future, the abstractions from Legion data-centric programming model seem generally appealing.

### Named Payload-Carrying Ops

Additionally, `linalg`

provides a small subset of commonly named operations:

```
* `linalg.copy`,
* `linalg.fill`,
* `linalg.dot`,
* `linalg.matmul`,
* `linalg.conv`.
```

These named operations adhere to the `linalg.generic`

op interface. Work is in
progress to define declarative mechanisms to automatically generate named ops
from a description in terms of only the generic op interface.

This is the main reason there are only a small number of ops today: we expect them to be auto-generated from Tablegen soon.

## Open Issues and Design Alternatives

Multiple open issues and design alternatives are in flight and it is time to lay them out for the community to discuss and pick apart:

- Should
`linalg.generic`

support nesting? - Should
`linalg.generic`

regions take views or only scalars? - Should we try to solve automatic differentiation at this level of abstraction?
- Are all the six properties really necessary?
- Is this relying too much on declarative specification and would we be better off relying more on analyses?
- Is this general enough for the community’s needs? If not how should this be extended, if at all? …

These key questions (and much more) should be really thought of in the general context of MLIR in which different levels of IR interoperate seamlessly. In practice, it is not necessary (or beneficial) to try and solve all problems in the same IR.

## Operations

`linalg.conv`

(linalg::ConvOp)

Syntax:

```
operation ::= `linalg.conv` `(` operands `)` attr-dict `:` type(operands)
```

Generic n-D convolution as described in the TF documentation: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/nn/convolution

```
output[b, x[0], ..., x[N-1], k] =
sum_{z[0], ..., z[N-1], q}
filter[z[0], ..., z[N-1], q, k] *
padded_input[b,
x[0] * strides[0] + dilation_rate[0] * z[0],
...,
x[N-1] * strides[N-1] + dilation_rate[N-1] * z[N-1],
q]
```

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`strides` | ArrayAttr | 64-bit integer array attribute |

`dilations` | ArrayAttr | 64-bit integer array attribute |

`padding` | DenseIntElementsAttr | 64-bit signless integer elements attribute |

#### Operands:

Operand | Description |
---|---|

`filter` | strided memref of any type values |

`input` | strided memref of any type values |

`output` | strided memref of any type values |

`linalg.copy`

(linalg::CopyOp)

Syntax:

```
operation ::= `linalg.copy` `(` operands `)` attr-dict `:` type(operands)
```

Copies the data in the input view into the output view.

Usage:

```
linalg.copy(%arg0, %arg1) : memref<?xf32, stride_specification>,
memref<?xf32, stride_specification>
```

One possible lowering to loop form is:

```
%0 = linalg.dim %arg0, 0 : index
loop.for %i0 = %c0 to %0 step %c1 {
%1 = load %arg0[%i0] : memref<?xf32, stride_specification>
store %1, %arg1[%i0] : memref<?xf32, stride_specification>
}
```

Optionally, can take `input_permutation`

and `output_permutation`

attributes
to reorder the dimensions of the input and output views.

Usage:

```
linalg.copy(%arg0, %arg1) {inputPermutation : (i, j, k) -> (i, k, j),
outputPermutation : (i, j, k) -> (k, j, i)} :
memref<?x?x?xf32, stride_specification>,
memref<?x?x?xf32, stride_specification>
```

One possible lowering to loop form is:

```
%0 = linalg.dim %arg0, 0
%1 = linalg.dim %arg0, 1
%2 = linalg.dim %arg0, 2
loop.for %i0 = %c0 to %{{.*}} step %c1 {
loop.for %i1 = %c0 to %{{.*}} step %c1 {
loop.for %i2 = %c0 to %{{.*}} step %c1 {
%3 = load %arg0[%i0, %i2, %i1] :
memref<?x?x?xf32, stride_specification>
store %3, %arg1[%i2, %i1, %i0] :
memref<?x?x?xf32, stride_specification>
```

The views are expected to be compatible for correctness but this is not enforced at the moment.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`inputPermutation` | AffineMapAttr | AffineMap attribute |

`outputPermutation` | AffineMapAttr | AffineMap attribute |

#### Operands:

Operand | Description |
---|---|

`input` | strided memref of any type values |

`output` | strided memref of any type values |

`linalg.dot`

(linalg::DotOp)

Syntax:

```
operation ::= `linalg.dot` `(` operands `)` attr-dict `:` type(operands)
```

#### Operands:

Operand | Description |
---|---|

«unnamed» | strided memref of any type values of rank 1 |

«unnamed» | strided memref of any type values of rank 1 |

«unnamed» | strided memref of any type values of rank 0 |

`linalg.fill`

(linalg::FillOp)

Syntax:

```
operation ::= `linalg.fill` `(` operands `)` attr-dict `:` type(operands)
```

#### Operands:

Operand | Description |
---|---|

`output` | strided memref of any type values |

`value` | floating-point or signless integer or vector of any type values |

`linalg.generic`

(linalg::GenericOp)

Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.generic op is written as:

```
linalg.generic #trait_attribute %A, %B, %C {other-attributes} :
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>
```

Where #trait_attributes is an alias of a dictionary attribute containing:

- args_in: an I64Attr representing the number of input (readonly) views
- args_out: an I64Attr representing the number of output (readwrite) views
- doc [optional]: a documentation string
- fun: a FlatSymbolRefAttr that must resolve to an existing function
symbol. To support inplace updates in a generic fashion, the signature
of the function must be:
`fun([input views element types], [output views element types]) -> ([output views element types])`

- indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
- library_call [optional]: a StringAttr containing the name of an external library function that the linalg.generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.generic will always lower to loops.
- iterator_types: an ArrayAttr specifying the type of the enclosing loops. Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

```
func @fma(%a: f32, %b: f32, %c: f32) -> f32 {
%d = mulf %a, %b: f32
%e = addf %c, %d: f32
return %e: f32
}
#matmul_accesses = [
(m, n, k) -> (m, k),
(m, n, k) -> (k, n),
(m, n, k) -> (m, n)
]
#matmul_trait = {
doc = "C(m, n) += A(m, k) * B(k, n)",
fun = @fma,
indexing_maps = #matmul_accesses,
library_call = "linalg_matmul",
n_views = [2, 1],
iterator_types = ["parallel", "parallel", "reduction"]
}
```

And can be reused in multiple places as:

```
linalg.generic #matmul_trait %A, %B, %C [other-attributes] :
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>
```

This may lower to either:

```
call @linalg_matmul(%A, %B, %C) :
(memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>)
-> ()
```

or IR resembling:

```
loop.for %m = %c0 to %M step %c1 {
loop.for %n = %c0 to %N step %c1 {
loop.for %k = %c0 to %K step %c1 {
%a = load %A[%m, %k] : memref<?x?xf32, stride_specification>
%b = load %B[%k, %n] : memref<?x?xf32, stride_specification>
%c = load %C[%m, %n] : memref<?x?xf32, stride_specification>
%d = call @func_of_elements(%a, %b, %c)
: (f32, f32, f32) -> (f32)
store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
}
}
}
```

To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a `linalg.generic`

op accepts
mixing input and output ranked tensor values with input and output memrefs.

```
%C = linalg.generic #trait_attribute %A, %B {other-attributes} :
tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```

In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the `linalg.indexed_generic`

op produces (i.e.
allocates and fills) its tensor return values.

Tensor values must be legalized by a buffer allocation pass before most transformations can be applied. Such legalization moves tensor return values into output buffer operands and updates the region arguments accordingly.

Transformations that create control-flow around linalg.indexed_generic operations are not expected to work with tensors because SSA values do not escape naturally. Still, transformations and rewrites that take advantage of tensor SSA values are expected to be useful and will be added in the near future.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`args_in` | IntegerAttr | 64-bit signless integer attribute |

`args_out` | IntegerAttr | 64-bit signless integer attribute |

`indexing_maps` | ArrayAttr | AffineMap array attribute |

`iterator_types` | ArrayAttr | array attribute |

`doc` | StringAttr | string attribute |

`fun` | FlatSymbolRefAttr | flat symbol reference attribute |

`library_call` | StringAttr | string attribute |

#### Operands:

Operand | Description |
---|---|

`views` | anonymous_324 |

#### Results:

Result | Description |
---|---|

`output_tensors` | ranked tensor of any type values |

`linalg.indexed_generic`

(linalg::IndexedGenericOp)

Indexed Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.indexed_generic op is written as:

```
linalg.indexed_generic #trait_attribute %A, %B, %C {other-attributes} :
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>
```

Where #trait_attributes is an alias of a dictionary attribute containing:

- args_in: an I64Attr representing the number of input (readonly) views
- args_out: an I64Attr representing the number of output (readwrite) views
- doc [optional]: a documentation string
- fun: a FlatSymbolRefAttr that must resolve to an existing function
symbol. To support inplace updates in a generic fashion, the signature
of the function must be:
`fun([index types of induction variables], [input views element types], [output views element types]) -> ([output views element types])`

- indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
- library_call [optional]: a StringAttr containing the name of an external library function that the linalg.indexed_generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.indexed_generic will always lower to loops.
- iterator_types: an ArrayAttr they type of the enclosing loops; Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

```
func @fma(%offset_m: index, %offset_n: index, %offset_k: index,
%a: f32, %b: f32, %c: f32)
-> f32
{
"some_optional_condition"(%offset_m, %offset_n, %offset_k)
%d = mulf %a, %b: f32
%e = addf %c, %d: f32
return %e: f32
}
#matmul_accesses = [
(m, n, k) -> (m, k),
(m, n, k) -> (k, n),
(m, n, k) -> (m, n)
]
#matmul_trait = {
doc = "C(m, n) += A(m, k) * B(k, n)",
fun = @fma,
indexing_maps = #matmul_accesses,
library_call = "linalg_matmul",
n_views = [2, 1],
iterator_types = ["parallel", "parallel", "reduction"]
}
```

And can be reused in multiple places as:

```
linalg.indexed_generic #matmul_trait %A, %B, %C [other-attributes] :
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>
```

This may lower to either:

```
call @linalg_matmul(%offset_m, %offset_n, %offset_k, %A, %B, %C) :
(memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>,
memref<?x?xf32, stride_specification>)
-> ()
```

or IR resembling:

```
loop.for %m = %c0 to %M step %c1 {
loop.for %n = %c0 to %N step %c1 {
loop.for %k = %c0 to %K step %c1 {
%a = load %A[%m, %k] : memref<?x?xf32, stride_specification>
%b = load %B[%k, %n] : memref<?x?xf32, stride_specification>
%c = load %C[%m, %n] : memref<?x?xf32, stride_specification>
%d = call @func_of_elements_and_indices(%m, %n, %k, %a, %b, %c)
: (index, index, index, f32, f32, f32) -> (f32)
store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
}
}
}
```

To allow progressive lowering from the value world (a.k.a tensor values) to
the buffer world (a.k.a memref values), a `linalg.indexed_generic`

op
accepts mixing input and output ranked tensor values with input and output
memrefs.

```
%C = linalg.indexed_generic #trait_attribute %A, %B {other-attributes}
: tensor<?x?xf32>,
memref<?x?xf32, stride_specification>
-> (tensor<?x?xf32>)
```

In this case, the number of outputs (args_out) must match the sum of (1) the
number of output buffer operands and (2) the number of tensor return values.
The semantics is that the `linalg.indexed_generic`

op produces (i.e.
allocates and fills) its return values.

Tensor values must be legalized by a buffer allocation pass before most transformations can be applied. Such legalization moves tensor return values into output buffer operands and updates the region argument accordingly.

Transformations that create control-flow around linalg.indexed_generic operations are not expected to work with tensors because SSA values do not escape naturally. Still, transformations and rewrites that take advantage of tensor SSA values are expected to be useful and will be added in the near future.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`args_in` | IntegerAttr | 64-bit signless integer attribute |

`args_out` | IntegerAttr | 64-bit signless integer attribute |

`indexing_maps` | ArrayAttr | AffineMap array attribute |

`iterator_types` | ArrayAttr | array attribute |

`doc` | StringAttr | string attribute |

`fun` | FlatSymbolRefAttr | flat symbol reference attribute |

`library_call` | StringAttr | string attribute |

#### Operands:

Operand | Description |
---|---|

`views` | anonymous_324 |

#### Results:

Result | Description |
---|---|

`output_tensors` | ranked tensor of any type values |

`linalg.range`

(linalg::RangeOp)

Create a `range`

type value, used to create `view`

s

Syntax:

```
operation ::= `linalg.range` $min `:` $max `:` $step attr-dict `:` type(results)
```

The `linalg.range`

op creates a `!linalg.range`

from 3 values of type
`index`

that represent the min, max and step values of the `range`

. This
type does not pass function boundaries at the moment.

Example:

```
%3 = linalg.range %0:%1:%2 : !linalg.range
```

#### Operands:

Operand | Description |
---|---|

`min` | index |

`max` | index |

`step` | index |

#### Results:

Result | Description |
---|---|

«unnamed» | range |

`linalg.reshape`

(linalg::ReshapeOp)

linalg.reshape produces a new view into the operand view

Syntax:

```
operation ::= `linalg.reshape` $src $reassociation attr-dict `:` type($src) `into` type(results)
```

The `linalg.reshape`

op produces a new view whose sizes are a reassociation
of the original `view`

. Depending on whether or not the reassociated
MemRefType is contiguous, the resulting memref may require explicit alloc
and copies.

A reassociation is defined as a continuous grouping of dimensions and is represented with an affine map array attribute. In the future, non-continuous groupings may be allowed (i.e. permutations, reindexings etc).

For now, it is assumed that either:

- a reassociation produces and consumes contiguous MemRefType or,
- the reshape op will be folded into its consumers (by changing the shape of the computations). All other cases are undefined behavior and a reshape op may not lower to LLVM if it cannot be proven statically that it does not require alloc+copy.

A reshape may either collapse or expand dimensions, depending on the relationship between source and target memref ranks. The verification rule is that the reassociation maps are applied to the memref with the larger rank to obtain the memref with the smaller rank. In the case of a dimension expansion, the reassociation maps can be interpreted as inverse maps.

Examples:

```
// Dimension collapse (i, j) -> i' and k -> k'
%1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
memref<?x?x?xf32, stride_spec> into memref<?x?xf32, stride_spec_2>
```

```
// Dimension expansion i -> (i', j') and (k) -> (k')
%1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
memref<?x?xf32, stride_spec> into memref<?x?x?xf32, stride_spec_2>
```

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`reassociation` | ArrayAttr | AffineMap array attribute |

#### Operands:

Operand | Description |
---|---|

`src` | strided memref of any type values |

#### Results:

Result | Description |
---|---|

`result` | strided memref of any type values |

`linalg.slice`

(linalg::SliceOp)

Produce a rank-reduced `subview`

of a base `view`

.

The `linalg.slice`

op allows defining a subregion of a smaller rank than the
operand `view`

within the underlying buffer.

A `linalg.slice`

op takes a view and a variadic number of indexings and
produces a `view`

of the same elemental type. An indexing is either:

- a
`linalg.range`

, in which case it does not reduce the rank of the parent`view`

along the corresponding dimension. - an
`index`

, in which case it reduces the rank of the parent view by one.

If an indexing extends past the size of the `view`

, this is undefined
behavior. Ideally the `linalg.slice`

operation would automatically truncate
it to be within bounds but there are tradeoffs involved now that `std.view`

is a standard op.

Examples:

- rank-preserving
`slice`

:

```
%4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
!linalg.range, !linalg.range, memref<?x?xf32, stride_spec>
```

- rank-reducing
`slice`

(from 2-D to 1-D):

```
%4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
index, !linalg.range, memref<?x?xf32, stride_spec>
```

- rank-reducing
`slice`

(from 2-D to 0-D):

```
%4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
index, index, memref<?x?xf32, stride_spec>
```

#### Operands:

Operand | Description |
---|---|

`view` | strided memref of any type values |

`indexings` | range or index |

#### Results:

Result | Description |
---|---|

«unnamed» | strided memref of any type values |

`linalg.tensor_reshape`

(linalg::TensorReshapeOp)

linalg.tensor_reshape produces a new reshaped tensor.

Syntax:

```
operation ::= `linalg.tensor_reshape` $src $reassociation attr-dict `:` type($src) `into` type(results)
```

The `linalg.reshape`

op produces a new tensor whose sizes are a
reassociation of the original `src`

.

A reassociation is defined as a continuous grouping of dimensions and is represented with an affine map array attribute. In the future, non-continuous groupings may be allowed (i.e. permutations, reindexings etc).

A reshape may either collapse or expand dimensions, depending on the relationship between source and target tensor ranks. The verification rule is that the reassociation maps are applied to the tensor with the larger rank to obtain the tensor with the smaller rank. In the case of a dimension expansion, the reassociation maps can be interpreted as inverse maps.

Examples:

```
// Dimension collapse (i, j) -> i' and k -> k'
%b = linalg.tensor_reshape %a [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
tensor<?x?x?xf32> into tensor<?x?xf32>
```

```
// Dimension expansion i -> (i', j') and (k) -> (k')
%b = linalg.tensor_reshape %a [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
tensor<?x?xf32> into tensor<?x?x?xf32>
```

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`reassociation` | ArrayAttr | AffineMap array attribute |

#### Operands:

Operand | Description |
---|---|

`src` | tensor of any type values |

#### Results:

Result | Description |
---|---|

`result` | tensor of any type values |

`linalg.transpose`

(linalg::TransposeOp)

`transpose`

produces a new strided memref (metadata-only)

The `linalg.transpose`

op produces a strided memref whose sizes and strides
are a permutation of the original `view`

. This is a pure metadata
transformation.

Example:

```
%1 = linalg.transpose %0 (i, j) -> (j, i) : memref<?x?xf32, stride_spec>
```

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`permutation` | AffineMapAttr | AffineMap attribute |

#### Operands:

Operand | Description |
---|---|

`view` | strided memref of any type values |

#### Results:

Result | Description |
---|---|

«unnamed» | strided memref of any type values |

`linalg.yield`

(linalg::YieldOp)

Linalg yield operation

`linalg.yield`

is a special terminator operation for blocks inside regions
in `linalg`

generic ops. It returns values to the immediately enclosing
`linalg`

generic op.

Example:

```
linalg.yield %f0, %f1 : f32, f32
```

#### Operands:

Operand | Description |
---|---|

`values` | any type |

`linalg.matmul`

(linalg::MatmulOp)

Syntax:

```
operation ::= `linalg.matmul` `(` operands `)` attr-dict `:` type(operands)
```

#### Operands:

Operand | Description |
---|---|

«unnamed» | strided memref of any type values of rank 2 |

«unnamed» | strided memref of any type values of rank 2 |

«unnamed» | strided memref of any type values of rank 2 |

`linalg.matvec`

(linalg::MatvecOp)

Syntax:

```
operation ::= `linalg.matvec` `(` operands `)` attr-dict `:` type(operands)
```

#### Operands:

Operand | Description |
---|---|

«unnamed» | strided memref of any type values of rank 2 |

«unnamed» | strided memref of any type values of rank 1 |

«unnamed» | strided memref of any type values of rank 1 |

`linalg.pooling_max`

(linalg::PoolingMaxOp)

Syntax:

```
operation ::= `linalg.pooling_max` `(` operands `)` attr-dict `:` type(operands)
```

Takes max op as pooling operation, i.e., it samples the maximum value in the window.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`strides` | ArrayAttr | 64-bit integer array attribute |

`dilations` | ArrayAttr | 64-bit integer array attribute |

`padding` | DenseIntElementsAttr | 64-bit signless integer elements attribute |

#### Operands:

Operand | Description |
---|---|

`input` | strided memref of any type values |

`windowDims` | strided memref of any type values |

`output` | strided memref of any type values |

`linalg.pooling_min`

(linalg::PoolingMinOp)

Syntax:

```
operation ::= `linalg.pooling_min` `(` operands `)` attr-dict `:` type(operands)
```

Takes min op as pooling operation, i.e., it samples the minimum value in the window.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`strides` | ArrayAttr | 64-bit integer array attribute |

`dilations` | ArrayAttr | 64-bit integer array attribute |

`padding` | DenseIntElementsAttr | 64-bit signless integer elements attribute |

#### Operands:

Operand | Description |
---|---|

`input` | strided memref of any type values |

`windowDims` | strided memref of any type values |

`output` | strided memref of any type values |

`linalg.pooling_sum`

(linalg::PoolingSumOp)

Syntax:

```
operation ::= `linalg.pooling_sum` `(` operands `)` attr-dict `:` type(operands)
```

Takes add op as pooling operation, i.e., it accumulates the values in the window.

#### Attributes:

Attribute | MLIR Type | Description |
---|---|---|

`strides` | ArrayAttr | 64-bit integer array attribute |

`dilations` | ArrayAttr | 64-bit integer array attribute |

`padding` | DenseIntElementsAttr | 64-bit signless integer elements attribute |

#### Operands:

Operand | Description |
---|---|

`input` | strided memref of any type values |

`windowDims` | strided memref of any type values |

`output` | strided memref of any type values |