MLIR

Multi-Level IR Compiler Framework

Dialect 'linalg' definition

The linalg dialect groups together a set of types, operations and transformations that are useful to implement a structured abstraction on buffers and tensors. These abstractions are useful for transformations and can lower to scalar load/store and other operations or to more general library calls.

Additional Linalg Dialect Documentation and a Rationale Document are are also available and should be read first before going in the details of the op semantics.

Operation definition

linalg.conv (linalg::ConvOp)

Description:

Generic n-D convolution as described in the TF documentation: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/nn/convolution

  output[b, x[0], ..., x[N-1], k] =
  sum_{z[0], ..., z[N-1], q}
      filter[z[0], ..., z[N-1], q, k] *
      padded_input[b,
                   x[0] * strides[0] + dilation_rate[0] * z[0],
                   ...,
                   x[N-1] * strides[N-1] + dilation_rate[N-1] * z[N-1],
                   q]

Operands:

  1. filter: strided memref of any type values
  2. input: strided memref of any type values
  3. output: strided memref of any type values

Attributes:

AttributeMLIR TypeDescription
stridesArrayAttr64-bit integer array attribute attribute
dilationsArrayAttr64-bit integer array attribute attribute

Results:

linalg.copy (linalg::CopyOp)

Description:

Copies the data in the input view into the output view.

Usage:

linalg.copy(%arg0, %arg1) : memref<?xf32, stride_specification>,
                            memref<?xf32, stride_specification>

One possible lowering to loop form is:

%0 = linalg.dim %arg0, 0 : index
loop.for %i0 = %c0 to %0 step %c1 {
  %1 = linalg.load %arg0[%i0] : memref<?xf32, stride_specification>
  linalg.store %1, %arg1[%i0] : memref<?xf32, stride_specification>
}

Optionally, can take input_permutation and output_permutation attributes to reorder the dimensions of the input and output views.

Usage:

linalg.copy(%arg0, %arg1) {inputPermutation : (i, j, k) -> (i, k, j),
                           outputPermutation : (i, j, k) -> (k, j, i)} :
  memref<?x?x?xf32, stride_specification>,
  memref<?x?x?xf32, stride_specification>

One possible lowering to loop form is:

%0 = linalg.dim %arg0, 0
%1 = linalg.dim %arg0, 1
%2 = linalg.dim %arg0, 2
loop.for %i0 = %c0 to %{{.*}} step %c1 {
  loop.for %i1 = %c0 to %{{.*}} step %c1 {
    loop.for %i2 = %c0 to %{{.*}} step %c1 {
      %3 = linalg.load %arg0[%i0, %i2, %i1] :
              memref<?x?x?xf32, stride_specification>
      linalg.store %3, %arg1[%i2, %i1, %i0] :
              memref<?x?x?xf32, stride_specification>

The views are expected to be compatible for correctness but this is not enforced at the moment.

Operands:

  1. input: strided memref of any type values
  2. output: strided memref of any type values

Attributes:

AttributeMLIR TypeDescription
inputPermutationAffineMapAttrAffineMap attribute attribute
outputPermutationAffineMapAttrAffineMap attribute attribute

Results:

linalg.dot (linalg::DotOp)

Description:

Operands:

  1. «unnamed»: strided memref of any type values of rank 1
  2. «unnamed»: strided memref of any type values of rank 1
  3. «unnamed»: strided memref of any type values of rank 0

Attributes:

Results:

linalg.fill (linalg::FillOp)

Description:

Operands:

  1. output: strided memref of any type values
  2. value: floating-point or integer or vector of any type values

Attributes:

Results:

linalg.generic (linalg::GenericOp)

Description:

Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.generic op is written as:

  linalg.generic #trait_attribute %A, %B, %C {other-attributes} :
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>

Where #trait_attributes is an alias of a dictionary attribute containing:

  • args_in: an I64Attr representing the number of input (readonly) views
  • args_out: an I64Attr representing the number of output (readwrite) views
  • doc [optional]: a documentation string
  • fun: a FlatSymbolRefAttr that must resolve to an existing function symbol. To support inplace updates in a generic fashion, the signature of the function must be:
      fun([input views element types], [output views element types])
        -> ([output views element types])
    
  • indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
  • library_call [optional]: a StringAttr containing the name of an external library function that the linalg.generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.generic will always lower to loops.
  • iterator_types: an ArrayAttr specifying the type of the enclosing loops. Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

  func @fma(%a: f32, %b: f32, %c: f32) -> f32 {
    %d = mulf %a, %b: f32
    %e = addf %c, %d: f32
    return %e: f32
  }
  #matmul_accesses = [
    (m, n, k) -> (m, k),
    (m, n, k) -> (k, n),
    (m, n, k) -> (m, n)
  ]
  #matmul_trait = {
    doc = "C(m, n) += A(m, k) * B(k, n)",
    fun = @fma,
    indexing_maps = #matmul_accesses,
    library_call = "linalg_matmul",
    n_views = [2, 1],
    iterator_types = ["parallel", "parallel", "reduction"]
  }

And can be reused in multiple places as:

  linalg.generic #matmul_trait %A, %B, %C [other-attributes] :
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>

This may lower to either:

  call @linalg_matmul(%A, %B, %C) :
    (memref<?x?xf32, stride_specification>,
     memref<?x?xf32, stride_specification>,
     memref<?x?xf32, stride_specification>)
    -> ()

or IR resembling:

loop.for %m = %c0 to %M step %c1 {
  loop.for %n = %c0 to %N step %c1 {
    loop.for %k = %c0 to %K step %c1 {
      %a = linalg.load %A[%m, %k] : memref<?x?xf32, stride_specification>
      %b = linalg.load %B[%k, %n] : memref<?x?xf32, stride_specification>
      %c = linalg.load %C[%m, %n] : memref<?x?xf32, stride_specification>
      %d = call @func_of_elements(%a, %b, %c)
             : (f32, f32, f32) -> (f32)
      linalg.store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
    }
  }
}

To allow progressive lowering from the value world (a.k.a tensor values) to the buffer world (a.k.a memref values), a linalg.generic op accepts mixing input and output ranked tensor values with input and output memrefs.

  %C = linalg.generic #trait_attribute %A, %B {other-attributes} :
    tensor<?x?xf32>,
    memref<?x?xf32, stride_specification>
    -> (tensor<?x?xf32>)

In this case, the number of outputs (args_out) must match the sum of (1) the number of output buffer operands and (2) the number of tensor return values. The semantics is that the linalg.indexed_generic op produces (i.e. allocates and fills) its tensor return values.

Tensor values must be legalized by a buffer allocation pass before most transformations can be applied. Such legalization moves tensor return values into output buffer operands and updates the region arguments accordingly.

Transformations that create control-flow around linalg.indexed_generic operations are not expected to work with tensors because SSA values do not escape naturally. Still, transformations and rewrites that take advantage of tensor SSA values are expected to be useful and will be added in the near future.

Operands:

  1. views: anonymous_269

Attributes:

AttributeMLIR TypeDescription
args_inIntegerAttr64-bit integer attribute attribute
args_outIntegerAttr64-bit integer attribute attribute
indexing_mapsArrayAttrAffineMap array attribute attribute
iterator_typesArrayAttrarray attribute attribute
docStringAttrstring attribute attribute
funFlatSymbolRefAttrflat symbol reference attribute attribute
library_callStringAttrstring attribute attribute

Results:

  1. output_tensors: ranked tensor of any type values

linalg.indexed_generic (linalg::IndexedGenericOp)

Description:

Indexed Generic Linalg op form where the key properties of the computation are specified as attributes. In pretty form, a linalg.indexed_generic op is written as:

  linalg.indexed_generic #trait_attribute %A, %B, %C {other-attributes} :
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>

Where #trait_attributes is an alias of a dictionary attribute containing:

  • args_in: an I64Attr representing the number of input (readonly) views
  • args_out: an I64Attr representing the number of output (readwrite) views
  • doc [optional]: a documentation string
  • fun: a FlatSymbolRefAttr that must resolve to an existing function symbol. To support inplace updates in a generic fashion, the signature of the function must be:
      fun([index types of induction variables], [input views element types],
          [output views element types]) -> ([output views element types])
    
  • indexing_maps: a list of AffineMapAttr, one AffineMapAttr per each input and output view. Such AffineMapAttr specifies the mapping between the loops and the indexing within each view.
  • library_call [optional]: a StringAttr containing the name of an external library function that the linalg.indexed_generic operation maps to. The external library is assumed to be dynamically linked and no strong compile-time guarantees are provided. In the absence of such a library call, linalg.indexed_generic will always lower to loops.
  • iterator_types: an ArrayAttr they type of the enclosing loops; Each element of the list represents and iterator of one of the following types: parallel, reduction, window

Example: Defining a #matmul_trait attribute in MLIR can be done as follows:

  func @fma(%offset_m: index, %offset_n: index, %offset_k: index,
            %a: f32, %b: f32, %c: f32)
    -> f32
  {
    "some_optional_condition"(%offset_m, %offset_n, %offset_k)
    %d = mulf %a, %b: f32
    %e = addf %c, %d: f32
    return %e: f32
  }
  #matmul_accesses = [
    (m, n, k) -> (m, k),
    (m, n, k) -> (k, n),
    (m, n, k) -> (m, n)
  ]
  #matmul_trait = {
    doc = "C(m, n) += A(m, k) * B(k, n)",
    fun = @fma,
    indexing_maps = #matmul_accesses,
    library_call = "linalg_matmul",
    n_views = [2, 1],
    iterator_types = ["parallel", "parallel", "reduction"]
  }

And can be reused in multiple places as:

  linalg.indexed_generic #matmul_trait %A, %B, %C [other-attributes] :
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>,
    memref<?x?xf32, stride_specification>

This may lower to either:

  call @linalg_matmul(%offset_m, %offset_n, %offset_k, %A, %B, %C) :
    (memref<?x?xf32, stride_specification>,
     memref<?x?xf32, stride_specification>,
     memref<?x?xf32, stride_specification>)
    -> ()

or IR resembling:

loop.for %m = %c0 to %M step %c1 {
  loop.for %n = %c0 to %N step %c1 {
    loop.for %k = %c0 to %K step %c1 {
      %a = linalg.load %A[%m, %k] : memref<?x?xf32, stride_specification>
      %b = linalg.load %B[%k, %n] : memref<?x?xf32, stride_specification>
      %c = linalg.load %C[%m, %n] : memref<?x?xf32, stride_specification>
      %d = call @func_of_elements_and_indices(%m, %n, %k, %a, %b, %c)
             : (index, index, index, f32, f32, f32) -> (f32)
      linalg.store %d, %C[%m, %n] : memref<?x?x?xf32, stride_specification>
    }
  }
}

To allow progressive lowering from the value world (a.k.a tensor values) to the buffer world (a.k.a memref values), a linalg.indexed_generic op accepts mixing input and output ranked tensor values with input and output memrefs.

  %C = linalg.indexed_generic #trait_attribute %A, %B {other-attributes}
  : tensor<?x?xf32>,
    memref<?x?xf32, stride_specification>
    -> (tensor<?x?xf32>)

In this case, the number of outputs (args_out) must match the sum of (1) the number of output buffer operands and (2) the number of tensor return values. The semantics is that the linalg.indexed_generic op produces (i.e. allocates and fills) its return values.

Tensor values must be legalized by a buffer allocation pass before most transformations can be applied. Such legalization moves tensor return values into output buffer operands and updates the region argument accordingly.

Transformations that create control-flow around linalg.indexed_generic operations are not expected to work with tensors because SSA values do not escape naturally. Still, transformations and rewrites that take advantage of tensor SSA values are expected to be useful and will be added in the near future.

Operands:

  1. views: anonymous_269

Attributes:

AttributeMLIR TypeDescription
args_inIntegerAttr64-bit integer attribute attribute
args_outIntegerAttr64-bit integer attribute attribute
indexing_mapsArrayAttrAffineMap array attribute attribute
iterator_typesArrayAttrarray attribute attribute
docStringAttrstring attribute attribute
funFlatSymbolRefAttrflat symbol reference attribute attribute
library_callStringAttrstring attribute attribute

Results:

  1. output_tensors: ranked tensor of any type values

linalg.range (linalg::RangeOp)

Create a range type value, used to create views

Description:

The linalg.range op creates a !linalg.range from 3 values of type index that represent the min, max and step values of the range. This type does not pass function boundaries at the moment.

Example:

  %3 = linalg.range %0:%1:%2 : !linalg.range

Operands:

  1. min: index
  2. max: index
  3. step: index

Attributes:

Results:

  1. «unnamed»: range

linalg.reshape (linalg::ReshapeOp)

linalg.reshape produces a new view into the operand view

Description:

The linalg.reshape op produces a new view whose sizes are a reassociation of the original view. Depending on whether or not the reassociated MemRefType is contiguous, the resulting memref may require explicit alloc and copies.

A reassociation is defined as a continuous grouping of dimensions and is represented with an affine map array attribute. In the future, non-continuous groupings may be allowed (i.e. permutations, reindexings etc).

For now, it is assumed that either:

  1. a reassociation produces and consumes contiguous MemRefType or,
  2. the reshape op will be folded into its consumers (by changing the shape of the computations). All other cases are undefined behavior and a reshape op may not lower to LLVM if it cannot be proven statically that it does not require alloc+copy.

A reshape may either collapse or expand dimensions, depending on the relationship between source and target memref ranks. The verification rule is that the reassociation maps are applied to the memref with the larger rank to obtain the memref with the smaller rank. In the case of a dimension expansion, the reassociation maps can be interpreted as inverse maps.

Examples:

   // Dimension collapse (i, j) -> i' and k -> k'
   %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
     memref<?x?x?xf32, stride_spec> into memref<?x?xf32, stride_spec_2>
   // Dimension expansion i -> (i', j') and (k) -> (k')
   %1 = linalg.reshape %0 [(i, j, k) -> (i, j), (i, j, k) -> (k)] :
     memref<?x?xf32, stride_spec> into memref<?x?x?xf32, stride_spec_2>

Operands:

  1. view: strided memref of any type values

Attributes:

AttributeMLIR TypeDescription
reassociationArrayAttrAffineMap array attribute attribute

Results:

  1. «unnamed»: strided memref of any type values

linalg.slice (linalg::SliceOp)

Produce a rank-reduced subview of a base view.

Description:

The linalg.slice op allows defining a subregion of a smaller rank than the operand view within the underlying buffer.

A linalg.slice op takes a view and a variadic number of indexings and produces a view of the same elemental type. An indexing is either:

  1. a linalg.range, in which case it does not reduce the rank of the parent view along the corresponding dimension.
  2. an index, in which case it reduces the rank of the parent view by one.

If an indexing extends past the size of the view, this is undefined behavior. Ideally the linalg.slice operation would automatically truncate it to be within bounds but there are tradeoffs involved now that std.view is a standard op.

Examples:

  1. rank-preserving slice:
  %4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
    !linalg.range, !linalg.range, memref<?x?xf32, stride_spec>
  1. rank-reducing slice (from 2-D to 1-D):
  %4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
    index, !linalg.range, memref<?x?xf32, stride_spec>
  1. rank-reducing slice (from 2-D to 0-D):
  %4 = linalg.slice %0[%1, %2] : memref<?x?xf32, stride_spec>,
    index, index, memref<?x?xf32, stride_spec>

Operands:

  1. view: strided memref of any type values
  2. indexings: range or index

Attributes:

Results:

  1. «unnamed»: strided memref of any type values

linalg.transpose (linalg::TransposeOp)

transpose produces a new strided memref (metadata-only)

Description:

The linalg.transpose op produces a strided memref whose sizes and strides are a permutation of the original view. This is a pure metadata transformation.

Example:

   %1 = linalg.transpose %0 (i, j) -> (j, i) : memref<?x?xf32, stride_spec>

Operands:

  1. view: strided memref of any type values

Attributes:

AttributeMLIR TypeDescription
permutationAffineMapAttrAffineMap attribute attribute

Results:

  1. «unnamed»: strided memref of any type values

linalg.yield (linalg::YieldOp)

Linalg yield operation

Description:

linalg.yield is a special terminator operation for blocks inside regions in linalg generic ops. It returns values to the immediately enclosing linalg generic op.

Example:

   linalg.yield %f0, %f1 : f32, f32

Operands:

  1. values: any type

Attributes:

Results:

linalg.matmul (linalg::MatmulOp)

Description:

Operands:

  1. «unnamed»: strided memref of any type values of rank 2
  2. «unnamed»: strided memref of any type values of rank 2
  3. «unnamed»: strided memref of any type values of rank 2

Attributes:

Results:

linalg.matvec (linalg::MatvecOp)

Description:

Operands:

  1. «unnamed»: strided memref of any type values of rank 2
  2. «unnamed»: strided memref of any type values of rank 1
  3. «unnamed»: strided memref of any type values of rank 1

Attributes:

Results: