MLIR

Multi-Level IR Compiler Framework

'sparse_tensor' Dialect

The SparseTensor dialect supports all the attributes, types, operations, and passes that are required to make sparse tensor types first class citizens within the MLIR compiler infrastructure. The dialect forms a bridge between high-level operations on sparse tensors types and lower-level operations on the actual sparse storage schemes consisting of pointers, indices, and values. Lower-level support may consist of fully generated code or may be provided by means of a small sparse runtime support library.

The concept of treating sparsity as a property, not a tedious implementation detail, by letting a sparse compiler generate sparse code automatically was pioneered for linear algebra by [Bik96] in MT1 (see https://www.aartbik.com/sparse.php) and formalized to tensor algebra by [Kjolstad17,Kjolstad20] in the Sparse Tensor Algebra Compiler (TACO) project (see http://tensor-compiler.org).

The MLIR implementation [Biketal22] closely follows the “sparse iteration theory” that forms the foundation of TACO. A rewriting rule is applied to each tensor expression in the Linalg dialect (MLIR’s tensor index notation) where the sparsity of tensors is indicated using the per-dimension level types dense/compressed together with a specification of the order on the dimensions (see [Chou18] for an in-depth discussions and possible extensions to these level types). Subsequently, a topologically sorted iteration graph, reflecting the required order on indices with respect to the dimensions of each tensor, is constructed to ensure that all tensors are visited in natural index order. Next, iteration lattices are constructed for the tensor expression for every index in topological order. Each iteration lattice point consists of a conjunction of tensor indices together with a tensor (sub)expression that needs to be evaluated for that conjunction. Within the lattice, iteration points are ordered according to the way indices are exhausted. As such these iteration lattices drive actual sparse code generation, which consists of a relatively straightforward one-to-one mapping from iteration lattices to combinations of for-loops, while-loops, and if-statements. Sparse tensor outputs that materialize uninitialized are handled with direct insertions if all parallel loops are outermost or insertions that indirectly go through a 1-dimensional access pattern expansion (a.k.a. workspace) where feasible [Gustavson72,Bik96,Kjolstad19].

  • [Bik96] Aart J.C. Bik. Compiler Support for Sparse Matrix Computations. PhD thesis, Leiden University, May 1996.
  • [Biketal22] Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. Compiler Support for Sparse Tensor Computations in MLIR. ACM Transactions on Architecture and Code Optimization, June, 2022. See: https://dl.acm.org/doi/10.1145/3544559
  • [Chou18] Stephen Chou, Fredrik Berg Kjolstad, and Saman Amarasinghe. Format Abstraction for Sparse Tensor Algebra Compilers. Proceedings of the ACM on Programming Languages, October 2018.
  • [Chou20] Stephen Chou, Fredrik Berg Kjolstad, and Saman Amarasinghe. Automatic Generation of Efficient Sparse Tensor Format Conversion Routines. Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, June, 2020.
  • [Gustavson72] Fred G. Gustavson. Some basic techniques for solving sparse systems of linear equations. In Sparse Matrices and Their Applications, pages 41–52. Plenum Press, New York, 1972.
  • [Kjolstad17] Fredrik Berg Kjolstad, Shoaib Ashraf Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. The Tensor Algebra Compiler. Proceedings of the ACM on Programming Languages, October 2017.
  • [Kjolstad19] Fredrik Berg Kjolstad, Peter Ahrens, Shoaib Ashraf Kamil, and Saman Amarasinghe. Tensor Algebra Compilation with Workspaces, Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2019.
  • [Kjolstad20] Fredrik Berg Kjolstad. Sparse Tensor Algebra Compilation. PhD thesis, MIT, February, 2020.

Operation definition 

sparse_tensor.binary (::mlir::sparse_tensor::BinaryOp) 

Binary set operation utilized within linalg.generic

Syntax:

operation ::= `sparse_tensor.binary` $x `,` $y `:` attr-dict type($x) `,` type($y) `to` type($output) `\n`
              `overlap` `=` $overlapRegion `\n`
              `left` `=` (`identity` $left_identity^):($leftRegion)? `\n`
              `right` `=` (`identity` $right_identity^):($rightRegion)?

Defines a computation within a linalg.generic operation that takes two operands and executes one of the regions depending on whether both operands or either operand is nonzero (i.e. stored explicitly in the sparse storage format).

Three regions are defined for the operation and must appear in this order:

  • overlap (elements present in both sparse tensors)
  • left (elements only present in the left sparse tensor)
  • right (element only present in the right sparse tensor)

Each region contains a single block describing the computation and result. Every non-empty block must end with a sparse_tensor.yield and the return type must match the type of output. The primary region’s block has two arguments, while the left and right region’s block has only one argument.

A region may also be declared empty (i.e. left={}), indicating that the region does not contribute to the output. For example, setting both left={} and right={} is equivalent to the intersection of the two inputs as only the overlap region will contribute values to the output.

As a convenience, there is also a special token identity which can be used in place of the left or right region. This token indicates that the return value is the input value (i.e. func(%x) => return %x). As a practical example, setting left=identity and right=identity would be equivalent to a union operation where non-overlapping values in the inputs are copied to the output unchanged.

Example of isEqual applied to intersecting elements only:

%C = bufferization.alloc_tensor...
%0 = linalg.generic #trait
  ins(%A: tensor<?xf64, #SparseVector>,
      %B: tensor<?xf64, #SparseVector>)
  outs(%C: tensor<?xi8, #SparseVector>) {
  ^bb0(%a: f64, %b: f64, %c: i8) :
    %result = sparse_tensor.binary %a, %b : f64, f64 to i8
      overlap={
        ^bb0(%arg0: f64, %arg1: f64):
          %cmp = arith.cmpf "oeq", %arg0, %arg1 : f64
          %ret_i8 = arith.extui %cmp : i1 to i8
          sparse_tensor.yield %ret_i8 : i8
      }
      left={}
      right={}
    linalg.yield %result : i8
} -> tensor<?xi8, #SparseVector>

Example of A+B in upper triangle, A-B in lower triangle:

%C = bufferization.alloc_tensor...
%1 = linalg.generic #trait
  ins(%A: tensor<?x?xf64, #CSR>, %B: tensor<?x?xf64, #CSR>
  outs(%C: tensor<?x?xf64, #CSR> {
  ^bb0(%a: f64, %b: f64, %c: f64) :
    %row = linalg.index 0 : index
    %col = linalg.index 1 : index
    %result = sparse_tensor.binary %a, %b : f64, f64 to f64
      overlap={
        ^bb0(%x: f64, %y: f64):
          %cmp = arith.cmpi "uge", %col, %row : index
          %upperTriangleResult = arith.addf %x, %y : f64
          %lowerTriangleResult = arith.subf %x, %y : f64
          %ret = arith.select %cmp, %upperTriangleResult, %lowerTriangleResult : f64
          sparse_tensor.yield %ret : f64
      }
      left=identity
      right={
        ^bb0(%y: f64):
          %cmp = arith.cmpi "uge", %col, %row : index
          %lowerTriangleResult = arith.negf %y : f64
          %ret = arith.select %cmp, %y, %lowerTriangleResult : f64
          sparse_tensor.yield %ret : f64
      }
    linalg.yield %result : f64
} -> tensor<?x?xf64, #CSR>

Example of set difference. Returns a copy of A where its sparse structure is not overlapped by B. The element type of B can be different than A because we never use its values, only its sparse structure:

%C = bufferization.alloc_tensor...
%2 = linalg.generic #trait
  ins(%A: tensor<?x?xf64, #CSR>, %B: tensor<?x?xi32, #CSR>
  outs(%C: tensor<?x?xf64, #CSR> {
  ^bb0(%a: f64, %b: i32, %c: f64) :
    %result = sparse_tensor.binary %a, %b : f64, i32 to f64
      overlap={}
      left=identity
      right={}
    linalg.yield %result : f64
} -> tensor<?x?xf64, #CSR>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
left_identity::mlir::UnitAttrunit attribute
right_identity::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
xany type
yany type

Results: 

ResultDescription
outputany type

sparse_tensor.compress (::mlir::sparse_tensor::CompressOp) 

Compressed an access pattern for insertion

Syntax:

operation ::= `sparse_tensor.compress` $values `,` $filled `,` $added `,` $count `into` $tensor `[` $indices `]` attr-dict `:` type($values) `,` type($filled) `,` type($added) `,` type($tensor)

Finishes a single access pattern expansion by moving inserted elements into the sparse storage scheme of the given tensor with the given indices. The arity of indices is one less than the rank of the tensor, with the remainder innermost indices defined through the added array. The values and filled array are reset in a sparse fashion by only iterating over set elements through an indirection using the added array, so that the operations are kept proportional to the number of nonzeros. See the sparse_tensor.expand operation for more details.

Note that this operation is “impure” in the sense that even though the result is modeled through an SSA value, the insertion is eventually done “in place”, and referencing the old SSA value is undefined behavior.

Example:

%result = sparse_tensor.compress %values, %filled, %added, %count into %tensor[%i]
  : memref<?xf64>, memref<?xi1>, memref<?xindex>, tensor<4x4xf64, #CSR>

Interfaces: InferTypeOpInterface

Operands: 

OperandDescription
valuesstrided memref of any type values of rank 1
filled1D memref of 1-bit signless integer values
added1D memref of index values
countindex
tensorsparse tensor of any type values
indicesindex

Results: 

ResultDescription
resultsparse tensor of any type values

sparse_tensor.concatenate (::mlir::sparse_tensor::ConcatenateOp) 

Concatenates a list of tensors into a single tensor.

Syntax:

operation ::= `sparse_tensor.concatenate` $inputs attr-dict `:` type($inputs) `to` type($result)

Concatenates a list input tensors and the output tensor with the same rank. The concatenation happens on the specified dimension (0<= dimension < rank). The resulting dimension size is the sum of all the input dimension sizes, while all the other dimensions should have the same size in the input and output tensors.

Only statically-sized input tensors are accepted, while the output tensor can be dynamically-sized.

Example:

%0 = sparse_tensor.concatenate %1, %2 { dimension = 0 : index }
  : tensor<64x64xf64, #CSR>, tensor<64x64xf64, #CSR> to tensor<128x64xf64, #CSR>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dimension::mlir::IntegerAttrindex attribute

Operands: 

OperandDescription
inputsranked tensor of any type values

Results: 

ResultDescription
resultranked tensor of any type values

sparse_tensor.convert (::mlir::sparse_tensor::ConvertOp) 

Converts between different tensor types

Syntax:

operation ::= `sparse_tensor.convert` $source attr-dict `:` type($source) `to` type($dest)

Converts one sparse or dense tensor type to another tensor type. The rank of the source and destination types must match exactly, and the dimension sizes must either match exactly or relax from a static to a dynamic size. The sparse encoding of the two types can obviously be completely different. The name convert was preferred over cast, since the operation may incur a non-trivial cost.

When converting between two different sparse tensor types, only explicitly stored values are moved from one underlying sparse storage format to the other. When converting from an unannotated dense tensor type to a sparse tensor type, an explicit test for nonzero values is used. When converting to an unannotated dense tensor type, implicit zeroes in the sparse storage format are made explicit. Note that the conversions can have non-trivial costs associated with them, since they may involve elaborate data structure transformations. Also, conversions from sparse tensor types into dense tensor types may be infeasible in terms of storage requirements.

Trivial dense-to-dense convert will be removed by canonicalization while trivial sparse-to-sparse convert will be removed by the sparse codegen. This is because we use trivial sparse-to-sparse convert to tell bufferization that the sparse codegen will expand the tensor buffer into sparse tensor storage.

Examples:

%0 = sparse_tensor.convert %a : tensor<32x32xf32> to tensor<32x32xf32, #CSR>
%1 = sparse_tensor.convert %a : tensor<32x32xf32> to tensor<?x?xf32, #CSR>
%2 = sparse_tensor.convert %b : tensor<8x8xi32, #CSC> to tensor<8x8xi32, #CSR>
%3 = sparse_tensor.convert %c : tensor<4x8xf64, #CSR> to tensor<4x?xf64, #CSC>

// The following conversion is not allowed (since it would require a
// runtime assertion that the source's dimension size is actually 100).
%4 = sparse_tensor.convert %d : tensor<?xf64> to tensor<100xf64, #SV>

Traits: AlwaysSpeculatableImplTrait, SameOperandsAndResultElementType

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
sourcetensor of any type values

Results: 

ResultDescription
desttensor of any type values

sparse_tensor.expand (::mlir::sparse_tensor::ExpandOp) 

Expands an access pattern for insertion

Syntax:

operation ::= `sparse_tensor.expand` $tensor attr-dict `:` type($tensor) `to` type($values) `,` type($filled) `,` type($added)

Performs an access pattern expansion for the innermost dimensions of the given tensor. This operation is useful to implement kernels in which a sparse tensor appears as output. This technique is known under several different names and using several alternative implementations, for example, phase counter [Gustavson72], expanded or switch array [Pissanetzky84], in phase scan [Duff90], access pattern expansion [Bik96], and workspaces [Kjolstad19].

The values and filled array have sizes that suffice for a dense innermost dimension (e.g. a full row for matrices). The added array and count are used to store new indices when a false value is encountered in the filled array. All arrays should be allocated before the loop (possibly even shared between loops in a future optimization) so that their dense initialization can be amortized over many iterations. Setting and resetting the dense arrays in the loop nest itself is kept sparse by only iterating over set elements through an indirection using the added array, so that the operations are kept proportional to the number of nonzeros.

Note that this operation is “impure” in the sense that even though the results are modeled through SSA values, the operation relies on a proper side-effecting context that sets and resets the expanded arrays.

Example:

%values, %filled, %added, %count = sparse_tensor.expand %tensor
  : tensor<4x4xf64, #CSR> to memref<?xf64>, memref<?xi1>, memref<?xindex>

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
valuesstrided memref of any type values of rank 1
filled1D memref of 1-bit signless integer values
added1D memref of index values
countindex

sparse_tensor.foreach (::mlir::sparse_tensor::ForeachOp) 

Iterates over elements in a tensor

Syntax:

operation ::= `sparse_tensor.foreach` `in` $tensor (`init``(`$initArgs^`)`)? attr-dict    `:` type($tensor) (`,` type($initArgs)^)?  (`->` type($results)^)?  `do` $region

Iterates over stored elements in a tensor (which are typically, but not always, non-zero for sparse tensors) and executes the block.

For an input tensor with rank n, the block must take n + 1 (and additional loop carried variables as described below) arguments. The first n arguments must be Index type, together indicating the current coordinates of the element being visited. The last argument must have the same type as the tensor’s element type, representing the actual value loaded from the input tensor at the given coordinates.

sparse_tensor.foreach can also operate on loop-carried variables and returns the final values after loop termination. The initial values of the variables are passed as additional SSA operands to the “sparse_tensor.foreach” following the n + 1 SSA values mentioned above (n coordinate and 1 value).

The region must terminate with a “sparse_tensor.yield” that passes the current values of all loop-carried variables to the next iteration, or to the result, if at the last iteration. The number and static types of loop-carried variables may not change with iterations.

For example:

%c0 = arith.constant 0 : i32
%ret = sparse_tensor.foreach in %0 init(%c0): tensor<?x?xi32, #DCSR>, i32 -> i32 do {
 ^bb0(%arg1: index, %arg2: index, %arg3: i32, %iter: i32):
   %sum = arith.add %iter, %arg3
   sparse_tensor.yield %sum
}

It is important to note that foreach generated loop iterates over the stored elements in the storage order. However, no matter what storage order is used, the indices passed to the block always obey the original dimension order.

For example:

#COL_MAJOR = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>
}>

// foreach on a column-major sparse tensor
sparse_tensor.foreach in %0 : tensor<2x3xf64, #COL_MAJOR> do {
 ^bb0(%row: index, %col: index, %arg3: f64):
    // [%row, %col] -> [0, 0], [1, 0], [2, 0], [0, 1], [1, 1], [2, 1]
}

#ROW_MAJOR = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed" ],
}>

// foreach on a row-major sparse tensor
sparse_tensor.foreach in %0 : tensor<2x3xf64, #ROW_MAJOR> do {
 ^bb0(%row: index, %col: index, %arg3: f64):
    // [%row, %col] -> [0, 0], [0, 1], [1, 0], [1, 1], [2, 0], [2, 1]
}

Traits: SingleBlockImplicitTerminator

Operands: 

OperandDescription
tensortensor of any type values
initArgsany type

Results: 

ResultDescription
resultsany type

sparse_tensor.insert (::mlir::sparse_tensor::InsertOp) 

Inserts a value into given sparse tensor

Syntax:

operation ::= `sparse_tensor.insert` $value `into` $tensor `[` $indices `]` attr-dict `:` type($tensor)

Inserts the given value at given indices into the underlying sparse storage format of the given tensor with the given indices. The arity of indices must match the rank of the tensor. This operation can only be applied when a tensor materializes unintialized with a bufferization.alloc_tensor operation and the final tensor is constructed with a load operation that has the hasInserts attribute set.

Properties in the sparse tensor type fully describe what kind of insertion order is allowed. When all dimensions have “unique” and “ordered” properties, for example, insertions should occur in strict lexicographical index order. Other properties define different insertion regimens. Inserting in a way contrary to these properties results in undefined behavior.

Note that this operation is “impure” in the sense that even though the result is modeled through an SSA value, the insertion is eventually done “in place”, and referencing the old SSA value is undefined behavior. This operation is scheduled to be unified with the dense counterpart tensor.insert that has pure SSA semantics.

Example:

%result = sparse_tensor.insert %val into %tensor[%i,%j] : tensor<1024x1024xf64, #CSR>

Interfaces: InferTypeOpInterface

Operands: 

OperandDescription
valueany type
tensorsparse tensor of any type values
indicesindex

Results: 

ResultDescription
resultsparse tensor of any type values

sparse_tensor.load (::mlir::sparse_tensor::LoadOp) 

Rematerializes tensor from underlying sparse storage format

Syntax:

operation ::= `sparse_tensor.load` $tensor (`hasInserts` $hasInserts^)? attr-dict `:` type($tensor)

Rematerializes a tensor from the underlying sparse storage format of the given tensor. This is similar to the bufferization.to_tensor operation in the sense that it provides a bridge between a bufferized world view and a tensor world view. Unlike the bufferization.to_tensor operation, however, this sparse operation is used only temporarily to maintain a correctly typed intermediate representation during progressive bufferization.

The hasInserts attribute denote whether insertions to the underlying sparse storage format may have occurred, in which case the underlying sparse storage format needs to be finalized. Otherwise, the operation simply folds away.

Note that this operation is “impure” in the sense that even though the result is modeled through an SSA value, the operation relies on a proper context of materializing and inserting the tensor value.

Examples:

%result = sparse_tensor.load %tensor : tensor<8xf64, #SV>

%1 = sparse_tensor.load %0 hasInserts : tensor<16x32xf32, #CSR>

Traits: SameOperandsAndResultType

Interfaces: InferTypeOpInterface

Attributes: 

AttributeMLIR TypeDescription
hasInserts::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
resulttensor of any type values

sparse_tensor.new (::mlir::sparse_tensor::NewOp) 

Materializes a new sparse tensor from given source

Syntax:

operation ::= `sparse_tensor.new` (`expand_symmetry` $expandSymmetry^)? $source attr-dict`:` type($source) `to` type($result)

Materializes a sparse tensor with contents taken from an opaque pointer provided by source. For targets that have access to a file system, for example, this pointer may be a filename (or file) of a sparse tensor in a particular external storage format. The form of the operation is kept deliberately very general to allow for alternative implementations in the future, such as pointers to buffers or runnable initialization code. The operation is provided as an anchor that materializes a properly typed sparse tensor with inital contents into a computation.

An optional attribute expandSymmetry can be used to extend this operation to make symmetry in external formats explicit in the storage. That is, when the attribute presents and a non-zero value is discovered at (i, j) where i!=j, we add the same value to (j, i). This claims more storage than a pure symmetric storage, and thus may cause a bad performance hit. True symmetric storage is planned for the future.

Example:

sparse_tensor.new %source : !Source to tensor<1024x1024xf64, #CSR>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
expandSymmetry::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
sourceany type

Results: 

ResultDescription
resultsparse tensor of any type values

sparse_tensor.number_of_entries (::mlir::sparse_tensor::NumberOfEntriesOp) 

Returns the number of entries that are stored in the tensor.

Syntax:

operation ::= `sparse_tensor.number_of_entries` $tensor attr-dict `:` type($tensor)

Returns the number of entries that are stored in the given sparse tensor. Note that this is typically the number of nonzero elements in the tensor, but since explicit zeros may appear in the storage formats, the more accurate nomenclature is used.

Example:

%noe = sparse_tensor.number_of_entries %tensor : tensor<64x64xf64, #CSR>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
resultindex

sparse_tensor.out (::mlir::sparse_tensor::OutOp) 

Outputs a sparse tensor to the given destination

Syntax:

operation ::= `sparse_tensor.out` $tensor `,` $dest attr-dict `:` type($tensor) `,` type($dest)

Outputs the contents of a sparse tensor to the destination defined by an opaque pointer provided by dest. For targets that have access to a file system, for example, this pointer may specify a filename (or file) for output. The form of the operation is kept deliberately very general to allow for alternative implementations in the future, such as sending the contents to a buffer defined by a pointer.

Note that this operation is “impure” in the sense that its behavior is solely defined by side-effects and not SSA values.

Example:

sparse_tensor.out %t, %dest : tensor<1024x1024xf64, #CSR>, !Dest

Operands: 

OperandDescription
tensorsparse tensor of any type values
destany type

sparse_tensor.push_back (::mlir::sparse_tensor::PushBackOp) 

Pushes a value to the back of a given buffer

Syntax:

operation ::= `sparse_tensor.push_back` (`inbounds` $inbounds^)? $bufferSizes `,` $inBuffer `,` $value (`,` $n^ )?  attr-dict `:` type($bufferSizes) `,` type($inBuffer) `,` type($value)  (`,` type($n)^ )?

Push value to the end of the given sparse tensor storage buffer inBuffer and update the size of the buffer in bufferSizes[idx]. The capacity of the buffer is recorded in the memref type of inBuffer . If the current buffer is full, then inBuffer.realloc is called before pushing the data to the buffer. This is similar to std::vector push_back.

The optional input n specifies the number of times to repeately push the value to the back of the tensor. When n is a compile-time constant, its value can’t be less than 1. If n is a runtime value that is less than 1, the behavior is undefined. Although using input n is semantically equivalent to calling push_back n times, it gives compiler more chances to to optimize the memory reallocation and the filling of the memory with the same value.

The inbounds attribute tells the compiler that the insertion won’t go beyond the current storage buffer. This allows the compiler to not generate the code for capacity check and reallocation. The typical usage will be for “dynamic” sparse tensors for which a capacity can be set beforehand.

Note that this operation is “impure” in the sense that even though the result is modeled through an SSA value, referencing the memref through the old SSA value after this operation is undefined behavior.

Example:

%r = sparse_tensor.push_back %bufferSizes, %buffer, %val
  {idx = 0 : index} : memref<?xindex>, memref<?xf64>, f64
%r = sparse_tensor.push_back inbounds %bufferSizes, %buffer, %val
  {idx = 0 : index} : memref<?xindex>, memref<?xf64>, f64
%r = sparse_tensor.push_back inbounds %bufferSizes, %buffer, %val, %n
  {idx = 0 : index} : memref<?xindex>, memref<?xf64>, f64

Interfaces: InferTypeOpInterface

Attributes: 

AttributeMLIR TypeDescription
idx::mlir::IntegerAttrindex attribute
inbounds::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
bufferSizes1D memref of index values
inBuffer1D memref of any type values
valueany type
nindex

Results: 

ResultDescription
outBuffer1D memref of any type values

sparse_tensor.reduce (::mlir::sparse_tensor::ReduceOp) 

Custom reduction operation utilized within linalg.generic

Syntax:

operation ::= `sparse_tensor.reduce` $x `,` $y `,` $identity attr-dict `:` type($output) $region

Defines a computation with a linalg.generic operation that takes two operands and an identity value and reduces all values down to a single result based on the computation in the region.

The region must contain exactly one block taking two arguments. The block must end with a sparse_tensor.yield and the output must match the input argument types.

Note that this operation is only required for custom reductions beyond the standard operations (add, mul, and, or, etc). The linalg.generic iterator_types defines which indices are being reduced. When the associated operands are used in an operation, a reduction will occur. The use of this explicit reduce operation is not required in most cases.

Example of Matrix->Vector reduction using max(product(x_i), 100):

%cf1 = arith.constant 1.0 : f64
%cf100 = arith.constant 100.0 : f64
%C = bufferization.alloc_tensor...
%0 = linalg.generic #trait
   ins(%A: tensor<?x?xf64, #SparseMatrix>)
  outs(%C: tensor<?xf64, #SparseVector>) {
  ^bb0(%a: f64, %c: f64) :
    %result = sparse_tensor.reduce %c, %a, %cf1 : f64 {
        ^bb0(%arg0: f64, %arg1: f64):
          %0 = arith.mulf %arg0, %arg1 : f64
          %cmp = arith.cmpf "ogt", %0, %cf100 : f64
          %ret = arith.select %cmp, %cf100, %0 : f64
          sparse_tensor.yield %ret : f64
      }
    linalg.yield %result : f64
} -> tensor<?xf64, #SparseVector>

Traits: AlwaysSpeculatableImplTrait, SameOperandsAndResultType

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
xany type
yany type
identityany type

Results: 

ResultDescription
outputany type

sparse_tensor.select (::mlir::sparse_tensor::SelectOp) 

Select operation utilized within linalg.generic

Syntax:

operation ::= `sparse_tensor.select` $x attr-dict `:` type($x) $region

Defines an evaluation within a linalg.generic operation that takes a single operand and decides whether or not to keep that operand in the output.

A single region must contain exactly one block taking one argument. The block must end with a sparse_tensor.yield and the output type must be boolean.

Value threshold is an obvious usage of the select operation. However, by using linalg.index, other useful selection can be achieved, such as selecting the upper triangle of a matrix.

Example of selecting A >= 4.0:

%C = bufferization.alloc_tensor...
%0 = linalg.generic #trait
   ins(%A: tensor<?xf64, #SparseVector>)
  outs(%C: tensor<?xf64, #SparseVector>) {
  ^bb0(%a: f64, %c: f64) :
    %result = sparse_tensor.select %a : f64 {
        ^bb0(%arg0: f64):
          %cf4 = arith.constant 4.0 : f64
          %keep = arith.cmpf "uge", %arg0, %cf4 : f64
          sparse_tensor.yield %keep : i1
      }
    linalg.yield %result : f64
} -> tensor<?xf64, #SparseVector>

Example of selecting lower triangle of a matrix:

%C = bufferization.alloc_tensor...
%0 = linalg.generic #trait
   ins(%A: tensor<?x?xf64, #CSR>)
  outs(%C: tensor<?x?xf64, #CSR>) {
  ^bb0(%a: f64, %c: f64) :
    %row = linalg.index 0 : index
    %col = linalg.index 1 : index
    %result = sparse_tensor.select %a : f64 {
        ^bb0(%arg0: f64):
          %keep = arith.cmpf "olt", %col, %row : f64
          sparse_tensor.yield %keep : i1
      }
    linalg.yield %result : f64
} -> tensor<?x?xf64, #CSR>

Traits: AlwaysSpeculatableImplTrait, SameOperandsAndResultType

Interfaces: ConditionallySpeculatable, InferTypeOpInterface, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
xany type

Results: 

ResultDescription
outputany type

sparse_tensor.sort_coo (::mlir::sparse_tensor::SortCooOp) 

Sorts the arrays in xs and ys lexicographically on the integral values found in the xs list

Syntax:

operation ::= `sparse_tensor.sort_coo` (`stable` $stable^)? $n`,`$xy (`jointly` $ys^)? attr-dict`:` type($xy) (`jointly` type($ys)^)?

Sparse_tensor.sort_coo is similar to sparse_tensor.sort, except that all the xs values and some ys values are put in the linear buffer xy. The optional index attribute nx provides the number of xs values in xy. When nx is not explicitly specified, its value is 1. The optional index attribute ny provides the number of ys values in xy. When ny is not explicitly specified, its value is 0. This instruction supports a more efficient way to store the COO definition in sparse tensor type.

The buffer xy should have a dimension not less than n * (nx + ny) while the buffers in ys should have a dimension not less than n. The behavior of the operator is undefined if this condition is not met.

Example:

sparse_tensor.sort_coo %n, %x { nx = 2 : index}
  : memref<?xindex>
sparse_tensor.sort %n, %xy jointly %y1 { nx = 2 : index, ny = 2 : index}
  : memref<?xi64> jointly memref<?xf32>

Attributes: 

AttributeMLIR TypeDescription
nx::mlir::IntegerAttrindex attribute
ny::mlir::IntegerAttrindex attribute
stable::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
nindex
xy1D memref of integer or index values
ys1D memref of any type values

sparse_tensor.sort (::mlir::sparse_tensor::SortOp) 

Sorts the arrays in xs and ys lexicographically on the integral values found in the xs list

Syntax:

operation ::= `sparse_tensor.sort` (`stable` $stable^)? $n`,`$xs (`jointly` $ys^)? attr-dict`:` type($xs) (`jointly` type($ys)^)?

Lexicographically sort the first n values in xs along with the values in ys. Conceptually, the values being sorted are tuples produced by zip(zip(xs), zip(ys)). In particular, values in ys needed to be sorted along with values in xs, but values in ys don’t affect the lexicographical order. The order in which arrays appear in xs affects the sorting result. The operator updates xs and ys in place with the result of the sorting.

For example, assume x1=[4, 3], x2=[1, 2], y1=[10, 5], then the output of “sort 2, x1, x2 jointly y1” are x1=[3, 4], x2=[2, 1], y1=[5, 10] while the output of “sort 2, x2, x1, jointly y1” are x2=[1, 2], x1=[4, 3], y1=[10, 5].

Buffers in xs needs to have the same integral element type while buffers in ys can have different numeric element types. All buffers in xs and ys should have a dimension not less than n. The behavior of the operator is undefined if this condition is not met. The operator requires at least one buffer in xs while ys can be empty.

The stable attribute indicates whether a stable sorting algorithm should be used to implement the operator.

Note that this operation is “impure” in the sense that its behavior is solely defined by side-effects and not SSA values.

Example:

sparse_tensor.sort %n, %x1, %x2 jointly y1, %y2
  : memref<?xindex>, memref<?xindex> jointly memref<?xindex>, memref<?xf32>
sparse_tensor.sort stable %n, %x1, %x2 jointly y1, %y2
  : memref<?xindex>, memref<?xindex> jointly memref<?xindex>, memref<?xf32>

Traits: AttrSizedOperandSegments

Attributes: 

AttributeMLIR TypeDescription
stable::mlir::UnitAttrunit attribute

Operands: 

OperandDescription
nindex
xs1D memref of integer or index values
ys1D memref of any type values

sparse_tensor.indices (::mlir::sparse_tensor::ToIndicesOp) 

Extracts indices array at given dimension from a tensor

Syntax:

operation ::= `sparse_tensor.indices` $tensor attr-dict `:` type($tensor) `to` type($result)

Returns the indices array of the sparse storage format at the given dimension for the given sparse tensor. This is similar to the bufferization.to_memref operation in the sense that it provides a bridge between a tensor world view and a bufferized world view. Unlike the bufferization.to_memref operation, however, this sparse operation actually lowers into code that extracts the indices array from the sparse storage scheme (either by calling a support library or through direct code).

Example:

%1 = sparse_tensor.indices %0 { dimension = 1 : index }
   : tensor<64x64xf64, #CSR> to memref<?xindex>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dimension::mlir::IntegerAttrindex attribute

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
resultstrided memref of any type values of rank 1

sparse_tensor.pointers (::mlir::sparse_tensor::ToPointersOp) 

Extracts pointers array at given dimension from a tensor

Syntax:

operation ::= `sparse_tensor.pointers` $tensor attr-dict `:` type($tensor) `to` type($result)

Returns the pointers array of the sparse storage format at the given dimension for the given sparse tensor. This is similar to the bufferization.to_memref operation in the sense that it provides a bridge between a tensor world view and a bufferized world view. Unlike the bufferization.to_memref operation, however, this sparse operation actually lowers into code that extracts the pointers array from the sparse storage scheme (either by calling a support library or through direct code).

Example:

%1 = sparse_tensor.pointers %0 { dimension = 1 : index }
   : tensor<64x64xf64, #CSR> to memref<?xindex>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Attributes: 

AttributeMLIR TypeDescription
dimension::mlir::IntegerAttrindex attribute

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
resultstrided memref of any type values of rank 1

sparse_tensor.values (::mlir::sparse_tensor::ToValuesOp) 

Extracts numerical values array from a tensor

Syntax:

operation ::= `sparse_tensor.values` $tensor attr-dict `:` type($tensor) `to` type($result)

Returns the values array of the sparse storage format for the given sparse tensor, independent of the actual dimension. This is similar to the bufferization.to_memref operation in the sense that it provides a bridge between a tensor world view and a bufferized world view. Unlike the bufferization.to_memref operation, however, this sparse operation actually lowers into code that extracts the values array from the sparse storage scheme (either by calling a support library or through direct code).

Example:

%1 = sparse_tensor.values %0 : tensor<64x64xf64, #CSR> to memref<?xf64>

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
tensorsparse tensor of any type values

Results: 

ResultDescription
resultstrided memref of any type values of rank 1

sparse_tensor.unary (::mlir::sparse_tensor::UnaryOp) 

Unary set operation utilized within linalg.generic

Syntax:

operation ::= `sparse_tensor.unary` $x attr-dict `:` type($x) `to` type($output) `\n`
              `present` `=` $presentRegion `\n`
              `absent` `=` $absentRegion

Defines a computation with a linalg.generic operation that takes a single operand and executes one of two regions depending on whether the operand is nonzero (i.e. stored explicitly in the sparse storage format).

Two regions are defined for the operation must appear in this order:

  • present (elements present in the sparse tensor)
  • absent (elements not present in the sparse tensor)

Each region contains a single block describing the computation and result. A non-empty block must end with a sparse_tensor.yield and the return type must match the type of output. The primary region’s block has one argument, while the missing region’s block has zero arguments.

A region may also be declared empty (i.e. absent={}), indicating that the region does not contribute to the output.

Example of A+1, restricted to existing elements:

%C = bufferization.alloc_tensor...
%0 = linalg.generic #trait
   ins(%A: tensor<?xf64, #SparseVector>)
  outs(%C: tensor<?xf64, #SparseVector>) {
  ^bb0(%a: f64, %c: f64) :
    %result = sparse_tensor.unary %a : f64 to f64
      present={
        ^bb0(%arg0: f64):
          %cf1 = arith.constant 1.0 : f64
          %ret = arith.addf %arg0, %cf1 : f64
          sparse_tensor.yield %ret : f64
      }
      absent={}
    linalg.yield %result : f64
} -> tensor<?xf64, #SparseVector>

Example returning +1 for existing values and -1 for missing values:

%result = sparse_tensor.unary %a : f64 to i32
  present={
    ^bb0(%x: f64):
      %ret = arith.constant 1 : i32
      sparse_tensor.yield %ret : i32
  }
  absent={
    %ret = arith.constant -1 : i32
    sparse_tensor.yield %ret : i32
  }

Example showing a structural inversion (existing values become missing in the output, while missing values are filled with 1):

%result = sparse_tensor.unary %a : f64 to i64
  present={}
  absent={
    %ret = arith.constant 1 : i64
    sparse_tensor.yield %ret : i64
  }

Traits: AlwaysSpeculatableImplTrait

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
xany type

Results: 

ResultDescription
outputany type

sparse_tensor.yield (::mlir::sparse_tensor::YieldOp) 

Yield from sparse_tensor set-like operations

Syntax:

operation ::= `sparse_tensor.yield` $result attr-dict `:` type($result)

Yields a value from within a binary, unary, reduce, select or foreach block.

Example:

%0 = sparse_tensor.unary %a : i64 to i64 {
  ^bb0(%arg0: i64):
    %cst = arith.constant 1 : i64
    %ret = arith.addi %arg0, %cst : i64
    sparse_tensor.yield %ret : i64
}

Traits: AlwaysSpeculatableImplTrait, Terminator

Interfaces: ConditionallySpeculatable, NoMemoryEffect (MemoryEffectOpInterface)

Effects: MemoryEffects::Effect{}

Operands: 

OperandDescription
resultany type

Attribute definition 

SparseTensorEncodingAttr 

An attribute to encode TACO-style information on sparsity properties of tensors. The encoding is eventually used by a sparse compiler pass to generate sparse code fully automatically for all tensor expressions that involve tensors with a sparse encoding. Compiler passes that run before this sparse compiler pass need to be aware of the semantics of tensor types with such an encoding.

The attribute consists of the following fields.

  • Dimension level type for each dimension of a tensor type:

    • dense : dimension is dense, all entries along this dimension are stored
    • compressed : dimension is sparse, only nonzeros along this dimensions are stored
    • singleton : dimension stores individual indices with no siblings By default, each dimension level types has the property of being unique (no duplicates at that level) and ordered (indices appear sorted at that level). The following two suffixes can be used to make the last two dimension level types not-unique (duplicates may appear) and not-ordered (indices may appear unsorted).
    • -nu : not unique
    • -no : not ordered Currently, these suffixes, is present, should appear in this order. In the future, we may introduce many more dimension level types and properties, and separate specifying the two completely rather than using this suffix mechanism.
  • An optional dimension ordering on the indices of this tensor type. Unlike dense storage, most sparse storage schemes do not provide fast random access. This affine map specifies the order of dimensions that should be supported by the sparse storage scheme. For example, for a 2-d tensor, (i, j) -> (i, j) requests row-wise storage and (i, j) -> (j, i) requests column-wise storage. By default, an identify mapping is used, which implies that the original indices directly correspond to stored indices.

  • An optional higher-ordering mapping from the original index space of the tensor to a higher-order index space, used to define block-sparse storage or ELL (jagged diagonal) storage. For example, for a 2-d tensor, the mapping (i, j) -> (i floordiv 2, j floordiv 3, i mod 2, j mod 3) imposes an higher-order partitioning into 2x3 blocks along the matrix layout. A dimension ordering can be used to define a desired ordering on this higher-order index space. Likewise, the dimension level types define dense or compressed storage along this higher-order index space. For block-sparse, blocks are typically stored with compression while dense storage is used within each block (although hybrid schemes are possible as well). The higher-order mapping also provides a notion of “counting a dimension”, where every stored element with the same index is mapped to a new slice. For instance, ELL storage of a 2-d tensor can be defined with the mapping (i, j) -> (#i, i, j) using the notation of [Chou20]. Lacking the # symbol in MLIR’s affine mapping, we use a free symbol c to define such counting, together with a constant that denotes the number of resulting slices. For example, the mapping (i, j)[c] -> (c * 3 * i, i, j) with the first two higher-order indices stored dense and the innermost compressed denotes ELL storage with three jagged diagonals that count the dimension i.

    TODO: introduce a real counting symbol to MLIR’s mapping, since an expression like 3ci has no direct interpretation?

  • The required bit width for “pointer” storage (integral offsets into the sparse storage scheme). A narrow width reduces the memory footprint of overhead storage, as long as the width suffices to define the total required range (viz. the maximum number of stored entries over all indirection dimensions). The choices are 8, 16, 32, 64, or, the default, 0 to indicate the native bit width.

  • The required bit width for “index” storage (elements of the coordinates of stored entries). A narrow width reduces the memory footprint of overhead storage, as long as the width suffices to define the total required range (viz. the maximum value of each tensor index over all dimensions). The choices are 8, 16, 32, 64, or, the default, 0 to indicate a native bit width.

Examples:

// Sparse vector.
#SparseVector = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed" ]
}>
... tensor<?xf32, #SparseVector> ...

// Sorted Coordinate Scheme.
#SortedCOO = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed-nu", "singleton" ]
}>
... tensor<?x?xf64, #SortedCOO> ...

// Doubly compressed sparse column storage with specific bitwidths.
#DCSC = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed" ],
  dimOrdering = affine_map<(i, j) -> (j, i)>,
  pointerBitWidth = 32,
  indexBitWidth = 8
}>
... tensor<8x8xf64, #DCSC> ...

// Block sparse row storage (2x3 blocks).
#BCSR = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed", "dense", "dense" ],
  dimOrdering  = affine_map<(ii, jj, i, j) -> (ii, jj, i, j)>,
  higherOrdering = affine_map<(i, j) -> (i floordiv 2, j floordiv 3, i mod 2, j mod 3)>
}>
... tensor<20x30xf32, #BCSR> ...

// ELL storage (4 jagged diagonals, i.e., at most 4 nonzeros per row).
#ELL = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "dense", "compressed" ],
  dimOrdering  = affine_map<(ii, i, j) -> (ii, i, j)>,
  higherOrdering = affine_map<(i, j)[c] -> (c * 4 * i, i, j)>
}>
... tensor<?x?xf64, #ELL> ...

Parameters: 

ParameterC++ typeDescription
dimLevelType::llvm::ArrayRef<::mlir::sparse_tensor::DimLevelType>per dimension level type
dimOrderingAffineMap
higherOrderingAffineMap
pointerBitWidthunsigned
indexBitWidthunsigned