MLIR

Multi-Level IR Compiler Framework

Users of MLIR

In alphabetical order below.

Accera

Accera is a compiler that enables you to experiment with loop optimizations without hand-writing Assembly code. With Accera, these problems and impediments can be addressed in an optimized way. It is available as a Python library and supports cross-compiling to a wide range of processor targets.

Beaver

Beaver is an MLIR frontend in Elixir and Zig. Powered by Elixir’s composable modularity and meta-programming features, Beaver provides a simple, intuitive, and extensible interface for MLIR.

Bᴛᴏʀ2ᴍʟɪʀ: A Format and Toolchain for Hardware Verification

Bᴛᴏʀ2ᴍʟɪʀ applies MLIR to the domain of hardware verification by offering a clean way to take advantage of a format’s strengths. For example, we support the use of software verification methods for hardware verification problems represented in the Bᴛᴏʀ2 format. The project aims to spur and support research in the formal verification domain, and has been shown to be competitive with existing methods.

Catalyst

Catalyst is an AOT/JIT compiler for PennyLane that accelerates hybrid quantum programs, with:

  • full auto-differentiation support, via custom quantum gradients and Enzyme-based backpropagation,
  • a dynamic quantum programming model,
  • and integration into the Python ML ecosytem.

Catalyst also comes with the Lightning high performance simulator by default, but supports an extensible backend system that is constantly evolving, aiming to deliver execution on heterogenous architectures with GPUs and QPUs.

CIRCT: Circuit IR Compilers and Tools

The CIRCT project is an (experimental!) effort looking to apply MLIR and the LLVM development methodology to the domain of hardware design tools.

DSP-MLIR: A Framework for Digital Signal Processing Applications in MLIR

DSP-MLIR is a framework designed specifically for DSP applications. It provides a DSL (Frontend), compiler, and rewrite patterns that detect DSP patterns and apply optimizations based on DSP theorems. The framework supports a wide range of DSP operations, including filters (FIR, IIR, filter response), transforms (DCT, FFT, IFFT), and other signal processing operations such as delay and gain, along with additional functionalities for application development.

Enzyme: General Automatic Differentiation of MLIR

Enzyme (specifically EnzymeMLIR) is a first-class automatic differentiation sytem for MLIR. Operations and types implement or inheret general interfaces to specify their differentiable behavior, which allows Enzyme to provide efficient forward and reverse pass derivatives. Source code is available here. See also the Enzyme-JaX project which uses Enzyme to differentiate StableHLO, and thus provide MLIR-native differentiation and codegen for JaX.

Firefly: A new compiler and runtime for BEAM languages

Firefly is not only a compiler, but a runtime as well. It consists of two parts:

  • A compiler for Erlang to native code for a given target (x86, ARM, WebAssembly)
  • An Erlang runtime, implemented in Rust, which provides the core functionality needed to implement OTP

The primary motivator for Firefly’s development was the ability to compile Elixir applications that could target WebAssembly, enabling use of Elixir as a language for frontend development. It is also possible to use Firefly to target other platforms as well, by producing self-contained executables on platforms such as x86.

Flang

Flang is a ground-up implementation of a Fortran front end written in modern C++. It started off as the f18 project with an aim to replace the previous flang project and address its various deficiencies. F18 was subsequently accepted into the LLVM project and rechristened as Flang. The high level IR of the Fortran compiler is modeled using MLIR.

IREE

IREE (pronounced “eerie”) is a compiler and minimal runtime system for compiling ML models for execution against a HAL (Hardware Abstraction Layer) that is aligned with Vulkan. It aims to be a viable way to compile and run ML devices on a variety of small and medium sized systems, leveraging either the GPU (via Vulkan/SPIR-V), CPU or some combination. It also aims to interoperate seamlessly with existing users of Vulkan APIs, specifically focused on games and rendering pipelines.

Kokkos:

The Kokkos C++ Performance Portability Ecosystem is a production level solution for writing modern C++ applications in a hardware agnostic way. It is part of the US Department of Energies Exascale Project – the leading effort in the US to prepare the HPC community for the next generation of super computing platforms. The Ecosystem consists of multiple libraries addressing the primary concerns for developing and maintaining applications in a portable way. The three main components are the Kokkos Core Programming Model, the Kokkos Kernels Math Libraries and the Kokkos Profiling and Debugging Tools.

There is current work ongoing to convert MLIR to portable Kokkos-based source code, add a partition dialect to MLIR to support tiled and distributed sparse tensors and target spatial dataflow accelerators.

Lingo DB: Revolutionizing Data Processing with Compiler Technology

LingoDB is a cutting-edge data processing system that leverages compiler technology to achieve unprecedented flexibility and extensibility without sacrificing performance. It supports a wide range of data-processing workflows beyond relational SQL queries, thanks to declarative sub-operators. Furthermore, LingoDB can perform cross-domain optimization by interleaving optimization passes of different domains and its flexibility enables sustainable support for heterogeneous hardware.

LingoDB heavily builds on the MLIR compiler framework for compiling queries to efficient machine code without much latency.

MARCO: Modelica Advanced Research COmpiler

MARCO is a prototype compiler for the Modelica language, with focus on the efficient compilation and simulation of large-scale models. The Modelica source code is processed by external tools to obtain a modeling language independent representation in Base Modelica, for which an MLIR dialect has been designed.

The project is complemented by multiple runtime libraries, written in C++, that are used to drive the generated simulation, provide support functions, and to ease interfacing with external differential equations solvers.

MLIR-AIE: Toolchain for AMD/Xilinx AIEngine devices

MLIR-AIE is a toolchain providing low-level device configuration for Versal AIEngine-based devices. Support is provided to target the AIEngine portion of the device, including processors, stream switches, TileDMA and ShimDMA blocks. Backend code generation is included, targetting the LibXAIE library, along with some higher-level abstractions enabling higher-level design.

MLIR-DaCe: Data-Centric MLIR Dialect

MLIR-DaCe is a project aiming to bridge the gap between control-centric and data-centric intermediate representations. By bridging these two groups of IRs, it allows the combination of control-centric and data-centric optimizations in optimization pipelines. In order to achieve this, MLIR-DaCe provides a data-centric dialect in MLIR to connect the MLIR and DaCe frameworks.

MLIR-EmitC

MLIR-EmitC provides a way to translate ML models into C++ code. The repository contains scripts and tools to translate Keras and TensorFlow models into the TOSA and StableHLO dialect and to convert those to EmitC. The latter is used to generate calls to a reference implementation.

The EmitC dialect itself, as well as the C++ emitter, are part of MLIR core and are no longer provided as part of the MLIR-EmitC repository.

Mojo

Mojo is a new programming language that bridges the gap between research and production by combining the best of Python syntax with systems programming and metaprogramming, all leveraging the MLIR ecosystem. It aims to be a strict superset of Python (i.e. be compatible with existing programs) and to embrace the CPython immediately for long-tail ecosystem enablement.

Nod Distributed Runtime: Asynchronous fine-grained op-level parallel runtime

Nod’s MLIR based Parallel Compiler and Distributed Runtime provide a way to easily scale out training and inference of very large models across multiple heterogeneous devices (CPUs/GPUs/Accelerators/FPGAs) in a cluster while exploiting fine-grained op-level parallelism.

ONNX-MLIR

To represent neural network models, users often use Open Neural Network Exchange (ONNX) which is an open standard format for machine learning interoperability. ONNX-MLIR is a MLIR-based compiler for rewriting a model in ONNX into a standalone binary that is executable on different target hardwares such as x86 machines, IBM Power Systems, and IBM System Z.

See also this paper: Compiling ONNX Neural Network Models Using MLIR.

OpenXLA

A community-driven, open source ML compiler ecosystem, using the best of XLA & MLIR.

PlaidML

PlaidML is a tensor compiler that facilitates reusable and performance portable ML models across various hardware targets including CPUs, GPUs, and accelerators.

PolyBlocks: An MLIR-based JIT and AOT compiler

PolyBlocks is a high-performance MLIR-based end-to-end compiler for DL and non-DL computations. It can perform both JIT and AOT compilation. Its compiler engine is aimed at being fully automatic, modular, analytical model-driven, and fully code generating (no reliance on vendor/HPC libraries).

Polygeist: C/C++ frontend and optimizations for MLIR

Polygeist is a C/C++ frontend for MLIR which preserves high-level structure from programs such as parallelism. Polygeist also includes high-level optimizations for MLIR, as well as various raising/lowering utilities.

See both the polyhedral Polygeist paper Polygeist: Raising C to Polyhedral MLIR and the GPU Polygeist paper High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs

Pylir

Pylir aims to be an optimizing Ahead-of-Time Python Compiler with high language conformance. It uses MLIR Dialects for the task of high level, language specific optimizations as well as LLVM for code genereation and garbage collector support.

RISE

RISE is a spiritual successor to the Lift project: “a high-level functional data parallel language with a system of rewrite rules which encode algorithmic and hardware-specific optimisation choices”.

SOPHGO TPU-MLIR

TPU-MLIR is an open-source machine-learning compiler based on MLIR for SOPHGO TPU. https://arxiv.org/abs/2210.15016.

TensorFlow

MLIR is used as a Graph Transformation framework and the foundation for building many tools (XLA, TFLite converter, quantization, …).

TFRT: TensorFlow Runtime

TFRT aims to provide a unified, extensible infrastructure layer for an asynchronous runtime system.

Torch-MLIR

The Torch-MLIR project aims to provide first class compiler support from the PyTorch ecosystem to the MLIR ecosystem.

Triton

Triton is a language and compiler for writing highly efficient custom Deep-Learning primitives. The aim of Triton is to provide an open-source environment to write fast code at higher productivity than CUDA, but also with higher flexibility than other existing DSLs.

VAST: C/C++ frontend for MLIR

VAST is a library for program analysis and instrumentation of C/C++ and related languages. VAST provides a foundation for customizable program representation for a broad spectrum of analyses. Using the MLIR infrastructure, VAST provides a toolset to represent C/C++ program at various stages of the compilation and to transform the representation to the best-fit program abstraction.

Verona

Project Verona is a research programming language to explore the concept of concurrent ownership. They are providing a new concurrency model that seamlessly integrates ownership.