MLIR Bytecode Format
This document describes the MLIR bytecode format and its encoding. This format is versioned and stable: we don’t plan to ever break compatibility, that is a dialect should be able to deserialize any older bytecode. Similarly, we support back-deployment so that an older version of the format can be targetted.
That said, it is important to realize that the promises of the bytecode format are made assuming immutable dialects: the format allows backward and forward compatibility, but only when nothing in a dialect changes (operations, types, attributes definitions).
A dialect can opt-in to handle its own versioning through the
BytecodeDialectInterface
. Some hooks are exposed to the dialect
to allow managing a version encoded into the bytecode file. The
version is loaded lazily and allows to retrieve the version
information while decoding the input IR, and gives an opportunity
to each dialect for which a version is present to perform IR
upgrades post-parsing through the upgradeFromVersion
method.
There is no restriction on what kind of information a dialect
is allowed to encode to model its versioning.
Magic Number ¶
MLIR uses the following four-byte magic number to indicate bytecode files:
‘[‘M’8, ‘L’8, ‘ï’8, ‘R’8]’
In hex:
‘[‘4D’8, ‘4C’8, ‘EF’8, ‘52’8]’
Format Overview ¶
An MLIR Bytecode file is comprised of a byte stream, with a few simple structural concepts layered on top.
Primitives ¶
Fixed-Width Integers ¶
byte ::= `0x00`...`0xFF`
Fixed width integers are unsigned integers of a known byte size. The values are stored in little-endian byte order.
TODO: Add larger fixed width integers as necessary.
Variable-Width Integers ¶
Variable width integers, or VarInt
s, provide a compact representation for
integers. Each encoded VarInt consists of one to nine bytes, which together
represent a single 64-bit value. The MLIR bytecode utilizes the “PrefixVarInt”
encoding for VarInts. This encoding is a variant of the
LEB128 (“Little-Endian Base 128”)
encoding, where each byte of the encoding provides up to 7 bits for the value,
with the remaining bit used to store a tag indicating the number of bytes used
for the encoding. This means that small unsigned integers (less than 2^7) may be
stored in one byte, unsigned integers up to 2^14 may be stored in two bytes,
etc.
The first byte of the encoding includes a length prefix in the low bits. This prefix is a bit sequence of ‘0’s followed by a terminal ‘1’, or the end of the byte. The number of ‘0’ bits indicate the number of additional bytes, not including the prefix byte, used to encode the value. All of the remaining bits in the first byte, along with all of the bits in the additional bytes, provide the value of the integer. Below are the various possible encodings of the prefix byte:
xxxxxxx1: 7 value bits, the encoding uses 1 byte
xxxxxx10: 14 value bits, the encoding uses 2 bytes
xxxxx100: 21 value bits, the encoding uses 3 bytes
xxxx1000: 28 value bits, the encoding uses 4 bytes
xxx10000: 35 value bits, the encoding uses 5 bytes
xx100000: 42 value bits, the encoding uses 6 bytes
x1000000: 49 value bits, the encoding uses 7 bytes
10000000: 56 value bits, the encoding uses 8 bytes
00000000: 64 value bits, the encoding uses 9 bytes
Signed Variable-Width Integers ¶
Signed variable width integer values are encoded in a similar fashion to varints, but employ zigzag encoding. This encoding uses the low bit of the value to indicate the sign, which allows for more efficiently encoding negative numbers. If a negative value were encoded using a normal varint, it would be treated as an extremely large unsigned value. Using zigzag encoding allows for a smaller number of active bits in the value, leading to a smaller encoding. Below is the basic computation for generating a zigzag encoding:
(value << 1) ^ (value >> 63)
Strings ¶
Strings are blobs of characters with an associated length.
Sections ¶
section {
idAndIsAligned: byte // id | (hasAlign << 7)
length: varint,
alignment: varint?,
padding: byte[], // Padding bytes are always `0xCB`.
data: byte[]
}
Sections are a mechanism for grouping data within the bytecode. They enable delayed processing, which is useful for out-of-order processing of data, lazy-loading, and more. Each section contains a Section ID, whose high bit indicates if the section has alignment requirements, a length (which allows for skipping over the section), and an optional alignment. When an alignment is present, a variable number of padding bytes (0xCB) may appear before the section data. The alignment of a section must be a power of 2.
MLIR Encoding ¶
Given the generic structure of MLIR, the bytecode encoding is actually fairly simplistic. It effectively maps to the core components of MLIR.
Top Level Structure ¶
The top-level structure of the bytecode contains the 4-byte “magic number”, a version number, a null-terminated producer string, and a list of sections. Each section is currently only expected to appear once within a bytecode file.
bytecode {
magic: "MLïR",
version: varint,
producer: string,
sections: section[]
}
String Section ¶
strings {
numStrings: varint,
reverseStringLengths: varint[],
stringData: byte[]
}
The string section contains a table of strings referenced within the bytecode, more easily enabling string sharing. This section is encoded first with the total number of strings, followed by the sizes of each of the individual strings in reverse order. The remaining encoding contains a single blob containing all of the strings concatenated together.
Dialect Section ¶
The dialect section of the bytecode contains all of the dialects referenced within the encoded IR, and some information about the components of those dialects that were also referenced.
dialect_section {
numDialects: varint,
dialectNames: dialect_name_group[],
opNames: dialect_ops_group[] // ops grouped by dialect
}
dialect_name_group {
nameAndIsVersioned: varint // (dialectID << 1) | (hasVersion),
version: dialect_version_section // only if versioned
}
dialect_version_section {
size: varint,
version: byte[]
}
dialect_ops_group {
dialect: varint,
numOpNames: varint,
opNames: op_name_group[]
}
op_name_group {
nameAndIsRegistered: varint // (nameID << 1) | (isRegisteredOp)
}
Dialects are encoded as a varint
containing the index to the name string
within the string section, plus a flag indicating whether the dialect is
versioned. Operation names are encoded in groups by dialect, with each group
containing the dialect, the number of operation names, and the array of indexes
to each name within the string section. The version is encoded as a nested
section for each dialect.
Attribute/Type Sections ¶
Attributes and types are encoded using two
sections, one section
(attr_type_section
) containing the actual encoded representation, and another
section (attr_type_offset_section
) containing the offsets of each encoded
attribute/type into the previous section. This structure allows for attributes
and types to always be lazily loaded on demand.
attr_type_section {
attrs: attribute[],
types: type[]
}
attr_type_offset_section {
numAttrs: varint,
numTypes: varint,
offsets: attr_type_offset_group[]
}
attr_type_offset_group {
dialect: varint,
numElements: varint,
offsets: varint[] // (offset << 1) | (hasCustomEncoding)
}
attribute {
encoding: ...
}
type {
encoding: ...
}
Each offset
in the attr_type_offset_section
above is the size of the
encoding for the attribute or type and a flag indicating if the encoding uses
the textual assembly format, or a custom bytecode encoding. We avoid using the
direct offset into the attr_type_section
, as a smaller relative offsets
provides more effective compression. Attributes and types are grouped by
dialect, with each attr_type_offset_group
in the offset section containing the
corresponding parent dialect, number of elements, and offsets for each element
within the group.
Attribute/Type Encodings ¶
In the abstract, an attribute/type is encoded in one of two possible ways: via its assembly format, or via a custom dialect defined encoding.
Assembly Format Fallback ¶
In the case where a dialect does not define a method for encoding the attribute
or type, the textual assembly format of that attribute or type is used as a
fallback. For example, a type !bytecode.type<42>
would be encoded as the null
terminated string “!bytecode.type<42>”. This ensures that every attribute and
type can be encoded, even if the owning dialect has not yet opted in to a more
efficient serialization.
TODO: We shouldn’t redundantly encode the dialect name here, we should use a reference to the parent dialect instead.
Dialect Defined Encoding ¶
As an alternative to the assembly format fallback, dialects may also provide a custom encoding for their attributes and types. Custom encodings are very beneficial in that they are significantly smaller and faster to read and write.
Dialects can opt-in to providing custom encodings by implementing the
BytecodeDialectInterface
. This interface provides hooks, namely
readAttribute
/readType
and writeAttribute
/writeType
, that will be used
by the bytecode reader and writer. These hooks are provided a reader and writer
implementation that can be used to encode various constructs in the underlying
bytecode format. A unique feature of this interface is that dialects may choose
to only encode a subset of their attributes and types in a custom bytecode
format, which can simplify adding new or experimental components that aren’t
fully baked.
When implementing the bytecode interface, dialects are responsible for all
aspects of the encoding. This includes the indicator for which kind of attribute
or type is being encoded; the bytecode reader will only know that it has
encountered an attribute or type of a given dialect, it doesn’t encode any
further information. As such, a common encoding idiom is to use a leading
varint
code to indicate how the attribute or type was encoded.
Resource Section ¶
Resources are encoded using two
sections, one section
(resource_section
) containing the actual encoded representation, and another
section (resource_offset_section
) containing the offsets of each encoded
resource into the previous section.
resource_section {
resources: resource[]
}
resource {
value: resource_bool | resource_string | resource_blob
}
resource_bool {
value: byte
}
resource_string {
value: varint
}
resource_blob {
alignment: varint,
size: varint,
padding: byte[],
blob: byte[]
}
resource_offset_section {
numExternalResourceGroups: varint,
resourceGroups: resource_group[]
}
resource_group {
key: varint,
numResources: varint,
resources: resource_info[]
}
resource_info {
key: varint,
size: varint
kind: byte,
}
Resources are grouped by the provider, either an external entity or a dialect,
with each resource_group
in the offset section containing the corresponding
provider, number of elements, and info for each element within the group. For
each element, we record the key, the value kind, and the encoded size. We avoid
using the direct offset into the resource_section
, as a smaller relative
offsets provides more effective compression.
IR Section ¶
The IR section contains the encoded form of operations within the bytecode.
ir_section {
block: block; // Single block without arguments.
}
Operation Encoding ¶
op {
name: varint,
encodingMask: byte,
location: varint,
attrDict: varint?,
numResults: varint?,
resultTypes: varint[],
numOperands: varint?,
operands: varint[],
numSuccessors: varint?,
successors: varint[],
numUseListOrders: varint?,
useListOrders: uselist[],
regionEncoding: varint?, // (numRegions << 1) | (isIsolatedFromAbove)
// regions are stored in a section if isIsolatedFromAbove
regions: (region | region_section)[]
}
uselist {
indexInRange: varint?,
useListEncoding: varint, // (numIndices << 1) | (isIndexPairEncoding)
indices: varint[]
}
The encoding of an operation is important because this is generally the most
commonly appearing structure in the bytecode. A single encoding is used for
every type of operation. Given this prevalence, many of the fields of an
operation are optional. The encodingMask
field is a bitmask which indicates
which of the components of the operation are present.
Location ¶
The location is encoded as the index of the location within the attribute table.
Attributes ¶
If the operation has attribues, the index of the operation attribute dictionary within the attribute table is encoded.
Results ¶
If the operation has results, the number of results and the indexes of the result types within the type table are encoded.
Operands ¶
If the operation has operands, the number of operands and the value index of each operand is encoded. This value index is the relative ordering of the definition of that value from the start of the first ancestor isolated region.
Successors ¶
If the operation has successors, the number of successors and the indexes of the successor blocks within the parent region are encoded.
Use-list orders ¶
The reference use-list order is assumed to be the reverse of the global
enumeration of all the op operands that one would obtain with a pre-order walk
of the IR. This order is naturally obtained by building blocks of operations
op-by-op. However, some transformations may shuffle the use-lists with respect
to this reference ordering. If any of the results of the operation have a
use-list order that is not sorted with respect to the reference use-list order,
an encoding is emitted such that it is possible to reconstruct such order after
parsing the bytecode. The encoding represents an index map from the reference
operand order to the current use-list order. A bit flag is used to detect if
this encoding is of type index-pair or not. When the bit flag is set to zero,
the element at i
represent the position of the use i
of the reference list
into the current use-list. When the bit flag is set to 1
, the encoding
represent index pairs (i, j)
, which indicate that the use at position i
of
the reference list is mapped to position j
in the current use-list. When only
less than half of the elements in the current use-list are shuffled with respect
to the reference use-list, the index-pair encoding is used to reduce the
bytecode memory requirements.
Regions ¶
If the operation has regions, the number of regions and if the regions are isolated from above are encoded together in a single varint. Afterwards, each region is encoded inline.
Region Encoding ¶
region {
numBlocks: varint,
numValues: varint?,
blocks: block[]
}
A region is encoded first with the number of blocks within. If the region is non-empty, the number of values defined directly within the region are encoded, followed by the blocks of the region.
Block Encoding ¶
block {
encoding: varint, // (numOps << 1) | (hasBlockArgs)
arguments: block_arguments?, // Optional based on encoding
ops : op[]
}
block_arguments {
numArgs: varint?,
args: block_argument[]
numUseListOrders: varint?,
useListOrders: uselist[],
}
block_argument {
typeAndLocation: varint, // (type << 1) | (hasLocation)
location: varint? // Optional, else unknown location
}
A block is encoded with an array of operations and block arguments. The first field is an encoding that combines the number of operations in the block, with a flag indicating if the block has arguments.
Use-list orders are attached to block arguments similarly to how they are attached to operation results.