mlir.dialects._omp_ops_gen¶

Attributes¶

_ods_ir

Classes¶

`_Dialect`
`AllocateDirOp`	The storage for each list item that appears in the allocate directive is
`AtomicCaptureOp`	This operation performs an atomic capture.
`AtomicReadOp`	This operation performs an atomic read.
`AtomicUpdateOp`	This operation performs an atomic update.
`AtomicWriteOp`	This operation performs an atomic write.
`BarrierOp`	The barrier construct specifies an explicit barrier at the point at which
`CancelOp`	The cancel construct activates cancellation of the innermost enclosing
`CancellationPointOp`	The cancellation point construct introduces a user-defined cancellation
`CanonicalLoopOp`	All loops that conform to OpenMP's definition of a canonical loop can be
`CriticalDeclareOp`	Declares a named critical section.
`CriticalOp`	The critical construct imposes a restriction on the associated structured
`DeclareMapperInfoOp`	This Op is used to capture the map information related to it's
`DeclareMapperOp`	The declare mapper directive declares a user-defined mapper for a given
`DeclareReductionOp`	Declares an OpenMP reduction kind. This requires two mandatory and three
`DistributeOp`	The distribute construct specifies that the iterations of one or more loops
`FlushOp`	The flush construct executes the OpenMP flush operation. This operation
`LoopNestOp`	This operation represents a rectangular loop nest which may be collapsed
`LoopOp`	A loop construct specifies that the logical iterations of the associated loops
`MapBoundsOp`	This operation is a variation on the OpenACC dialects DataBoundsOp. Within
`MapInfoOp`	The MapInfoOp captures information relating to individual OpenMP map clauses
`MaskedOp`	Masked construct allows to specify a structured block to be executed by a subset of
`MasterOp`	The master construct specifies a structured block that is executed by
`NewCliOp`	Create a new CLI that can be passed as an argument to a CanonicalLoopOp
`OrderedOp`	The ordered construct without region is a stand-alone directive that
`OrderedRegionOp`	The ordered construct with region specifies a structured block in a
`ParallelOp`	The parallel construct includes a region of code which is to be executed
`PrivateClauseOp`	This operation provides a declaration of how to implement the
`ScanOp`	The scan directive allows to specify scan reductions. It should be
`SectionOp`	A section operation encloses a region which represents one section in a
`SectionsOp`	The sections construct is a non-iterative worksharing construct that
`SimdOp`	The simd construct can be applied to a loop to indicate that the loop can be
`SingleOp`	The single construct specifies that the associated structured block is
`TargetAllocMemOp`	Allocates memory on the specified OpenMP device for an object of the given type.
`TargetDataOp`	Map variables to a device data environment for the extent of the region.
`TargetEnterDataOp`	The target enter data directive specifies that variables are mapped to
`TargetExitDataOp`	The target exit data directive specifies that variables are mapped to a
`TargetFreeMemOp`	Deallocates memory on the specified OpenMP device that was previously
`TargetOp`	The target construct includes a region of code which is to be executed
`TargetUpdateOp`	The target update directive makes the corresponding list items in the device
`TaskOp`	The task construct defines an explicit task.
`TaskgroupOp`	The taskgroup construct specifies a wait on completion of child tasks of the
`TaskloopOp`	The taskloop construct specifies that the iterations of one or more
`TaskwaitOp`	The taskwait construct specifies a wait on the completion of child tasks
`TaskyieldOp`	The taskyield construct specifies that the current task can be suspended
`TeamsOp`	The teams construct defines a region of code that triggers the creation of a
`TerminatorOp`	A terminator operation for regions that appear in the body of OpenMP
`ThreadprivateOp`	The threadprivate directive specifies that variables are replicated, with
`TileOp`	Represents the OpenMP tile directive introduced in OpenMP 5.1.
`UnrollHeuristicOp`	Represents a `#pragma omp unroll` construct introduced in OpenMP 5.1.
`WorkdistributeOp`	workdistribute divides execution of the enclosed structured block into
`WorkshareLoopWrapperOp`	This operation wraps a loop nest that is marked for dividing into units of
`WorkshareOp`	The workshare construct divides the execution of the enclosed structured
`WsloopOp`	The worksharing-loop construct specifies that the iterations of the loop(s)
`YieldOp`	"omp.yield" yields SSA values from the OpenMP dialect op region and

Functions¶

`allocate_dir`(→ AllocateDirOp)
`atomic_capture`(→ AtomicCaptureOp)
`atomic_read`(→ AtomicReadOp)
`atomic_update`(→ AtomicUpdateOp)
`atomic_write`(→ AtomicWriteOp)
`barrier`(→ BarrierOp)
`cancel`(→ CancelOp)
`cancellation_point`(→ CancellationPointOp)
`canonical_loop`(→ CanonicalLoopOp)
`critical_declare`(→ CriticalDeclareOp)
`critical`(→ CriticalOp)
`declare_mapper_info`(→ DeclareMapperInfoOp)
`declare_mapper`(→ DeclareMapperOp)
`declare_reduction`(→ DeclareReductionOp)
`distribute`(→ DistributeOp)
`flush`(→ FlushOp)
`loop_nest`(→ LoopNestOp)
`loop`(→ LoopOp)
`map_bounds`(→ _ods_ir)
`map_info`(→ _ods_ir)
`masked`(→ MaskedOp)
`master`(→ MasterOp)
`new_cli`(→ _ods_ir)
`ordered`(→ OrderedOp)
`ordered_region`(→ OrderedRegionOp)
`parallel`(→ ParallelOp)
`private`(→ PrivateClauseOp)
`scan`(→ ScanOp)
`section`(→ SectionOp)
`sections`(→ SectionsOp)
`simd`(→ SimdOp)
`single`(→ SingleOp)
`target_allocmem`(→ _ods_ir)
`target_data`(→ TargetDataOp)
`target_enter_data`(→ TargetEnterDataOp)
`target_exit_data`(→ TargetExitDataOp)
`target_freemem`(→ TargetFreeMemOp)
`target`(→ TargetOp)
`target_update`(→ TargetUpdateOp)
`task`(→ TaskOp)
`taskgroup`(→ TaskgroupOp)
`taskloop`(→ TaskloopOp)
`taskwait`(→ TaskwaitOp)
`taskyield`(→ TaskyieldOp)
`teams`(→ TeamsOp)
`terminator`(→ TerminatorOp)
`threadprivate`(→ _ods_ir)
`tile`(→ TileOp)
`unroll_heuristic`(→ UnrollHeuristicOp)
`workdistribute`(→ WorkdistributeOp)
`workshare_loop_wrapper`(→ WorkshareLoopWrapperOp)
`workshare`(→ WorkshareOp)
`wsloop`(→ WsloopOp)
`yield_`(→ YieldOp)

Module Contents¶

mlir.dialects._omp_ops_gen._ods_ir¶

class mlir.dialects._omp_ops_gen._Dialect(descriptor: object)¶

Bases: _ods_ir

DIALECT_NAMESPACE = 'omp'¶

class mlir.dialects._omp_ops_gen.AllocateDirOp(varList, *, align=None, allocator=None, loc=None, ip=None)¶

Bases: _ods_ir

The storage for each list item that appears in the allocate directive is provided an allocation through the memory allocator.

The align clause is used to specify the byte alignment to use for allocations associated with the construct on which the clause appears.

allocator specifies the memory allocator to be used for allocations associated with the construct on which the clause appears.

OPERATION_NAME = 'omp.allocate_dir'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

varList() → _ods_ir¶

allocator() → _ods_ir | None¶

align() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.allocate_dir(var_list, *, align=None, allocator=None, loc=None, ip=None) → AllocateDirOp¶

class mlir.dialects._omp_ops_gen.AtomicCaptureOp(*, hint=None, memory_order=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation performs an atomic capture.

The region has the following allowed forms:

omp.atomic.capture {
  omp.atomic.update ...
  omp.atomic.read ...
  omp.terminator
}

omp.atomic.capture {
  omp.atomic.read ...
  omp.atomic.update ...
  omp.terminator
}

omp.atomic.capture {
  omp.atomic.read ...
  omp.atomic.write ...
  omp.terminator
}

hint is the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.

memory_order indicates the memory ordering behavior of the construct. It can be one of seq_cst, acq_rel, release, acquire or relaxed.

OPERATION_NAME = 'omp.atomic.capture'¶

_ODS_REGIONS = (1, True)¶

hint() → _ods_ir | None¶

memory_order() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.atomic_capture(*, hint=None, memory_order=None, loc=None, ip=None) → AtomicCaptureOp¶

class mlir.dialects._omp_ops_gen.AtomicReadOp(x, v, element_type, *, hint=None, memory_order=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation performs an atomic read.

The operand x is the address from where the value is atomically read. The operand v is the address where the value is stored after reading.

hint is the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.

memory_order indicates the memory ordering behavior of the construct. It can be one of seq_cst, acq_rel, release, acquire or relaxed.

OPERATION_NAME = 'omp.atomic.read'¶

_ODS_REGIONS = (0, True)¶

x() → _ods_ir¶

v() → _ods_ir¶

element_type() → _ods_ir¶

hint() → _ods_ir | None¶

memory_order() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.atomic_read(x, v, element_type, *, hint=None, memory_order=None, loc=None, ip=None) → AtomicReadOp¶

class mlir.dialects._omp_ops_gen.AtomicUpdateOp(x, *, atomic_control=None, hint=None, memory_order=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation performs an atomic update.

The operand x is exactly the same as the operand x in the OpenMP Standard (OpenMP 5.0, section 2.17.7). It is the address of the variable that is being updated. x is atomically read/written.

The region describes how to update the value of x. It takes the value at x as an input and must yield the updated value. Only the update to x is atomic. Generally the region must have only one instruction, but can potentially have more than one instructions too. The update is sematically similar to a compare-exchange loop based atomic update.

The syntax of atomic update operation is different from atomic read and atomic write operations. This is because only the host dialect knows how to appropriately update a value. For example, while generating LLVM IR, if there are no special atomicrmw instructions for the operation-type combination in atomic update, a compare-exchange loop is generated, where the core update operation is directly translated like regular operations by the host dialect. The front-end must handle semantic checks for allowed operations.

hint is the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.

memory_order indicates the memory ordering behavior of the construct. It can be one of seq_cst, acq_rel, release, acquire or relaxed.

OPERATION_NAME = 'omp.atomic.update'¶

_ODS_REGIONS = (1, True)¶

x() → _ods_ir¶

atomic_control() → _ods_ir | None¶

hint() → _ods_ir | None¶

memory_order() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.atomic_update(x, *, atomic_control=None, hint=None, memory_order=None, loc=None, ip=None) → AtomicUpdateOp¶

class mlir.dialects._omp_ops_gen.AtomicWriteOp(x, expr, *, hint=None, memory_order=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation performs an atomic write.

The operand x is the address to where the expr is atomically written w.r.t. multiple threads. The evaluation of expr need not be atomic w.r.t. the write to address. In general, the type(x) must dereference to type(expr).

hint is the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.

memory_order indicates the memory ordering behavior of the construct. It can be one of seq_cst, acq_rel, release, acquire or relaxed.

OPERATION_NAME = 'omp.atomic.write'¶

_ODS_REGIONS = (0, True)¶

x() → _ods_ir¶

expr() → _ods_ir¶

hint() → _ods_ir | None¶

memory_order() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.atomic_write(x, expr, *, hint=None, memory_order=None, loc=None, ip=None) → AtomicWriteOp¶

class mlir.dialects._omp_ops_gen.BarrierOp(*, loc=None, ip=None)¶

Bases: _ods_ir

The barrier construct specifies an explicit barrier at the point at which the construct appears.

OPERATION_NAME = 'omp.barrier'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._omp_ops_gen.barrier(*, loc=None, ip=None) → BarrierOp¶

class mlir.dialects._omp_ops_gen.CancelOp(cancel_directive, *, if_expr=None, loc=None, ip=None)¶

Bases: _ods_ir

The cancel construct activates cancellation of the innermost enclosing region of the type specified.

OPERATION_NAME = 'omp.cancel'¶

_ODS_REGIONS = (0, True)¶

if_expr() → _ods_ir | None¶

cancel_directive() → _ods_ir¶

mlir.dialects._omp_ops_gen.cancel(cancel_directive, *, if_expr=None, loc=None, ip=None) → CancelOp¶

class mlir.dialects._omp_ops_gen.CancellationPointOp(cancel_directive, *, loc=None, ip=None)¶

Bases: _ods_ir

The cancellation point construct introduces a user-defined cancellation point at which implicit or explicit tasks check if cancellation of the innermost enclosing region of the type specified has been activated.

OPERATION_NAME = 'omp.cancellation_point'¶

_ODS_REGIONS = (0, True)¶

cancel_directive() → _ods_ir¶

mlir.dialects._omp_ops_gen.cancellation_point(cancel_directive, *, loc=None, ip=None) → CancellationPointOp¶

class mlir.dialects._omp_ops_gen.CanonicalLoopOp(tripCount, *, cli=None, loc=None, ip=None)¶

Bases: _ods_ir

All loops that conform to OpenMP’s definition of a canonical loop can be simplified to a CanonicalLoopOp. In particular, there are no loop-carried variables and the number of iterations it will execute is known before the operation. This allows e.g. to determine the number of threads and chunks the iterations space is split into before executing any iteration. More restrictions may apply in cases such as (collapsed) loop nests, doacross loops, etc.

In contrast to other loop operations such as scf.for, the number of iterations is determined by only a single variable, the trip-count. The induction variable value is the logical iteration number of that iteration, which OpenMP defines to be between 0 and the trip-count (exclusive). Loop representation having lower-bound, upper-bound, and step-size operands, require passes to do more work than necessary, including handling special cases such as upper-bound smaller than lower-bound, upper-bound equal to the integer type’s maximal value, negative step size, etc. This complexity is better only handled once by the front-end and can apply its semantics for such cases while still being able to represent any kind of loop, which kind of the point of a mid-end intermediate representation. User-defined types such as random-access iterators in C++ could not directly be represented anyway.

The induction variable is always of the same type as the tripcount argument. Since it can never be negative, tripcount is always interpreted as an unsigned integer. It is the caller’s responsibility to ensure the tripcount is not negative when its interpretation is signed, i.e. %tripcount = max(0,%tripcount).

An optional argument to a omp.canonical_loop that can be passed in is a CanonicalLoopInfo value that can be used to refer to the canonical loop to apply transformations – such as tiling, unrolling, or work-sharing – to the loop, similar to the transform dialect but with OpenMP-specific semantics. Because it is optional, it has to be the last of the operands, but appears first in the pretty format printing.

The pretty assembly format is inspired by python syntax, where range(n) returns an iterator that runs from $0$ to $n-1$. The pretty assembly syntax is one of:

omp.canonical_loop(%cli) %iv : !type in range(%tripcount) omp.canonical_loop %iv : !type in range(%tripcount)

A CanonicalLoopOp is lowered to LLVM-IR using OpenMPIRBuilder::createCanonicalLoop.

Examples¶

Translation from lower-bound, upper-bound, step-size to trip-count.

for (int i = 3; i < 42; i+=2) {
  B[i] = A[i];
}

%lb = arith.constant 3 : i32
%ub = arith.constant 42 : i32
%step = arith.constant 2 : i32
%range = arith.sub %ub, %lb : i32
%tripcount = arith.div %range, %step : i32
omp.canonical_loop %iv : i32 in range(%tripcount) {
  %offset = arith.mul %iv, %step : i32
  %i = arith.add %offset, %lb : i32
  %a = load %arrA[%i] : memref<?xf32>
  store %a, %arrB[%i] : memref<?xf32>
}

Nested canonical loop with transformation of the inner loop.

%outer = omp.new_cli : !omp.cli
%inner = omp.new_cli : !omp.cli
omp.canonical_loop(%outer) %iv1 : i32 in range(%tc1) {
  omp.canonical_loop(%inner) %iv2 : i32 in range(%tc2) {
    %a = load %arrA[%iv1, %iv2] : memref<?x?xf32>
    store %a, %arrB[%iv1, %iv2] : memref<?x?xf32>
  }
}
omp.unroll_full(%inner)

OPERATION_NAME = 'omp.canonical_loop'¶

_ODS_REGIONS = (1, True)¶

tripCount() → _ods_ir¶

cli() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.canonical_loop(trip_count, *, cli=None, loc=None, ip=None) → CanonicalLoopOp¶

class mlir.dialects._omp_ops_gen.CriticalDeclareOp(sym_name, *, hint=None, loc=None, ip=None)¶

Bases: _ods_ir

Declares a named critical section.

The sym_name can be used in omp.critical constructs in the dialect.

hint is the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.

OPERATION_NAME = 'omp.critical.declare'¶

_ODS_REGIONS = (0, True)¶

sym_name() → _ods_ir¶

hint() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.critical_declare(sym_name, *, hint=None, loc=None, ip=None) → CriticalDeclareOp¶

class mlir.dialects._omp_ops_gen.CriticalOp(*, name=None, loc=None, ip=None)¶

Bases: _ods_ir

The critical construct imposes a restriction on the associated structured block (region) to be executed by only a single thread at a time.

The optional name argument of critical constructs is used to identify them. Unnamed critical constructs behave as though an identical name was specified.

OPERATION_NAME = 'omp.critical'¶

_ODS_REGIONS = (1, True)¶

name() → _ods_ir | None¶: Returns the fully qualified name of the operation.

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.critical(*, name=None, loc=None, ip=None) → CriticalOp¶

class mlir.dialects._omp_ops_gen.DeclareMapperInfoOp(map_vars, *, loc=None, ip=None)¶

Bases: _ods_ir

This Op is used to capture the map information related to it’s parent DeclareMapperOp.

The optional map_vars maps data from the current task’s data environment to the device data environment.

OPERATION_NAME = 'omp.declare_mapper.info'¶

_ODS_REGIONS = (0, True)¶

map_vars() → _ods_ir¶

mlir.dialects._omp_ops_gen.declare_mapper_info(map_vars, *, loc=None, ip=None) → DeclareMapperInfoOp¶

class mlir.dialects._omp_ops_gen.DeclareMapperOp(sym_name, type_, *, loc=None, ip=None)¶

Bases: _ods_ir

The declare mapper directive declares a user-defined mapper for a given type, and defines a mapper-identifier that can be used in a map clause.

OPERATION_NAME = 'omp.declare_mapper'¶

_ODS_REGIONS = (1, True)¶

sym_name() → _ods_ir¶

type_() → _ods_ir¶

body() → _ods_ir¶

mlir.dialects._omp_ops_gen.declare_mapper(sym_name, type_, *, loc=None, ip=None) → DeclareMapperOp¶

class mlir.dialects._omp_ops_gen.DeclareReductionOp(sym_name, type_, *, loc=None, ip=None)¶

Bases: _ods_ir

Declares an OpenMP reduction kind. This requires two mandatory and three optional regions.

#. The optional alloc region specifies how to allocate the thread-local reduction value. This region should not contain control flow and all IR should be suitable for inlining straight into an entry block. In the common case this is expected to contain only allocas. It is expected to omp.yield the allocated value on all control paths. If allocation is conditional (e.g. only allocate if the mold is allocated), this should be done in the initilizer region and this region not included. The alloc region is not used for by-value reductions (where allocation is implicit). #. The initializer region specifies how to initialize the thread-local reduction value. This is usually the neutral element of the reduction. For convenience, the region has an argument that contains the value of the reduction accumulator at the start of the reduction. If an alloc region is specified, there is a second block argument containing the address of the allocated memory. The initializer region is expected to omp.yield the new value on all control flow paths. #. The reduction region specifies how to combine two values into one, i.e. the reduction operator. It accepts the two values as arguments and is expected to omp.yield the combined value on all control flow paths. #. The atomic reduction region is optional and specifies how two values can be combined atomically given local accumulator variables. It is expected to store the combined value in the first accumulator variable. #. The cleanup region is optional and specifies how to clean up any memory allocated by the initializer region. The region has an argument that contains the value of the thread-local reduction accumulator. This will be executed after the reduction has completed.

Note that the MLIR type system does not allow for type-polymorphic reductions. Separate reduction declarations should be created for different element and accumulator types.

For initializer and reduction regions, the operand to omp.yield must match the parent operation’s results.

OPERATION_NAME = 'omp.declare_reduction'¶

_ODS_REGIONS = (5, True)¶

sym_name() → _ods_ir¶

type_() → _ods_ir¶

allocRegion() → _ods_ir¶

initializerRegion() → _ods_ir¶

reductionRegion() → _ods_ir¶

atomicReductionRegion() → _ods_ir¶

cleanupRegion() → _ods_ir¶

mlir.dialects._omp_ops_gen.declare_reduction(sym_name, type_, *, loc=None, ip=None) → DeclareReductionOp¶

class mlir.dialects._omp_ops_gen.DistributeOp(allocate_vars, allocator_vars, private_vars, *, dist_schedule_static=None, dist_schedule_chunk_size=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None)¶

Bases: _ods_ir

The distribute construct specifies that the iterations of one or more loops (optionally specified using collapse clause) will be executed by the initial teams in the context of their implicit tasks. The loops that the distribute op is associated with starts with the outermost loop enclosed by the distribute op region and going down the loop nest toward the innermost loop. The iterations are distributed across the initial threads of all initial teams that execute the teams region to which the distribute region binds.

The distribute loop construct specifies that the iterations of the loop(s) will be executed in parallel by threads in the current context. These iterations are spread across threads that already exist in the enclosing region.

The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an omp.loop_nest.

omp.distribute <clauses> {
  omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
    %a = load %arrA[%i1, %i2] : memref<?x?xf32>
    %b = load %arrB[%i1, %i2] : memref<?x?xf32>
    %sum = arith.addf %a, %b : f32
    store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
    omp.yield
  }
}

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The dist_schedule_static attribute specifies the schedule for this loop, determining how the loop is distributed across the various teams. The optional dist_schedule_chunk_size associated with this determines further controls this distribution.

The optional order attribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.

OPERATION_NAME = 'omp.distribute'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

dist_schedule_chunk_size() → _ods_ir | None¶

private_vars() → _ods_ir¶

dist_schedule_static() → bool¶

order() → _ods_ir | None¶

order_mod() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.distribute(allocate_vars, allocator_vars, private_vars, *, dist_schedule_static=None, dist_schedule_chunk_size=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None) → DistributeOp¶

class mlir.dialects._omp_ops_gen.FlushOp(varList, *, loc=None, ip=None)¶

Bases: _ods_ir

The flush construct executes the OpenMP flush operation. This operation makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.

OPERATION_NAME = 'omp.flush'¶

_ODS_REGIONS = (0, True)¶

varList() → _ods_ir¶

mlir.dialects._omp_ops_gen.flush(var_list, *, loc=None, ip=None) → FlushOp¶

class mlir.dialects._omp_ops_gen.LoopNestOp(loop_lower_bounds, loop_upper_bounds, loop_steps, *, collapse_num_loops=None, loop_inclusive=None, tile_sizes=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation represents a rectangular loop nest which may be collapsed and/or tiled. For each rectangular loop of the nest represented by an instance of this operation, lower and upper bounds, as well as a step variable, must be defined. The collapse clause specifies how many loops that should be collapsed (1 if no collapse is done) after any tiling is performed. The tiling sizes is represented by the tile sizes clause.

The lower and upper bounds specify a half-open range: the range includes the lower bound but does not include the upper bound. If the loop_inclusive attribute is specified then the upper bound is also included.

The body region can contain any number of blocks. The region is terminated by an omp.yield instruction without operands. The induction variables, represented as entry block arguments to the loop nest operation’s single region, match the types of the loop_lower_bounds, loop_upper_bounds and loop_steps arguments.

omp.loop_nest (%i1, %i2) : i32 = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) collapse(2) tiles(5,5) {
  %a = load %arrA[%i1, %i2] : memref<?x?xf32>
  %b = load %arrB[%i1, %i2] : memref<?x?xf32>
  %sum = arith.addf %a, %b : f32
  store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
  omp.yield
}

This is a temporary simplified definition of a loop based on existing OpenMP loop operations intended to serve as a stopgap solution until the long-term representation of canonical loops is defined. Specifically, this operation is intended to serve as a unique source for loop information during the transition to making omp.distribute, omp.simd, omp.taskloop and omp.wsloop wrapper operations. It is not intended to help with the addition of support for loop transformations, non-rectangular loops and non-perfectly nested loops.

OPERATION_NAME = 'omp.loop_nest'¶

_ODS_REGIONS = (1, True)¶

loop_lower_bounds() → _ods_ir¶

loop_upper_bounds() → _ods_ir¶

loop_steps() → _ods_ir¶

collapse_num_loops() → _ods_ir | None¶

loop_inclusive() → bool¶

tile_sizes() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.loop_nest(loop_lower_bounds, loop_upper_bounds, loop_steps, *, collapse_num_loops=None, loop_inclusive=None, tile_sizes=None, loc=None, ip=None) → LoopNestOp¶

class mlir.dialects._omp_ops_gen.LoopOp(private_vars, reduction_vars, *, bind_kind=None, private_syms=None, private_needs_barrier=None, order=None, order_mod=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶

Bases: _ods_ir

A loop construct specifies that the logical iterations of the associated loops may execute concurrently and permits the encountering threads to execute the loop accordingly. A loop construct can have 3 different types of binding:

#. teams: in which case the binding region is the innermost enclosing teams region. #. parallel: in which case the binding region is the innermost enclosing parallel region. #. thread: in which case the binding region is not defined.

The body region can only contain a single block which must contain a single operation, this operation must be an omp.loop_nest.

omp.loop <clauses> {
  omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
    %a = load %arrA[%i1, %i2] : memref<?x?xf32>
    %b = load %arrB[%i1, %i2] : memref<?x?xf32>
    %sum = arith.addf %a, %b : f32
    store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
    omp.yield
  }
}

The bind clause specifies the binding region of the construct on which it appears.

The optional order attribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.

Reductions can be performed by specifying the reduction modifer (default, inscan or task) in reduction_mod, reduction accumulator variables in reduction_vars, symbols referring to reduction declarations in the reduction_syms attribute, and whether the reduction variable should be passed into the reduction region by value or by reference in reduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.

OPERATION_NAME = 'omp.loop'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

bind_kind() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

order() → _ods_ir | None¶

order_mod() → _ods_ir | None¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.loop(private_vars, reduction_vars, *, bind_kind=None, private_syms=None, private_needs_barrier=None, order=None, order_mod=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) → LoopOp¶

class mlir.dialects._omp_ops_gen.MapBoundsOp(result, *, lower_bound=None, upper_bound=None, extent=None, stride=None, stride_in_bytes=None, start_idx=None, loc=None, ip=None)¶

Bases: _ods_ir

This operation is a variation on the OpenACC dialects DataBoundsOp. Within the OpenMP dialect it stores the bounds/range of data to be mapped to a device specified by map clauses on target directives. Within the OpenMP dialect, the MapBoundsOp is associated with MapInfoOp, helping to store bounds information for the mapped variable.

It is used to support OpenMP array sectioning, Fortran pointer and allocatable mapping and pointer/allocatable member of derived types. In all cases the MapBoundsOp holds information on the section of data to be mapped. Such as the upper bound and lower bound of the section of data to be mapped. This information is currently utilised by the LLVM-IR lowering to help generate instructions to copy data to and from the device when processing target operations.

The example below copys a section of a 10-element array; all except the first element, utilising OpenMP array sectioning syntax where array subscripts are provided to specify the bounds to be mapped to device. To simplify the examples, the constants are used directly, in reality they will be MLIR SSA values.

C++:

int array[10];
#pragma target map(array[1:9])

omp.map.bounds lower_bound(1) upper_bound(9) extent(9) start_idx(0)

Fortran:

integer :: array(1:10)
!$target map(array(2:10))

omp.map.bounds lower_bound(1) upper_bound(9) extent(9) start_idx(1)

For Fortran pointers and allocatables (as well as those that are members of derived types) the bounds information is provided by the Fortran compiler and runtime through descriptor information.

A basic pointer example can be found below (constants again provided for simplicity, where in reality SSA values will be used, in this case that point to data yielded by Fortran’s descriptors):

Fortran:

integer, pointer :: ptr(:)
allocate(ptr(10))
!$target map(ptr)

omp.map.bounds lower_bound(0) upper_bound(9) extent(10) start_idx(1)

This operation records the bounds information in a normalized fashion (zero-based). This works well with the PointerLikeType requirement in data clauses - since a lower_bound of 0 means looking at data at the zero offset from pointer.

This operation must have an upper_bound or extent (or both are allowed - but not checked for consistency). When the source language’s arrays are not zero-based, the start_idx must specify the zero-position index.

OPERATION_NAME = 'omp.map.bounds'¶

_ODS_OPERAND_SEGMENTS = [0, 0, 0, 0, 0]¶

_ODS_REGIONS = (0, True)¶

lower_bound() → _ods_ir | None¶

upper_bound() → _ods_ir | None¶

extent() → _ods_ir | None¶

stride() → _ods_ir | None¶

start_idx() → _ods_ir | None¶

stride_in_bytes() → _ods_ir¶

result() → _ods_ir¶: Shortcut to get an op result if it has only one (throws an error otherwise).

mlir.dialects._omp_ops_gen.map_bounds(result, *, lower_bound=None, upper_bound=None, extent=None, stride=None, stride_in_bytes=None, start_idx=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._omp_ops_gen.MapInfoOp(omp_ptr, var_ptr, var_type, map_type, map_capture_type, members, bounds, *, var_ptr_ptr=None, members_index=None, mapper_id=None, name=None, partial_map=None, loc=None, ip=None)¶

Bases: _ods_ir

The MapInfoOp captures information relating to individual OpenMP map clauses that are applied to certain OpenMP directives such as Target and Target Data.

For example, the map type modifier; such as from, tofrom and to, the variable being captured or the bounds of an array section being mapped.

It can be used to capture both implicit and explicit map information, where explicit is an argument directly specified to an OpenMP map clause or implicit where a variable is utilised in a target region but is defined externally to the target region.

This map information is later used to aid the lowering of the target operations they are attached to providing argument input and output context for kernels generated or the target data mapping environment.

Example (Fortran):

integer :: index
!$target map(to: index)

omp.map.info var_ptr(%index_ssa) map_type(to) map_capture_type(ByRef)
  name(index)

Description of arguments:

var_ptr: The address of variable to copy.
var_type: The type of the variable to copy.
‘map_type’: OpenMP map type for this map capture, for example: from, to and

always. It’s a bitfield composed of the OpenMP runtime flags stored in OpenMPOffloadMappingFlags. * ‘map_capture_type’: Capture type for the variable e.g. this, byref, byvalue, byvla this can affect how the variable is lowered. * var_ptr_ptr: Used when the variable copied is a member of a class, structure or derived type and refers to the originating struct. * members: Used to indicate mapped child members for the current MapInfoOp, represented as other MapInfoOp’s, utilised in cases where a parent structure type and members of the structure type are being mapped at the same time. For example: map(to: parent, parent->member, parent->member2[:10]) * members_index: Used to indicate the ordering of members within the containing parent (generally a record type such as a structure, class or derived type), e.g. struct {int x, float y, double z}, x would be 0, y would be 1, and z would be 2. This aids the mapping. * bounds: Used when copying slices of array’s, pointers or pointer members of objects (e.g. derived types or classes), indicates the bounds to be copied of the variable. When it’s an array slice it is in rank order where rank 0 is the inner-most dimension. * ‘mapper_id’: OpenMP mapper map type modifier for this map capture. It’s used to specify a user defined mapper to be used for mapping. * name: Holds the name of variable as specified in user clause (including bounds). * partial_map: The record type being mapped will not be mapped in its entirety, it may be used however, in a mapping to bind it’s mapped components together.

OPERATION_NAME = 'omp.map.info'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

var_ptr() → _ods_ir¶

var_ptr_ptr() → _ods_ir | None¶

members() → _ods_ir¶

bounds() → _ods_ir¶

var_type() → _ods_ir¶

map_type() → _ods_ir¶

map_capture_type() → _ods_ir¶

members_index() → _ods_ir | None¶

mapper_id() → _ods_ir | None¶

name() → _ods_ir | None¶: Returns the fully qualified name of the operation.

partial_map() → _ods_ir¶

omp_ptr() → _ods_ir¶

mlir.dialects._omp_ops_gen.map_info(omp_ptr, var_ptr, var_type, map_type, map_capture_type, members, bounds, *, var_ptr_ptr=None, members_index=None, mapper_id=None, name=None, partial_map=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._omp_ops_gen.MaskedOp(*, filtered_thread_id=None, loc=None, ip=None)¶

Bases: _ods_ir

Masked construct allows to specify a structured block to be executed by a subset of threads of the current team.

If filter is specified, the masked construct masks the execution of the region to only the thread id filtered. Other threads executing the parallel region are not expected to execute the region specified within the masked directive. If filter is not specified, master thread is expected to execute the region enclosed within masked directive.

OPERATION_NAME = 'omp.masked'¶

_ODS_REGIONS = (1, True)¶

filtered_thread_id() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.masked(*, filtered_thread_id=None, loc=None, ip=None) → MaskedOp¶

class mlir.dialects._omp_ops_gen.MasterOp(*, loc=None, ip=None)¶

Bases: _ods_ir

The master construct specifies a structured block that is executed by the master thread of the team.

OPERATION_NAME = 'omp.master'¶

_ODS_REGIONS = (1, True)¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.master(*, loc=None, ip=None) → MasterOp¶

class mlir.dialects._omp_ops_gen.NewCliOp(result, *, loc=None, ip=None)¶

Bases: _ods_ir

Create a new CLI that can be passed as an argument to a CanonicalLoopOp and to loop transformation operations to handle dependencies between loop transformation operations.

OPERATION_NAME = 'omp.new_cli'¶

_ODS_REGIONS = (0, True)¶

result() → _ods_ir¶: Shortcut to get an op result if it has only one (throws an error otherwise).

mlir.dialects._omp_ops_gen.new_cli(result, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._omp_ops_gen.OrderedOp(doacross_depend_vars, *, doacross_depend_type=None, doacross_num_loops=None, loc=None, ip=None)¶

Bases: _ods_ir

The ordered construct without region is a stand-alone directive that specifies cross-iteration dependencies in a doacross loop nest.

The doacross_depend_type attribute refers to either the DEPEND(SOURCE) clause or the DEPEND(SINK: vec) clause.

The doacross_num_loops attribute specifies the number of loops in the doacross nest.

The doacross_depend_vars is a variadic list of operands that specifies the index of the loop iterator in the doacross nest for the DEPEND(SOURCE) clause or the index of the element of “vec” for the DEPEND(SINK: vec) clause. It contains the operands in multiple “vec” when multiple DEPEND(SINK: vec) clauses exist in one ORDERED directive.

OPERATION_NAME = 'omp.ordered'¶

_ODS_REGIONS = (0, True)¶

doacross_depend_vars() → _ods_ir¶

doacross_depend_type() → _ods_ir | None¶

doacross_num_loops() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.ordered(doacross_depend_vars, *, doacross_depend_type=None, doacross_num_loops=None, loc=None, ip=None) → OrderedOp¶

class mlir.dialects._omp_ops_gen.OrderedRegionOp(*, par_level_simd=None, loc=None, ip=None)¶

Bases: _ods_ir

The ordered construct with region specifies a structured block in a worksharing-loop, SIMD, or worksharing-loop SIMD region that is executed in the order of the loop iterations.

The par_level_simd attribute corresponds to the simd clause specified. If it is not present, it behaves as if the threads clause is specified or no clause is specified.

OPERATION_NAME = 'omp.ordered.region'¶

_ODS_REGIONS = (1, True)¶

par_level_simd() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.ordered_region(*, par_level_simd=None, loc=None, ip=None) → OrderedRegionOp¶

class mlir.dialects._omp_ops_gen.ParallelOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_threads=None, private_syms=None, private_needs_barrier=None, proc_bind_kind=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶

Bases: _ods_ir

The parallel construct includes a region of code which is to be executed by a team of threads.

The optional if_expr parameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the parallel region runs as normal, if it is 0 then the parallel region is executed with one thread.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The optional num_threads parameter specifies the number of threads which should be used to execute the parallel region.

The optional proc_bind_kind attribute controls the thread affinity for the execution of the parallel region.

OPERATION_NAME = 'omp.parallel'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

if_expr() → _ods_ir | None¶

num_threads() → _ods_ir | None¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

proc_bind_kind() → _ods_ir | None¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.parallel(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_threads=None, private_syms=None, private_needs_barrier=None, proc_bind_kind=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) → ParallelOp¶

class mlir.dialects._omp_ops_gen.PrivateClauseOp(sym_name, type_, data_sharing_type, *, loc=None, ip=None)¶

Bases: _ods_ir

This operation provides a declaration of how to implement the [first]privatization of a variable. The dialect users should provide which type should be allocated for this variable. The allocated (usually by alloca) variable is passed to the initialization region which does everything else (e.g. initialization of Fortran runtime descriptors). Information about how to initialize the copy from the original item should be given in the copy region, and if needed, how to deallocate memory (allocated by the initialization region) in the dealloc region.

Examples:

private(x) would not need any regions because no initialization is

required by the standard for i32 variables and this is not firstprivate.

omp.private {type = private} @x.privatizer : i32

firstprivate(x) would be emitted as:

omp.private {type = firstprivate} @x.privatizer : i32 copy {
^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>):
// %arg0 is the original host variable.
// %arg1 represents the memory allocated for this private variable.
... copy from host to the privatized clone ....
omp.yield(%arg1 : !fir.ref<i32>)
}

private(x) for “allocatables” would be emitted as:

omp.private {type = private} @x.privatizer : !some.type init {
^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
// initialize %arg1, using %arg0 as a mold for allocations.
// For example if %arg0 is a heap allocated array with a runtime determined
// length and !some.type is a runtime type descriptor, the init region
// will read the array length from %arg0, and heap allocate an array of the
// right length and initialize %arg1 to contain the array allocation and
// length.
omp.yield(%arg1 : !some.pointer<!some.type>)
} dealloc {
^bb0(%arg0: !some.pointer<!some.type>):
// ... deallocate memory allocated by the init region...
// In the example above, this will free the heap allocated array data.
omp.yield
}

There are no restrictions on the body except for:

The dealloc regions has a single argument.
The init & copy regions have 2 arguments.
All three regions are terminated by omp.yield ops.

The above restrictions and other obvious restrictions (e.g. verifying the type of yielded values) are verified by the custom op verifier. The actual contents of the blocks inside all regions are not verified.

Instances of this op would then be used by ops that model directives that accept data-sharing attribute clauses.

The sym_name attribute provides a symbol by which the privatizer op can be referenced by other dialect ops.

The type attribute is the type of the value being privatized. This type will be implicitly allocated in MLIR->LLVMIR conversion and passed as the second argument to the init region. Therefore the type of arguments to the regions should be a type which represents a pointer to type.

The data_sharing_type attribute specifies whether privatizer corresponds to a private or a firstprivate clause.

OPERATION_NAME = 'omp.private'¶

_ODS_REGIONS = (3, True)¶

sym_name() → _ods_ir¶

type_() → _ods_ir¶

data_sharing_type() → _ods_ir¶

init_region() → _ods_ir¶

copy_region() → _ods_ir¶

dealloc_region() → _ods_ir¶

mlir.dialects._omp_ops_gen.private(sym_name, type_, data_sharing_type, *, loc=None, ip=None) → PrivateClauseOp¶

class mlir.dialects._omp_ops_gen.ScanOp(inclusive_vars, exclusive_vars, *, loc=None, ip=None)¶

Bases: _ods_ir

The scan directive allows to specify scan reductions. It should be enclosed within a parent directive along with which a reduction clause with inscan modifier must be specified. The scan directive allows to split code blocks into input phase and scan phase in the region enclosed by the parent.

The inclusive clause is used on a separating directive that separates a structured block into two structured block sequences. If it is specified, the input phase includes the preceding structured block sequence and the scan phase includes the following structured block sequence.

The inclusive_vars is a variadic list of operands that specifies the scan-reduction accumulator symbols.

The exclusive clause is used on a separating directive that separates a structured block into two structured block sequences. If it is specified, the input phase excludes the preceding structured block sequence and instead includes the following structured block sequence, while the scan phase includes the preceding structured block sequence.

The exclusive_vars is a variadic list of operands that specifies the scan-reduction accumulator symbols.

OPERATION_NAME = 'omp.scan'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

inclusive_vars() → _ods_ir¶

exclusive_vars() → _ods_ir¶

mlir.dialects._omp_ops_gen.scan(inclusive_vars, exclusive_vars, *, loc=None, ip=None) → ScanOp¶

class mlir.dialects._omp_ops_gen.SectionOp(*, loc=None, ip=None)¶

Bases: _ods_ir

A section operation encloses a region which represents one section in a sections construct. A section op should always be surrounded by an omp.sections operation. The section operation may have block args which corespond to the block arguments of the surrounding omp.sections operation. This is done to reflect situations where these block arguments represent variables private to each section.

OPERATION_NAME = 'omp.section'¶

_ODS_REGIONS = (1, True)¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.section(*, loc=None, ip=None) → SectionOp¶

class mlir.dialects._omp_ops_gen.SectionsOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, nowait=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶

Bases: _ods_ir

The sections construct is a non-iterative worksharing construct that contains omp.section operations. The omp.section operations are to be distributed among and executed by the threads in a team. Each omp.section is executed once by one of the threads in the team in the context of its implicit task. Block arguments for reduction variables should be mirrored in enclosed omp.section operations.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The optional nowait attribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.

OPERATION_NAME = 'omp.sections'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

nowait() → bool¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.sections(allocate_vars, allocator_vars, private_vars, reduction_vars, *, nowait=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) → SectionsOp¶

class mlir.dialects._omp_ops_gen.SimdOp(aligned_vars, linear_vars, linear_step_vars, nontemporal_vars, private_vars, reduction_vars, *, alignments=None, if_expr=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, safelen=None, simdlen=None, loc=None, ip=None)¶

Bases: _ods_ir

The simd construct can be applied to a loop to indicate that the loop can be transformed into a SIMD loop (that is, multiple iterations of the loop can be executed concurrently using SIMD instructions).

The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an omp.loop_nest.

omp.simd <clauses> {
  omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
    %a = load %arrA[%i1, %i2] : memref<?x?xf32>
    %b = load %arrB[%i1, %i2] : memref<?x?xf32>
    %sum = arith.addf %a, %b : f32
    store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
    omp.yield
  }
}

When an if clause is present and evaluates to false, the preferred number of iterations to be executed concurrently is one, regardless of whether a simdlen clause is specified.

The alignments attribute additionally specifies alignment of each corresponding aligned operand. Note that aligned_vars and alignments must contain the same number of elements.

The linear_step_vars operand additionally specifies the step for each associated linear operand. Note that the linear_vars and linear_step_vars variadic lists should contain the same number of elements.

The optional nontemporal attribute specifies variables which have low temporal locality across the iterations where they are accessed.

The optional order attribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.

The safelen clause specifies that no two concurrent iterations within a SIMD chunk can have a distance in the logical iteration space that is greater than or equal to the value given in the clause.

When a simdlen clause is present, the preferred number of iterations to be executed concurrently is the value provided to the simdlen clause.

OPERATION_NAME = 'omp.simd'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

aligned_vars() → _ods_ir¶

if_expr() → _ods_ir | None¶

linear_vars() → _ods_ir¶

linear_step_vars() → _ods_ir¶

nontemporal_vars() → _ods_ir¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

alignments() → _ods_ir | None¶

order() → _ods_ir | None¶

order_mod() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

safelen() → _ods_ir | None¶

simdlen() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.simd(aligned_vars, linear_vars, linear_step_vars, nontemporal_vars, private_vars, reduction_vars, *, alignments=None, if_expr=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, safelen=None, simdlen=None, loc=None, ip=None) → SimdOp¶

class mlir.dialects._omp_ops_gen.SingleOp(allocate_vars, allocator_vars, copyprivate_vars, private_vars, *, copyprivate_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None)¶

Bases: _ods_ir

The single construct specifies that the associated structured block is executed by only one of the threads in the team (not necessarily the master thread), in the context of its implicit task. The other threads in the team, which do not execute the block, wait at an implicit barrier at the end of the single construct.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

If copyprivate variables and functions are specified, then each thread variable is updated with the variable value of the thread that executed the single region, using the specified copy functions.

OPERATION_NAME = 'omp.single'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

copyprivate_vars() → _ods_ir¶

private_vars() → _ods_ir¶

copyprivate_syms() → _ods_ir | None¶

nowait() → bool¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.single(allocate_vars, allocator_vars, copyprivate_vars, private_vars, *, copyprivate_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None) → SingleOp¶

class mlir.dialects._omp_ops_gen.TargetAllocMemOp(result, device, in_type, typeparams, shape, *, uniq_name=None, bindc_name=None, loc=None, ip=None)¶

Bases: _ods_ir

Allocates memory on the specified OpenMP device for an object of the given type. Returns an integer value representing the device pointer to the allocated memory. The memory is uninitialized after allocation. Operations must be paired with omp.target_freemem to avoid memory leaks.

$device: The integer ID of the OpenMP device where the memory will be allocated.
$in_type: The type of the object for which memory is being allocated.

For arrays, this can be a static or dynamic array type. * $uniq_name: An optional unique name for the allocated memory. * $bindc_name: An optional name used for C interoperability. * $typeparams: Runtime type parameters for polymorphic or parameterized types. These are typically integer values that define aspects of a type not fixed at compile time. * $shape: Runtime shape operands for dynamic arrays. Each operand is an integer value representing the extent of a specific dimension.

// Allocate a static 3x3 integer vector on device 0
%device_0 = arith.constant 0 : i32
%ptr_static = omp.target_allocmem %device_0 : i32, vector<3x3xi32>
// ... use %ptr_static ...
omp.target_freemem %device_0, %ptr_static : i32, i64

// Allocate a dynamic 2D Fortran array (fir.array) on device 1
%device_1 = arith.constant 1 : i32
%rows = arith.constant 10 : index
%cols = arith.constant 20 : index
%ptr_dynamic = omp.target_allocmem %device_1 : i32, !fir.array<?x?xf32>, %rows, %cols : index, index
// ... use %ptr_dynamic ...
omp.target_freemem %device_1, %ptr_dynamic : i32, i64

OPERATION_NAME = 'omp.target_allocmem'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

device() → _ods_ir¶

typeparams() → _ods_ir¶

shape() → _ods_ir¶

in_type() → _ods_ir¶

uniq_name() → _ods_ir | None¶

bindc_name() → _ods_ir | None¶

mlir.dialects._omp_ops_gen.target_allocmem(result, device, in_type, typeparams, shape, *, uniq_name=None, bindc_name=None, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._omp_ops_gen.TargetDataOp(map_vars, use_device_addr_vars, use_device_ptr_vars, *, device=None, if_expr=None, loc=None, ip=None)¶

Bases: _ods_ir

Map variables to a device data environment for the extent of the region.

The omp target data directive maps variables to a device data environment, and defines the lexical scope of the data environment that is created. The omp target data directive can reduce data copies to and from the offloading device when multiple target regions are using the same data.

The optional if_expr parameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.

The optional device parameter specifies the device number for the target region.

The optional map_vars maps data from the current task’s data environment to the device data environment.

The optional use_device_addr_vars specifies the address of the objects in the device data environment.

The optional use_device_ptr_vars specifies the device pointers to the corresponding list items in the device data environment.

OPERATION_NAME = 'omp.target_data'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

device() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

map_vars() → _ods_ir¶

use_device_addr_vars() → _ods_ir¶

use_device_ptr_vars() → _ods_ir¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.target_data(map_vars, use_device_addr_vars, use_device_ptr_vars, *, device=None, if_expr=None, loc=None, ip=None) → TargetDataOp¶

class mlir.dialects._omp_ops_gen.TargetEnterDataOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶

Bases: _ods_ir

The target enter data directive specifies that variables are mapped to a device data environment. The target enter data directive is a stand-alone directive.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

The optional device parameter specifies the device number for the target region.

The optional map_vars maps data from the current task’s data environment to the device data environment.

OPERATION_NAME = 'omp.target_enter_data'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

depend_vars() → _ods_ir¶

device() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

map_vars() → _ods_ir¶

depend_kinds() → _ods_ir | None¶

nowait() → bool¶

mlir.dialects._omp_ops_gen.target_enter_data(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) → TargetEnterDataOp¶

class mlir.dialects._omp_ops_gen.TargetExitDataOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶

Bases: _ods_ir

The target exit data directive specifies that variables are mapped to a device data environment. The target exit data directive is a stand-alone directive.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

The optional device parameter specifies the device number for the target region.

The optional map_vars maps data from the current task’s data environment to the device data environment.

OPERATION_NAME = 'omp.target_exit_data'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

depend_vars() → _ods_ir¶

device() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

map_vars() → _ods_ir¶

depend_kinds() → _ods_ir | None¶

nowait() → bool¶

mlir.dialects._omp_ops_gen.target_exit_data(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) → TargetExitDataOp¶

class mlir.dialects._omp_ops_gen.TargetFreeMemOp(device, heapref, *, loc=None, ip=None)¶

Bases: _ods_ir

Deallocates memory on the specified OpenMP device that was previously allocated by an omp.target_allocmem operation. After this operation, the deallocated memory is in an undefined state and should not be accessed. It is crucial to ensure that all accesses to the memory region are completed before omp.target_freemem is called to avoid undefined behavior.

$device: The integer ID of the OpenMP device from which the memory will be freed.
$heapref: The integer value representing the device pointer to the memory

to be deallocated, which was previously returned by omp.target_allocmem.

// Example of allocating and freeing memory on an OpenMP device
%device_id = arith.constant 0 : i32
%allocated_ptr = omp.target_allocmem %device_id : i32, vector<3x3xi32>
// ... operations using %allocated_ptr on the device ...
omp.target_freemem %device_id, %allocated_ptr : i32, i64

OPERATION_NAME = 'omp.target_freemem'¶

_ODS_REGIONS = (0, True)¶

device() → _ods_ir¶

heapref() → _ods_ir¶

mlir.dialects._omp_ops_gen.target_freemem(device, heapref, *, loc=None, ip=None) → TargetFreeMemOp¶

class mlir.dialects._omp_ops_gen.TargetOp(allocate_vars, allocator_vars, depend_vars, has_device_addr_vars, host_eval_vars, in_reduction_vars, is_device_ptr_vars, map_vars, private_vars, *, bare=None, depend_kinds=None, device=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, thread_limit=None, private_maps=None, loc=None, ip=None)¶

Bases: _ods_ir

The target construct includes a region of code which is to be executed on a device.

The private_maps attribute connects private operands to their corresponding map operands. For private operands that require a map, the value of the corresponding element in the attribute is the index of the map operand (relative to other map operands not the whole operands of the operation). For private opernads that do not require a map, this value is -1 (which is omitted from the assembly foramt printing).

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

ompx_bare allows omp target teams to be executed on a GPU with an explicit number of teams and threads. This clause also allows the teams and threads sizes to have up to 3 dimensions.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

The optional device parameter specifies the device number for the target region.

The optional has_device_addr_vars indicates that list items already have device addresses, so they may be directly accessed from the target device. This includes array sections.

The optional host_eval_vars holds values defined outside of the region of the IsolatedFromAbove operation for which a corresponding entry block argument is defined. The only legal uses for these captured values are the following:

num_teams or thread_limit clause of an immediately nested

omp.teams operation. * If the operation is the top-level omp.target of a target SPMD kernel: * num_threads clause of the nested omp.parallel operation. * Bounds and steps of the nested omp.loop_nest operation.

The optional is_device_ptr_vars indicates list items are device pointers.

The optional map_vars maps data from the current task’s data environment to the device data environment.

The optional thread_limit specifies the limit on the number of threads.

OPERATION_NAME = 'omp.target'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

depend_vars() → _ods_ir¶

device() → _ods_ir | None¶

has_device_addr_vars() → _ods_ir¶

host_eval_vars() → _ods_ir¶

if_expr() → _ods_ir | None¶

in_reduction_vars() → _ods_ir¶

is_device_ptr_vars() → _ods_ir¶

map_vars() → _ods_ir¶

private_vars() → _ods_ir¶

thread_limit() → _ods_ir | None¶

bare() → bool¶

depend_kinds() → _ods_ir | None¶

in_reduction_byref() → _ods_ir | None¶

in_reduction_syms() → _ods_ir | None¶

nowait() → bool¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

private_maps() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.target(allocate_vars, allocator_vars, depend_vars, has_device_addr_vars, host_eval_vars, in_reduction_vars, is_device_ptr_vars, map_vars, private_vars, *, bare=None, depend_kinds=None, device=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, thread_limit=None, private_maps=None, loc=None, ip=None) → TargetOp¶

class mlir.dialects._omp_ops_gen.TargetUpdateOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶

Bases: _ods_ir

The target update directive makes the corresponding list items in the device data environment consistent with their original list items, according to the specified motion clauses. The target update construct is a stand-alone directive.

We use MapInfoOp to model the motion clauses and their modifiers. Even though the spec differentiates between map-types & map-type-modifiers vs. motion-clauses & motion-modifiers, the motion clauses and their modifiers are a subset of map types and their modifiers. The subset relation is handled in during verification to make sure the restrictions for target update are respected.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

The optional device parameter specifies the device number for the target region.

The optional map_vars maps data from the current task’s data environment to the device data environment.

OPERATION_NAME = 'omp.target_update'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

depend_vars() → _ods_ir¶

device() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

map_vars() → _ods_ir¶

depend_kinds() → _ods_ir | None¶

nowait() → bool¶

mlir.dialects._omp_ops_gen.target_update(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) → TargetUpdateOp¶

class mlir.dialects._omp_ops_gen.TaskOp(allocate_vars, allocator_vars, depend_vars, in_reduction_vars, private_vars, *, depend_kinds=None, final=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, priority=None, private_syms=None, private_needs_barrier=None, untied=None, event_handle=None, loc=None, ip=None)¶

Bases: _ods_ir

The task construct defines an explicit task.

For definitions of “undeferred task”, “included task”, “final task” and “mergeable task”, please check OpenMP Specification.

When an if clause is present on a task construct, and the value of if_expr evaluates to false, an “undeferred task” is generated, and the encountering thread must suspend the current task region, for which execution cannot be resumed until execution of the structured block that is associated with the generated task is completed.

The in_reduction clause specifies that this particular task (among all the tasks in current taskgroup, if any) participates in a reduction. in_reduction_byref indicates whether each reduction variable should be passed by value or by reference.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

When a final clause is present and the final clause expression evaluates to true, the generated tasks will be final tasks. All task constructs encountered during execution of a final task will generate final and included tasks. The use of a variable in a final clause expression causes an implicit reference to the variable in all enclosing constructs.

When the mergeable clause is present, the tasks generated by the construct are “mergeable tasks”.

The priority clause is a hint for the priority of the generated tasks. The priority is a non-negative integer expression that provides a hint for task execution order. Among all tasks ready to be executed, higher priority tasks (those with a higher numerical value in the priority clause expression) are recommended to execute before lower priority ones. The default priority-value when no priority clause is specified should be assumed to be zero (the lowest priority).

If the untied clause is present on a task construct, any thread in the team can resume the task region after a suspension. The untied clause is ignored if a final clause is present on the same task construct and the final expression evaluates to true, or if a task is an included task.

The detach clause specifies that the task generated by the construct on which it appears is a

detachable task. A new allow-completion event is created and connected to the completion of the associated task region. The original event-handle is updated to represent that allow-completion event before the task data environment is created.

OPERATION_NAME = 'omp.task'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

depend_vars() → _ods_ir¶

final() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

in_reduction_vars() → _ods_ir¶

priority() → _ods_ir | None¶

private_vars() → _ods_ir¶

event_handle() → _ods_ir | None¶

depend_kinds() → _ods_ir | None¶

in_reduction_byref() → _ods_ir | None¶

in_reduction_syms() → _ods_ir | None¶

mergeable() → bool¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

untied() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.task(allocate_vars, allocator_vars, depend_vars, in_reduction_vars, private_vars, *, depend_kinds=None, final=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, priority=None, private_syms=None, private_needs_barrier=None, untied=None, event_handle=None, loc=None, ip=None) → TaskOp¶

class mlir.dialects._omp_ops_gen.TaskgroupOp(allocate_vars, allocator_vars, task_reduction_vars, *, task_reduction_byref=None, task_reduction_syms=None, loc=None, ip=None)¶

Bases: _ods_ir

The taskgroup construct specifies a wait on completion of child tasks of the current task and their descendent tasks.

When a thread encounters a taskgroup construct, it starts executing the region. All child tasks generated in the taskgroup region and all of their descendants that bind to the same parallel region as the taskgroup region are part of the taskgroup set associated with the taskgroup region. There is an implicit task scheduling point at the end of the taskgroup region. The current task is suspended at the task scheduling point until all tasks in the taskgroup set complete execution.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The task_reduction clause specifies a reduction among tasks. For each list item, the number of copies is unspecified. Any copies associated with the reduction are initialized before they are accessed by the tasks participating in the reduction. After the end of the region, the original list item contains the result of the reduction. Similarly to the reduction clause, accumulator variables must be passed in task_reduction_vars, symbols referring to reduction declarations in the task_reduction_syms attribute, and whether the reduction variable should be passed into the reduction region by value or by reference in task_reduction_byref.

OPERATION_NAME = 'omp.taskgroup'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

task_reduction_vars() → _ods_ir¶

task_reduction_byref() → _ods_ir | None¶

task_reduction_syms() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.taskgroup(allocate_vars, allocator_vars, task_reduction_vars, *, task_reduction_byref=None, task_reduction_syms=None, loc=None, ip=None) → TaskgroupOp¶

class mlir.dialects._omp_ops_gen.TaskloopOp(allocate_vars, allocator_vars, in_reduction_vars, private_vars, reduction_vars, *, final=None, grainsize_mod=None, grainsize=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, nogroup=None, num_tasks_mod=None, num_tasks=None, priority=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, untied=None, loc=None, ip=None)¶

Bases: _ods_ir

The taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using explicit tasks. The iterations are distributed across tasks generated by the construct and scheduled to be executed.

The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an omp.loop_nest.

omp.taskloop <clauses> {
  omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
    %a = load %arrA[%i1, %i2] : memref<?x?xf32>
    %b = load %arrB[%i1, %i2] : memref<?x?xf32>
    %sum = arith.addf %a, %b : f32
    store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
    omp.yield
  }
}

For definitions of “undeferred task”, “included task”, “final task” and “mergeable task”, please check OpenMP Specification.

When an if clause is present on a taskloop construct, and if the if clause expression evaluates to false, undeferred tasks are generated. The use of a variable in an if clause expression of a taskloop construct causes an implicit reference to the variable in all enclosing constructs.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

If a grainsize clause is present, the number of logical loop iterations assigned to each generated task is greater than or equal to the minimum of the value of the grain-size expression and the number of logical loop iterations, but less than two times the value of the grain-size expression.

When the mergeable clause is present, the tasks generated by the construct are “mergeable tasks”.

By default, the taskloop construct executes as if it was enclosed in a taskgroup construct with no statements or directives outside of the taskloop construct. Thus, the taskloop construct creates an implicit taskgroup region. If the nogroup clause is present, no implicit taskgroup region is created.

If num_tasks is specified, the taskloop construct creates as many tasks as the minimum of the num-tasks expression and the number of logical loop iterations. Each task must have at least one logical loop iteration.

If an in_reduction clause is present on the taskloop construct, the behavior is as if each generated task was defined by a task construct on which an in_reduction clause with the same reduction operator and list items is present. Thus, the generated tasks are participants of a reduction previously defined by a reduction scoping clause. In this case, accumulator variables are specified in in_reduction_vars, symbols referring to reduction declarations in in_reduction_syms and in_reduction_byref indicate for each reduction variable whether it should be passed by value or by reference.

If a reduction clause is present on the taskloop construct, the behavior is as if a task_reduction clause with the same reduction operator and list items was applied to the implicit taskgroup construct enclosing the taskloop construct. The taskloop construct executes as if each generated task was defined by a task construct on which an in_reduction clause with the same reduction operator and list items is present. Thus, the generated tasks are participants of the reduction defined by the task_reduction clause that was applied to the implicit taskgroup construct.

OPERATION_NAME = 'omp.taskloop'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

final() → _ods_ir | None¶

grainsize() → _ods_ir | None¶

if_expr() → _ods_ir | None¶

in_reduction_vars() → _ods_ir¶

num_tasks() → _ods_ir | None¶

priority() → _ods_ir | None¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

grainsize_mod() → _ods_ir | None¶

in_reduction_byref() → _ods_ir | None¶

in_reduction_syms() → _ods_ir | None¶

mergeable() → bool¶

nogroup() → bool¶

num_tasks_mod() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

untied() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.taskloop(allocate_vars, allocator_vars, in_reduction_vars, private_vars, reduction_vars, *, final=None, grainsize_mod=None, grainsize=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, nogroup=None, num_tasks_mod=None, num_tasks=None, priority=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, untied=None, loc=None, ip=None) → TaskloopOp¶

class mlir.dialects._omp_ops_gen.TaskwaitOp(depend_vars, *, depend_kinds=None, nowait=None, loc=None, ip=None)¶

Bases: _ods_ir

The taskwait construct specifies a wait on the completion of child tasks of the current task.

The depend_kinds and depend_vars arguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.

OPERATION_NAME = 'omp.taskwait'¶

_ODS_REGIONS = (0, True)¶

depend_vars() → _ods_ir¶

depend_kinds() → _ods_ir | None¶

nowait() → bool¶

mlir.dialects._omp_ops_gen.taskwait(depend_vars, *, depend_kinds=None, nowait=None, loc=None, ip=None) → TaskwaitOp¶

class mlir.dialects._omp_ops_gen.TaskyieldOp(*, loc=None, ip=None)¶

Bases: _ods_ir

The taskyield construct specifies that the current task can be suspended in favor of execution of a different task.

OPERATION_NAME = 'omp.taskyield'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._omp_ops_gen.taskyield(*, loc=None, ip=None) → TaskyieldOp¶

class mlir.dialects._omp_ops_gen.TeamsOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_teams_lower=None, num_teams_upper=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, thread_limit=None, loc=None, ip=None)¶

Bases: _ods_ir

The teams construct defines a region of code that triggers the creation of a league of teams. Once created, the number of teams remains constant for the duration of its code region.

If the if_expr is present and it evaluates to false, the number of teams created is one.

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The optional num_teams_upper and num_teams_lower arguments specify the limit on the number of teams to be created. If only the upper bound is specified, it acts as if the lower bound was set to the same value. It is not allowed to set num_teams_lower if num_teams_upper is not specified. They define a closed range, where both the lower and upper bounds are included.

The optional thread_limit specifies the limit on the number of threads.

OPERATION_NAME = 'omp.teams'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

if_expr() → _ods_ir | None¶

num_teams_lower() → _ods_ir | None¶

num_teams_upper() → _ods_ir | None¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

thread_limit() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.teams(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_teams_lower=None, num_teams_upper=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, thread_limit=None, loc=None, ip=None) → TeamsOp¶

class mlir.dialects._omp_ops_gen.TerminatorOp(*, loc=None, ip=None)¶

Bases: _ods_ir

A terminator operation for regions that appear in the body of OpenMP operation. These regions are not expected to return any value so the terminator takes no operands. The terminator op returns control to the enclosing op.

OPERATION_NAME = 'omp.terminator'¶

_ODS_REGIONS = (0, True)¶

mlir.dialects._omp_ops_gen.terminator(*, loc=None, ip=None) → TerminatorOp¶

class mlir.dialects._omp_ops_gen.ThreadprivateOp(tls_addr, sym_addr, *, loc=None, ip=None)¶

Bases: _ods_ir

The threadprivate directive specifies that variables are replicated, with each thread having its own copy.

The current implementation uses the OpenMP runtime to provide thread-local storage (TLS). Using the TLS feature of the LLVM IR will be supported in future.

This operation takes in the address of a symbol that represents the original variable and returns the address of its TLS. All occurrences of threadprivate variables in a parallel region should use the TLS returned by this operation.

The sym_addr refers to the address of the symbol, which is a pointer to the original variable.

OPERATION_NAME = 'omp.threadprivate'¶

_ODS_REGIONS = (0, True)¶

sym_addr() → _ods_ir¶

tls_addr() → _ods_ir¶

mlir.dialects._omp_ops_gen.threadprivate(tls_addr, sym_addr, *, loc=None, ip=None) → _ods_ir¶

class mlir.dialects._omp_ops_gen.TileOp(generatees, applyees, sizes, *, loc=None, ip=None)¶

Bases: _ods_ir

Represents the OpenMP tile directive introduced in OpenMP 5.1.

The construct partitions the logical iteration space of the affected loops into equally-sized tiles, then creates two sets of nested loops. The outer loops, called the grid loops, iterate over all tiles. The inner loops, called the intratile loops, iterate over the logical iterations of a tile. The sizes clause determines the size of a tile.

Currently, the affected loops must be rectangular (the tripcount of the inner loop must not depend on any iv of an surrounding affected loop) and perfectly nested (except for the innermost affected loop, no operations other than the nested loop and the terminator in the loop body).

The sizes clauses defines the size of a grid over a multi-dimensional logical iteration space. This grid is used for loop transformations such as tile and strip. The size per dimension can be a variable, but only values that are not at least 2 make sense. It is not specified what happens when smaller values are used, but should still result in a loop nest that executes each logical iteration once.

OPERATION_NAME = 'omp.tile'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (0, True)¶

generatees() → _ods_ir¶

applyees() → _ods_ir¶

sizes() → _ods_ir¶

mlir.dialects._omp_ops_gen.tile(generatees, applyees, sizes, *, loc=None, ip=None) → TileOp¶

class mlir.dialects._omp_ops_gen.UnrollHeuristicOp(applyee, *, loc=None, ip=None)¶

Bases: _ods_ir

Represents a #pragma omp unroll construct introduced in OpenMP 5.1.

The operation has one applyee and no generatees. The applyee is unrolled according to implementation-defined heuristics. Implementations may choose to not unroll the loop, partially unroll by a chosen factor, or fully unroll it. Even if the implementation chooses to partially unroll the applyee, the resulting unrolled loop is not accessible as a generatee. Use omp.unroll_partial if a generatee is required.

The lowering is implemented using OpenMPIRBuilder::unrollLoopHeuristic, which just attaches llvm.loop.unroll.enable metadata to the loop so the unrolling is carried-out by LLVM’s LoopUnroll pass. That is, unrolling only actually performed in optimized builds.

Assembly formats: omp.unroll_heuristic(%cli) omp.unroll_heuristic(%cli) -> ()

OPERATION_NAME = 'omp.unroll_heuristic'¶

_ODS_REGIONS = (0, True)¶

applyee() → _ods_ir¶

mlir.dialects._omp_ops_gen.unroll_heuristic(applyee, *, loc=None, ip=None) → UnrollHeuristicOp¶

class mlir.dialects._omp_ops_gen.WorkdistributeOp(*, loc=None, ip=None)¶

Bases: _ods_ir

workdistribute divides execution of the enclosed structured block into separate units of work, each executed only once by each initial thread in the league.

!$omp target teams
    !$omp workdistribute
    y = a * x + y
    !$omp end workdistribute
!$omp end target teams

OPERATION_NAME = 'omp.workdistribute'¶

_ODS_REGIONS = (1, True)¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.workdistribute(*, loc=None, ip=None) → WorkdistributeOp¶

class mlir.dialects._omp_ops_gen.WorkshareLoopWrapperOp(*, loc=None, ip=None)¶

Bases: _ods_ir

This operation wraps a loop nest that is marked for dividing into units of work by an encompassing omp.workshare operation.

OPERATION_NAME = 'omp.workshare.loop_wrapper'¶

_ODS_REGIONS = (1, True)¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.workshare_loop_wrapper(*, loc=None, ip=None) → WorkshareLoopWrapperOp¶

class mlir.dialects._omp_ops_gen.WorkshareOp(*, nowait=None, loc=None, ip=None)¶

Bases: _ods_ir

The workshare construct divides the execution of the enclosed structured block into separate units of work, and causes the threads of the team to share the work such that each unit is executed only once by one thread, in the context of its implicit task

This operation is used for the intermediate representation of the workshare block before the work gets divided between the threads. See the flang LowerWorkshare pass for details.

OPERATION_NAME = 'omp.workshare'¶

_ODS_REGIONS = (1, True)¶

nowait() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.workshare(*, nowait=None, loc=None, ip=None) → WorkshareOp¶

class mlir.dialects._omp_ops_gen.WsloopOp(allocate_vars, allocator_vars, linear_vars, linear_step_vars, private_vars, reduction_vars, *, nowait=None, order=None, order_mod=None, ordered=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, schedule_kind=None, schedule_chunk=None, schedule_mod=None, schedule_simd=None, loc=None, ip=None)¶

Bases: _ods_ir

The worksharing-loop construct specifies that the iterations of the loop(s) will be executed in parallel by threads in the current context. These iterations are spread across threads that already exist in the enclosing parallel region.

The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an omp.loop_nest.

omp.wsloop <clauses> {
  omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) {
    %a = load %arrA[%i1, %i2] : memref<?x?xf32>
    %b = load %arrB[%i1, %i2] : memref<?x?xf32>
    %sum = arith.addf %a, %b : f32
    store %sum, %arrC[%i1, %i2] : memref<?x?xf32>
    omp.yield
  }
}

The allocator_vars and allocate_vars parameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.

The optional order attribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.

The optional ordered attribute specifies how many loops are associated with the worksharing-loop construct. The value of zero refers to the ordered clause specified without parameter.

The optional schedule_kind attribute specifies the loop schedule for this loop, determining how the loop is distributed across the parallel threads. The optional schedule_chunk associated with this determines further controls this distribution.

OPERATION_NAME = 'omp.wsloop'¶

_ODS_OPERAND_SEGMENTS¶

_ODS_REGIONS = (1, True)¶

allocate_vars() → _ods_ir¶

allocator_vars() → _ods_ir¶

linear_vars() → _ods_ir¶

linear_step_vars() → _ods_ir¶

private_vars() → _ods_ir¶

reduction_vars() → _ods_ir¶

schedule_chunk() → _ods_ir | None¶

nowait() → bool¶

order() → _ods_ir | None¶

order_mod() → _ods_ir | None¶

ordered() → _ods_ir | None¶

private_syms() → _ods_ir | None¶

private_needs_barrier() → bool¶

reduction_mod() → _ods_ir | None¶

reduction_byref() → _ods_ir | None¶

reduction_syms() → _ods_ir | None¶

schedule_kind() → _ods_ir | None¶

schedule_mod() → _ods_ir | None¶

schedule_simd() → bool¶

region() → _ods_ir¶

mlir.dialects._omp_ops_gen.wsloop(allocate_vars, allocator_vars, linear_vars, linear_step_vars, private_vars, reduction_vars, *, nowait=None, order=None, order_mod=None, ordered=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, schedule_kind=None, schedule_chunk=None, schedule_mod=None, schedule_simd=None, loc=None, ip=None) → WsloopOp¶

class mlir.dialects._omp_ops_gen.YieldOp(results_, *, loc=None, ip=None)¶

Bases: _ods_ir

“omp.yield” yields SSA values from the OpenMP dialect op region and terminates the region. The semantics of how the values are yielded is defined by the parent operation.

OPERATION_NAME = 'omp.yield'¶

_ODS_REGIONS = (0, True)¶

results_() → _ods_ir¶

mlir.dialects._omp_ops_gen.yield_(results_, *, loc=None, ip=None) → YieldOp¶