mlir.dialects._omp_ops_gen¶
Attributes¶
Classes¶
The storage for each list item that appears in the allocate directive is |
|
This operation performs an atomic capture. |
|
This operation performs an atomic read. |
|
This operation performs an atomic update. |
|
This operation performs an atomic write. |
|
The barrier construct specifies an explicit barrier at the point at which |
|
The cancel construct activates cancellation of the innermost enclosing |
|
The cancellation point construct introduces a user-defined cancellation |
|
All loops that conform to OpenMP's definition of a canonical loop can be |
|
Declares a named critical section. |
|
The critical construct imposes a restriction on the associated structured |
|
This Op is used to capture the map information related to it's |
|
The declare mapper directive declares a user-defined mapper for a given |
|
Declares an OpenMP reduction kind. This requires two mandatory and three |
|
The distribute construct specifies that the iterations of one or more loops |
|
The flush construct executes the OpenMP flush operation. This operation |
|
This operation represents a rectangular loop nest which may be collapsed |
|
A loop construct specifies that the logical iterations of the associated loops |
|
This operation is a variation on the OpenACC dialects DataBoundsOp. Within |
|
The MapInfoOp captures information relating to individual OpenMP map clauses |
|
Masked construct allows to specify a structured block to be executed by a subset of |
|
The master construct specifies a structured block that is executed by |
|
Create a new CLI that can be passed as an argument to a CanonicalLoopOp |
|
The ordered construct without region is a stand-alone directive that |
|
The ordered construct with region specifies a structured block in a |
|
The parallel construct includes a region of code which is to be executed |
|
This operation provides a declaration of how to implement the |
|
The scan directive allows to specify scan reductions. It should be |
|
A section operation encloses a region which represents one section in a |
|
The sections construct is a non-iterative worksharing construct that |
|
The simd construct can be applied to a loop to indicate that the loop can be |
|
The single construct specifies that the associated structured block is |
|
Allocates memory on the specified OpenMP device for an object of the given type. |
|
Map variables to a device data environment for the extent of the region. |
|
The target enter data directive specifies that variables are mapped to |
|
The target exit data directive specifies that variables are mapped to a |
|
Deallocates memory on the specified OpenMP device that was previously |
|
The target construct includes a region of code which is to be executed |
|
The target update directive makes the corresponding list items in the device |
|
The task construct defines an explicit task. |
|
The taskgroup construct specifies a wait on completion of child tasks of the |
|
The taskloop construct specifies that the iterations of one or more |
|
The taskwait construct specifies a wait on the completion of child tasks |
|
The taskyield construct specifies that the current task can be suspended |
|
The teams construct defines a region of code that triggers the creation of a |
|
A terminator operation for regions that appear in the body of OpenMP |
|
The threadprivate directive specifies that variables are replicated, with |
|
Represents the OpenMP tile directive introduced in OpenMP 5.1. |
|
Represents a |
|
workdistribute divides execution of the enclosed structured block into |
|
This operation wraps a loop nest that is marked for dividing into units of |
|
The workshare construct divides the execution of the enclosed structured |
|
The worksharing-loop construct specifies that the iterations of the loop(s) |
|
"omp.yield" yields SSA values from the OpenMP dialect op region and |
Functions¶
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Module Contents¶
- mlir.dialects._omp_ops_gen._ods_ir¶
- class mlir.dialects._omp_ops_gen._Dialect(descriptor: object)¶
Bases:
_ods_ir- DIALECT_NAMESPACE = 'omp'¶
- class mlir.dialects._omp_ops_gen.AllocateDirOp(varList, *, align=None, allocator=None, loc=None, ip=None)¶
Bases:
_ods_irThe storage for each list item that appears in the allocate directive is provided an allocation through the memory allocator.
The
alignclause is used to specify the byte alignment to use for allocations associated with the construct on which the clause appears.allocatorspecifies the memory allocator to be used for allocations associated with the construct on which the clause appears.- OPERATION_NAME = 'omp.allocate_dir'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- varList() _ods_ir¶
- allocator() _ods_ir | None¶
- align() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.allocate_dir(var_list, *, align=None, allocator=None, loc=None, ip=None) AllocateDirOp¶
- class mlir.dialects._omp_ops_gen.AtomicCaptureOp(*, hint=None, memory_order=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation performs an atomic capture.
The region has the following allowed forms:
omp.atomic.capture { omp.atomic.update ... omp.atomic.read ... omp.terminator } omp.atomic.capture { omp.atomic.read ... omp.atomic.update ... omp.terminator } omp.atomic.capture { omp.atomic.read ... omp.atomic.write ... omp.terminator }
hintis the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.memory_orderindicates the memory ordering behavior of the construct. It can be one ofseq_cst,acq_rel,release,acquireorrelaxed.- OPERATION_NAME = 'omp.atomic.capture'¶
- _ODS_REGIONS = (1, True)¶
- hint() _ods_ir | None¶
- memory_order() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.atomic_capture(*, hint=None, memory_order=None, loc=None, ip=None) AtomicCaptureOp¶
- class mlir.dialects._omp_ops_gen.AtomicReadOp(x, v, element_type, *, hint=None, memory_order=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation performs an atomic read.
The operand
xis the address from where the value is atomically read. The operandvis the address where the value is stored after reading.hintis the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.memory_orderindicates the memory ordering behavior of the construct. It can be one ofseq_cst,acq_rel,release,acquireorrelaxed.- OPERATION_NAME = 'omp.atomic.read'¶
- _ODS_REGIONS = (0, True)¶
- x() _ods_ir¶
- v() _ods_ir¶
- element_type() _ods_ir¶
- hint() _ods_ir | None¶
- memory_order() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.atomic_read(x, v, element_type, *, hint=None, memory_order=None, loc=None, ip=None) AtomicReadOp¶
- class mlir.dialects._omp_ops_gen.AtomicUpdateOp(x, *, atomic_control=None, hint=None, memory_order=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation performs an atomic update.
The operand
xis exactly the same as the operandxin the OpenMP Standard (OpenMP 5.0, section 2.17.7). It is the address of the variable that is being updated.xis atomically read/written.The region describes how to update the value of
x. It takes the value atxas an input and must yield the updated value. Only the update toxis atomic. Generally the region must have only one instruction, but can potentially have more than one instructions too. The update is sematically similar to a compare-exchange loop based atomic update.The syntax of atomic update operation is different from atomic read and atomic write operations. This is because only the host dialect knows how to appropriately update a value. For example, while generating LLVM IR, if there are no special
atomicrmwinstructions for the operation-type combination in atomic update, a compare-exchange loop is generated, where the core update operation is directly translated like regular operations by the host dialect. The front-end must handle semantic checks for allowed operations.hintis the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.memory_orderindicates the memory ordering behavior of the construct. It can be one ofseq_cst,acq_rel,release,acquireorrelaxed.- OPERATION_NAME = 'omp.atomic.update'¶
- _ODS_REGIONS = (1, True)¶
- x() _ods_ir¶
- atomic_control() _ods_ir | None¶
- hint() _ods_ir | None¶
- memory_order() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.atomic_update(x, *, atomic_control=None, hint=None, memory_order=None, loc=None, ip=None) AtomicUpdateOp¶
- class mlir.dialects._omp_ops_gen.AtomicWriteOp(x, expr, *, hint=None, memory_order=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation performs an atomic write.
The operand
xis the address to where theexpris atomically written w.r.t. multiple threads. The evaluation ofexprneed not be atomic w.r.t. the write to address. In general, the type(x) must dereference to type(expr).hintis the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.memory_orderindicates the memory ordering behavior of the construct. It can be one ofseq_cst,acq_rel,release,acquireorrelaxed.- OPERATION_NAME = 'omp.atomic.write'¶
- _ODS_REGIONS = (0, True)¶
- x() _ods_ir¶
- expr() _ods_ir¶
- hint() _ods_ir | None¶
- memory_order() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.atomic_write(x, expr, *, hint=None, memory_order=None, loc=None, ip=None) AtomicWriteOp¶
- class mlir.dialects._omp_ops_gen.BarrierOp(*, loc=None, ip=None)¶
Bases:
_ods_irThe barrier construct specifies an explicit barrier at the point at which the construct appears.
- OPERATION_NAME = 'omp.barrier'¶
- _ODS_REGIONS = (0, True)¶
- class mlir.dialects._omp_ops_gen.CancelOp(cancel_directive, *, if_expr=None, loc=None, ip=None)¶
Bases:
_ods_irThe cancel construct activates cancellation of the innermost enclosing region of the type specified.
- OPERATION_NAME = 'omp.cancel'¶
- _ODS_REGIONS = (0, True)¶
- if_expr() _ods_ir | None¶
- cancel_directive() _ods_ir¶
- class mlir.dialects._omp_ops_gen.CancellationPointOp(cancel_directive, *, loc=None, ip=None)¶
Bases:
_ods_irThe cancellation point construct introduces a user-defined cancellation point at which implicit or explicit tasks check if cancellation of the innermost enclosing region of the type specified has been activated.
- OPERATION_NAME = 'omp.cancellation_point'¶
- _ODS_REGIONS = (0, True)¶
- cancel_directive() _ods_ir¶
- mlir.dialects._omp_ops_gen.cancellation_point(cancel_directive, *, loc=None, ip=None) CancellationPointOp¶
- class mlir.dialects._omp_ops_gen.CanonicalLoopOp(tripCount, *, cli=None, loc=None, ip=None)¶
Bases:
_ods_irAll loops that conform to OpenMP’s definition of a canonical loop can be simplified to a CanonicalLoopOp. In particular, there are no loop-carried variables and the number of iterations it will execute is known before the operation. This allows e.g. to determine the number of threads and chunks the iterations space is split into before executing any iteration. More restrictions may apply in cases such as (collapsed) loop nests, doacross loops, etc.
In contrast to other loop operations such as
scf.for, the number of iterations is determined by only a single variable, the trip-count. The induction variable value is the logical iteration number of that iteration, which OpenMP defines to be between 0 and the trip-count (exclusive). Loop representation having lower-bound, upper-bound, and step-size operands, require passes to do more work than necessary, including handling special cases such as upper-bound smaller than lower-bound, upper-bound equal to the integer type’s maximal value, negative step size, etc. This complexity is better only handled once by the front-end and can apply its semantics for such cases while still being able to represent any kind of loop, which kind of the point of a mid-end intermediate representation. User-defined types such as random-access iterators in C++ could not directly be represented anyway.The induction variable is always of the same type as the tripcount argument. Since it can never be negative, tripcount is always interpreted as an unsigned integer. It is the caller’s responsibility to ensure the tripcount is not negative when its interpretation is signed, i.e.
%tripcount = max(0,%tripcount).An optional argument to a omp.canonical_loop that can be passed in is a CanonicalLoopInfo value that can be used to refer to the canonical loop to apply transformations – such as tiling, unrolling, or work-sharing – to the loop, similar to the transform dialect but with OpenMP-specific semantics. Because it is optional, it has to be the last of the operands, but appears first in the pretty format printing.
The pretty assembly format is inspired by python syntax, where
range(n)returns an iterator that runs from $0$ to $n-1$. The pretty assembly syntax is one of:omp.canonical_loop(%cli) %iv : !type in range(%tripcount) omp.canonical_loop %iv : !type in range(%tripcount)
A CanonicalLoopOp is lowered to LLVM-IR using
OpenMPIRBuilder::createCanonicalLoop.Examples¶
Translation from lower-bound, upper-bound, step-size to trip-count.
for (int i = 3; i < 42; i+=2) { B[i] = A[i]; }
%lb = arith.constant 3 : i32 %ub = arith.constant 42 : i32 %step = arith.constant 2 : i32 %range = arith.sub %ub, %lb : i32 %tripcount = arith.div %range, %step : i32 omp.canonical_loop %iv : i32 in range(%tripcount) { %offset = arith.mul %iv, %step : i32 %i = arith.add %offset, %lb : i32 %a = load %arrA[%i] : memref<?xf32> store %a, %arrB[%i] : memref<?xf32> }
Nested canonical loop with transformation of the inner loop.
%outer = omp.new_cli : !omp.cli %inner = omp.new_cli : !omp.cli omp.canonical_loop(%outer) %iv1 : i32 in range(%tc1) { omp.canonical_loop(%inner) %iv2 : i32 in range(%tc2) { %a = load %arrA[%iv1, %iv2] : memref<?x?xf32> store %a, %arrB[%iv1, %iv2] : memref<?x?xf32> } } omp.unroll_full(%inner)
- OPERATION_NAME = 'omp.canonical_loop'¶
- _ODS_REGIONS = (1, True)¶
- tripCount() _ods_ir¶
- cli() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.canonical_loop(trip_count, *, cli=None, loc=None, ip=None) CanonicalLoopOp¶
- class mlir.dialects._omp_ops_gen.CriticalDeclareOp(sym_name, *, hint=None, loc=None, ip=None)¶
Bases:
_ods_irDeclares a named critical section.
The
sym_namecan be used inomp.criticalconstructs in the dialect.hintis the value of hint (as specified in the hint clause). It is a compile time constant. As the name suggests, this is just a hint for optimization.- OPERATION_NAME = 'omp.critical.declare'¶
- _ODS_REGIONS = (0, True)¶
- sym_name() _ods_ir¶
- hint() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.critical_declare(sym_name, *, hint=None, loc=None, ip=None) CriticalDeclareOp¶
- class mlir.dialects._omp_ops_gen.CriticalOp(*, name=None, loc=None, ip=None)¶
Bases:
_ods_irThe critical construct imposes a restriction on the associated structured block (region) to be executed by only a single thread at a time.
The optional
nameargument of critical constructs is used to identify them. Unnamed critical constructs behave as though an identical name was specified.- OPERATION_NAME = 'omp.critical'¶
- _ODS_REGIONS = (1, True)¶
- name() _ods_ir | None¶
Returns the fully qualified name of the operation.
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.critical(*, name=None, loc=None, ip=None) CriticalOp¶
- class mlir.dialects._omp_ops_gen.DeclareMapperInfoOp(map_vars, *, loc=None, ip=None)¶
Bases:
_ods_irThis Op is used to capture the map information related to it’s parent DeclareMapperOp.
The optional
map_varsmaps data from the current task’s data environment to the device data environment.- OPERATION_NAME = 'omp.declare_mapper.info'¶
- _ODS_REGIONS = (0, True)¶
- map_vars() _ods_ir¶
- mlir.dialects._omp_ops_gen.declare_mapper_info(map_vars, *, loc=None, ip=None) DeclareMapperInfoOp¶
- class mlir.dialects._omp_ops_gen.DeclareMapperOp(sym_name, type_, *, loc=None, ip=None)¶
Bases:
_ods_irThe declare mapper directive declares a user-defined mapper for a given type, and defines a mapper-identifier that can be used in a map clause.
- OPERATION_NAME = 'omp.declare_mapper'¶
- _ODS_REGIONS = (1, True)¶
- sym_name() _ods_ir¶
- type_() _ods_ir¶
- body() _ods_ir¶
- mlir.dialects._omp_ops_gen.declare_mapper(sym_name, type_, *, loc=None, ip=None) DeclareMapperOp¶
- class mlir.dialects._omp_ops_gen.DeclareReductionOp(sym_name, type_, *, loc=None, ip=None)¶
Bases:
_ods_irDeclares an OpenMP reduction kind. This requires two mandatory and three optional regions.
#. The optional alloc region specifies how to allocate the thread-local reduction value. This region should not contain control flow and all IR should be suitable for inlining straight into an entry block. In the common case this is expected to contain only allocas. It is expected to
omp.yieldthe allocated value on all control paths. If allocation is conditional (e.g. only allocate if the mold is allocated), this should be done in the initilizer region and this region not included. The alloc region is not used for by-value reductions (where allocation is implicit). #. The initializer region specifies how to initialize the thread-local reduction value. This is usually the neutral element of the reduction. For convenience, the region has an argument that contains the value of the reduction accumulator at the start of the reduction. If an alloc region is specified, there is a second block argument containing the address of the allocated memory. The initializer region is expected toomp.yieldthe new value on all control flow paths. #. The reduction region specifies how to combine two values into one, i.e. the reduction operator. It accepts the two values as arguments and is expected toomp.yieldthe combined value on all control flow paths. #. The atomic reduction region is optional and specifies how two values can be combined atomically given local accumulator variables. It is expected to store the combined value in the first accumulator variable. #. The cleanup region is optional and specifies how to clean up any memory allocated by the initializer region. The region has an argument that contains the value of the thread-local reduction accumulator. This will be executed after the reduction has completed.Note that the MLIR type system does not allow for type-polymorphic reductions. Separate reduction declarations should be created for different element and accumulator types.
For initializer and reduction regions, the operand to
omp.yieldmust match the parent operation’s results.- OPERATION_NAME = 'omp.declare_reduction'¶
- _ODS_REGIONS = (5, True)¶
- sym_name() _ods_ir¶
- type_() _ods_ir¶
- allocRegion() _ods_ir¶
- initializerRegion() _ods_ir¶
- reductionRegion() _ods_ir¶
- atomicReductionRegion() _ods_ir¶
- cleanupRegion() _ods_ir¶
- mlir.dialects._omp_ops_gen.declare_reduction(sym_name, type_, *, loc=None, ip=None) DeclareReductionOp¶
- class mlir.dialects._omp_ops_gen.DistributeOp(allocate_vars, allocator_vars, private_vars, *, dist_schedule_static=None, dist_schedule_chunk_size=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None)¶
Bases:
_ods_irThe distribute construct specifies that the iterations of one or more loops (optionally specified using collapse clause) will be executed by the initial teams in the context of their implicit tasks. The loops that the distribute op is associated with starts with the outermost loop enclosed by the distribute op region and going down the loop nest toward the innermost loop. The iterations are distributed across the initial threads of all initial teams that execute the teams region to which the distribute region binds.
The distribute loop construct specifies that the iterations of the loop(s) will be executed in parallel by threads in the current context. These iterations are spread across threads that already exist in the enclosing region.
The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an
omp.loop_nest.omp.distribute <clauses> { omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield } }
The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The
dist_schedule_staticattribute specifies the schedule for this loop, determining how the loop is distributed across the various teams. The optionaldist_schedule_chunk_sizeassociated with this determines further controls this distribution.The optional
orderattribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.- OPERATION_NAME = 'omp.distribute'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- dist_schedule_chunk_size() _ods_ir | None¶
- private_vars() _ods_ir¶
- dist_schedule_static() bool¶
- order() _ods_ir | None¶
- order_mod() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.distribute(allocate_vars, allocator_vars, private_vars, *, dist_schedule_static=None, dist_schedule_chunk_size=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None) DistributeOp¶
- class mlir.dialects._omp_ops_gen.FlushOp(varList, *, loc=None, ip=None)¶
Bases:
_ods_irThe flush construct executes the OpenMP flush operation. This operation makes a thread’s temporary view of memory consistent with memory and enforces an order on the memory operations of the variables explicitly specified or implied.
- OPERATION_NAME = 'omp.flush'¶
- _ODS_REGIONS = (0, True)¶
- varList() _ods_ir¶
- class mlir.dialects._omp_ops_gen.LoopNestOp(loop_lower_bounds, loop_upper_bounds, loop_steps, *, collapse_num_loops=None, loop_inclusive=None, tile_sizes=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation represents a rectangular loop nest which may be collapsed and/or tiled. For each rectangular loop of the nest represented by an instance of this operation, lower and upper bounds, as well as a step variable, must be defined. The collapse clause specifies how many loops that should be collapsed (1 if no collapse is done) after any tiling is performed. The tiling sizes is represented by the tile sizes clause.
The lower and upper bounds specify a half-open range: the range includes the lower bound but does not include the upper bound. If the
loop_inclusiveattribute is specified then the upper bound is also included.The body region can contain any number of blocks. The region is terminated by an
omp.yieldinstruction without operands. The induction variables, represented as entry block arguments to the loop nest operation’s single region, match the types of theloop_lower_bounds,loop_upper_boundsandloop_stepsarguments.omp.loop_nest (%i1, %i2) : i32 = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) collapse(2) tiles(5,5) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield }
This is a temporary simplified definition of a loop based on existing OpenMP loop operations intended to serve as a stopgap solution until the long-term representation of canonical loops is defined. Specifically, this operation is intended to serve as a unique source for loop information during the transition to making
omp.distribute,omp.simd,omp.taskloopandomp.wsloopwrapper operations. It is not intended to help with the addition of support for loop transformations, non-rectangular loops and non-perfectly nested loops.- OPERATION_NAME = 'omp.loop_nest'¶
- _ODS_REGIONS = (1, True)¶
- loop_lower_bounds() _ods_ir¶
- loop_upper_bounds() _ods_ir¶
- loop_steps() _ods_ir¶
- collapse_num_loops() _ods_ir | None¶
- loop_inclusive() bool¶
- tile_sizes() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.loop_nest(loop_lower_bounds, loop_upper_bounds, loop_steps, *, collapse_num_loops=None, loop_inclusive=None, tile_sizes=None, loc=None, ip=None) LoopNestOp¶
- class mlir.dialects._omp_ops_gen.LoopOp(private_vars, reduction_vars, *, bind_kind=None, private_syms=None, private_needs_barrier=None, order=None, order_mod=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶
Bases:
_ods_irA loop construct specifies that the logical iterations of the associated loops may execute concurrently and permits the encountering threads to execute the loop accordingly. A loop construct can have 3 different types of binding:
#. teams: in which case the binding region is the innermost enclosing
teamsregion. #. parallel: in which case the binding region is the innermost enclosingparallelregion. #. thread: in which case the binding region is not defined.The body region can only contain a single block which must contain a single operation, this operation must be an
omp.loop_nest.omp.loop <clauses> { omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield } }The
bindclause specifies the binding region of the construct on which it appears.The optional
orderattribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.- OPERATION_NAME = 'omp.loop'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- bind_kind() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- order() _ods_ir | None¶
- order_mod() _ods_ir | None¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.loop(private_vars, reduction_vars, *, bind_kind=None, private_syms=None, private_needs_barrier=None, order=None, order_mod=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) LoopOp¶
- class mlir.dialects._omp_ops_gen.MapBoundsOp(result, *, lower_bound=None, upper_bound=None, extent=None, stride=None, stride_in_bytes=None, start_idx=None, loc=None, ip=None)¶
Bases:
_ods_irThis operation is a variation on the OpenACC dialects DataBoundsOp. Within the OpenMP dialect it stores the bounds/range of data to be mapped to a device specified by map clauses on target directives. Within the OpenMP dialect, the MapBoundsOp is associated with MapInfoOp, helping to store bounds information for the mapped variable.
It is used to support OpenMP array sectioning, Fortran pointer and allocatable mapping and pointer/allocatable member of derived types. In all cases the MapBoundsOp holds information on the section of data to be mapped. Such as the upper bound and lower bound of the section of data to be mapped. This information is currently utilised by the LLVM-IR lowering to help generate instructions to copy data to and from the device when processing target operations.
The example below copys a section of a 10-element array; all except the first element, utilising OpenMP array sectioning syntax where array subscripts are provided to specify the bounds to be mapped to device. To simplify the examples, the constants are used directly, in reality they will be MLIR SSA values.
C++:
int array[10]; #pragma target map(array[1:9])
=>
omp.map.bounds lower_bound(1) upper_bound(9) extent(9) start_idx(0)
Fortran:
integer :: array(1:10) !$target map(array(2:10))
=>
omp.map.bounds lower_bound(1) upper_bound(9) extent(9) start_idx(1)
For Fortran pointers and allocatables (as well as those that are members of derived types) the bounds information is provided by the Fortran compiler and runtime through descriptor information.
A basic pointer example can be found below (constants again provided for simplicity, where in reality SSA values will be used, in this case that point to data yielded by Fortran’s descriptors):
Fortran:
integer, pointer :: ptr(:) allocate(ptr(10)) !$target map(ptr)
=>
omp.map.bounds lower_bound(0) upper_bound(9) extent(10) start_idx(1)
This operation records the bounds information in a normalized fashion (zero-based). This works well with the
PointerLikeTyperequirement in data clauses - since alower_boundof 0 means looking at data at the zero offset from pointer.This operation must have an
upper_boundorextent(or both are allowed - but not checked for consistency). When the source language’s arrays are not zero-based, thestart_idxmust specify the zero-position index.- OPERATION_NAME = 'omp.map.bounds'¶
- _ODS_OPERAND_SEGMENTS = [0, 0, 0, 0, 0]¶
- _ODS_REGIONS = (0, True)¶
- lower_bound() _ods_ir | None¶
- upper_bound() _ods_ir | None¶
- extent() _ods_ir | None¶
- stride() _ods_ir | None¶
- start_idx() _ods_ir | None¶
- stride_in_bytes() _ods_ir¶
- result() _ods_ir¶
Shortcut to get an op result if it has only one (throws an error otherwise).
- mlir.dialects._omp_ops_gen.map_bounds(result, *, lower_bound=None, upper_bound=None, extent=None, stride=None, stride_in_bytes=None, start_idx=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._omp_ops_gen.MapInfoOp(omp_ptr, var_ptr, var_type, map_type, map_capture_type, members, bounds, *, var_ptr_ptr=None, members_index=None, mapper_id=None, name=None, partial_map=None, loc=None, ip=None)¶
Bases:
_ods_irThe MapInfoOp captures information relating to individual OpenMP map clauses that are applied to certain OpenMP directives such as Target and Target Data.
For example, the map type modifier; such as from, tofrom and to, the variable being captured or the bounds of an array section being mapped.
It can be used to capture both implicit and explicit map information, where explicit is an argument directly specified to an OpenMP map clause or implicit where a variable is utilised in a target region but is defined externally to the target region.
This map information is later used to aid the lowering of the target operations they are attached to providing argument input and output context for kernels generated or the target data mapping environment.
Example (Fortran):
integer :: index !$target map(to: index)
=>
omp.map.info var_ptr(%index_ssa) map_type(to) map_capture_type(ByRef) name(index)
Description of arguments:
var_ptr: The address of variable to copy.var_type: The type of the variable to copy.‘map_type’: OpenMP map type for this map capture, for example: from, to and
always. It’s a bitfield composed of the OpenMP runtime flags stored in OpenMPOffloadMappingFlags. * ‘map_capture_type’: Capture type for the variable e.g. this, byref, byvalue, byvla this can affect how the variable is lowered. *
var_ptr_ptr: Used when the variable copied is a member of a class, structure or derived type and refers to the originating struct. *members: Used to indicate mapped child members for the current MapInfoOp, represented as other MapInfoOp’s, utilised in cases where a parent structure type and members of the structure type are being mapped at the same time. For example: map(to: parent, parent->member, parent->member2[:10]) *members_index: Used to indicate the ordering of members within the containing parent (generally a record type such as a structure, class or derived type), e.g. struct {int x, float y, double z}, x would be 0, y would be 1, and z would be 2. This aids the mapping. *bounds: Used when copying slices of array’s, pointers or pointer members of objects (e.g. derived types or classes), indicates the bounds to be copied of the variable. When it’s an array slice it is in rank order where rank 0 is the inner-most dimension. * ‘mapper_id’: OpenMP mapper map type modifier for this map capture. It’s used to specify a user defined mapper to be used for mapping. *name: Holds the name of variable as specified in user clause (including bounds). *partial_map: The record type being mapped will not be mapped in its entirety, it may be used however, in a mapping to bind it’s mapped components together.- OPERATION_NAME = 'omp.map.info'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- var_ptr() _ods_ir¶
- var_ptr_ptr() _ods_ir | None¶
- members() _ods_ir¶
- bounds() _ods_ir¶
- var_type() _ods_ir¶
- map_type() _ods_ir¶
- map_capture_type() _ods_ir¶
- members_index() _ods_ir | None¶
- mapper_id() _ods_ir | None¶
- name() _ods_ir | None¶
Returns the fully qualified name of the operation.
- partial_map() _ods_ir¶
- omp_ptr() _ods_ir¶
- mlir.dialects._omp_ops_gen.map_info(omp_ptr, var_ptr, var_type, map_type, map_capture_type, members, bounds, *, var_ptr_ptr=None, members_index=None, mapper_id=None, name=None, partial_map=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._omp_ops_gen.MaskedOp(*, filtered_thread_id=None, loc=None, ip=None)¶
Bases:
_ods_irMasked construct allows to specify a structured block to be executed by a subset of threads of the current team.
If
filteris specified, the masked construct masks the execution of the region to only the thread id filtered. Other threads executing the parallel region are not expected to execute the region specified within themaskeddirective. Iffilteris not specified, master thread is expected to execute the region enclosed withinmaskeddirective.- OPERATION_NAME = 'omp.masked'¶
- _ODS_REGIONS = (1, True)¶
- filtered_thread_id() _ods_ir | None¶
- region() _ods_ir¶
- class mlir.dialects._omp_ops_gen.MasterOp(*, loc=None, ip=None)¶
Bases:
_ods_irThe master construct specifies a structured block that is executed by the master thread of the team.
- OPERATION_NAME = 'omp.master'¶
- _ODS_REGIONS = (1, True)¶
- region() _ods_ir¶
- class mlir.dialects._omp_ops_gen.NewCliOp(result, *, loc=None, ip=None)¶
Bases:
_ods_irCreate a new CLI that can be passed as an argument to a CanonicalLoopOp and to loop transformation operations to handle dependencies between loop transformation operations.
- OPERATION_NAME = 'omp.new_cli'¶
- _ODS_REGIONS = (0, True)¶
- result() _ods_ir¶
Shortcut to get an op result if it has only one (throws an error otherwise).
- mlir.dialects._omp_ops_gen.new_cli(result, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._omp_ops_gen.OrderedOp(doacross_depend_vars, *, doacross_depend_type=None, doacross_num_loops=None, loc=None, ip=None)¶
Bases:
_ods_irThe ordered construct without region is a stand-alone directive that specifies cross-iteration dependencies in a doacross loop nest.
The
doacross_depend_typeattribute refers to either the DEPEND(SOURCE) clause or the DEPEND(SINK: vec) clause.The
doacross_num_loopsattribute specifies the number of loops in the doacross nest.The
doacross_depend_varsis a variadic list of operands that specifies the index of the loop iterator in the doacross nest for the DEPEND(SOURCE) clause or the index of the element of “vec” for the DEPEND(SINK: vec) clause. It contains the operands in multiple “vec” when multiple DEPEND(SINK: vec) clauses exist in one ORDERED directive.- OPERATION_NAME = 'omp.ordered'¶
- _ODS_REGIONS = (0, True)¶
- doacross_depend_vars() _ods_ir¶
- doacross_depend_type() _ods_ir | None¶
- doacross_num_loops() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.ordered(doacross_depend_vars, *, doacross_depend_type=None, doacross_num_loops=None, loc=None, ip=None) OrderedOp¶
- class mlir.dialects._omp_ops_gen.OrderedRegionOp(*, par_level_simd=None, loc=None, ip=None)¶
Bases:
_ods_irThe ordered construct with region specifies a structured block in a worksharing-loop, SIMD, or worksharing-loop SIMD region that is executed in the order of the loop iterations.
The
par_level_simdattribute corresponds to the simd clause specified. If it is not present, it behaves as if the threads clause is specified or no clause is specified.- OPERATION_NAME = 'omp.ordered.region'¶
- _ODS_REGIONS = (1, True)¶
- par_level_simd() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.ordered_region(*, par_level_simd=None, loc=None, ip=None) OrderedRegionOp¶
- class mlir.dialects._omp_ops_gen.ParallelOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_threads=None, private_syms=None, private_needs_barrier=None, proc_bind_kind=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶
Bases:
_ods_irThe parallel construct includes a region of code which is to be executed by a team of threads.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the parallel region runs as normal, if it is 0 then the parallel region is executed with one thread.The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The optional
num_threadsparameter specifies the number of threads which should be used to execute the parallel region.The optional
proc_bind_kindattribute controls the thread affinity for the execution of the parallel region.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.- OPERATION_NAME = 'omp.parallel'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- if_expr() _ods_ir | None¶
- num_threads() _ods_ir | None¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- proc_bind_kind() _ods_ir | None¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.parallel(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_threads=None, private_syms=None, private_needs_barrier=None, proc_bind_kind=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) ParallelOp¶
- class mlir.dialects._omp_ops_gen.PrivateClauseOp(sym_name, type_, data_sharing_type, *, loc=None, ip=None)¶
Bases:
_ods_irThis operation provides a declaration of how to implement the [first]privatization of a variable. The dialect users should provide which type should be allocated for this variable. The allocated (usually by alloca) variable is passed to the initialization region which does everything else (e.g. initialization of Fortran runtime descriptors). Information about how to initialize the copy from the original item should be given in the copy region, and if needed, how to deallocate memory (allocated by the initialization region) in the dealloc region.
Examples:
private(x)would not need any regions because no initialization is
required by the standard for i32 variables and this is not firstprivate.
omp.private {type = private} @x.privatizer : i32
firstprivate(x)would be emitted as:
omp.private {type = firstprivate} @x.privatizer : i32 copy { ^bb0(%arg0: !fir.ref<i32>, %arg1: !fir.ref<i32>): // %arg0 is the original host variable. // %arg1 represents the memory allocated for this private variable. ... copy from host to the privatized clone .... omp.yield(%arg1 : !fir.ref<i32>) }
private(x)for “allocatables” would be emitted as:
omp.private {type = private} @x.privatizer : !some.type init { ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>): // initialize %arg1, using %arg0 as a mold for allocations. // For example if %arg0 is a heap allocated array with a runtime determined // length and !some.type is a runtime type descriptor, the init region // will read the array length from %arg0, and heap allocate an array of the // right length and initialize %arg1 to contain the array allocation and // length. omp.yield(%arg1 : !some.pointer<!some.type>) } dealloc { ^bb0(%arg0: !some.pointer<!some.type>): // ... deallocate memory allocated by the init region... // In the example above, this will free the heap allocated array data. omp.yield }
There are no restrictions on the body except for:
The
deallocregions has a single argument.The
init©regions have 2 arguments.All three regions are terminated by
omp.yieldops.
The above restrictions and other obvious restrictions (e.g. verifying the type of yielded values) are verified by the custom op verifier. The actual contents of the blocks inside all regions are not verified.
Instances of this op would then be used by ops that model directives that accept data-sharing attribute clauses.
The
sym_nameattribute provides a symbol by which the privatizer op can be referenced by other dialect ops.The
typeattribute is the type of the value being privatized. This type will be implicitly allocated in MLIR->LLVMIR conversion and passed as the second argument to the init region. Therefore the type of arguments to the regions should be a type which represents a pointer totype.The
data_sharing_typeattribute specifies whether privatizer corresponds to aprivateor afirstprivateclause.- OPERATION_NAME = 'omp.private'¶
- _ODS_REGIONS = (3, True)¶
- sym_name() _ods_ir¶
- type_() _ods_ir¶
- data_sharing_type() _ods_ir¶
- init_region() _ods_ir¶
- copy_region() _ods_ir¶
- dealloc_region() _ods_ir¶
- mlir.dialects._omp_ops_gen.private(sym_name, type_, data_sharing_type, *, loc=None, ip=None) PrivateClauseOp¶
- class mlir.dialects._omp_ops_gen.ScanOp(inclusive_vars, exclusive_vars, *, loc=None, ip=None)¶
Bases:
_ods_irThe scan directive allows to specify scan reductions. It should be enclosed within a parent directive along with which a reduction clause with
inscanmodifier must be specified. The scan directive allows to split code blocks into input phase and scan phase in the region enclosed by the parent.The inclusive clause is used on a separating directive that separates a structured block into two structured block sequences. If it is specified, the input phase includes the preceding structured block sequence and the scan phase includes the following structured block sequence.
The
inclusive_varsis a variadic list of operands that specifies the scan-reduction accumulator symbols.The exclusive clause is used on a separating directive that separates a structured block into two structured block sequences. If it is specified, the input phase excludes the preceding structured block sequence and instead includes the following structured block sequence, while the scan phase includes the preceding structured block sequence.
The
exclusive_varsis a variadic list of operands that specifies the scan-reduction accumulator symbols.- OPERATION_NAME = 'omp.scan'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- inclusive_vars() _ods_ir¶
- exclusive_vars() _ods_ir¶
- class mlir.dialects._omp_ops_gen.SectionOp(*, loc=None, ip=None)¶
Bases:
_ods_irA section operation encloses a region which represents one section in a sections construct. A section op should always be surrounded by an
omp.sectionsoperation. The section operation may have block args which corespond to the block arguments of the surroundingomp.sectionsoperation. This is done to reflect situations where these block arguments represent variables private to each section.- OPERATION_NAME = 'omp.section'¶
- _ODS_REGIONS = (1, True)¶
- region() _ods_ir¶
- class mlir.dialects._omp_ops_gen.SectionsOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, nowait=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None)¶
Bases:
_ods_irThe sections construct is a non-iterative worksharing construct that contains
omp.sectionoperations. Theomp.sectionoperations are to be distributed among and executed by the threads in a team. Eachomp.sectionis executed once by one of the threads in the team in the context of its implicit task. Block arguments for reduction variables should be mirrored in enclosedomp.sectionoperations.The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.- OPERATION_NAME = 'omp.sections'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- nowait() bool¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.sections(allocate_vars, allocator_vars, private_vars, reduction_vars, *, nowait=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, loc=None, ip=None) SectionsOp¶
- class mlir.dialects._omp_ops_gen.SimdOp(aligned_vars, linear_vars, linear_step_vars, nontemporal_vars, private_vars, reduction_vars, *, alignments=None, if_expr=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, safelen=None, simdlen=None, loc=None, ip=None)¶
Bases:
_ods_irThe simd construct can be applied to a loop to indicate that the loop can be transformed into a SIMD loop (that is, multiple iterations of the loop can be executed concurrently using SIMD instructions).
The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an
omp.loop_nest.omp.simd <clauses> { omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield } }When an if clause is present and evaluates to false, the preferred number of iterations to be executed concurrently is one, regardless of whether a simdlen clause is specified.
The
alignmentsattribute additionally specifies alignment of each corresponding aligned operand. Note thataligned_varsandalignmentsmust contain the same number of elements.The
linear_step_varsoperand additionally specifies the step for each associated linear operand. Note that thelinear_varsandlinear_step_varsvariadic lists should contain the same number of elements.The optional
nontemporalattribute specifies variables which have low temporal locality across the iterations where they are accessed.The optional
orderattribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.The
safelenclause specifies that no two concurrent iterations within a SIMD chunk can have a distance in the logical iteration space that is greater than or equal to the value given in the clause.When a
simdlenclause is present, the preferred number of iterations to be executed concurrently is the value provided to thesimdlenclause.- OPERATION_NAME = 'omp.simd'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- aligned_vars() _ods_ir¶
- if_expr() _ods_ir | None¶
- linear_vars() _ods_ir¶
- linear_step_vars() _ods_ir¶
- nontemporal_vars() _ods_ir¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- alignments() _ods_ir | None¶
- order() _ods_ir | None¶
- order_mod() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- safelen() _ods_ir | None¶
- simdlen() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.simd(aligned_vars, linear_vars, linear_step_vars, nontemporal_vars, private_vars, reduction_vars, *, alignments=None, if_expr=None, order=None, order_mod=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, safelen=None, simdlen=None, loc=None, ip=None) SimdOp¶
- class mlir.dialects._omp_ops_gen.SingleOp(allocate_vars, allocator_vars, copyprivate_vars, private_vars, *, copyprivate_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None)¶
Bases:
_ods_irThe single construct specifies that the associated structured block is executed by only one of the threads in the team (not necessarily the master thread), in the context of its implicit task. The other threads in the team, which do not execute the block, wait at an implicit barrier at the end of the single construct.
The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.If
copyprivatevariables and functions are specified, then each thread variable is updated with the variable value of the thread that executed the single region, using the specified copy functions.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.- OPERATION_NAME = 'omp.single'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- copyprivate_vars() _ods_ir¶
- private_vars() _ods_ir¶
- copyprivate_syms() _ods_ir | None¶
- nowait() bool¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.single(allocate_vars, allocator_vars, copyprivate_vars, private_vars, *, copyprivate_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, loc=None, ip=None) SingleOp¶
- class mlir.dialects._omp_ops_gen.TargetAllocMemOp(result, device, in_type, typeparams, shape, *, uniq_name=None, bindc_name=None, loc=None, ip=None)¶
Bases:
_ods_irAllocates memory on the specified OpenMP device for an object of the given type. Returns an integer value representing the device pointer to the allocated memory. The memory is uninitialized after allocation. Operations must be paired with
omp.target_freememto avoid memory leaks.$device: The integer ID of the OpenMP device where the memory will be allocated.$in_type: The type of the object for which memory is being allocated.
For arrays, this can be a static or dynamic array type. *
$uniq_name: An optional unique name for the allocated memory. *$bindc_name: An optional name used for C interoperability. *$typeparams: Runtime type parameters for polymorphic or parameterized types. These are typically integer values that define aspects of a type not fixed at compile time. *$shape: Runtime shape operands for dynamic arrays. Each operand is an integer value representing the extent of a specific dimension.// Allocate a static 3x3 integer vector on device 0 %device_0 = arith.constant 0 : i32 %ptr_static = omp.target_allocmem %device_0 : i32, vector<3x3xi32> // ... use %ptr_static ... omp.target_freemem %device_0, %ptr_static : i32, i64 // Allocate a dynamic 2D Fortran array (fir.array) on device 1 %device_1 = arith.constant 1 : i32 %rows = arith.constant 10 : index %cols = arith.constant 20 : index %ptr_dynamic = omp.target_allocmem %device_1 : i32, !fir.array<?x?xf32>, %rows, %cols : index, index // ... use %ptr_dynamic ... omp.target_freemem %device_1, %ptr_dynamic : i32, i64
- OPERATION_NAME = 'omp.target_allocmem'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- device() _ods_ir¶
- typeparams() _ods_ir¶
- shape() _ods_ir¶
- in_type() _ods_ir¶
- uniq_name() _ods_ir | None¶
- bindc_name() _ods_ir | None¶
- mlir.dialects._omp_ops_gen.target_allocmem(result, device, in_type, typeparams, shape, *, uniq_name=None, bindc_name=None, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._omp_ops_gen.TargetDataOp(map_vars, use_device_addr_vars, use_device_ptr_vars, *, device=None, if_expr=None, loc=None, ip=None)¶
Bases:
_ods_irMap variables to a device data environment for the extent of the region.
The omp target data directive maps variables to a device data environment, and defines the lexical scope of the data environment that is created. The omp target data directive can reduce data copies to and from the offloading device when multiple target regions are using the same data.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.The optional
deviceparameter specifies the device number for the target region.The optional
map_varsmaps data from the current task’s data environment to the device data environment.The optional
use_device_addr_varsspecifies the address of the objects in the device data environment.The optional
use_device_ptr_varsspecifies the device pointers to the corresponding list items in the device data environment.- OPERATION_NAME = 'omp.target_data'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- device() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- map_vars() _ods_ir¶
- use_device_addr_vars() _ods_ir¶
- use_device_ptr_vars() _ods_ir¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.target_data(map_vars, use_device_addr_vars, use_device_ptr_vars, *, device=None, if_expr=None, loc=None, ip=None) TargetDataOp¶
- class mlir.dialects._omp_ops_gen.TargetEnterDataOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶
Bases:
_ods_irThe target enter data directive specifies that variables are mapped to a device data environment. The target enter data directive is a stand-alone directive.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.The optional
deviceparameter specifies the device number for the target region.The optional
map_varsmaps data from the current task’s data environment to the device data environment.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.- OPERATION_NAME = 'omp.target_enter_data'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- depend_vars() _ods_ir¶
- device() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- map_vars() _ods_ir¶
- depend_kinds() _ods_ir | None¶
- nowait() bool¶
- mlir.dialects._omp_ops_gen.target_enter_data(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) TargetEnterDataOp¶
- class mlir.dialects._omp_ops_gen.TargetExitDataOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶
Bases:
_ods_irThe target exit data directive specifies that variables are mapped to a device data environment. The target exit data directive is a stand-alone directive.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.The optional
deviceparameter specifies the device number for the target region.The optional
map_varsmaps data from the current task’s data environment to the device data environment.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.- OPERATION_NAME = 'omp.target_exit_data'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- depend_vars() _ods_ir¶
- device() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- map_vars() _ods_ir¶
- depend_kinds() _ods_ir | None¶
- nowait() bool¶
- mlir.dialects._omp_ops_gen.target_exit_data(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) TargetExitDataOp¶
- class mlir.dialects._omp_ops_gen.TargetFreeMemOp(device, heapref, *, loc=None, ip=None)¶
Bases:
_ods_irDeallocates memory on the specified OpenMP device that was previously allocated by an
omp.target_allocmemoperation. After this operation, the deallocated memory is in an undefined state and should not be accessed. It is crucial to ensure that all accesses to the memory region are completed beforeomp.target_freememis called to avoid undefined behavior.$device: The integer ID of the OpenMP device from which the memory will be freed.$heapref: The integer value representing the device pointer to the memory
to be deallocated, which was previously returned by
omp.target_allocmem.// Example of allocating and freeing memory on an OpenMP device %device_id = arith.constant 0 : i32 %allocated_ptr = omp.target_allocmem %device_id : i32, vector<3x3xi32> // ... operations using %allocated_ptr on the device ... omp.target_freemem %device_id, %allocated_ptr : i32, i64
- OPERATION_NAME = 'omp.target_freemem'¶
- _ODS_REGIONS = (0, True)¶
- device() _ods_ir¶
- heapref() _ods_ir¶
- mlir.dialects._omp_ops_gen.target_freemem(device, heapref, *, loc=None, ip=None) TargetFreeMemOp¶
- class mlir.dialects._omp_ops_gen.TargetOp(allocate_vars, allocator_vars, depend_vars, has_device_addr_vars, host_eval_vars, in_reduction_vars, is_device_ptr_vars, map_vars, private_vars, *, bare=None, depend_kinds=None, device=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, thread_limit=None, private_maps=None, loc=None, ip=None)¶
Bases:
_ods_irThe target construct includes a region of code which is to be executed on a device.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.The
private_mapsattribute connectsprivateoperands to their correspondingmapoperands. Forprivateoperands that require a map, the value of the corresponding element in the attribute is the index of themapoperand (relative to othermapoperands not the whole operands of the operation). Forprivateopernads that do not require a map, this value is -1 (which is omitted from the assembly foramt printing).The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.ompx_bareallowsomp target teamsto be executed on a GPU with an explicit number of teams and threads. This clause also allows the teams and threads sizes to have up to 3 dimensions.The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.The optional
deviceparameter specifies the device number for the target region.The optional
has_device_addr_varsindicates that list items already have device addresses, so they may be directly accessed from the target device. This includes array sections.The optional
host_eval_varsholds values defined outside of the region of theIsolatedFromAboveoperation for which a corresponding entry block argument is defined. The only legal uses for these captured values are the following:num_teamsorthread_limitclause of an immediately nested
omp.teamsoperation. * If the operation is the top-levelomp.targetof a target SPMD kernel: *num_threadsclause of the nestedomp.paralleloperation. * Bounds and steps of the nestedomp.loop_nestoperation.The optional
is_device_ptr_varsindicates list items are device pointers.The optional
map_varsmaps data from the current task’s data environment to the device data environment.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.The optional
thread_limitspecifies the limit on the number of threads.- OPERATION_NAME = 'omp.target'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- depend_vars() _ods_ir¶
- device() _ods_ir | None¶
- has_device_addr_vars() _ods_ir¶
- host_eval_vars() _ods_ir¶
- if_expr() _ods_ir | None¶
- in_reduction_vars() _ods_ir¶
- is_device_ptr_vars() _ods_ir¶
- map_vars() _ods_ir¶
- private_vars() _ods_ir¶
- thread_limit() _ods_ir | None¶
- bare() bool¶
- depend_kinds() _ods_ir | None¶
- in_reduction_byref() _ods_ir | None¶
- in_reduction_syms() _ods_ir | None¶
- nowait() bool¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- private_maps() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.target(allocate_vars, allocator_vars, depend_vars, has_device_addr_vars, host_eval_vars, in_reduction_vars, is_device_ptr_vars, map_vars, private_vars, *, bare=None, depend_kinds=None, device=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, nowait=None, private_syms=None, private_needs_barrier=None, thread_limit=None, private_maps=None, loc=None, ip=None) TargetOp¶
- class mlir.dialects._omp_ops_gen.TargetUpdateOp(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None)¶
Bases:
_ods_irThe target update directive makes the corresponding list items in the device data environment consistent with their original list items, according to the specified motion clauses. The target update construct is a stand-alone directive.
The optional
if_exprparameter specifies a boolean result of a conditional check. If this value is 1 or is not provided then the target region runs on a device, if it is 0 then the target region is executed on the host device.We use
MapInfoOpto model the motion clauses and their modifiers. Even though the spec differentiates between map-types & map-type-modifiers vs. motion-clauses & motion-modifiers, the motion clauses and their modifiers are a subset of map types and their modifiers. The subset relation is handled in during verification to make sure the restrictions for target update are respected.The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.The optional
deviceparameter specifies the device number for the target region.The optional
map_varsmaps data from the current task’s data environment to the device data environment.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.- OPERATION_NAME = 'omp.target_update'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- depend_vars() _ods_ir¶
- device() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- map_vars() _ods_ir¶
- depend_kinds() _ods_ir | None¶
- nowait() bool¶
- mlir.dialects._omp_ops_gen.target_update(depend_vars, map_vars, *, depend_kinds=None, device=None, if_expr=None, nowait=None, loc=None, ip=None) TargetUpdateOp¶
- class mlir.dialects._omp_ops_gen.TaskOp(allocate_vars, allocator_vars, depend_vars, in_reduction_vars, private_vars, *, depend_kinds=None, final=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, priority=None, private_syms=None, private_needs_barrier=None, untied=None, event_handle=None, loc=None, ip=None)¶
Bases:
_ods_irThe task construct defines an explicit task.
For definitions of “undeferred task”, “included task”, “final task” and “mergeable task”, please check OpenMP Specification.
When an
ifclause is present on a task construct, and the value ofif_exprevaluates tofalse, an “undeferred task” is generated, and the encountering thread must suspend the current task region, for which execution cannot be resumed until execution of the structured block that is associated with the generated task is completed.The
in_reductionclause specifies that this particular task (among all the tasks in current taskgroup, if any) participates in a reduction.in_reduction_byrefindicates whether each reduction variable should be passed by value or by reference.The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.When a
finalclause is present and thefinalclause expression evaluates totrue, the generated tasks will be final tasks. All task constructs encountered during execution of a final task will generate final and included tasks. The use of a variable in afinalclause expression causes an implicit reference to the variable in all enclosing constructs.When the
mergeableclause is present, the tasks generated by the construct are “mergeable tasks”.The
priorityclause is a hint for the priority of the generated tasks. Thepriorityis a non-negative integer expression that provides a hint for task execution order. Among all tasks ready to be executed, higher priority tasks (those with a higher numerical value in the priority clause expression) are recommended to execute before lower priority ones. The default priority-value when no priority clause is specified should be assumed to be zero (the lowest priority).If the
untiedclause is present on a task construct, any thread in the team can resume the task region after a suspension. Theuntiedclause is ignored if afinalclause is present on the same task construct and thefinalexpression evaluates totrue, or if a task is an included task.The detach clause specifies that the task generated by the construct on which it appears is a
detachable task. A new allow-completion event is created and connected to the completion of the associated task region. The original event-handle is updated to represent that allow-completion event before the task data environment is created.
- OPERATION_NAME = 'omp.task'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- depend_vars() _ods_ir¶
- final() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- in_reduction_vars() _ods_ir¶
- priority() _ods_ir | None¶
- private_vars() _ods_ir¶
- event_handle() _ods_ir | None¶
- depend_kinds() _ods_ir | None¶
- in_reduction_byref() _ods_ir | None¶
- in_reduction_syms() _ods_ir | None¶
- mergeable() bool¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- untied() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.task(allocate_vars, allocator_vars, depend_vars, in_reduction_vars, private_vars, *, depend_kinds=None, final=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, priority=None, private_syms=None, private_needs_barrier=None, untied=None, event_handle=None, loc=None, ip=None) TaskOp¶
- class mlir.dialects._omp_ops_gen.TaskgroupOp(allocate_vars, allocator_vars, task_reduction_vars, *, task_reduction_byref=None, task_reduction_syms=None, loc=None, ip=None)¶
Bases:
_ods_irThe taskgroup construct specifies a wait on completion of child tasks of the current task and their descendent tasks.
When a thread encounters a taskgroup construct, it starts executing the region. All child tasks generated in the taskgroup region and all of their descendants that bind to the same parallel region as the taskgroup region are part of the taskgroup set associated with the taskgroup region. There is an implicit task scheduling point at the end of the taskgroup region. The current task is suspended at the task scheduling point until all tasks in the taskgroup set complete execution.
The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The
task_reductionclause specifies a reduction among tasks. For each list item, the number of copies is unspecified. Any copies associated with the reduction are initialized before they are accessed by the tasks participating in the reduction. After the end of the region, the original list item contains the result of the reduction. Similarly to thereductionclause, accumulator variables must be passed intask_reduction_vars, symbols referring to reduction declarations in thetask_reduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference intask_reduction_byref.- OPERATION_NAME = 'omp.taskgroup'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- task_reduction_vars() _ods_ir¶
- task_reduction_byref() _ods_ir | None¶
- task_reduction_syms() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.taskgroup(allocate_vars, allocator_vars, task_reduction_vars, *, task_reduction_byref=None, task_reduction_syms=None, loc=None, ip=None) TaskgroupOp¶
- class mlir.dialects._omp_ops_gen.TaskloopOp(allocate_vars, allocator_vars, in_reduction_vars, private_vars, reduction_vars, *, final=None, grainsize_mod=None, grainsize=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, nogroup=None, num_tasks_mod=None, num_tasks=None, priority=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, untied=None, loc=None, ip=None)¶
Bases:
_ods_irThe taskloop construct specifies that the iterations of one or more associated loops will be executed in parallel using explicit tasks. The iterations are distributed across tasks generated by the construct and scheduled to be executed.
The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an
omp.loop_nest.omp.taskloop <clauses> { omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield } }For definitions of “undeferred task”, “included task”, “final task” and “mergeable task”, please check OpenMP Specification.
When an
ifclause is present on a taskloop construct, and if theifclause expression evaluates tofalse, undeferred tasks are generated. The use of a variable in anifclause expression of a taskloop construct causes an implicit reference to the variable in all enclosing constructs.The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.When a
finalclause is present and thefinalclause expression evaluates totrue, the generated tasks will be final tasks. All task constructs encountered during execution of a final task will generate final and included tasks. The use of a variable in afinalclause expression causes an implicit reference to the variable in all enclosing constructs.If a
grainsizeclause is present, the number of logical loop iterations assigned to each generated task is greater than or equal to the minimum of the value of the grain-size expression and the number of logical loop iterations, but less than two times the value of the grain-size expression.When the
mergeableclause is present, the tasks generated by the construct are “mergeable tasks”.By default, the taskloop construct executes as if it was enclosed in a taskgroup construct with no statements or directives outside of the taskloop construct. Thus, the taskloop construct creates an implicit taskgroup region. If the
nogroupclause is present, no implicit taskgroup region is created.If
num_tasksis specified, the taskloop construct creates as many tasks as the minimum of the num-tasks expression and the number of logical loop iterations. Each task must have at least one logical loop iteration.The
priorityclause is a hint for the priority of the generated tasks. Thepriorityis a non-negative integer expression that provides a hint for task execution order. Among all tasks ready to be executed, higher priority tasks (those with a higher numerical value in the priority clause expression) are recommended to execute before lower priority ones. The default priority-value when no priority clause is specified should be assumed to be zero (the lowest priority).Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.If the
untiedclause is present on a task construct, any thread in the team can resume the task region after a suspension. Theuntiedclause is ignored if afinalclause is present on the same task construct and thefinalexpression evaluates totrue, or if a task is an included task.If an
in_reductionclause is present on the taskloop construct, the behavior is as if each generated task was defined by a task construct on which anin_reductionclause with the same reduction operator and list items is present. Thus, the generated tasks are participants of a reduction previously defined by a reduction scoping clause. In this case, accumulator variables are specified inin_reduction_vars, symbols referring to reduction declarations inin_reduction_symsandin_reduction_byrefindicate for each reduction variable whether it should be passed by value or by reference.If a
reductionclause is present on the taskloop construct, the behavior is as if atask_reductionclause with the same reduction operator and list items was applied to the implicit taskgroup construct enclosing the taskloop construct. The taskloop construct executes as if each generated task was defined by a task construct on which anin_reductionclause with the same reduction operator and list items is present. Thus, the generated tasks are participants of the reduction defined by thetask_reductionclause that was applied to the implicit taskgroup construct.- OPERATION_NAME = 'omp.taskloop'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- final() _ods_ir | None¶
- grainsize() _ods_ir | None¶
- if_expr() _ods_ir | None¶
- in_reduction_vars() _ods_ir¶
- num_tasks() _ods_ir | None¶
- priority() _ods_ir | None¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- grainsize_mod() _ods_ir | None¶
- in_reduction_byref() _ods_ir | None¶
- in_reduction_syms() _ods_ir | None¶
- mergeable() bool¶
- nogroup() bool¶
- num_tasks_mod() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- untied() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.taskloop(allocate_vars, allocator_vars, in_reduction_vars, private_vars, reduction_vars, *, final=None, grainsize_mod=None, grainsize=None, if_expr=None, in_reduction_byref=None, in_reduction_syms=None, mergeable=None, nogroup=None, num_tasks_mod=None, num_tasks=None, priority=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, untied=None, loc=None, ip=None) TaskloopOp¶
- class mlir.dialects._omp_ops_gen.TaskwaitOp(depend_vars, *, depend_kinds=None, nowait=None, loc=None, ip=None)¶
Bases:
_ods_irThe taskwait construct specifies a wait on the completion of child tasks of the current task.
The
depend_kindsanddepend_varsarguments are variadic lists of values that specify the dependencies of this particular task in relation to other tasks.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.- OPERATION_NAME = 'omp.taskwait'¶
- _ODS_REGIONS = (0, True)¶
- depend_vars() _ods_ir¶
- depend_kinds() _ods_ir | None¶
- nowait() bool¶
- mlir.dialects._omp_ops_gen.taskwait(depend_vars, *, depend_kinds=None, nowait=None, loc=None, ip=None) TaskwaitOp¶
- class mlir.dialects._omp_ops_gen.TaskyieldOp(*, loc=None, ip=None)¶
Bases:
_ods_irThe taskyield construct specifies that the current task can be suspended in favor of execution of a different task.
- OPERATION_NAME = 'omp.taskyield'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._omp_ops_gen.taskyield(*, loc=None, ip=None) TaskyieldOp¶
- class mlir.dialects._omp_ops_gen.TeamsOp(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_teams_lower=None, num_teams_upper=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, thread_limit=None, loc=None, ip=None)¶
Bases:
_ods_irThe teams construct defines a region of code that triggers the creation of a league of teams. Once created, the number of teams remains constant for the duration of its code region.
If the
if_expris present and it evaluates tofalse, the number of teams created is one.The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The optional
num_teams_upperandnum_teams_lowerarguments specify the limit on the number of teams to be created. If only the upper bound is specified, it acts as if the lower bound was set to the same value. It is not allowed to setnum_teams_lowerifnum_teams_upperis not specified. They define a closed range, where both the lower and upper bounds are included.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.The optional
thread_limitspecifies the limit on the number of threads.- OPERATION_NAME = 'omp.teams'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- if_expr() _ods_ir | None¶
- num_teams_lower() _ods_ir | None¶
- num_teams_upper() _ods_ir | None¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- thread_limit() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.teams(allocate_vars, allocator_vars, private_vars, reduction_vars, *, if_expr=None, num_teams_lower=None, num_teams_upper=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, thread_limit=None, loc=None, ip=None) TeamsOp¶
- class mlir.dialects._omp_ops_gen.TerminatorOp(*, loc=None, ip=None)¶
Bases:
_ods_irA terminator operation for regions that appear in the body of OpenMP operation. These regions are not expected to return any value so the terminator takes no operands. The terminator op returns control to the enclosing op.
- OPERATION_NAME = 'omp.terminator'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._omp_ops_gen.terminator(*, loc=None, ip=None) TerminatorOp¶
- class mlir.dialects._omp_ops_gen.ThreadprivateOp(tls_addr, sym_addr, *, loc=None, ip=None)¶
Bases:
_ods_irThe threadprivate directive specifies that variables are replicated, with each thread having its own copy.
The current implementation uses the OpenMP runtime to provide thread-local storage (TLS). Using the TLS feature of the LLVM IR will be supported in future.
This operation takes in the address of a symbol that represents the original variable and returns the address of its TLS. All occurrences of threadprivate variables in a parallel region should use the TLS returned by this operation.
The
sym_addrrefers to the address of the symbol, which is a pointer to the original variable.- OPERATION_NAME = 'omp.threadprivate'¶
- _ODS_REGIONS = (0, True)¶
- sym_addr() _ods_ir¶
- tls_addr() _ods_ir¶
- mlir.dialects._omp_ops_gen.threadprivate(tls_addr, sym_addr, *, loc=None, ip=None) _ods_ir¶
- class mlir.dialects._omp_ops_gen.TileOp(generatees, applyees, sizes, *, loc=None, ip=None)¶
Bases:
_ods_irRepresents the OpenMP tile directive introduced in OpenMP 5.1.
The construct partitions the logical iteration space of the affected loops into equally-sized tiles, then creates two sets of nested loops. The outer loops, called the grid loops, iterate over all tiles. The inner loops, called the intratile loops, iterate over the logical iterations of a tile. The sizes clause determines the size of a tile.
Currently, the affected loops must be rectangular (the tripcount of the inner loop must not depend on any iv of an surrounding affected loop) and perfectly nested (except for the innermost affected loop, no operations other than the nested loop and the terminator in the loop body).
The
sizesclauses defines the size of a grid over a multi-dimensional logical iteration space. This grid is used for loop transformations such astileandstrip. The size per dimension can be a variable, but only values that are not at least 2 make sense. It is not specified what happens when smaller values are used, but should still result in a loop nest that executes each logical iteration once.- OPERATION_NAME = 'omp.tile'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (0, True)¶
- generatees() _ods_ir¶
- applyees() _ods_ir¶
- sizes() _ods_ir¶
- class mlir.dialects._omp_ops_gen.UnrollHeuristicOp(applyee, *, loc=None, ip=None)¶
Bases:
_ods_irRepresents a
#pragma omp unrollconstruct introduced in OpenMP 5.1.The operation has one applyee and no generatees. The applyee is unrolled according to implementation-defined heuristics. Implementations may choose to not unroll the loop, partially unroll by a chosen factor, or fully unroll it. Even if the implementation chooses to partially unroll the applyee, the resulting unrolled loop is not accessible as a generatee. Use omp.unroll_partial if a generatee is required.
The lowering is implemented using
OpenMPIRBuilder::unrollLoopHeuristic, which just attachesllvm.loop.unroll.enablemetadata to the loop so the unrolling is carried-out by LLVM’s LoopUnroll pass. That is, unrolling only actually performed in optimized builds.Assembly formats: omp.unroll_heuristic(%cli) omp.unroll_heuristic(%cli) -> ()
- OPERATION_NAME = 'omp.unroll_heuristic'¶
- _ODS_REGIONS = (0, True)¶
- applyee() _ods_ir¶
- mlir.dialects._omp_ops_gen.unroll_heuristic(applyee, *, loc=None, ip=None) UnrollHeuristicOp¶
- class mlir.dialects._omp_ops_gen.WorkdistributeOp(*, loc=None, ip=None)¶
Bases:
_ods_irworkdistribute divides execution of the enclosed structured block into separate units of work, each executed only once by each initial thread in the league.
!$omp target teams !$omp workdistribute y = a * x + y !$omp end workdistribute !$omp end target teams- OPERATION_NAME = 'omp.workdistribute'¶
- _ODS_REGIONS = (1, True)¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.workdistribute(*, loc=None, ip=None) WorkdistributeOp¶
Bases:
_ods_irThis operation wraps a loop nest that is marked for dividing into units of work by an encompassing omp.workshare operation.
Bases:
_ods_irThe workshare construct divides the execution of the enclosed structured block into separate units of work, and causes the threads of the team to share the work such that each unit is executed only once by one thread, in the context of its implicit task
This operation is used for the intermediate representation of the workshare block before the work gets divided between the threads. See the flang LowerWorkshare pass for details.
The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.
- class mlir.dialects._omp_ops_gen.WsloopOp(allocate_vars, allocator_vars, linear_vars, linear_step_vars, private_vars, reduction_vars, *, nowait=None, order=None, order_mod=None, ordered=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, schedule_kind=None, schedule_chunk=None, schedule_mod=None, schedule_simd=None, loc=None, ip=None)¶
Bases:
_ods_irThe worksharing-loop construct specifies that the iterations of the loop(s) will be executed in parallel by threads in the current context. These iterations are spread across threads that already exist in the enclosing parallel region.
The body region can only contain a single block which must contain a single operation. This operation must be another compatible loop wrapper or an
omp.loop_nest.omp.wsloop <clauses> { omp.loop_nest (%i1, %i2) : index = (%c0, %c0) to (%c10, %c10) step (%c1, %c1) { %a = load %arrA[%i1, %i2] : memref<?x?xf32> %b = load %arrB[%i1, %i2] : memref<?x?xf32> %sum = arith.addf %a, %b : f32 store %sum, %arrC[%i1, %i2] : memref<?x?xf32> omp.yield } }The
allocator_varsandallocate_varsparameters are a variadic list of values that specify the memory allocator to be used to obtain storage for private values.The
linear_step_varsoperand additionally specifies the step for each associated linear operand. Note that thelinear_varsandlinear_step_varsvariadic lists should contain the same number of elements.The optional
nowaitattribute, when present, eliminates the implicit barrier at the end of the construct, so the parent operation can make progress even if the child operation has not completed yet.The optional
orderattribute specifies which order the iterations of the associated loops are executed in. Currently the only option for this attribute is “concurrent”.The optional
orderedattribute specifies how many loops are associated with the worksharing-loop construct. The value of zero refers to the ordered clause specified without parameter.Reductions can be performed by specifying the reduction modifer (
default,inscanortask) inreduction_mod, reduction accumulator variables inreduction_vars, symbols referring to reduction declarations in thereduction_symsattribute, and whether the reduction variable should be passed into the reduction region by value or by reference inreduction_byref. Each reduction is identified by the accumulator it uses and accumulators must not be repeated in the same reduction. A private variable corresponding to the accumulator is used in place of the accumulator inside the body of the operation. The reduction declaration specifies how to combine the values from each iteration, section, team, thread or simd lane defined by the operation’s region into the final value, which is available in the accumulator after they all complete.The optional
schedule_kindattribute specifies the loop schedule for this loop, determining how the loop is distributed across the parallel threads. The optionalschedule_chunkassociated with this determines further controls this distribution.- OPERATION_NAME = 'omp.wsloop'¶
- _ODS_OPERAND_SEGMENTS¶
- _ODS_REGIONS = (1, True)¶
- allocate_vars() _ods_ir¶
- allocator_vars() _ods_ir¶
- linear_vars() _ods_ir¶
- linear_step_vars() _ods_ir¶
- private_vars() _ods_ir¶
- reduction_vars() _ods_ir¶
- schedule_chunk() _ods_ir | None¶
- nowait() bool¶
- order() _ods_ir | None¶
- order_mod() _ods_ir | None¶
- ordered() _ods_ir | None¶
- private_syms() _ods_ir | None¶
- private_needs_barrier() bool¶
- reduction_mod() _ods_ir | None¶
- reduction_byref() _ods_ir | None¶
- reduction_syms() _ods_ir | None¶
- schedule_kind() _ods_ir | None¶
- schedule_mod() _ods_ir | None¶
- schedule_simd() bool¶
- region() _ods_ir¶
- mlir.dialects._omp_ops_gen.wsloop(allocate_vars, allocator_vars, linear_vars, linear_step_vars, private_vars, reduction_vars, *, nowait=None, order=None, order_mod=None, ordered=None, private_syms=None, private_needs_barrier=None, reduction_mod=None, reduction_byref=None, reduction_syms=None, schedule_kind=None, schedule_chunk=None, schedule_mod=None, schedule_simd=None, loc=None, ip=None) WsloopOp¶
- class mlir.dialects._omp_ops_gen.YieldOp(results_, *, loc=None, ip=None)¶
Bases:
_ods_ir“omp.yield” yields SSA values from the OpenMP dialect op region and terminates the region. The semantics of how the values are yielded is defined by the parent operation.
- OPERATION_NAME = 'omp.yield'¶
- _ODS_REGIONS = (0, True)¶
- results_() _ods_ir¶