mlir.dialects._nvgpu_transform_ops_gen¶
Attributes¶
Classes¶
Collects patterns that convert NVGPU dialect ops to NVVM dialect ops. These |
|
Look for global to shared memory copies within the targeted op in the form |
|
Applies software pipelining to a given scf.for loop. The pipelining |
|
Rewrite a copy operation on memref to tma operations that transit through |
|
Rewrite a matmul operation on memref to an mma.sync operation on vectors. |
Functions¶
|
|
|
|
|
|
Module Contents¶
- mlir.dialects._nvgpu_transform_ops_gen._ods_ir¶
- class mlir.dialects._nvgpu_transform_ops_gen.ApplyNVGPUToNVVMConversionPatternsOp(*, loc=None, ip=None)¶
Bases:
_ods_irCollects patterns that convert NVGPU dialect ops to NVVM dialect ops. These patterns require an “LLVMTypeConverter”.
- OPERATION_NAME = 'transform.apply_conversion_patterns.nvgpu.nvgpu_to_nvvm'¶
- _ODS_REGIONS = (0, True)¶
- mlir.dialects._nvgpu_transform_ops_gen.apply_conversion_patterns_nvgpu_nvgpu_to_nvvm(*, loc=None, ip=None) ApplyNVGPUToNVVMConversionPatternsOp¶
- class mlir.dialects._nvgpu_transform_ops_gen.CreateAsyncGroupsOp(result, target, *, bypass_l1=None, loc=None, ip=None)¶
Bases:
_ods_irLook for global to shared memory copies within the targeted op in the form of vector transfer ops and convert them to async copies when possible. Consecutive copies are put into the same group. A “wait” operation is inserted right at the of end the group.
bypass_l1specifies whetherbypassL1attributes should be added to the async copies.bypass_l1is a compiler hint: only 16 byte transfers can bypass the L1 cache, so this attribute is not set for any other transfer sizes.Return modes¶
This op consumes the
targethandle and produces theresulthandle, which is mapped to the same payload operations as thetargethandle. The op modifies the payload.- OPERATION_NAME = 'transform.nvgpu.create_async_groups'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- bypass_l1() bool¶
- result() _ods_ir¶
Shortcut to get an op result if it has only one (throws an error otherwise).
- mlir.dialects._nvgpu_transform_ops_gen.nvgpu_create_async_groups(result, target, *, bypass_l1=None, loc=None, ip=None) _ods_ir¶
Bases:
_ods_irApplies software pipelining to a given scf.for loop. The pipelining strategy will look for a load into shared memory and pipeline it to overlap it with the rest of the loop.
NOTE: It is user responsibility to ensure that there are no dependency between
depthiterations of the loop by using multi-buffering. It is also user responsibility to ensure a sufficient amount of shared memory is allocated to cover eventual writes bydepth-1speculative iterations.depthwill indicate how many stages the software pipeline should have.peel_epilogueallows to force the epilogue to be peeled out instead of potentially using predicated operations for the epilogue phase.Return modes¶
Consumes the operand handle and produces a result handle pointing to the loop, which may or may not have been pipelined. Produces a definite failure if the loop pipeliner mutated the IR before failing to pipeline, in particular if
peel_epilogueis not set and the loop body doesn’t support predication. If failure propagation mode is set to “propagate”, produces a silenceable failure when pipelining preconditions, e.g., loop bound being static, are not met or when the loop wasn’t pipelined because due to the lack of loads into shared memory. If the failure propagation mode is set to “suppress” (default), succeeds in these case and associates the result handle with the original loop.TODO: the shared memory part and behavior specific to NVGPU should be made orthogonal to pipelining so that
transform.loop.pipelinebecomes usable here.Shortcut to get an op result if it has only one (throws an error otherwise).
- class mlir.dialects._nvgpu_transform_ops_gen.RewriteCopyAsTmaOp(target, *, loc=None, ip=None)¶
Bases:
_ods_irRewrite a copy operation on memref to tma operations that transit through shared memory.
- OPERATION_NAME = 'transform.nvgpu.rewrite_copy_as_tma'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- mlir.dialects._nvgpu_transform_ops_gen.nvgpu_rewrite_copy_as_tma(target, *, loc=None, ip=None) RewriteCopyAsTmaOp¶
- class mlir.dialects._nvgpu_transform_ops_gen.RewriteMatmulAsMmaSyncOp(target, *, loc=None, ip=None)¶
Bases:
_ods_irRewrite a matmul operation on memref to an mma.sync operation on vectors.
Memory copies with the required access patterns are automatically inserted. Operations that do not have a 1-1 mapping to mma.sync operations are left unchanged.
- OPERATION_NAME = 'transform.nvgpu.rewrite_matmul_as_mma_sync'¶
- _ODS_REGIONS = (0, True)¶
- target() _ods_ir¶
- mlir.dialects._nvgpu_transform_ops_gen.nvgpu_rewrite_matmul_as_mma_sync(target, *, loc=None, ip=None) RewriteMatmulAsMmaSyncOp¶