mlir.dialects._nvgpu_transform_ops_gen
======================================

.. py:module:: mlir.dialects._nvgpu_transform_ops_gen


Attributes
----------

.. autoapisummary::

   mlir.dialects._nvgpu_transform_ops_gen._ods_ir


Classes
-------

.. autoapisummary::

   mlir.dialects._nvgpu_transform_ops_gen.ApplyNVGPUToNVVMConversionPatternsOp
   mlir.dialects._nvgpu_transform_ops_gen.CreateAsyncGroupsOp
   mlir.dialects._nvgpu_transform_ops_gen.PipelineSharedMemoryCopiesOp
   mlir.dialects._nvgpu_transform_ops_gen.RewriteCopyAsTmaOp
   mlir.dialects._nvgpu_transform_ops_gen.RewriteMatmulAsMmaSyncOp


Functions
---------

.. autoapisummary::

   mlir.dialects._nvgpu_transform_ops_gen.apply_conversion_patterns_nvgpu_nvgpu_to_nvvm
   mlir.dialects._nvgpu_transform_ops_gen.nvgpu_create_async_groups
   mlir.dialects._nvgpu_transform_ops_gen.nvgpu_pipeline_shared_memory_copies
   mlir.dialects._nvgpu_transform_ops_gen.nvgpu_rewrite_copy_as_tma
   mlir.dialects._nvgpu_transform_ops_gen.nvgpu_rewrite_matmul_as_mma_sync


Module Contents
---------------

.. py:data:: _ods_ir

.. py:class:: ApplyNVGPUToNVVMConversionPatternsOp(*, loc=None, ip=None)

   Bases: :py:obj:`_ods_ir`


   Collects patterns that convert NVGPU dialect ops to NVVM dialect ops. These
   patterns require an "LLVMTypeConverter".


   .. py:attribute:: OPERATION_NAME
      :value: 'transform.apply_conversion_patterns.nvgpu.nvgpu_to_nvvm'


   .. py:attribute:: _ODS_REGIONS
      :value: (0, True)


.. py:function:: apply_conversion_patterns_nvgpu_nvgpu_to_nvvm(*, loc=None, ip=None) -> ApplyNVGPUToNVVMConversionPatternsOp

.. py:class:: CreateAsyncGroupsOp(result, target, *, bypass_l1=None, loc=None, ip=None)

   Bases: :py:obj:`_ods_ir`


   Look for global to shared memory copies within the targeted op in the form
   of vector transfer ops and convert them to async copies when possible.
   Consecutive copies are put into the same group. A "wait" operation is
   inserted right at the of end the group.

   ``bypass_l1`` specifies whether ``bypassL1`` attributes should be added to
   the async copies. ``bypass_l1`` is a compiler hint: only 16 byte transfers
   can bypass the L1 cache, so this attribute is not set for any other transfer
   sizes.

   Return modes
   ------------

   This op consumes the ``target`` handle and produces the ``result`` handle, which
   is mapped to the same payload operations as the ``target`` handle. The op
   modifies the payload.


   .. py:attribute:: OPERATION_NAME
      :value: 'transform.nvgpu.create_async_groups'


   .. py:attribute:: _ODS_REGIONS
      :value: (0, True)


   .. py:method:: target() -> _ods_ir


   .. py:method:: bypass_l1() -> bool


   .. py:method:: result() -> _ods_ir

      
      Shortcut to get an op result if it has only one (throws an error otherwise).


.. py:function:: nvgpu_create_async_groups(result, target, *, bypass_l1=None, loc=None, ip=None) -> _ods_ir

.. py:class:: PipelineSharedMemoryCopiesOp(result, for_op, depth, *, peel_epilogue=None, failure_propagation_mode=None, loc=None, ip=None)

   Bases: :py:obj:`_ods_ir`


   Applies software pipelining to a given scf.for loop. The pipelining
   strategy will look for a load into shared memory and pipeline it to overlap
   it with the rest of the loop.

   NOTE: It is user responsibility to ensure that there are no dependency
   between ``depth`` iterations of the loop by using multi-buffering. It is
   also user responsibility to ensure a sufficient amount of shared memory
   is allocated to cover eventual writes by ``depth-1`` speculative
   iterations.

   ``depth`` will indicate how many stages the software pipeline should have.
   ``peel_epilogue`` allows to force the epilogue to be peeled out instead of
   potentially using predicated operations for the epilogue phase.

   Return modes
   ------------

   Consumes the operand handle and produces a result handle pointing to the
   loop, which may or may not have been pipelined. Produces a definite failure
   if the loop pipeliner mutated the IR before failing to pipeline, in
   particular if ``peel_epilogue`` is not set and the loop body doesn't support
   predication. If failure propagation mode is set to "propagate", produces a
   silenceable failure when pipelining preconditions, e.g., loop bound being
   static, are not met or when the loop wasn't pipelined because due to the
   lack of loads into shared memory. If the failure propagation mode is set
   to "suppress" (default), succeeds in these case and associates the result
   handle with the original loop.

   TODO: the shared memory part and behavior specific to NVGPU should be
   made orthogonal to pipelining so that ``transform.loop.pipeline`` becomes
   usable here.


   .. py:attribute:: OPERATION_NAME
      :value: 'transform.nvgpu.pipeline_shared_memory_copies'


   .. py:attribute:: _ODS_REGIONS
      :value: (0, True)


   .. py:method:: for_op() -> _ods_ir


   .. py:method:: depth() -> _ods_ir


   .. py:method:: peel_epilogue() -> bool


   .. py:method:: failure_propagation_mode() -> _ods_ir


   .. py:method:: result() -> _ods_ir

      
      Shortcut to get an op result if it has only one (throws an error otherwise).


.. py:function:: nvgpu_pipeline_shared_memory_copies(result, for_op, depth, *, peel_epilogue=None, failure_propagation_mode=None, loc=None, ip=None) -> _ods_ir

.. py:class:: RewriteCopyAsTmaOp(target, *, loc=None, ip=None)

   Bases: :py:obj:`_ods_ir`


   Rewrite a copy operation on memref to tma operations that transit through
   shared memory.


   .. py:attribute:: OPERATION_NAME
      :value: 'transform.nvgpu.rewrite_copy_as_tma'


   .. py:attribute:: _ODS_REGIONS
      :value: (0, True)


   .. py:method:: target() -> _ods_ir


.. py:function:: nvgpu_rewrite_copy_as_tma(target, *, loc=None, ip=None) -> RewriteCopyAsTmaOp

.. py:class:: RewriteMatmulAsMmaSyncOp(target, *, loc=None, ip=None)

   Bases: :py:obj:`_ods_ir`


   Rewrite a matmul operation on memref to an mma.sync operation on vectors.

   Memory copies with the required access patterns are automatically inserted.
   Operations that do not have a 1-1 mapping to mma.sync operations are left
   unchanged.


   .. py:attribute:: OPERATION_NAME
      :value: 'transform.nvgpu.rewrite_matmul_as_mma_sync'


   .. py:attribute:: _ODS_REGIONS
      :value: (0, True)


   .. py:method:: target() -> _ods_ir


.. py:function:: nvgpu_rewrite_matmul_as_mma_sync(target, *, loc=None, ip=None) -> RewriteMatmulAsMmaSyncOp