.. _libdoc_cuda_dnn:

================================
:mod:`sandbox.cuda.dnn` -- cuDNN
================================

.. moduleauthor:: LISA

`cuDNN <https://developer.nvidia.com/cuDNN>`_ is an NVIDIA library with
functionality used by deep neural network. It provides optimized versions
of some operations like the convolution. cuDNN is not currently
installed with CUDA 6.5. You must download and install it
yourself.

To install it, decompress the downloaded file and make the ``*.h`` and
``*.so*`` files available to the compilation environment.
There are at least three possible ways of doing so:

- The easiest is to include them in your CUDA installation. Copy the
  ``*.h`` files to ``CUDA_ROOT/include`` and the ``*.so*`` files to
  ``CUDA_ROOT/lib64`` (by default, ``CUDA_ROOT`` is ``/usr/local/cuda``
  on Linux).
- Alternatively, on Linux, you can set the environment variables
  ``LD_LIBRARY_PATH``, ``LIBRARY_PATH`` and ``CPATH`` to the directory
  extracted from the download. If needed, separate multiple directories
  with ``:`` as in the ``PATH`` environment variable.

  example::

      export LD_LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH
      export CPATH=/home/user/path_to_CUDNN_folder/include:$CPATH
      export LIBRARY_PATH=/home/user/path_to_CUDNN_folder/lib64:$LD_LIBRARY_PATH

- And as a third way, also on Linux, you can copy the ``*.h`` files
  to ``/usr/include`` and the ``*.so*`` files to ``/lib64``.

By default, Theano will detect if it can use cuDNN. If so, it will use
it.  If not, Theano optimizations will not introduce cuDNN ops. So
Theano will still work if the user did not introduce them manually.

The recently added Theano flag :attr:`dnn.enabled
<config.dnn.enabled>` allows to change the default behavior to force
it or disable it. Older Theano version do not support this flag. To
get an error when cuDNN can not be used with them, use this flag:
``optimizer_including=cudnn``.

.. note::

   cuDNN v3 has now been released. cuDNN v2 remains supported but cuDNN v3 is
   faster and offers many more options. We recommend that everybody update to
   v3.

.. note::

   Starting in cuDNN v3, multiple convolution implementations are offered and
   it is possible to use heuristics to automatically choose a convolution
   implementation well suited to the parameters of the convolution.

   The Theano flag ``dnn.conv.algo_fwd`` allows to specify the cuDNN
   convolution implementation that Theano should use for forward convolutions.
   Possible values include :

   * ``small`` (default) : use a convolution implementation with small memory
     usage
   * ``none`` : use a slower implementation with minimal memory usage
   * ``large`` : use a sometimes faster implementation with large memory usage
   * ``fft`` : use the Fast Fourrier Transform implementation of convolution
     (very high memory usage)
   * ``fft_tiling`` : use the Fast Fourrier Transform implementation of convolution
     with tiling (high memory usage, but less then fft)
   * ``guess_once`` : the first time a convolution is executed, the
     implementation to use is chosen according to cuDNN's heuristics and reused
     for every subsequent execution of the convolution.
   * ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
     implementation selected every time the shapes of the inputs and kernels
     don't match the shapes from the last execution.
   * ``time_once`` : the first time a convolution is executed, every convolution
     implementation offered by cuDNN is executed and timed. The fastest is
     reused for every subsequent execution of the convolution.
   * ``time_on_shape_change`` : like ``time_once`` but a new convolution
     implementation selected every time the shapes of the inputs and kernels
     don't match the shapes from the last execution.

   The Theano flag ``dnn.conv.algo_bwd_filter`` and
   ``dnn.conv.algo_bwd_data`` allows to specify the cuDNN
   convolution implementation that Theano should use for gradient
   convolutions.  Possible values include :

   * ``none`` (default) : use the default non-deterministic convolution
     implementation
   * ``deterministic`` : use a slower but deterministic implementation
   * ``fft`` : use the Fast Fourrier Transform implementation of convolution
     (very high memory usage)
   * ``guess_once`` : the first time a convolution is executed, the
     implementation to use is chosen according to cuDNN's heuristics and reused
     for every subsequent execution of the convolution.
   * ``guess_on_shape_change`` : like ``guess_once`` but a new convolution
     implementation selected every time the shapes of the inputs and kernels
     don't match the shapes from the last execution.
   * ``time_once`` : the first time a convolution is executed, every convolution
     implementation offered by cuDNN is executed and timed. The fastest is
     reused for every subsequent execution of the convolution.
   * ``time_on_shape_change`` : like ``time_once`` but a new convolution
     implementation selected every time the shapes of the inputs and kernels
     don't match the shapes from the last execution.

   * (algo_bwd_data only) ``fft_tiling`` : use the Fast Fourrier
     Transform implementation of convolution with tiling (high memory
     usage, but less then fft)

   * (algo_bwd_data only) ``small`` : use a convolution implementation
     with small memory usage

   ``guess_*`` and ``time_*`` flag values take into account the amount of
   available memory when selecting an implementation. This means that slower
   implementations might be selected if not enough memory is available for the
   faster implementations.

.. note::

    Normally you should not call GPU Ops directly, but the CPU interface
    currently does not allow all options supported by cuDNN ops. So it is
    possible that you will need to call them manually.

.. note::

    The documentation of CUDNN tells that, for the 2 following operations, the
    reproducibility is not guaranteed with the default implementation:
    `cudnnConvolutionBackwardFilter` and `cudnnConvolutionBackwardData`.
    Those correspond to the gradient wrt the weights and the gradient wrt the
    input of the convolution. They are also used sometimes in the forward
    pass, when they give a speed up.

    The Theano flag ``dnn.conv.algo_bwd`` can be use to force the use of a
    slower but deterministic convolution implementation.

.. note::

    There is a problem we do not understand yet when cudnn paths are
    used with symbolic links. So avoid using that.

.. note::

    cudnn.so* must be readable and executable by everybody.
    cudnn.h must be readable by everybody.


Functions
=========

.. automodule:: theano.sandbox.cuda.dnn
   :noindex:
   :members: dnn_conv, dnn_pool

Convolution Ops
===============

.. automodule:: theano.sandbox.cuda.dnn
   :noindex:
   :members: GpuDnnConvDesc, GpuDnnConv, GpuDnnConv3d, GpuDnnConvGradW,
             GpuDnnConv3dGradW, GpuDnnConvGradI, GpuDnnConv3dGradI

Pooling Ops
===========

.. automodule:: theano.sandbox.cuda.dnn
   :noindex:
   :members: GpuDnnPoolDesc, GpuDnnPool, GpuDnnPoolGrad

Softmax Ops
===========

.. automodule:: theano.sandbox.cuda.dnn
   :noindex:
   :members: GpuDnnSoftmax, GpuDnnSoftmaxGrad
