opencv/doc/tutorials/gpu/gpu-basics-similarity/gpu-basics-similarity.rst

.. _gpuBasicsSimilarity:

Similarity check (PNSR and SSIM) on the GPU
*******************************************

Goal
====

In the :ref:`videoInputPSNRMSSIM` tutorial I already presented the PSNR and SSIM methods for checking the similarity between the two images. And as you could see there performing these takes quite some time, especially in the case of the SSIM. However, if the performance numbers of an OpenCV implementation for the CPU do not satisfy you and you happen to have an NVidia CUDA GPU device in your system all is not lost. You may try to port or write your algorithm for the video card. 

This tutorial will give a good grasp on how to approach coding by using the GPU module of OpenCV. As a prerequisite you should already know how to handle the core, highgui and imgproc modules. So, our goals are: 

.. container:: enumeratevisibleitemswithsquare

   + What's different compared to the CPU?
   + Create the GPU code for the PSNR and SSIM 
   + Optimize the code for maximal performance

The source code
===============

You may also find the source code and these video file in the :file:`samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity` folder of the OpenCV source library or :download:`download it from here <../../../../samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp>`. The full source code is quite long (due to the controlling of the application via the command line arguments and performance measurement). Therefore, to avoid cluttering up these sections with those you'll find here only the functions itself. 

The PSNR returns a float number, that if the two inputs are similar between 30 and 50 (higher is better). 

.. literalinclude:: ../../../../samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp
   :language: cpp
   :linenos:
   :tab-width: 4
   :lines: 165-210, 18-23, 210-235

The SSIM returns the MSSIM of the images. This is too a float number between zero and one (higher is better), however we have one for each channel. Therefore, we return a *Scalar* OpenCV data structure:

.. literalinclude:: ../../../../samples/cpp/tutorial_code/gpu/gpu-basics-similarity/gpu-basics-similarity.cpp
   :language: cpp
   :linenos:
   :tab-width: 4
   :lines: 235-355, 26-42, 357-

How to do it? - The GPU
=======================

Now as you can see we have three types of functions for each operation. One for the CPU and two for the GPU. The reason I made two for the GPU is too illustrate that often simple porting your CPU to GPU will actually make it slower. If you want some performance gain you will need to remember a few rules, whose I'm going to detail later on.

The development of the GPU module was made so that it resembles as much as possible its CPU counterpart. This is to make porting easy. The first thing you need to do before writing any code is to link the GPU module to your project, and include the header file for the module. All the functions and data structures of the GPU are in a *gpu* sub namespace of the *cv* namespace. You may add this to the default one via the *use namespace* keyword, or mark it everywhere explicitly via the cv:: to avoid confusion. I'll do the later. 

.. code-block:: cpp

   #include <opencv2/gpu/gpu.hpp>        // GPU structures and methods

GPU stands for **g**\ raphics **p**\ rocessing **u**\ nit. It was originally build to render graphical scenes. These scenes somehow build on a lot of data. Nevertheless, these aren't all dependent one from another in a sequential way and as it is possible a parallel processing of them. Due to this a GPU will contain multiple smaller processing units. These aren't the state of the art processors and on a one on one test with a CPU it will fall behind. However, its strength lies in its numbers. In the last years there has been an increasing trend to harvest these massive parallel powers of the GPU in non-graphical scene rendering too. This gave birth to the general-purpose computation on graphics processing units (GPGPU). 

The GPU has its own memory. When you read d