Remove all sphinx files

pull/185/head
Maksim Shabunin 10 years ago
parent 61f36de542
commit 7d9bbdcaad
  1. 418
      doc/tutorials/bioinspired/retina_model/retina_model.rst
  2. 36
      doc/tutorials/bioinspired/table_of_content_bioinspired/table_of_content_bioinspired.rst
  3. 37
      doc/tutorials/cvv/table_of_content_cvv/table_of_content_cvv.rst
  4. 372
      doc/tutorials/cvv/visual_debugging_introduction/visual_debugging_introduction.rst
  5. 151
      doc/tutorials/ximgproc/prediction/prediction.rst
  6. 115
      doc/tutorials/ximgproc/training/training.rst
  7. 10
      modules/bioinspired/doc/bioinspired.rst
  8. 493
      modules/bioinspired/doc/retina.rst
  9. 202
      modules/ccalib/doc/customPattern.rst
  10. 11
      modules/cvv/doc/cvv.rst
  11. 85
      modules/cvv/doc/cvv_api.rst
  12. 24
      modules/cvv/doc/cvv_gui.rst
  13. 36
      modules/datasets/doc/ar_hmdb.rst
  14. 18
      modules/datasets/doc/ar_sports.rst
  15. 121
      modules/datasets/doc/datasets.rst
  16. 20
      modules/datasets/doc/fr_adience.rst
  17. 31
      modules/datasets/doc/fr_lfw.rst
  18. 22
      modules/datasets/doc/gr_chalearn.rst
  19. 20
      modules/datasets/doc/gr_skig.rst
  20. 22
      modules/datasets/doc/hpe_humaneva.rst
  21. 20
      modules/datasets/doc/hpe_parse.rst
  22. 20
      modules/datasets/doc/ir_affine.rst
  23. 20
      modules/datasets/doc/ir_robot.rst
  24. 20
      modules/datasets/doc/is_bsds.rst
  25. 20
      modules/datasets/doc/is_weizmann.rst
  26. 20
      modules/datasets/doc/msm_epfl.rst
  27. 20
      modules/datasets/doc/msm_middlebury.rst
  28. 39
      modules/datasets/doc/or_imagenet.rst
  29. 20
      modules/datasets/doc/or_mnist.rst
  30. 22
      modules/datasets/doc/or_sun.rst
  31. 29
      modules/datasets/doc/pd_caltech.rst
  32. 24
      modules/datasets/doc/slam_kitti.rst
  33. 20
      modules/datasets/doc/slam_tumindoor.rst
  34. 22
      modules/datasets/doc/tr_chars.rst
  35. 33
      modules/datasets/doc/tr_svt.rst
  36. 10
      modules/face/doc/face.rst
  37. 412
      modules/face/doc/facerec_api.rst
  38. 86
      modules/face/doc/facerec_changelog.rst
  39. 628
      modules/face/doc/facerec_tutorial.rst
  40. 31
      modules/face/doc/index.rst
  41. 233
      modules/face/doc/tutorial/facerec_gender_classification.rst
  42. 46
      modules/face/doc/tutorial/facerec_save_load.rst
  43. 207
      modules/face/doc/tutorial/facerec_video_recognition.rst
  44. 113
      modules/latentsvm/doc/latent_svm_cascade.rst
  45. 52
      modules/line_descriptor/doc/LSDDetector.rst
  46. 223
      modules/line_descriptor/doc/binary_descriptor.rst
  47. 70
      modules/line_descriptor/doc/drawing_functions.rst
  48. 142
      modules/line_descriptor/doc/line_descriptor.rst
  49. 143
      modules/line_descriptor/doc/matching.rst
  50. 428
      modules/line_descriptor/doc/tutorial.rst
  51. 116
      modules/optflow/doc/dense_optflow.rst
  52. 125
      modules/optflow/doc/motion_templates.rst
  53. 14
      modules/optflow/doc/optflow.rst
  54. 36
      modules/optflow/doc/optflow_io.rst
  55. 8
      modules/reg/doc/reg.rst
  56. 66
      modules/reg/doc/registration.rst
  57. 8
      modules/rgbd/doc/rgbd.rst
  58. 67
      modules/saliency/doc/common_interfaces_saliency.rst
  59. 66
      modules/saliency/doc/motion_saliency_algorithms.rst
  60. 82
      modules/saliency/doc/objectness_algorithms.rst
  61. 47
      modules/saliency/doc/saliency.rst
  62. 72
      modules/saliency/doc/saliency_categories.rst
  63. 81
      modules/saliency/doc/static_saliency_algorithms.rst
  64. 355
      modules/surface_matching/doc/surface_matching.rst
  65. 239
      modules/text/doc/erfilter.rst
  66. 106
      modules/text/doc/ocr.rst
  67. 13
      modules/text/doc/text.rst
  68. 290
      modules/tracking/doc/common_interfaces_tracker.rst
  69. 343
      modules/tracking/doc/common_interfaces_tracker_feature_set.rst
  70. 506
      modules/tracking/doc/common_interfaces_tracker_model.rst
  71. 347
      modules/tracking/doc/common_interfaces_tracker_sampler.rst
  72. 226
      modules/tracking/doc/tracker_algorithms.rst
  73. 63
      modules/tracking/doc/tracking.rst
  74. 40
      modules/tracking/doc/uml/Tracker.rst
  75. 60
      modules/tracking/doc/uml/TrackerFeature.rst
  76. 67
      modules/tracking/doc/uml/TrackerModel.rst
  77. 45
      modules/tracking/doc/uml/TrackerSampler.rst
  78. 15
      modules/tracking/doc/uml/package.rst
  79. 93
      modules/xfeatures2d/doc/extra_features.rst
  80. 252
      modules/xfeatures2d/doc/nonfree_features.rst
  81. 11
      modules/xfeatures2d/doc/xfeatures2d.rst
  82. 259
      modules/ximgproc/doc/edge_aware_filters.rst
  83. 67
      modules/ximgproc/doc/structured_edge_detection.rst
  84. 118
      modules/ximgproc/doc/superpixels.rst
  85. 12
      modules/ximgproc/doc/ximgproc.rst
  86. 236
      modules/xobjdetect/doc/integral_channel_features.rst
  87. 10
      modules/xobjdetect/doc/xobjdetect.rst
  88. 20
      modules/xphoto/doc/denoising.rst
  89. 20
      modules/xphoto/doc/inpainting.rst
  90. 22
      modules/xphoto/doc/whitebalance.rst
  91. 10
      modules/xphoto/doc/xphoto.rst

@ -1,418 +0,0 @@
.. _Retina_Model:
Discovering the human retina and its use for image processing
*************************************************************
Goal
=====
I present here a model of human retina that shows some interesting properties for image preprocessing and enhancement.
In this tutorial you will learn how to:
.. container:: enumeratevisibleitemswithsquare
+ discover the main two channels outing from your retina
+ see the basics to use the retina model
+ discover some parameters tweaks
General overview
================
The proposed model originates from Jeanny Herault's research [herault2010]_ at `Gipsa <http://www.gipsa-lab.inpg.fr>`_. It is involved in image processing applications with `Listic <http://www.listic.univ-savoie.fr>`_ (code maintainer and user) lab. This is not a complete model but it already present interesting properties that can be involved for enhanced image processing experience. The model allows the following human retina properties to be used :
* spectral whitening that has 3 important effects: high spatio-temporal frequency signals canceling (noise), mid-frequencies details enhancement and low frequencies luminance energy reduction. This *all in one* property directly allows visual signals cleaning of classical undesired distortions introduced by image sensors and input luminance range.
* local logarithmic luminance compression allows details to be enhanced even in low light conditions.
* decorrelation of the details information (Parvocellular output channel) and transient information (events, motion made available at the Magnocellular output channel).
The first two points are illustrated below :
In the figure below, the OpenEXR image sample *CrissyField.exr*, a High Dynamic Range image is shown. In order to make it visible on this web-page, the original input image is linearly rescaled to the classical image luminance range [0-255] and is converted to 8bit/channel format. Such strong conversion hides many details because of too strong local contrasts. Furthermore, noise energy is also strong and pollutes visual information.
.. image:: images/retina_TreeHdr_small.jpg
:alt: A High dynamic range image linearly rescaled within range [0-255].
:align: center
In the following image, applying the ideas proposed in [benoit2010]_, as your retina does, local luminance adaptation, spatial noise removal and spectral whitening work together and transmit accurate information on lower range 8bit data channels. On this picture, noise in significantly removed, local details hidden by strong luminance contrasts are enhanced. Output image keeps its naturalness and visual content is enhanced. Color processing is based on the color multiplexing/demultiplexing method proposed in [chaix2007]_.
.. image:: images/retina_TreeHdr_retina.jpg
:alt: A High dynamic range image compressed within range [0-255] using the retina.
:align: center
*Note :* image sample can be downloaded from the `OpenEXR website <http://www.openexr.com>`_. Regarding this demonstration, before retina processing, input image has been linearly rescaled within 0-255 keeping its channels float format. 5% of its histogram ends has been cut (mostly removes wrong HDR pixels). Check out the sample *opencv/samples/cpp/OpenEXRimages_HighDynamicRange_Retina_toneMapping.cpp* for similar processing. The following demonstration will only consider classical 8bit/channel images.
The retina model output channels
================================
The retina model presents two outputs that benefit from the above cited behaviors.
* The first one is called the Parvocellular channel. It is mainly active in the foveal retina area (high resolution central vision with color sensitive photo-receptors), its aim is to provide accurate color vision for visual details remaining static on the retina. On the other hand objects moving on the retina projection are blurred.
* The second well known channel is the Magnocellular channel. It is mainly active in the retina peripheral vision and send signals related to change events (motion, transient events, etc.). These outing signals also help visual system to focus/center retina on 'transient'/moving areas for more detailed analysis thus improving visual scene context and object classification.
**NOTE :** regarding the proposed model, contrary to the real retina, we apply these two channels on the entire input images using the same resolution. This allows enhanced visual details and motion information to be extracted on all the considered images... but remember, that these two channels are complementary. For example, if Magnocellular channel gives strong energy in an area, then, the Parvocellular channel is certainly blurred there since there is a transient event.
As an illustration, we apply in the following the retina model on a webcam video stream of a dark visual scene. In this visual scene, captured in an amphitheater of the university, some students are moving while talking to the teacher.
In this video sequence, because of the dark ambiance, signal to noise ratio is low and color artifacts are present on visual features edges because of the low quality image capture tool-chain.
.. image:: images/studentsSample_input.jpg
:alt: an input video stream extract sample
:align: center
Below is shown the retina foveal vision applied on the entire image. In the used retina configuration, global luminance is preserved and local contrasts are enhanced. Also, signal to noise ratio is improved : since high frequency spatio-temporal noise is reduced, enhanced details are not corrupted by any enhanced noise.
.. image:: images/studentsSample_parvo.jpg
:alt: the retina Parvocellular output. Enhanced details, luminance adaptation and noise removal. A processing tool for image analysis.
:align: center
Below is the output of the Magnocellular output of the retina model. Its signals are strong where transient events occur. Here, a student is moving at the bottom of the image thus generating high energy. The remaining of the image is static however, it is corrupted by a strong noise. Here, the retina filters out most of the noise thus generating low false motion area 'alarms'. This channel can be used as a transient/moving areas detector : it would provide relevant information for a low cost segmentation tool that would highlight areas in which an event is occurring.
.. image:: images/studentsSample_magno.jpg
:alt: the retina Magnocellular output. Enhanced transient signals (motion, etc.). A preprocessing tool for event detection.
:align: center
Retina use case
===============
This model can be used basically for spatio-temporal video effects but also in the aim of :
* performing texture analysis with enhanced signal to noise ratio and enhanced details robust against input images luminance ranges (check out the Parvocellular retina channel output)
* performing motion analysis also taking benefit of the previously cited properties.
Literature
==========
For more information, refer to the following papers :
.. [benoit2010] Benoit A., Caplier A., Durette B., Herault, J., "Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing", Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011>
* Please have a look at the reference work of Jeanny Herault that you can read in his book :
.. [herault2010] Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception (Progress in Neural Processing),By: Jeanny Herault, ISBN: 9814273686. WAPI (Tower ID): 113266891.
This retina filter code includes the research contributions of phd/research collegues from which code has been redrawn by the author :
* take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color mosaicing/demosaicing and his reference paper:
.. [chaix2007] B. Chaix de Lavarene, D. Alleysson, B. Durette, J. Herault (2007). "Efficient demosaicing through recursive filtering", IEEE International Conference on Image Processing ICIP 2007
* take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is also proposed and originates from Jeanny's discussions. More informations in the above cited Jeanny Heraults's book.
Code tutorial
=============
Please refer to the original tutorial source code in file *opencv_folder/samples/cpp/tutorial_code/bioinspired/retina_tutorial.cpp*.
**Note :** do not forget that the retina model is included in the following namespace : *cv::bioinspired*.
To compile it, assuming OpenCV is correctly installed, use the following command. It requires the opencv_core *(cv::Mat and friends objects management)*, opencv_highgui *(display and image/video read)* and opencv_bioinspired *(Retina description)* libraries to compile.
.. code-block:: cpp
// compile
gcc retina_tutorial.cpp -o Retina_tuto -lopencv_core -lopencv_highgui -lopencv_bioinspired
// Run commands : add 'log' as a last parameter to apply a spatial log sampling (simulates retina sampling)
// run on webcam
./Retina_tuto -video
// run on video file
./Retina_tuto -video myVideo.avi
// run on an image
./Retina_tuto -image myPicture.jpg
// run on an image with log sampling
./Retina_tuto -image myPicture.jpg log
Here is a code explanation :
Retina definition is present in the bioinspired package and a simple include allows to use it. You can rather use the specific header : *opencv2/bioinspired.hpp* if you prefer but then include the other required openv modules : *opencv2/core.hpp* and *opencv2/highgui.hpp*
.. code-block:: cpp
#include "opencv2/opencv.hpp"
Provide user some hints to run the program with a help function
.. code-block:: cpp
// the help procedure
static void help(std::string errorMessage)
{
std::cout<<"Program init error : "<<errorMessage<<std::endl;
std::cout<<"\nProgram call procedure : retinaDemo [processing mode] [Optional : media target] [Optional LAST parameter: \"log\" to activate retina log sampling]"<<std::endl;
std::cout<<"\t[processing mode] :"<<std::endl;
std::cout<<"\t -image : for still image processing"<<std::endl;
std::cout<<"\t -video : for video stream processing"<<std::endl;
std::cout<<"\t[Optional : media target] :"<<std::endl;
std::cout<<"\t if processing an image or video file, then, specify the path and filename of the target to process"<<std::endl;
std::cout<<"\t leave empty if processing video stream coming from a connected video device"<<std::endl;
std::cout<<"\t[Optional : activate retina log sampling] : an optional last parameter can be specified for retina spatial log sampling"<<std::endl;
std::cout<<"\t set \"log\" without quotes to activate this sampling, output frame size will be divided by 4"<<std::endl;
std::cout<<"\nExamples:"<<std::endl;
std::cout<<"\t-Image processing : ./retinaDemo -image lena.jpg"<<std::endl;
std::cout<<"\t-Image processing with log sampling : ./retinaDemo -image lena.jpg log"<<std::endl;
std::cout<<"\t-Video processing : ./retinaDemo -video myMovie.mp4"<<std::endl;
std::cout<<"\t-Live video processing : ./retinaDemo -video"<<std::endl;
std::cout<<"\nPlease start again with new parameters"<<std::endl;
std::cout<<"****************************************************"<<std::endl;
std::cout<<" NOTE : this program generates the default retina parameters file 'RetinaDefaultParameters.xml'"<<std::endl;
std::cout<<" => you can use this to fine tune parameters and load them if you save to file 'RetinaSpecificParameters.xml'"<<std::endl;
}
Then, start the main program and first declare a *cv::Mat* matrix in which input images will be loaded. Also allocate a *cv::VideoCapture* object ready to load video streams (if necessary)
.. code-block:: cpp
int main(int argc, char* argv[]) {
// declare the retina input buffer... that will be fed differently in regard of the input media
cv::Mat inputFrame;
cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here
In the main program, before processing, first check input command parameters. Here it loads a first input image coming from a single loaded image (if user chose command *-image*) or from a video stream (if user chose command *-video*). Also, if the user added *log* command at the end of its program call, the spatial logarithmic image sampling performed by the retina is taken into account by the Boolean flag *useLogSampling*.
.. code-block:: cpp
// welcome message
std::cout<<"****************************************************"<<std::endl;
std::cout<<"* Retina demonstration : demonstrates the use of is a wrapper class of the Gipsa/Listic Labs retina model."<<std::endl;
std::cout<<"* This demo will try to load the file 'RetinaSpecificParameters.xml' (if exists).\nTo create it, copy the autogenerated template 'RetinaDefaultParameters.xml'.\nThen twaek it with your own retina parameters."<<std::endl;
// basic input arguments checking
if (argc<2)
{
help("bad number of parameter");
return -1;
}
bool useLogSampling = !strcmp(argv[argc-1], "log"); // check if user wants retina log sampling processing
std::string inputMediaType=argv[1];
//////////////////////////////////////////////////////////////////////////////
// checking input media type (still image, video file, live video acquisition)
if (!strcmp(inputMediaType.c_str(), "-image") && argc >= 3)
{
std::cout<<"RetinaDemo: processing image "<<argv[2]<<std::endl;
// image processing case
inputFrame = cv::imread(std::string(argv[2]), 1); // load image in RGB mode
}else
if (!strcmp(inputMediaType.c_str(), "-video"))
{
if (argc == 2 || (argc == 3 && useLogSampling)) // attempt to grab images from a video capture device
{
videoCapture.open(0);
}else// attempt to grab images from a video filestream
{
std::cout<<"RetinaDemo: processing video stream "<<argv[2]<<std::endl;
videoCapture.open(argv[2]);
}
// grab a first frame to check if everything is ok
videoCapture>>inputFrame;
}else
{
// bad command parameter
help("bad command parameter");
return -1;
}
Once all input parameters are processed, a first image should have been loaded, if not, display error and stop program :
.. code-block:: cpp
if (inputFrame.empty())
{
help("Input media could not be loaded, aborting");
return -1;
}
Now, everything is ready to run the retina model. I propose here to allocate a retina instance and to manage the eventual log sampling option. The Retina constructor expects at least a cv::Size object that shows the input data size that will have to be managed. One can activate other options such as color and its related color multiplexing strategy (here Bayer multiplexing is chosen using *enum cv::bioinspired::RETINA_COLOR_BAYER*). If using log sampling, the image reduction factor (smaller output images) and log sampling strengh can be adjusted.
.. code-block:: cpp
// pointer to a retina object
cv::Ptr<cv::bioinspired::Retina> myRetina;
// if the last parameter is 'log', then activate log sampling (favour foveal vision and subsamples peripheral vision)
if (useLogSampling)
{
myRetina = cv::bioinspired::createRetina(inputFrame.size(), true, cv::bioinspired::RETINA_COLOR_BAYER, true, 2.0, 10.0);
}
else// -> else allocate "classical" retina :
myRetina = cv::bioinspired::createRetina(inputFrame.size());
Once done, the proposed code writes a default xml file that contains the default parameters of the retina. This is useful to make your own config using this template. Here generated template xml file is called *RetinaDefaultParameters.xml*.
.. code-block:: cpp
// save default retina parameters file in order to let you see this and maybe modify it and reload using method "setup"
myRetina->write("RetinaDefaultParameters.xml");
In the following line, the retina attempts to load another xml file called *RetinaSpecificParameters.xml*. If you created it and introduced your own setup, it will be loaded, in the other case, default retina parameters are used.
.. code-block:: cpp
// load parameters if file exists
myRetina->setup("RetinaSpecificParameters.xml");
It is not required here but just to show it is possible, you can reset the retina buffers to zero to force it to forget past events.
.. code-block:: cpp
// reset all retina buffers (imagine you close your eyes for a long time)
myRetina->clearBuffers();
Now, it is time to run the retina ! First create some output buffers ready to receive the two retina channels outputs
.. code-block:: cpp
// declare retina output buffers
cv::Mat retinaOutput_parvo;
cv::Mat retinaOutput_magno;
Then, run retina in a loop, load new frames from video sequence if necessary and get retina outputs back to dedicated buffers.
.. code-block:: cpp
// processing loop with no stop condition
while(true)
{
// if using video stream, then, grabbing a new frame, else, input remains the same
if (videoCapture.isOpened())
videoCapture>>inputFrame;
// run retina filter on the loaded input frame
myRetina->run(inputFrame);
// Retrieve and display retina output
myRetina->getParvo(retinaOutput_parvo);
myRetina->getMagno(retinaOutput_magno);
cv::imshow("retina input", inputFrame);
cv::imshow("Retina Parvo", retinaOutput_parvo);
cv::imshow("Retina Magno", retinaOutput_magno);
cv::waitKey(10);
}
That's done ! But if you want to secure the system, take care and manage Exceptions. The retina can throw some when it sees irrelevant data (no input frame, wrong setup, etc.).
Then, i recommend to surround all the retina code by a try/catch system like this :
.. code-block:: cpp
try{
// pointer to a retina object
cv::Ptr<cv::Retina> myRetina;
[---]
// processing loop with no stop condition
while(true)
{
[---]
}
}catch(cv::Exception e)
{
std::cerr<<"Error using Retina : "<<e.what()<<std::endl;
}
Retina parameters, what to do ?
===============================
First, it is recommended to read the reference paper :
* Benoit A., Caplier A., Durette B., Herault, J., *"Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing"*, Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011>
Once done open the configuration file *RetinaDefaultParameters.xml* generated by the demo and let's have a look at it.
.. code-block:: cpp
<?xml version="1.0"?>
<opencv_storage>
<OPLandIPLparvo>
<colorMode>1</colorMode>
<normaliseOutput>1</normaliseOutput>
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity>
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant>
<photoreceptorsSpatialConstant>5.7e-01</photoreceptorsSpatialConstant>
<horizontalCellsGain>0.01</horizontalCellsGain>
<hcellsTemporalConstant>0.5</hcellsTemporalConstant>
<hcellsSpatialConstant>7.</hcellsSpatialConstant>
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo>
<IPLmagno>
<normaliseOutput>1</normaliseOutput>
<parasolCells_beta>0.</parasolCells_beta>
<parasolCells_tau>0.</parasolCells_tau>
<parasolCells_k>7.</parasolCells_k>
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency>
<V0CompressionParameter>9.5e-01</V0CompressionParameter>
<localAdaptintegration_tau>0.</localAdaptintegration_tau>
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno>
</opencv_storage>
Here are some hints but actually, the best parameter setup depends more on what you want to do with the retina rather than the images input that you give to retina. Apart from the more specific case of High Dynamic Range images (HDR) that require more specific setup for specific luminance compression objective, the retina behaviors should be rather stable from content to content. Note that OpenCV is able to manage such HDR format thanks to the OpenEXR images compatibility.
Then, if the application target requires details enhancement prior to specific image processing, you need to know if mean luminance information is required or not. If not, the the retina can cancel or significantly reduce its energy thus giving more visibility to higher spatial frequency details.
Basic parameters
----------------
The most simple parameters are the following :
* **colorMode** : let the retina process color information (if 1) or gray scale images (if 0). In this last case, only the first channel of the input will be processed.
* **normaliseOutput** : each channel has this parameter, if value is 1, then the considered channel output is rescaled between 0 and 255. Take care in this case at the Magnocellular output level (motion/transient channel detection). Residual noise will also be rescaled !
**Note :** using color requires color channels multiplexing/demultipexing which requires more processing. You can expect much faster processing using gray levels : it would require around 30 product per pixel for all the retina processes and it has recently been parallelized for multicore architectures.
Photo-receptors parameters
--------------------------
The following parameters act on the entry point of the retina - photo-receptors - and impact all the following processes. These sensors are low pass spatio-temporal filters that smooth temporal and spatial data and also adjust there sensitivity to local luminance thus improving details extraction and high frequency noise canceling.
* **photoreceptorsLocalAdaptationSensitivity** between 0 and 1. Values close to 1 allow high luminance log compression effect at the photo-receptors level. Values closer to 0 give a more linear sensitivity. Increased alone, it can burn the *Parvo (details channel)* output image. If adjusted in collaboration with **ganglionCellsSensitivity** images can be very contrasted whatever the local luminance there is... at the price of a naturalness decrease.
* **photoreceptorsTemporalConstant** this setups the temporal constant of the low pass filter effect at the entry of the retina. High value lead to strong temporal smoothing effect : moving objects are blurred and can disappear while static object are favored. But when starting the retina processing, stable state is reached lately.
* **photoreceptorsSpatialConstant** specifies the spatial constant related to photo-receptors low pass filter effect. This parameters specify the minimum allowed spatial signal period allowed in the following. Typically, this filter should cut high frequency noise. Then a 0 value doesn't cut anything noise while higher values start to cut high spatial frequencies and more and more lower frequencies... Then, do not go to high if you wanna see some details of the input images ! A good compromise for color images is 0.53 since this won't affect too much the color spectrum. Higher values would lead to gray and blurred output images.
Horizontal cells parameters
---------------------------
This parameter set tunes the neural network connected to the photo-receptors, the horizontal cells. It modulates photo-receptors sensitivity and completes the processing for final spectral whitening (part of the spatial band pass effect thus favoring visual details enhancement).
* **horizontalCellsGain** here is a critical parameter ! If you are not interested by the mean luminance and focus on details enhancement, then, set to zero. But if you want to keep some environment luminance data, let some low spatial frequencies pass into the system and set a higher value (<1).
* **hcellsTemporalConstant** similar to photo-receptors, this acts on the temporal constant of a low pass temporal filter that smooths input data. Here, a high value generates a high retina after effect while a lower value makes the retina more reactive. This value should be lower than **photoreceptorsTemporalConstant** to limit strong retina after effects.
* **hcellsSpatialConstant** is the spatial constant of the low pass filter of these cells filter. It specifies the lowest spatial frequency allowed in the following. Visually, a high value leads to very low spatial frequencies processing and leads to salient halo effects. Lower values reduce this effect but the limit is : do not go lower than the value of **photoreceptorsSpatialConstant**. Those 2 parameters actually specify the spatial band-pass of the retina.
**NOTE** after the processing managed by the previous parameters, input data is cleaned from noise and luminance in already partly enhanced. The following parameters act on the last processing stages of the two outing retina signals.
Parvo (details channel) dedicated parameter
-------------------------------------------
* **ganglionCellsSensitivity** specifies the strength of the final local adaptation occurring at the output of this details dedicated channel. Parameter values remain between 0 and 1. Low value tend to give a linear response while higher values enforces the remaining low contrasted areas.
**Note :** this parameter can correct eventual burned images by favoring low energetic details of the visual scene, even in bright areas.
IPL Magno (motion/transient channel) parameters
-----------------------------------------------
Once image information is cleaned, this channel acts as a high pass temporal filter that only selects signals related to transient signals (events, motion, etc.). A low pass spatial filter smooths extracted transient data and a final logarithmic compression enhances low transient events thus enhancing event sensitivity.
* **parasolCells_beta** generally set to zero, can be considered as an amplifier gain at the entry point of this processing stage. Generally set to 0.
* **parasolCells_tau** the temporal smoothing effect that can be added
* **parasolCells_k** the spatial constant of the spatial filtering effect, set it at a high value to favor low spatial frequency signals that are lower subject to residual noise.
* **amacrinCellsTemporalCutFrequency** specifies the temporal constant of the high pass filter. High values let slow transient events to be selected.
* **V0CompressionParameter** specifies the strength of the log compression. Similar behaviors to previous description but here it enforces sensitivity of transient events.
* **localAdaptintegration_tau** generally set to 0, no real use here actually
* **localAdaptintegration_k** specifies the size of the area on which local adaptation is performed. Low values lead to short range local adaptation (higher sensitivity to noise), high values secure log compression.

@ -1,36 +0,0 @@
.. _Table-Of-Content-Bioinspired:
*bioinspired* module. Algorithms inspired from biological models
----------------------------------------------------------------
Here you will learn how to use additional modules of OpenCV defined in the "bioinspired" module.
.. include:: ../../definitions/tocDefinitions.rst
+
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=============== ======================================================
|RetinaDemoImg| **Title:** :ref:`Retina_Model`
*Compatibility:* > OpenCV 2.4
*Author:* |Author_AlexB|
You will learn how to process images and video streams with a model of retina filter for details enhancement, spatio-temporal noise removal, luminance correction and spatio-temporal events detection.
=============== ======================================================
.. |RetinaDemoImg| image:: images/retina_TreeHdr_small.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. toctree::
:hidden:
../retina_model/retina_model

@ -1,37 +0,0 @@
.. _Table-Of-Content-CVV:
*cvv* module. GUI for Interactive Visual Debugging
--------------------------------------------------
Here you will learn how to use the cvv module to ease programming computer vision software through visual debugging aids.
.. include:: ../../definitions/tocDefinitions.rst
+
.. tabularcolumns:: m{100pt} m{300pt}
.. cssclass:: toctableopencv
=============== ======================================================
|cvvIntro| *Title:* :ref:`Visual_Debugging_Introduction`
*Compatibility:* > OpenCV 2.4.8
*Author:* |Author_Bihlmaier|
We will learn how to debug our applications in a visual and interactive way.
=============== ======================================================
.. |cvvIntro| image:: images/Visual_Debugging_Introduction_Tutorial_Cover.jpg
:height: 90pt
:width: 90pt
.. raw:: latex
\pagebreak
.. toctree::
:hidden:
../visual_debugging_introduction/visual_debugging_introduction

@ -1,372 +0,0 @@
.. _Visual_Debugging_Introduction:
Interactive Visual Debugging of Computer Vision applications
************************************************************
What is the most common way to debug computer vision applications?
Usually the answer is temporary, hacked together, custom code that must be removed from the code for release compilation.
In this tutorial we will show how to use the visual debugging features of the **cvv** module (*opencv2/cvv/cvv.hpp*) instead.
Goals
======
In this tutorial you will learn how to:
* Add cvv debug calls to your application
* Use the visual debug GUI
* Enable and disable the visual debug features during compilation (with zero runtime overhead when disabled)
Code
=====
The example code
* captures images (*highgui*), e.g. from a webcam,
* applies some filters to each image (*imgproc*),
* detects image features and matches them to the previous image (*features2d*).
If the program is compiled without visual debugging (see CMakeLists.txt below) the only result is some information printed to the command line.
We want to demonstrate how much debugging or development functionality is added by just a few lines of *cvv* commands.
.. code-block:: cpp
// system includes
#include <getopt.h>
#include <iostream>
// library includes
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/features2d/features2d.hpp>
// Visual debugging
#include <opencv2/cvv/cvv.hpp>
// helper function to convert objects that support operator<<() to std::string
template<class T> std::string toString(const T& p_arg)
{
std::stringstream ss;
ss << p_arg;
return ss.str();
}
void
usage()
{
printf("usage: cvvt [-r WxH]\n");
printf("-h print this help\n");
printf("-r WxH change resolution to width W and height H\n");
}
int
main(int argc, char** argv)
{
#ifdef CVVISUAL_DEBUGMODE
std::cout << "Visual debugging is ENABLED" << std::endl;
#else
std::cout << "Visual debugging is DISABLED" << std::endl;
#endif
cv::Size* resolution = nullptr;
// parse options
const char* optstring = "hr:";
int opt;
while ((opt = getopt(argc, argv, optstring)) != -1) {
switch (opt) {
case 'h':
usage();
return 0;
break;
case 'r':
{
char dummych;
resolution = new cv::Size();
if (sscanf(optarg, "%d%c%d", &resolution->width, &dummych, &resolution->height) != 3) {
printf("%s not a valid resolution\n", optarg);
return 1;
}
}
break;
default:
usage();
return 2;
}
}
// setup video capture
cv::VideoCapture capture(0);
if (!capture.isOpened()) {
std::cout << "Could not open VideoCapture" << std::endl;
return 3;
}
if (resolution) {
printf("Setting resolution to %dx%d\n", resolution->width, resolution->height);
capture.set(CV_CAP_PROP_FRAME_WIDTH, resolution->width);
capture.set(CV_CAP_PROP_FRAME_HEIGHT, resolution->height);
}
cv::Mat prevImgGray;
std::vector<cv::KeyPoint> prevKeypoints;
cv::Mat prevDescriptors;
int maxFeatureCount = 500;
cv::ORB detector(maxFeatureCount);
cv::BFMatcher matcher(cv::NORM_HAMMING);
for (int imgId = 0; imgId < 10; imgId++) {
// capture a frame
cv::Mat imgRead;
capture >> imgRead;
printf("%d: image captured\n", imgId);
std::string imgIdString{"imgRead"};
imgIdString += toString(imgId);
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str());
// convert to grayscale
cv::Mat imgGray;
cv::cvtColor(imgRead, imgGray, CV_BGR2GRAY);
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray");
// filter edges using Canny on smoothed image
cv::Mat imgGraySmooth;
cv::GaussianBlur(imgGray, imgGraySmooth, cv::Size(9, 9), 2, 2);
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed");
cv::Mat imgEdges;
cv::Canny(imgGraySmooth, imgEdges, 50, 150);
cvv::showImage(imgEdges, CVVISUAL_LOCATION, "edges");
// dilate edges
cv::Mat imgEdgesDilated;
cv::dilate(imgEdges, imgEdgesDilated, cv::getStructuringElement(cv::MORPH_RECT, cv::Size(7, 7), cv::Point(3, 3)));
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges");
// detect ORB features
std::vector<cv::KeyPoint> keypoints;
cv::Mat descriptors;
detector(imgGray, cv::noArray(), keypoints, descriptors);
printf("%d: detected %zd keypoints\n", imgId, keypoints.size());
// match them to previous image (if available)
if (!prevImgGray.empty()) {
std::vector<cv::DMatch> matches;
matcher.match(prevDescriptors, descriptors, matches);
printf("%d: all matches size=%zd\n", imgId, matches.size());
std::string allMatchIdString{"all matches "};
allMatchIdString += toString(imgId-1) + "<->" + toString(imgId);
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str());
// remove worst (as defined by match distance) bestRatio quantile
double bestRatio = 0.8;
std::sort(matches.begin(), matches.end());
matches.resize(int(bestRatio * matches.size()));
printf("%d: best matches size=%zd\n", imgId, matches.size());
std::string bestMatchIdString{"best " + toString(bestRatio) + " matches "};
bestMatchIdString += toString(imgId-1) + "<->" + toString(imgId);
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, bestMatchIdString.c_str());
}
prevImgGray = imgGray;
prevKeypoints = keypoints;
prevDescriptors = descriptors;
}
cvv::finalShow();
return 0;
}
.. code-block:: cmake
cmake_minimum_required(VERSION 2.8)
project(cvvisual_test)
SET(CMAKE_PREFIX_PATH ~/software/opencv/install)
SET(CMAKE_CXX_COMPILER "g++-4.8")
SET(CMAKE_CXX_FLAGS "-std=c++11 -O2 -pthread -Wall -Werror")
# (un)set: cmake -DCVV_DEBUG_MODE=OFF ..
OPTION(CVV_DEBUG_MODE "cvvisual-debug-mode" ON)
if(CVV_DEBUG_MODE MATCHES ON)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DCVVISUAL_DEBUGMODE")
endif()
FIND_PACKAGE(OpenCV REQUIRED)
include_directories(${OpenCV_INCLUDE_DIRS})
add_executable(cvvt main.cpp)
target_link_libraries(cvvt
opencv_core opencv_highgui opencv_imgproc opencv_features2d
opencv_cvv
)
Explanation
============
#. We compile the program either using the above CmakeLists.txt with Option *CVV_DEBUG_MODE=ON* (*cmake -DCVV_DEBUG_MODE=ON*) or by adding the corresponding define *CVVISUAL_DEBUGMODE* to our compiler (e.g. *g++ -DCVVISUAL_DEBUGMODE*).
#. The first cvv call simply shows the image (similar to *imshow*) with the imgIdString as comment.
.. code-block:: cpp
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str());
The image is added to the overview tab in the visual debug GUI and the cvv call blocks.
.. image:: images/01_overview_single.jpg
:alt: Overview with image of first cvv call
:align: center
The image can then be selected and viewed
.. image:: images/02_single_image_view.jpg
:alt: Display image added through cvv::showImage
:align: center
Whenever you want to continue in the code, i.e. unblock the cvv call, you can
either continue until the next cvv call (*Step*), continue until the last cvv
call (*>>*) or run the application until it exists (*Close*).
We decide to press the green *Step* button.
#. The next cvv calls are used to debug all kinds of filter operations, i.e. operations that take a picture as input and return a picture as output.
.. code-block:: cpp
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray");
As with every cvv call, you first end up in the overview.
.. image:: images/03_overview_two.jpg
:alt: Overview with two cvv calls after pressing Step
:align: center
We decide not to care about the conversion to gray scale and press *Step*.
.. code-block:: cpp
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed");
If you open the filter call, you will end up in the so called "DefaultFilterView".
Both images are shown next to each other and you can (synchronized) zoom into them.
.. image:: images/04_default_filter_view.jpg
:alt: Default filter view displaying a gray scale image and its corresponding GaussianBlur filtered one
:align: center
When you go to very high zoom levels, each pixel is annotated with its numeric values.
.. image:: images/05_default_filter_view_high_zoom.jpg
:alt: Default filter view at very high zoom levels
:align: center
We press *Step* twice and have a look at the dilated image.
.. code-block:: cpp
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges");
The DefaultFilterView showing both images
.. image:: images/06_default_filter_view_edges.jpg
:alt: Default filter view showing an edge image and the image after dilate()
:align: center
Now we use the *View* selector in the top right and select the "DualFilterView".
We select "Changed Pixels" as filter and apply it (middle image).
.. image:: images/07_dual_filter_view_edges.jpg
:alt: Dual filter view showing an edge image and the image after dilate()
:align: center
After we had a close look at these images, perhaps using different views, filters or other GUI features, we decide to let the program run through. Therefore we press the yellow *>>* button.
The program will block at
.. code-block:: cpp
cvv::finalShow();
and display the overview with everything that was passed to cvv in the meantime.
.. image:: images/08_overview_all.jpg
:alt: Overview displaying all cvv calls up to finalShow()
:align: center
#. The cvv debugDMatch call is used in a situation where there are two images each with a set of descriptors that are matched to each other.
We pass both images, both sets of keypoints and their matching to the visual debug module.
.. code-block:: cpp
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str());
Since we want to have a look at matches, we use the filter capabilities (*#type match*) in the overview to only show match calls.
.. image:: images/09_overview_filtered_type_match.jpg
:alt: Overview displaying only match calls
:align: center
We want to have a closer look at one of them, e.g. to tune our parameters that use the matching.
The view has various settings how to display keypoints and matches.
Furthermore, there is a mouseover tooltip.
.. image:: images/10_line_match_view.jpg
:alt: Line match view
:align: center
We see (visual debugging!) that there are many bad matches.
We decide that only 70% of the matches should be shown - those 70% with the lowest match distance.
.. image:: images/11_line_match_view_portion_selector.jpg
:alt: Line match view showing the best 70% matches, i.e. lowest match distance
:align: center
Having successfully reduced the visual distraction, we want to see more clearly what changed between the two images.
We select the "TranslationMatchView" that shows to where the keypoint was matched in a different way.
.. image:: images/12_translation_match_view_portion_selector.jpg
:alt: Translation match view
:align: center
It is easy to see that the cup was moved to the left during the two images.
Although, cvv is all about interactively *seeing* the computer vision bugs, this is complemented by a "RawView" that allows to have a look at the underlying numeric data.
.. image:: images/13_raw_view.jpg
:alt: Raw view of matches
:align: center
#. There are many more useful features contained in the cvv GUI. For instance, one can group the overview tab.
.. image:: images/14_overview_group_by_line.jpg
:alt: Overview grouped by call line
:align: center
Result
=======
* By adding a view expressive lines to our computer vision program we can interactively debug it through different visualizations.
* Once we are done developing/debugging we do not have to remove those lines. We simply disable cvv debugging (*cmake -DCVV_DEBUG_MODE=OFF* or g++ without *-DCVVISUAL_DEBUGMODE*) and our programs runs without any debug overhead.
Enjoy computer vision!

@ -1,151 +0,0 @@
.. ximgproc:
Structured forests for fast edge detection
******************************************
Introduction
------------
In this tutorial you will learn how to use structured forests for the purpose of edge detection in an image.
Examples
--------
.. image:: images/01.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/02.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/03.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/04.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/05.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/06.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/07.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/08.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/09.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/10.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/11.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
.. image:: images/12.jpg
:height: 238pt
:width: 750pt
:alt: First example
:align: center
**Note :** binarization techniques like Canny edge detector are applicable
to edges produced by both algorithms (``Sobel`` and ``StructuredEdgeDetection::detectEdges``).
Source Code
-----------
.. literalinclude:: ../../../../modules/ximpgroc/samples/cpp/structured_edge_detection.cpp
:language: cpp
:linenos:
:tab-width: 4
Explanation
-----------
1. **Load source color image**
.. code-block:: cpp
cv::Mat image = cv::imread(inFilename, 1);
if ( image.empty() )
{
printf("Cannot read image file: %s\n", inFilename.c_str());
return -1;
}
2. **Convert source image to [0;1] range**
.. code-block:: cpp
image.convertTo(image, cv::DataType<float>::type, 1/255.0);
3. **Run main algorithm**
.. code-block:: cpp
cv::Mat edges(image.size(), image.type());
cv::Ptr<StructuredEdgeDetection> pDollar =
cv::createStructuredEdgeDetection(modelFilename);
pDollar->detectEdges(image, edges);
4. **Show results**
.. code-block:: cpp
if ( outFilename == "" )
{
cv::namedWindow("edges", 1);
cv::imshow("edges", edges);
cv::waitKey(0);
}
else
cv::imwrite(outFilename, 255*edges);
Literature
----------
For more information, refer to the following papers :
.. [Dollar2013] Dollar P., Zitnick C. L., "Structured forests for fast edge detection",
IEEE International Conference on Computer Vision (ICCV), 2013,
pp. 1841-1848. `DOI <http://dx.doi.org/10.1109/ICCV.2013.231>`_
.. [Lim2013] Lim J. J., Zitnick C. L., Dollar P., "Sketch Tokens: A Learned
Mid-level Representation for Contour and Object Detection",
Comoputer Vision and Pattern Recognition (CVPR), 2013,
pp. 3158-3165. `DOI <http://dx.doi.org/10.1109/CVPR.2013.406>`_

@ -1,115 +0,0 @@
.. ximgproc:
Structured forest training
**************************
Introduction
------------
In this tutorial we show how to train your own structured forest using author's initial Matlab implementation.
Training pipeline
-----------------
1. Download "Piotr's Toolbox" from `link <http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html>`_
and put it into separate directory, e.g. PToolbox
2. Download BSDS500 dataset from `link <http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/>`
and put it into separate directory named exactly BSR
3. Add both directory and their subdirectories to Matlab path.
4. Download detector code from `link <http://research.microsoft.com/en-us/downloads/389109f6-b4e8-404c-84bf-239f7cbf4e3d/>`
and put it into root directory. Now you should have ::
.
BSR
PToolbox
models
private
Contents.m
edgesChns.m
edgesDemo.m
edgesDemoRgbd.m
edgesDetect.m
edgesEval.m
edgesEvalDir.m
edgesEvalImg.m
edgesEvalPlot.m
edgesSweeps.m
edgesTrain.m
license.txt
readme.txt
5. Rename models/forest/modelFinal.mat to models/forest/modelFinal.mat.backup
6. Open edgesChns.m and comment lines 26--41. Add after commented lines the following: ::
shrink=opts.shrink;
chns = single(getFeatures( im2double(I) ));
7. Now it is time to compile promised getFeatures. I do with the following code:
.. code-block:: cpp
#include <cv.h>
#include <highgui.h>
#include <mat.h>
#include <mex.h>
#include "MxArray.hpp" // https://github.com/kyamagu/mexopencv
class NewRFFeatureGetter : public cv::RFFeatureGetter
{
public:
NewRFFeatureGetter() : name("NewRFFeatureGetter"){}
virtual void getFeatures(const cv::Mat &src, NChannelsMat &features,
const int gnrmRad, const int gsmthRad,
const int shrink, const int outNum, const int gradNum) const
{
// here your feature extraction code, the default one is:
// resulting features Mat should be n-channels, floating point matrix
}
protected:
cv::String name;
};
MEXFUNCTION_LINKAGE void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
if (nlhs != 1) mexErrMsgTxt("nlhs != 1");
if (nrhs != 1) mexErrMsgTxt("nrhs != 1");
cv::Mat src = MxArray(prhs[0]).toMat();
src.convertTo(src, cv::DataType<float>::type);
std::string modelFile = MxArray(prhs[1]).toString();
NewRFFeatureGetter *pDollar = createNewRFFeatureGetter();
cv::Mat edges;
pDollar->getFeatures(src, edges, 4, 0, 2, 13, 4);
// you can use other numbers here
edges.convertTo(edges, cv::DataType<double>::type);
plhs[0] = MxArray(edges);
}
8. Place compiled mex file into root dir and run edgesDemo.
You will need to wait a couple of hours after that the new model
will appear inside models/forest/.
9. The final step is converting trained model from Matlab binary format
to YAML which you can use with our ocv::StructuredEdgeDetection.
For this purpose run opencv_contrib/doc/tutorials/ximpgroc/training/modelConvert(model, "model.yml")
How to use your model
---------------------
Just use expanded constructor with above defined class NewRFFeatureGetter
.. code-block:: cpp
cv::StructuredEdgeDetection pDollar
= cv::createStructuredEdgeDetection( modelName, makePtr<NewRFFeatureGetter>() );

@ -1,10 +0,0 @@
********************************************************************
bioinspired. Biologically inspired vision models and derivated tools
********************************************************************
The module provides biological visual systems models (human visual system and others). It also provides derivated objects that take advantage of those bio-inspired models.
.. toctree::
:maxdepth: 2
Human retina documentation <retina>

@ -1,493 +0,0 @@
Retina : a Bio mimetic human retina model
*****************************************
.. highlight:: cpp
Retina
======
.. ocv:class:: Retina : public Algorithm
**Note** : do not forget that the retina model is included in the following namespace : *cv::bioinspired*.
Introduction
++++++++++++
Class which provides the main controls to the Gipsa/Listic labs human retina model. This is a non separable spatio-temporal filter modelling the two main retina information channels :
* foveal vision for detailled color vision : the parvocellular pathway.
* peripheral vision for sensitive transient signals detection (motion and events) : the magnocellular pathway.
From a general point of view, this filter whitens the image spectrum and corrects luminance thanks to local adaptation. An other important property is its hability to filter out spatio-temporal noise while enhancing details.
This model originates from Jeanny Herault work [Herault2010]_. It has been involved in Alexandre Benoit phd and his current research [Benoit2010]_, [Strat2013]_ (he currently maintains this module within OpenCV). It includes the work of other Jeanny's phd student such as [Chaix2007]_ and the log polar transformations of Barthelemy Durette described in Jeanny's book.
**NOTES :**
* For ease of use in computer vision applications, the two retina channels are applied homogeneously on all the input images. This does not follow the real retina topology but this can still be done using the log sampling capabilities proposed within the class.
* Extend the retina description and code use in the tutorial/contrib section for complementary explanations.
Preliminary illustration
++++++++++++++++++++++++
As a preliminary presentation, let's start with a visual example. We propose to apply the filter on a low quality color jpeg image with backlight problems. Here is the considered input... *"Well, my eyes were able to see more that this strange black shadow..."*
.. image:: images/retinaInput.jpg
:alt: a low quality color jpeg image with backlight problems.
:align: center
Below, the retina foveal model applied on the entire image with default parameters. Here contours are enforced, halo effects are voluntary visible with this configuration. See parameters discussion below and increase horizontalCellsGain near 1 to remove them.
.. image:: images/retinaOutput_default.jpg
:alt: the retina foveal model applied on the entire image with default parameters. Here contours are enforced, luminance is corrected and halo effects are voluntary visible with this configuration, increase horizontalCellsGain near 1 to remove them.
:align: center
Below, a second retina foveal model output applied on the entire image with a parameters setup focused on naturalness perception. *"Hey, i now recognize my cat, looking at the mountains at the end of the day !"*. Here contours are enforced, luminance is corrected but halos are avoided with this configuration. The backlight effect is corrected and highlight details are still preserved. Then, even on a low quality jpeg image, if some luminance information remains, the retina is able to reconstruct a proper visual signal. Such configuration is also usefull for High Dynamic Range (*HDR*) images compression to 8bit images as discussed in [Benoit2010]_ and in the demonstration codes discussed below.
As shown at the end of the page, parameters change from defaults are :
* horizontalCellsGain=0.3
* photoreceptorsLocalAdaptationSensitivity=ganglioncellsSensitivity=0.89.
.. image:: images/retinaOutput_realistic.jpg
:alt: the retina foveal model applied on the entire image with 'naturalness' parameters. Here contours are enforced but are avoided with this configuration, horizontalCellsGain is 0.3 and photoreceptorsLocalAdaptationSensitivity=ganglioncellsSensitivity=0.89.
:align: center
As observed in this preliminary demo, the retina can be settled up with various parameters, by default, as shown on the figure above, the retina strongly reduces mean luminance energy and enforces all details of the visual scene. Luminance energy and halo effects can be modulated (exagerated to cancelled as shown on the two examples). In order to use your own parameters, you can use at least one time the *write(String fs)* method which will write a proper XML file with all default parameters. Then, tweak it on your own and reload them at any time using method *setup(String fs)*. These methods update a *Retina::RetinaParameters* member structure that is described hereafter. XML parameters file samples are shown at the end of the page.
Here is an overview of the abstract Retina interface, allocate one instance with the *createRetina* functions.::
namespace cv{namespace bioinspired{
class Retina : public Algorithm
{
public:
// parameters setup instance
struct RetinaParameters; // this class is detailled later
// main method for input frame processing (all use method, can also perform High Dynamic Range tone mapping)
void run (InputArray inputImage);
// specific method aiming at correcting luminance only (faster High Dynamic Range tone mapping)
void applyFastToneMapping(InputArray inputImage, OutputArray outputToneMappedImage)
// output buffers retreival methods
// -> foveal color vision details channel with luminance and noise correction
void getParvo (OutputArray retinaOutput_parvo);
void getParvoRAW (OutputArray retinaOutput_parvo);// retreive original output buffers without any normalisation
const Mat getParvoRAW () const;// retreive original output buffers without any normalisation
// -> peripheral monochrome motion and events (transient information) channel
void getMagno (OutputArray retinaOutput_magno);
void getMagnoRAW (OutputArray retinaOutput_magno); // retreive original output buffers without any normalisation
const Mat getMagnoRAW () const;// retreive original output buffers without any normalisation
// reset retina buffers... equivalent to closing your eyes for some seconds
void clearBuffers ();
// retreive input and output buffers sizes
Size getInputSize ();
Size getOutputSize ();
// setup methods with specific parameters specification of global xml config file loading/write
void setup (String retinaParameterFile="", const bool applyDefaultSetupOnFailure=true);
void setup (FileStorage &fs, const bool applyDefaultSetupOnFailure=true);
void setup (RetinaParameters newParameters);
struct Retina::RetinaParameters getParameters ();
const String printSetup ();
virtual void write (String fs) const;
virtual void write (FileStorage &fs) const;
void setupOPLandIPLParvoChannel (const bool colorMode=true, const bool normaliseOutput=true, const float photoreceptorsLocalAdaptationSensitivity=0.7, const float photoreceptorsTemporalConstant=0.5, const float photoreceptorsSpatialConstant=0.53, const float horizontalCellsGain=0, const float HcellsTemporalConstant=1, const float HcellsSpatialConstant=7, const float ganglionCellsSensitivity=0.7);
void setupIPLMagnoChannel (const bool normaliseOutput=true, const float parasolCells_beta=0, const float parasolCells_tau=0, const float parasolCells_k=7, const float amacrinCellsTemporalCutFrequency=1.2, const float V0CompressionParameter=0.95, const float localAdaptintegration_tau=0, const float localAdaptintegration_k=7);
void setColorSaturation (const bool saturateColors=true, const float colorSaturationValue=4.0);
void activateMovingContoursProcessing (const bool activate);
void activateContoursProcessing (const bool activate);
};
// Allocators
cv::Ptr<Retina> createRetina (Size inputSize);
cv::Ptr<Retina> createRetina (Size inputSize, const bool colorMode, RETINA_COLORSAMPLINGMETHOD colorSamplingMethod=RETINA_COLOR_BAYER, const bool useRetinaLogSampling=false, const double reductionFactor=1.0, const double samplingStrenght=10.0);
}} // cv and bioinspired namespaces end
.. Sample code::
* An example on retina tone mapping can be found at opencv_source_code/samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp
* An example on retina tone mapping on video input can be found at opencv_source_code/samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp
* A complete example illustrating the retina interface can be found at opencv_source_code/samples/cpp/retinaDemo.cpp
Description
+++++++++++
Class which allows the `Gipsa <http://www.gipsa-lab.inpg.fr>`_ (preliminary work) / `Listic <http://www.listic.univ-savoie.fr>`_ (code maintainer and user) labs retina model to be used. This class allows human retina spatio-temporal image processing to be applied on still images, images sequences and video sequences. Briefly, here are the main human retina model properties:
* spectral whithening (mid-frequency details enhancement)
* high frequency spatio-temporal noise reduction (temporal noise and high frequency spatial noise are minimized)
* low frequency luminance reduction (luminance range compression) : high luminance regions do not hide details in darker regions anymore
* local logarithmic luminance compression allows details to be enhanced even in low light conditions
Use : this model can be used basically for spatio-temporal video effects but also in the aim of :
* performing texture analysis with enhanced signal to noise ratio and enhanced details robust against input images luminance ranges (check out the parvocellular retina channel output, by using the provided **getParvo** methods)
* performing motion analysis also taking benefit of the previously cited properties (check out the magnocellular retina channel output, by using the provided **getMagno** methods)
* general image/video sequence description using either one or both channels. An example of the use of Retina in a Bag of Words approach is given in [Strat2013]_.
Literature
==========
For more information, refer to the following papers :
* Model description :
.. [Benoit2010] Benoit A., Caplier A., Durette B., Herault, J., "Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing", Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011>
* Model use in a Bag of Words approach :
.. [Strat2013] Strat S., Benoit A., Lambert P., "Retina enhanced SIFT descriptors for video indexing", CBMI2013, Veszprém, Hungary, 2013.
* Please have a look at the reference work of Jeanny Herault that you can read in his book :
.. [Herault2010] Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception (Progress in Neural Processing),By: Jeanny Herault, ISBN: 9814273686. WAPI (Tower ID): 113266891.
This retina filter code includes the research contributions of phd/research collegues from which code has been redrawn by the author :
* take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color mosaicing/demosaicing and his reference paper:
.. [Chaix2007] B. Chaix de Lavarene, D. Alleysson, B. Durette, J. Herault (2007). "Efficient demosaicing through recursive filtering", IEEE International Conference on Image Processing ICIP 2007
* take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is also proposed and originates from Jeanny's discussions. More informations in the above cited Jeanny Heraults's book.
* Meylan&al work on HDR tone mapping that is implemented as a specific method within the model :
.. [Meylan2007] L. Meylan , D. Alleysson, S. Susstrunk, "A Model of Retinal Local Adaptation for the Tone Mapping of Color Filter Array Images", Journal of Optical Society of America, A, Vol. 24, N 9, September, 1st, 2007, pp. 2807-2816
Demos and experiments !
=======================
**NOTE : Complementary to the following examples, have a look at the Retina tutorial in the tutorial/contrib section for complementary explanations.**
Take a look at the provided C++ examples provided with OpenCV :
* **samples/cpp/retinademo.cpp** shows how to use the retina module for details enhancement (Parvo channel output) and transient maps observation (Magno channel output). You can play with images, video sequences and webcam video.
Typical uses are (provided your OpenCV installation is situated in folder *OpenCVReleaseFolder*)
* image processing : **OpenCVReleaseFolder/bin/retinademo -image myPicture.jpg**
* video processing : **OpenCVReleaseFolder/bin/retinademo -video myMovie.avi**
* webcam processing: **OpenCVReleaseFolder/bin/retinademo -video**
**Note :** This demo generates the file *RetinaDefaultParameters.xml* which contains the default parameters of the retina. Then, rename this as *RetinaSpecificParameters.xml*, adjust the parameters the way you want and reload the program to check the effect.
* **samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp** shows how to use the retina to perform High Dynamic Range (HDR) luminance compression
Then, take a HDR image using bracketing with your camera and generate an OpenEXR image and then process it using the demo.
Typical use, supposing that you have the OpenEXR image such as *memorial.exr* (present in the samples/cpp/ folder)
**OpenCVReleaseFolder/bin/OpenEXRimages_HDR_Retina_toneMapping memorial.exr [optional: 'fast']**
Note that some sliders are made available to allow you to play with luminance compression.
If not using the 'fast' option, then, tone mapping is performed using the full retina model [Benoit2010]_. It includes spectral whitening that allows luminance energy to be reduced. When using the 'fast' option, then, a simpler method is used, it is an adaptation of the algorithm presented in [Meylan2007]_. This method gives also good results and is faster to process but it sometimes requires some more parameters adjustement.
Methods description
===================
Here are detailled the main methods to control the retina model
Ptr<Retina>::createRetina
+++++++++++++++++++++++++
.. ocv:function:: Ptr<cv::bioinspired::Retina> createRetina(Size inputSize)
.. ocv:function:: Ptr<cv::bioinspired::Retina> createRetina(Size inputSize, const bool colorMode, cv::bioinspired::RETINA_COLORSAMPLINGMETHOD colorSamplingMethod = cv::bioinspired::RETINA_COLOR_BAYER, const bool useRetinaLogSampling = false, const double reductionFactor = 1.0, const double samplingStrenght = 10.0 )
Constructors from standardized interfaces : retreive a smart pointer to a Retina instance
:param inputSize: the input frame size
:param colorMode: the chosen processing mode : with or without color processing
:param colorSamplingMethod: specifies which kind of color sampling will be used :
* cv::bioinspired::RETINA_COLOR_RANDOM: each pixel position is either R, G or B in a random choice
* cv::bioinspired::RETINA_COLOR_DIAGONAL: color sampling is RGBRGBRGB..., line 2 BRGBRGBRG..., line 3, GBRGBRGBR...
* cv::bioinspired::RETINA_COLOR_BAYER: standard bayer sampling
:param useRetinaLogSampling: activate retina log sampling, if true, the 2 following parameters can be used
:param reductionFactor: only usefull if param useRetinaLogSampling=true, specifies the reduction factor of the output frame (as the center (fovea) is high resolution and corners can be underscaled, then a reduction of the output is allowed without precision leak
:param samplingStrenght: only usefull if param useRetinaLogSampling=true, specifies the strenght of the log scale that is applied
Retina::activateContoursProcessing
++++++++++++++++++++++++++++++++++
.. ocv:function:: void Retina::activateContoursProcessing(const bool activate)
Activate/desactivate the Parvocellular pathway processing (contours information extraction), by default, it is activated
:param activate: true if Parvocellular (contours information extraction) output should be activated, false if not... if activated, the Parvocellular output can be retrieved using the **getParvo** methods
Retina::activateMovingContoursProcessing
++++++++++++++++++++++++++++++++++++++++
.. ocv:function:: void Retina::activateMovingContoursProcessing(const bool activate)
Activate/desactivate the Magnocellular pathway processing (motion information extraction), by default, it is activated
:param activate: true if Magnocellular output should be activated, false if not... if activated, the Magnocellular output can be retrieved using the **getMagno** methods
Retina::clearBuffers
++++++++++++++++++++
.. ocv:function:: void Retina::clearBuffers()
Clears all retina buffers (equivalent to opening the eyes after a long period of eye close ;o) whatchout the temporal transition occuring just after this method call.
Retina::getParvo
++++++++++++++++
.. ocv:function:: void Retina::getParvo( OutputArray retinaOutput_parvo )
.. ocv:function:: void Retina::getParvoRAW( OutputArray retinaOutput_parvo )
.. ocv:function:: const Mat Retina::getParvoRAW() const
Accessor of the details channel of the retina (models foveal vision). Warning, getParvoRAW methods return buffers that are not rescaled within range [0;255] while the non RAW method allows a normalized matrix to be retrieved.
:param retinaOutput_parvo: the output buffer (reallocated if necessary), format can be :
* a Mat, this output is rescaled for standard 8bits image processing use in OpenCV
* RAW methods actually return a 1D matrix (encoding is R1, R2, ... Rn, G1, G2, ..., Gn, B1, B2, ...Bn), this output is the original retina filter model output, without any quantification or rescaling.
Retina::getMagno
++++++++++++++++
.. ocv:function:: void Retina::getMagno( OutputArray retinaOutput_magno )
.. ocv:function:: void Retina::getMagnoRAW( OutputArray retinaOutput_magno )
.. ocv:function:: const Mat Retina::getMagnoRAW() const
Accessor of the motion channel of the retina (models peripheral vision). Warning, getMagnoRAW methods return buffers that are not rescaled within range [0;255] while the non RAW method allows a normalized matrix to be retrieved.
:param retinaOutput_magno: the output buffer (reallocated if necessary), format can be :
* a Mat, this output is rescaled for standard 8bits image processing use in OpenCV
* RAW methods actually return a 1D matrix (encoding is M1, M2,... Mn), this output is the original retina filter model output, without any quantification or rescaling.
Retina::getInputSize
++++++++++++++++++++
.. ocv:function:: Size Retina::getInputSize()
Retreive retina input buffer size
:return: the retina input buffer size
Retina::getOutputSize
+++++++++++++++++++++
.. ocv:function:: Size Retina::getOutputSize()
Retreive retina output buffer size that can be different from the input if a spatial log transformation is applied
:return: the retina output buffer size
Retina::printSetup
++++++++++++++++++
.. ocv:function:: const String Retina::printSetup()
Outputs a string showing the used parameters setup
:return: a string which contains formated parameters information
Retina::run
+++++++++++
.. ocv:function:: void Retina::run(InputArray inputImage)
Method which allows retina to be applied on an input image, after run, encapsulated retina module is ready to deliver its outputs using dedicated acccessors, see getParvo and getMagno methods
:param inputImage: the input Mat image to be processed, can be gray level or BGR coded in any format (from 8bit to 16bits)
Retina::applyFastToneMapping
++++++++++++++++++++++++++++
.. ocv:function:: void Retina::applyFastToneMapping(InputArray inputImage, OutputArray outputToneMappedImage)
Method which processes an image in the aim to correct its luminance : correct backlight problems, enhance details in shadows. This method is designed to perform High Dynamic Range image tone mapping (compress >8bit/pixel images to 8bit/pixel). This is a simplified version of the Retina Parvocellular model (simplified version of the run/getParvo methods call) since it does not include the spatio-temporal filter modelling the Outer Plexiform Layer of the retina that performs spectral whitening and many other stuff. However, it works great for tone mapping and in a faster way.
Check the demos and experiments section to see examples and the way to perform tone mapping using the original retina model and the method.
:param inputImage: the input image to process (should be coded in float format : CV_32F, CV_32FC1, CV_32F_C3, CV_32F_C4, the 4th channel won't be considered).
:param outputToneMappedImage: the output 8bit/channel tone mapped image (CV_8U or CV_8UC3 format).
Retina::setColorSaturation
++++++++++++++++++++++++++
.. ocv:function:: void Retina::setColorSaturation(const bool saturateColors = true, const float colorSaturationValue = 4.0 )
Activate color saturation as the final step of the color demultiplexing process -> this saturation is a sigmoide function applied to each channel of the demultiplexed image.
:param saturateColors: boolean that activates color saturation (if true) or desactivate (if false)
:param colorSaturationValue: the saturation factor : a simple factor applied on the chrominance buffers
Retina::setup
+++++++++++++
.. ocv:function:: void Retina::setup(String retinaParameterFile = "", const bool applyDefaultSetupOnFailure = true )
.. ocv:function:: void Retina::setup(FileStorage & fs, const bool applyDefaultSetupOnFailure = true )
.. ocv:function:: void Retina::setup(RetinaParameters newParameters)
Try to open an XML retina parameters file to adjust current retina instance setup => if the xml file does not exist, then default setup is applied => warning, Exceptions are thrown if read XML file is not valid
:param retinaParameterFile: the parameters filename
:param applyDefaultSetupOnFailure: set to true if an error must be thrown on error
:param fs: the open Filestorage which contains retina parameters
:param newParameters: a parameters structures updated with the new target configuration. You can retreive the current parameers structure using method *Retina::RetinaParameters Retina::getParameters()* and update it before running method *setup*.
Retina::write
+++++++++++++
.. ocv:function:: void Retina::write( String fs ) const
.. ocv:function:: void Retina::write( FileStorage& fs ) const
Write xml/yml formated parameters information
:param fs: the filename of the xml file that will be open and writen with formatted parameters information
Retina::setupIPLMagnoChannel
++++++++++++++++++++++++++++
.. ocv:function:: void Retina::setupIPLMagnoChannel(const bool normaliseOutput = true, const float parasolCells_beta = 0, const float parasolCells_tau = 0, const float parasolCells_k = 7, const float amacrinCellsTemporalCutFrequency = 1.2, const float V0CompressionParameter = 0.95, const float localAdaptintegration_tau = 0, const float localAdaptintegration_k = 7 )
Set parameters values for the Inner Plexiform Layer (IPL) magnocellular channel this channel processes signals output from OPL processing stage in peripheral vision, it allows motion information enhancement. It is decorrelated from the details channel. See reference papers for more details.
:param normaliseOutput: specifies if (true) output is rescaled between 0 and 255 of not (false)
:param parasolCells_beta: the low pass filter gain used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), typical value is 0
:param parasolCells_tau: the low pass filter time constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is frame, typical value is 0 (immediate response)
:param parasolCells_k: the low pass filter spatial constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is pixels, typical value is 5
:param amacrinCellsTemporalCutFrequency: the time constant of the first order high pass fiter of the magnocellular way (motion information channel), unit is frames, typical value is 1.2
:param V0CompressionParameter: the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.95
:param localAdaptintegration_tau: specifies the temporal constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation
:param localAdaptintegration_k: specifies the spatial constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation
Retina::setupOPLandIPLParvoChannel
++++++++++++++++++++++++++++++++++
.. ocv:function:: void Retina::setupOPLandIPLParvoChannel(const bool colorMode = true, const bool normaliseOutput = true, const float photoreceptorsLocalAdaptationSensitivity = 0.7, const float photoreceptorsTemporalConstant = 0.5, const float photoreceptorsSpatialConstant = 0.53, const float horizontalCellsGain = 0, const float HcellsTemporalConstant = 1, const float HcellsSpatialConstant = 7, const float ganglionCellsSensitivity = 0.7 )
Setup the OPL and IPL parvo channels (see biologocal model) OPL is referred as Outer Plexiform Layer of the retina, it allows the spatio-temporal filtering which withens the spectrum and reduces spatio-temporal noise while attenuating global luminance (low frequency energy) IPL parvo is the OPL next processing stage, it refers to a part of the Inner Plexiform layer of the retina, it allows high contours sensitivity in foveal vision. See reference papers for more informations.
:param colorMode: specifies if (true) color is processed of not (false) to then processing gray level image
:param normaliseOutput: specifies if (true) output is rescaled between 0 and 255 of not (false)
:param photoreceptorsLocalAdaptationSensitivity: the photoreceptors sensitivity renage is 0-1 (more log compression effect when value increases)
:param photoreceptorsTemporalConstant: the time constant of the first order low pass filter of the photoreceptors, use it to cut high temporal frequencies (noise or fast motion), unit is frames, typical value is 1 frame
:param photoreceptorsSpatialConstant: the spatial constant of the first order low pass filter of the photoreceptors, use it to cut high spatial frequencies (noise or thick contours), unit is pixels, typical value is 1 pixel
:param horizontalCellsGain: gain of the horizontal cells network, if 0, then the mean value of the output is zero, if the parameter is near 1, then, the luminance is not filtered and is still reachable at the output, typicall value is 0
:param HcellsTemporalConstant: the time constant of the first order low pass filter of the horizontal cells, use it to cut low temporal frequencies (local luminance variations), unit is frames, typical value is 1 frame, as the photoreceptors
:param HcellsSpatialConstant: the spatial constant of the first order low pass filter of the horizontal cells, use it to cut low spatial frequencies (local luminance), unit is pixels, typical value is 5 pixel, this value is also used for local contrast computing when computing the local contrast adaptation at the ganglion cells level (Inner Plexiform Layer parvocellular channel model)
:param ganglionCellsSensitivity: the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.7
Retina::RetinaParameters
========================
.. ocv:struct:: Retina::RetinaParameters
This structure merges all the parameters that can be adjusted threw the **Retina::setup()**, **Retina::setupOPLandIPLParvoChannel** and **Retina::setupIPLMagnoChannel** setup methods
Parameters structure for better clarity, check explenations on the comments of methods : setupOPLandIPLParvoChannel and setupIPLMagnoChannel. ::
class RetinaParameters{
struct OPLandIplParvoParameters{ // Outer Plexiform Layer (OPL) and Inner Plexiform Layer Parvocellular (IplParvo) parameters
OPLandIplParvoParameters():colorMode(true),
normaliseOutput(true), // specifies if (true) output is rescaled between 0 and 255 of not (false)
photoreceptorsLocalAdaptationSensitivity(0.7f), // the photoreceptors sensitivity renage is 0-1 (more log compression effect when value increases)
photoreceptorsTemporalConstant(0.5f),// the time constant of the first order low pass filter of the photoreceptors, use it to cut high temporal frequencies (noise or fast motion), unit is frames, typical value is 1 frame
photoreceptorsSpatialConstant(0.53f),// the spatial constant of the first order low pass filter of the photoreceptors, use it to cut high spatial frequencies (noise or thick contours), unit is pixels, typical value is 1 pixel
horizontalCellsGain(0.0f),//gain of the horizontal cells network, if 0, then the mean value of the output is zero, if the parameter is near 1, then, the luminance is not filtered and is still reachable at the output, typicall value is 0
hcellsTemporalConstant(1.f),// the time constant of the first order low pass filter of the horizontal cells, use it to cut low temporal frequencies (local luminance variations), unit is frames, typical value is 1 frame, as the photoreceptors. Reduce to 0.5 to limit retina after effects.
hcellsSpatialConstant(7.f),//the spatial constant of the first order low pass filter of the horizontal cells, use it to cut low spatial frequencies (local luminance), unit is pixels, typical value is 5 pixel, this value is also used for local contrast computing when computing the local contrast adaptation at the ganglion cells level (Inner Plexiform Layer parvocellular channel model)
ganglionCellsSensitivity(0.7f)//the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.7
{};// default setup
bool colorMode, normaliseOutput;
float photoreceptorsLocalAdaptationSensitivity, photoreceptorsTemporalConstant, photoreceptorsSpatialConstant, horizontalCellsGain, hcellsTemporalConstant, hcellsSpatialConstant, ganglionCellsSensitivity;
};
struct IplMagnoParameters{ // Inner Plexiform Layer Magnocellular channel (IplMagno)
IplMagnoParameters():
normaliseOutput(true), //specifies if (true) output is rescaled between 0 and 255 of not (false)
parasolCells_beta(0.f), // the low pass filter gain used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), typical value is 0
parasolCells_tau(0.f), //the low pass filter time constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is frame, typical value is 0 (immediate response)
parasolCells_k(7.f), //the low pass filter spatial constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is pixels, typical value is 5
amacrinCellsTemporalCutFrequency(1.2f), //the time constant of the first order high pass fiter of the magnocellular way (motion information channel), unit is frames, typical value is 1.2
V0CompressionParameter(0.95f), the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.95
localAdaptintegration_tau(0.f), // specifies the temporal constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation
localAdaptintegration_k(7.f) // specifies the spatial constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation
{};// default setup
bool normaliseOutput;
float parasolCells_beta, parasolCells_tau, parasolCells_k, amacrinCellsTemporalCutFrequency, V0CompressionParameter, localAdaptintegration_tau, localAdaptintegration_k;
};
struct OPLandIplParvoParameters OPLandIplParvo;
struct IplMagnoParameters IplMagno;
};
Retina parameters files examples
++++++++++++++++++++++++++++++++
Here is the default configuration file of the retina module. It gives results such as the first retina output shown on the top of this page.
.. code-block:: cpp
<?xml version="1.0"?>
<opencv_storage>
<OPLandIPLparvo>
<colorMode>1</colorMode>
<normaliseOutput>1</normaliseOutput>
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity>
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant>
<photoreceptorsSpatialConstant>5.3e-01</photoreceptorsSpatialConstant>
<horizontalCellsGain>0.01</horizontalCellsGain>
<hcellsTemporalConstant>0.5</hcellsTemporalConstant>
<hcellsSpatialConstant>7.</hcellsSpatialConstant>
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo>
<IPLmagno>
<normaliseOutput>1</normaliseOutput>
<parasolCells_beta>0.</parasolCells_beta>
<parasolCells_tau>0.</parasolCells_tau>
<parasolCells_k>7.</parasolCells_k>
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency>
<V0CompressionParameter>9.5e-01</V0CompressionParameter>
<localAdaptintegration_tau>0.</localAdaptintegration_tau>
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno>
</opencv_storage>
Here is the 'realistic" setup used to obtain the second retina output shown on the top of this page.
.. code-block:: cpp
<?xml version="1.0"?>
<opencv_storage>
<OPLandIPLparvo>
<colorMode>1</colorMode>
<normaliseOutput>1</normaliseOutput>
<photoreceptorsLocalAdaptationSensitivity>8.9e-01</photoreceptorsLocalAdaptationSensitivity>
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant>
<photoreceptorsSpatialConstant>5.3e-01</photoreceptorsSpatialConstant>
<horizontalCellsGain>0.3</horizontalCellsGain>
<hcellsTemporalConstant>0.5</hcellsTemporalConstant>
<hcellsSpatialConstant>7.</hcellsSpatialConstant>
<ganglionCellsSensitivity>8.9e-01</ganglionCellsSensitivity></OPLandIPLparvo>
<IPLmagno>
<normaliseOutput>1</normaliseOutput>
<parasolCells_beta>0.</parasolCells_beta>
<parasolCells_tau>0.</parasolCells_tau>
<parasolCells_k>7.</parasolCells_k>
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency>
<V0CompressionParameter>9.5e-01</V0CompressionParameter>
<localAdaptintegration_tau>0.</localAdaptintegration_tau>
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno>
</opencv_storage>

@ -1,202 +0,0 @@
Custom Calibration Pattern
==========================
.. highlight:: cpp
CustomPattern
-------------
A custom pattern class that can be used to calibrate a camera and to further track the translation and rotation of the pattern. Defaultly it uses an ``ORB`` feature detector and a ``BruteForce-Hamming(2)`` descriptor matcher to find the location of the pattern feature points that will subsequently be used for calibration.
.. ocv:class:: CustomPattern : public Algorithm
CustomPattern::CustomPattern
----------------------------
CustomPattern constructor.
.. ocv:function:: CustomPattern()
CustomPattern::create
---------------------
A method that initializes the class and generates the necessary detectors, extractors and matchers.
.. ocv:function:: bool create(InputArray pattern, const Size2f boardSize, OutputArray output = noArray())
:param pattern: The image, which will be used as a pattern. If the desired pattern is part of a bigger image, you can crop it out using image(roi).
:param boardSize: The size of the pattern in physical dimensions. These will be used to scale the points when the calibration occurs.
:param output: A matrix that is the same as the input pattern image, but has all the feature points drawn on it.
:return Returns whether the initialization was successful or not. Possible reason for failure may be that no feature points were detected.
.. seealso::
:ocv:func:`getFeatureDetector`,
:ocv:func:`getDescriptorExtractor`,
:ocv:func:`getDescriptorMatcher`
.. note::
* Determine the number of detected feature points can be done through :ocv:func:`getPatternPoints` method.
* The feature detector, extractor and matcher cannot be changed after initialization.
CustomPattern::findPattern
--------------------------
Finds the pattern in the input image
.. ocv:function:: bool findPattern(InputArray image, OutputArray matched_features, OutputArray pattern_points, const double ratio = 0.7, const double proj_error = 8.0, const bool refine_position = false, OutputArray out = noArray(), OutputArray H = noArray(), OutputArray pattern_corners = noArray());
:param image: The input image where the pattern is searched for.
:param matched_features: A ``vector<Point2f>`` of the projections of calibration pattern points, matched in the image. The points correspond to the ``pattern_points``.``matched_features`` and ``pattern_points`` have the same size.
:param pattern_points: A ``vector<Point3f>`` of calibration pattern points in the calibration pattern coordinate space.
:param ratio: A ratio used to threshold matches based on D. Lowe's point ratio test.
:param proj_error: The maximum projection error that is allowed when the found points are back projected. A lower projection error will be beneficial for eliminating mismatches. Higher values are recommended when the camera lens has greater distortions.
:param refine_position: Whether to refine the position of the feature points with :ocv:func:`cornerSubPix`.
:param out: An image showing the matched feature points and a contour around the estimated pattern.
:param H: The homography transformation matrix between the pattern and the current image.
:param pattern_corners: A ``vector<Point2f>`` containing the 4 corners of the found pattern.
:return The method return whether the pattern was found or not.
CustomPattern::isInitialized
----------------------------
.. ocv:function:: bool isInitialized()
:return If the class is initialized or not.
CustomPattern::getPatternPoints
-------------------------------
.. ocv:function:: void getPatternPoints(OutputArray original_points)
:param original_points: Fills the vector with the points found in the pattern.
CustomPattern::getPixelSize
---------------------------
.. ocv:function:: double getPixelSize()
:return Get the physical pixel size as initialized by the pattern.
CustomPattern::setFeatureDetector
---------------------------------
.. ocv:function:: bool setFeatureDetector(Ptr<FeatureDetector> featureDetector)
:param featureDetector: Set a new FeatureDetector.
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`.
.. note::
* It is left to user discretion to select matching feature detector, extractor and matchers. Please consult the documentation for each to confirm coherence.
CustomPattern::setDescriptorExtractor
-------------------------------------
.. ocv:function:: bool setDescriptorExtractor(Ptr<DescriptorExtractor> extractor)
:param extractor: Set a new DescriptorExtractor.
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`.
CustomPattern::setDescriptorMatcher
-----------------------------------
.. ocv:function:: bool setDescriptorMatcher(Ptr<DescriptorMatcher> matcher)
:param matcher: Set a new DescriptorMatcher.
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`.
CustomPattern::getFeatureDetector
---------------------------------
.. ocv:function:: Ptr<FeatureDetector> getFeatureDetector()
:return The used FeatureDetector.
CustomPattern::getDescriptorExtractor
-------------------------------------
.. ocv:function:: Ptr<DescriptorExtractor> getDescriptorExtractor()
:return The used DescriptorExtractor.
CustomPattern::getDescriptorMatcher
-----------------------------------
.. ocv:function:: Ptr<DescriptorMatcher> getDescriptorMatcher()
:return The used DescriptorMatcher.
CustomPattern::calibrate
------------------------
Calibrates the camera.
.. ocv:function:: double calibrate(InputArrayOfArrays objectPoints, InputArrayOfArrays imagePoints, Size imageSize, InputOutputArray cameraMatrix, InputOutputArray distCoeffs, OutputArrayOfArrays rvecs, OutputArrayOfArrays tvecs, int flags = 0, TermCriteria criteria = TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 30, DBL_EPSILON))
See :ocv:func:`calibrateCamera` for parameter information.
CustomPattern::findRt
---------------------
Finds the rotation and translation vectors of the pattern.
.. ocv:function:: bool findRt(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = ITERATIVE)
.. ocv:function:: bool findRt(InputArray image, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = ITERATIVE)
:param image: The image, in which the rotation and translation of the pattern will be found.
See :ocv:func:`solvePnP` for parameter information.
CustomPattern::findRtRANSAC
---------------------------
Finds the rotation and translation vectors of the pattern using RANSAC.
.. ocv:function:: bool findRtRANSAC(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100, float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = ITERATIVE)
.. ocv:function:: bool findRtRANSAC(InputArray image, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100, float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = ITERATIVE)
:param image: The image, in which the rotation and translation of the pattern will be found.
See :ocv:func:`solvePnPRANSAC` for parameter information.
CustomPattern::drawOrientation
------------------------------
Draws the ``(x,y,z)`` axis on the image, in the center of the pattern, showing the orientation of the pattern.
.. ocv:function:: void drawOrientation(InputOutputArray image, InputArray tvec, InputArray rvec, InputArray cameraMatrix, InputArray distCoeffs, double axis_length = 3, int axis_width = 2)
:param image: The image, based on which the rotation and translation was calculated. The axis will be drawn in color - ``x`` - in red, ``y`` - in green, ``z`` - in blue.
:param tvec: Translation vector.
:param rvec: Rotation vector.
:param cameraMatrix: The camera matrix.
:param distCoeffs: The distortion coefficients.
:param axis_length: The length of the axis symbol.
:param axis_width: The width of the axis symbol.

@ -1,11 +0,0 @@
*********************************************************************
cvv. GUI for Interactive Visual Debugging of Computer Vision Programs
*********************************************************************
The module provides an interactive GUI to debug and incrementally design computer vision algorithms. The debug statements can remain in the code after development and aid in further changes because they have neglectable overhead if the program is compiled in release mode.
.. toctree::
:maxdepth: 2
CVV API Documentation <cvv_api>
CVV GUI Documentation <cvv_gui>

@ -1,85 +0,0 @@
CVV : the API
*************
.. highlight:: cpp
Introduction
++++++++++++
Namespace for all functions is **cvv**, i.e. *cvv::showImage()*.
Compilation:
* For development, i.e. for cvv GUI to show up, compile your code using cvv with *g++ -DCVVISUAL_DEBUGMODE*.
* For release, i.e. cvv calls doing nothing, compile your code without above flag.
See cvv tutorial for a commented example application using cvv.
API Functions
+++++++++++++
showImage
---------
Add a single image to debug GUI (similar to :imshow:`imshow <>`).
.. ocv:function:: void showImage(InputArray img, const CallMetaData& metaData, const string& description, const string& view)
:param img: Image to show in debug GUI.
:param metaData: Properly initialized CallMetaData struct, i.e. information about file, line and function name for GUI. Use CVVISUAL_LOCATION macro.
:param description: Human readable description to provide context to image.
:param view: Preselect view that will be used to visualize this image in GUI. Other views can still be selected in GUI later on.
debugFilter
-----------
Add two images to debug GUI for comparison. Usually the input and output of some filter operation, whose result should be inspected.
.. ocv:function:: void debugFilter(InputArray original, InputArray result, const CallMetaData& metaData, const string& description, const string& view)
:param original: First image for comparison, e.g. filter input.
:param result: Second image for comparison, e.g. filter output.
:param metaData: See :ocv:func:`showImage`
:param description: See :ocv:func:`showImage`
:param view: See :ocv:func:`showImage`
debugDMatch
-----------
Add a filled in :basicstructures:`DMatch <dmatch>` to debug GUI. The matches can are visualized for interactive inspection in different GUI views (one similar to an interactive :draw_matches:`drawMatches<>`).
.. ocv:function:: void debugDMatch(InputArray img1, std::vector<cv::KeyPoint> keypoints1, InputArray img2, std::vector<cv::KeyPoint> keypoints2, std::vector<cv::DMatch> matches, const CallMetaData& metaData, const string& description, const string& view, bool useTrainDescriptor)
:param img1: First image used in :basicstructures:`DMatch <dmatch>`.
:param keypoints1: Keypoints of first image.
:param img2: Second image used in DMatch.
:param keypoints2: Keypoints of second image.
:param metaData: See :ocv:func:`showImage`
:param description: See :ocv:func:`showImage`
:param view: See :ocv:func:`showImage`
:param useTrainDescriptor: Use :basicstructures:`DMatch <dmatch>`'s train descriptor index instead of query descriptor index.
finalShow
---------
This function **must** be called *once* *after* all cvv calls if any.
As an alternative create an instance of FinalShowCaller, which calls finalShow() in its destructor (RAII-style).
.. ocv:function:: void finalShow()
setDebugFlag
------------
Enable or disable cvv for current translation unit and thread (disabled this way has higher - but still low - overhead compared to using the compile flags).
.. ocv:function:: void setDebugFlag(bool active)
:param active: See above

@ -1,24 +0,0 @@
CVV : the GUI
*************
.. highlight:: cpp
Introduction
++++++++++++
For now: See cvv tutorial.
Overview
++++++++
Filter
------
Views
++++++++

@ -1,36 +0,0 @@
HMDB: A Large Human Motion Database
===================================
.. ocv:class:: AR_hmdb
Implements loading dataset:
_`"HMDB: A Large Human Motion Database"`: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/
.. note:: Usage
1. From link above download dataset files: hmdb51_org.rar & test_train_splits.rar.
2. Unpack them. Unpack all archives from directory: hmdb51_org/ and remove them.
3. To load data run: ./opencv/build/bin/example_datasets_ar_hmdb -p=/home/user/path_to_unpacked_folders/
Benchmark
"""""""""
For this dataset was implemented benchmark with accuracy: 0.107407 (using precomputed HOG/HOF "STIP" features from site, averaging for 3 splits)
To run this benchmark execute:
.. code-block:: bash
./opencv/build/bin/example_datasets_ar_hmdb_benchmark -p=/home/user/path_to_unpacked_folders/
(precomputed features should be unpacked in the same folder: /home/user/path_to_unpacked_folders/hmdb51_org_stips/. Also unpack all archives from directory: hmdb51_org_stips/ and remove them.)
**References:**
.. [Kuehne11] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A Large Video Database for Human Motion Recognition. ICCV, 2011
.. [Laptev08] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning Realistic Human Actions From Movies. CVPR, 2008

@ -1,18 +0,0 @@
Sports-1M Dataset
=================
.. ocv:class:: AR_sports
Implements loading dataset:
_`"Sports-1M Dataset"`: http://cs.stanford.edu/people/karpathy/deepvideo/
.. note:: Usage
1. From link above download dataset files (git clone https://code.google.com/p/sports-1m-dataset/).
2. To load data run: ./opencv/build/bin/example_datasets_ar_sports -p=/home/user/path_to_downloaded_folders/
**References:**
.. [KarpathyCVPR14] Andrej Karpathy and George Toderici and Sanketh Shetty and Thomas Leung and Rahul Sukthankar and Li Fei-Fei. Large-scale Video Classification with Convolutional Neural Networks. CVPR, 2014

@ -1,121 +0,0 @@
*******************************************************
datasets. Framework for working with different datasets
*******************************************************
.. highlight:: cpp
The datasets module includes classes for working with different datasets: load data, evaluate different algorithms on them, contains benchmarks, etc.
It is planned to have:
* basic: loading code for all datasets to help start work with them.
* next stage: quick benchmarks for all datasets to show how to solve them using OpenCV and implement evaluation code.
* finally: implement on OpenCV state-of-the-art algorithms, which solve these tasks.
.. toctree::
:hidden:
ar_hmdb
ar_sports
fr_adience
fr_lfw
gr_chalearn
gr_skig
hpe_humaneva
hpe_parse
ir_affine
ir_robot
is_bsds
is_weizmann
msm_epfl
msm_middlebury
or_imagenet
or_mnist
or_sun
pd_caltech
slam_kitti
slam_tumindoor
tr_chars
tr_svt
Action Recognition
------------------
:doc:`ar_hmdb` [#f1]_
:doc:`ar_sports`
Face Recognition
----------------
:doc:`fr_adience`
:doc:`fr_lfw` [#f1]_
Gesture Recognition
-------------------
:doc:`gr_chalearn`
:doc:`gr_skig`
Human Pose Estimation
---------------------
:doc:`hpe_humaneva`
:doc:`hpe_parse`
Image Registration
------------------
:doc:`ir_affine`
:doc:`ir_robot`
Image Segmentation
------------------
:doc:`is_bsds`
:doc:`is_weizmann`
Multiview Stereo Matching
-------------------------
:doc:`msm_epfl`
:doc:`msm_middlebury`
Object Recognition
------------------
:doc:`or_imagenet`
:doc:`or_mnist` [#f2]_
:doc:`or_sun`
Pedestrian Detection
--------------------
:doc:`pd_caltech` [#f2]_
SLAM
----
:doc:`slam_kitti`
:doc:`slam_tumindoor`
Text Recognition
----------------
:doc:`tr_chars`
:doc:`tr_svt` [#f1]_
*Footnotes*
.. [#f1] Benchmark implemented
.. [#f2] Not used in Vision Challenge

@ -1,20 +0,0 @@
Adience
=======
.. ocv:class:: FR_adience
Implements loading dataset:
_`"Adience"`: http://www.openu.ac.il/home/hassner/Adience/data.html
.. note:: Usage
1. From link above download any dataset file: faces.tar.gz\\aligned.tar.gz and files with splits: fold_0_data.txt-fold_4_data.txt, fold_frontal_0_data.txt-fold_frontal_4_data.txt. (For face recognition task another splits should be created)
2. Unpack dataset file to some folder and place split files into the same folder.
3. To load data run: ./opencv/build/bin/example_datasets_fr_adience -p=/home/user/path_to_created_folder/
**References:**
.. [Eidinger] E. Eidinger, R. Enbar, and T. Hassner. Age and Gender Estimation of Unfiltered Faces

@ -1,31 +0,0 @@
Labeled Faces in the Wild
=========================
.. ocv:class:: FR_lfw
Implements loading dataset:
_`"Labeled Faces in the Wild"`: http://vis-www.cs.umass.edu/lfw/
.. note:: Usage
1. From link above download any dataset file: lfw.tgz\\lfwa.tar.gz\\lfw-deepfunneled.tgz\\lfw-funneled.tgz and files with pairs: 10 test splits: pairs.txt and developer train split: pairsDevTrain.txt.
2. Unpack dataset file and place pairs.txt and pairsDevTrain.txt in created folder.
3. To load data run: ./opencv/build/bin/example_datasets_fr_lfw -p=/home/user/path_to_unpacked_folder/lfw2/
Benchmark
"""""""""
For this dataset was implemented benchmark with accuracy: 0.623833 +- 0.005223 (train split: pairsDevTrain.txt, dataset: lfwa)
To run this benchmark execute:
.. code-block:: bash
./opencv/build/bin/example_datasets_fr_lfw_benchmark -p=/home/user/path_to_unpacked_folder/lfw2/
**References:**
.. [Huang07] G.B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. 2007

@ -1,22 +0,0 @@
ChaLearn Looking at People
==========================
.. ocv:class:: GR_chalearn
Implements loading dataset:
_`"ChaLearn Looking at People"`: http://gesture.chalearn.org/
.. note:: Usage
1. Follow instruction from site above, download files for dataset "Track 3: Gesture Recognition": Train1.zip-Train5.zip, Validation1.zip-Validation3.zip (Register on site: www.codalab.org and accept the terms and conditions of competition: https://www.codalab.org/competitions/991#learn_the_details There are three mirrors for downloading dataset files. When I downloaded data only mirror: "Universitat Oberta de Catalunya" works).
2. Unpack train archives Train1.zip-Train5.zip to folder Train/, validation archives Validation1.zip-Validation3.zip to folder Validation/
3. Unpack all archives in Train/ & Validation/ in the folders with the same names, for example: Sample0001.zip to Sample0001/
4. To load data run: ./opencv/build/bin/example_datasets_gr_chalearn -p=/home/user/path_to_unpacked_folders/
**References:**
.. [Escalera14] S. Escalera, X. Baró, J. Gonzàlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce-López, H.J. Escalante, J. Shotton, I. Guyon, "ChaLearn Looking at People Challenge 2014: Dataset and Results", ECCV Workshops, 2014

@ -1,20 +0,0 @@
Sheffield Kinect Gesture Dataset
================================
.. ocv:class:: GR_skig
Implements loading dataset:
_`"Sheffield Kinect Gesture Dataset"`: http://lshao.staff.shef.ac.uk/data/SheffieldKinectGesture.htm
.. note:: Usage
1. From link above download dataset files: subject1_dep.7z-subject6_dep.7z, subject1_rgb.7z-subject6_rgb.7z.
2. Unpack them.
3. To load data run: ./opencv/build/bin/example_datasets_gr_skig -p=/home/user/path_to_unpacked_folders/
**References:**
.. [Liu13] L. Liu and L. Shao, “Learning Discriminative Representations from RGB-D Video Data”, In Proc. International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013.

@ -1,22 +0,0 @@
HumanEva Dataset
================
.. ocv:class:: HPE_humaneva
Implements loading dataset:
_`"HumanEva Dataset"`: http://humaneva.is.tue.mpg.de
.. note:: Usage
1. From link above download dataset files for HumanEva-I (tar) & HumanEva-II.
2. Unpack them to HumanEva_1 & HumanEva_2 accordingly.
3. To load data run: ./opencv/build/bin/example_datasets_hpe_humaneva -p=/home/user/path_to_unpacked_folders/
**References:**
.. [Sigal10] L. Sigal, A. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion, In International Journal of Computer Vision, Vol. 87 (1-2), 2010
.. [Sigal06] L. Sigal and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, Techniacl Report CS-06-08, Brown University, 2006

@ -1,20 +0,0 @@
PARSE Dataset
=============
.. ocv:class:: HPE_parse
Implements loading dataset:
_`"PARSE Dataset"`: http://www.ics.uci.edu/~dramanan/papers/parse/
.. note:: Usage
1. From link above download dataset file: people.zip.
2. Unpack it.
3. To load data run: ./opencv/build/bin/example_datasets_hpe_parse -p=/home/user/path_to_unpacked_folder/people_all/
**References:**
.. [Ramanan06] D. Ramanan "Learning to Parse Images of Articulated Bodies." Neural Info. Proc. Systems (NIPS) To appear. Dec 2006.

@ -1,20 +0,0 @@
Affine Covariant Regions Datasets
=================================
.. ocv:class:: IR_affine
Implements loading dataset:
_`"Affine Covariant Regions Datasets"`: http://www.robots.ox.ac.uk/~vgg/data/data-aff.html
.. note:: Usage
1. From link above download dataset files: bark\\bikes\\boat\\graf\\leuven\\trees\\ubc\\wall.tar.gz.
2. Unpack them.
3. To load data, for example, for "bark", run: ./opencv/build/bin/example_datasets_ir_affine -p=/home/user/path_to_unpacked_folder/bark/
**References:**
.. [Mikolajczyk05] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool. A Comparison of Affine Region Detectors. International Journal of Computer Vision, Volume 65, Number 1/2, page 43--72, 2005

@ -1,20 +0,0 @@
Robot Data Set
==============
.. ocv:class:: IR_robot
Implements loading dataset:
_`"Robot Data Set, Point Feature Data Set – 2010"`: http://roboimagedata.compute.dtu.dk/?page_id=24
.. note:: Usage
1. From link above download dataset files: SET001_6.tar.gz-SET055_60.tar.gz
2. Unpack them to one folder.
3. To load data run: ./opencv/build/bin/example_datasets_ir_robot -p=/home/user/path_to_unpacked_folder/
**References:**
.. [aanæsinteresting] Aan{\ae}s, H. and Dahl, A.L. and Steenstrup Pedersen, K. Interesting Interest Points. International Journal of Computer Vision. 2012.

@ -1,20 +0,0 @@
The Berkeley Segmentation Dataset and Benchmark
===============================================
.. ocv:class:: IS_bsds
Implements loading dataset:
_`"The Berkeley Segmentation Dataset and Benchmark"`: https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/
.. note:: Usage
1. From link above download dataset files: BSDS300-human.tgz & BSDS300-images.tgz.
2. Unpack them.
3. To load data run: ./opencv/build/bin/example_datasets_is_bsds -p=/home/user/path_to_unpacked_folder/BSDS300/
**References:**
.. [MartinFTM01] D. Martin and C. Fowlkes and D. Tal and J. Malik. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. 2001

@ -1,20 +0,0 @@
Weizmann Segmentation Evaluation Database
=========================================
.. ocv:class:: IS_weizmann
Implements loading dataset:
_`"Weizmann Segmentation Evaluation Database"`: http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/
.. note:: Usage
1. From link above download dataset files: Weizmann_Seg_DB_1obj.ZIP & Weizmann_Seg_DB_2obj.ZIP.
2. Unpack them.
3. To load data, for example, for 1 object dataset, run: ./opencv/build/bin/example_datasets_is_weizmann -p=/home/user/path_to_unpacked_folder/1obj/
**References:**
.. [AlpertGBB07] Sharon Alpert and Meirav Galun and Ronen Basri and Achi Brandt. Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration. 2007

@ -1,20 +0,0 @@
EPFL Multi-View Stereo
======================
.. ocv:class:: MSM_epfl
Implements loading dataset:
_`"EPFL Multi-View Stereo"`: http://cvlabwww.epfl.ch/~strecha/multiview/denseMVS.html
.. note:: Usage
1. From link above download dataset files: castle_dense\\castle_dense_large\\castle_entry\\fountain\\herzjesu_dense\\herzjesu_dense_large_bounding\\cameras\\images\\p.tar.gz.
2. Unpack them in separate folder for each object. For example, for "fountain", in folder fountain/ : fountain_dense_bounding.tar.gz -> bounding/, fountain_dense_cameras.tar.gz -> camera/, fountain_dense_images.tar.gz -> png/, fountain_dense_p.tar.gz -> P/
3. To load data, for example, for "fountain", run: ./opencv/build/bin/example_datasets_msm_epfl -p=/home/user/path_to_unpacked_folder/fountain/
**References:**
.. [Strecha08] C. Strecha, W. von Hansen, L. Van Gool, P. Fua, U. Thoennessen. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. CVPR, 2008

@ -1,20 +0,0 @@
Stereo – Middlebury Computer Vision
===================================
.. ocv:class:: MSM_middlebury
Implements loading dataset:
_`"Stereo – Middlebury Computer Vision"`: http://vision.middlebury.edu/mview/
.. note:: Usage
1. From link above download dataset files: dino\\dinoRing\\dinoSparseRing\\temple\\templeRing\\templeSparseRing.zip
2. Unpack them.
3. To load data, for example "temple" dataset, run: ./opencv/build/bin/example_datasets_msm_middlebury -p=/home/user/path_to_unpacked_folder/temple/
**References:**
.. [Seitz06] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms, CVPR, 2006

@ -1,39 +0,0 @@
ImageNet
========
.. ocv:class:: OR_imagenet
Implements loading dataset:
_`"ImageNet"`: http://www.image-net.org/
.. note:: Usage
1. From link above download dataset files: ILSVRC2010_images_train.tar\\ILSVRC2010_images_test.tar\\ILSVRC2010_images_val.tar & devkit: ILSVRC2010_devkit-1.0.tar.gz (Implemented loading of 2010 dataset as only this dataset has ground truth for test data, but structure for ILSVRC2014 is similar)
2. Unpack them to: some_folder/train/\\some_folder/test/\\some_folder/val & some_folder/ILSVRC2010_validation_ground_truth.txt\\some_folder/ILSVRC2010_test_ground_truth.txt.
3. Create file with labels: some_folder/labels.txt, for example, using :ref:`python script <python-script>` below (each file's row format: synset,labelID,description. For example: "n07751451,18,plum").
4. Unpack all tar files in train.
5. To load data run: ./opencv/build/bin/example_datasets_or_imagenet -p=/home/user/some_folder/
.. _python-script:
Python script to parse meta.mat:
::
import scipy.io
meta_mat = scipy.io.loadmat("devkit-1.0/data/meta.mat")
labels_dic = dict((m[0][1][0], m[0][0][0][0]-1) for m in meta_mat['synsets']
label_names_dic = dict((m[0][1][0], m[0][2][0]) for m in meta_mat['synsets']
for label in labels_dic.keys():
print "{0},{1},{2}".format(label, labels_dic[label], label_names_dic[label])
**References:**
.. [ILSVRCarxiv14] Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. 2014

@ -1,20 +0,0 @@
MNIST
=====
.. ocv:class:: OR_mnist
Implements loading dataset:
_`"MNIST"`: http://yann.lecun.com/exdb/mnist/
.. note:: Usage
1. From link above download dataset files: t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz.
2. Unpack them.
3. To load data run: ./opencv/build/bin/example_datasets_or_mnist -p=/home/user/path_to_unpacked_files/
**References:**
.. [LeCun98a] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.

@ -1,22 +0,0 @@
SUN Database
============
.. ocv:class:: OR_sun
Implements loading dataset:
_`"SUN Database, Scene Recognition Benchmark. SUN397"`: http://vision.cs.princeton.edu/projects/2010/SUN/
.. note:: Usage
1. From link above download dataset file: SUN397.tar & file with splits: Partitions.zip
2. Unpack SUN397.tar into folder: SUN397/ & Partitions.zip into folder: SUN397/Partitions/
3. To load data run: ./opencv/build/bin/example_datasets_or_sun -p=/home/user/path_to_unpacked_files/SUN397/
**References:**
.. [Xiao10] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN Database: Large-scale Scene Recognition from Abbey to Zoo. IEEE Conference on Computer Vision and Pattern Recognition. CVPR, 2010
.. [Xiao14] J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva. SUN Database: Exploring a Large Collection of Scene Categories. International Journal of Computer Vision. IJCV, 2014

@ -1,29 +0,0 @@
Caltech Pedestrian Detection Benchmark
======================================
.. ocv:class:: PD_caltech
Implements loading dataset:
_`"Caltech Pedestrian Detection Benchmark"`: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/
.. note:: First version of Caltech Pedestrian dataset loading.
Code to unpack all frames from seq files commented as their number is huge!
So currently load only meta information without data.
Also ground truth isn't processed, as need to convert it from mat files first.
.. note:: Usage
1. From link above download dataset files: set00.tar-set10.tar.
2. Unpack them to separate folder.
3. To load data run: ./opencv/build/bin/example_datasets_pd_caltech -p=/home/user/path_to_unpacked_folders/
**References:**
.. [Dollár12] P. Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. PAMI, 2012.
.. [DollárCVPR09] P. Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: A Benchmark. CVPR, 2009

@ -1,24 +0,0 @@
KITTI Vision Benchmark
======================
.. ocv:class:: SLAM_kitti
Implements loading dataset:
_`"KITTI Vision Benchmark"`: http://www.cvlibs.net/datasets/kitti/eval_odometry.php
.. note:: Usage
1. From link above download "Odometry" dataset files: data_odometry_gray\\data_odometry_color\\data_odometry_velodyne\\data_odometry_poses\\data_odometry_calib.zip.
2. Unpack data_odometry_poses.zip, it creates folder dataset/poses/. After that unpack data_odometry_gray.zip, data_odometry_color.zip, data_odometry_velodyne.zip. Folder dataset/sequences/ will be created with folders 00/..21/. Each of these folders will contain: image_0/, image_1/, image_2/, image_3/, velodyne/ and files calib.txt & times.txt. These two last files will be replaced after unpacking data_odometry_calib.zip at the end.
3. To load data run: ./opencv/build/bin/example_datasets_slam_kitti -p=/home/user/path_to_unpacked_folder/dataset/
**References:**
.. [Geiger2012CVPR] Andreas Geiger and Philip Lenz and Raquel Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. CVPR, 2012
.. [Geiger2013IJRR] Andreas Geiger and Philip Lenz and Christoph Stiller and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset. IJRR, 2013
.. [Fritsch2013ITSC] Jannik Fritsch and Tobias Kuehnl and Andreas Geiger. A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms. ITSC, 2013

@ -1,20 +0,0 @@
TUMindoor Dataset
=================
.. ocv:class:: SLAM_tumindoor
Implements loading dataset:
_`"TUMindoor Dataset"`: http://www.navvis.lmt.ei.tum.de/dataset/
.. note:: Usage
1. From link above download dataset files: dslr\\info\\ladybug\\pointcloud.tar.bz2 for each dataset: 11-11-28 (1st floor)\\11-12-13 (1st floor N1)\\11-12-17a (4th floor)\\11-12-17b (3rd floor)\\11-12-17c (Ground I)\\11-12-18a (Ground II)\\11-12-18b (2nd floor)
2. Unpack them in separate folder for each dataset. dslr.tar.bz2 -> dslr/, info.tar.bz2 -> info/, ladybug.tar.bz2 -> ladybug/, pointcloud.tar.bz2 -> pointcloud/.
3. To load each dataset run: ./opencv/build/bin/example_datasets_slam_tumindoor -p=/home/user/path_to_unpacked_folders/
**References:**
.. [TUMindoor] R. Huitl and G. Schroth and S. Hilsenbeck and F. Schweiger and E. Steinbach. {TUM}indoor: An Extensive Image and Point Cloud Dataset for Visual Indoor Localization and Mapping. 2012

@ -1,22 +0,0 @@
The Chars74K Dataset
====================
.. ocv:class:: TR_chars
Implements loading dataset:
_`"The Chars74K Dataset"`: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/
.. note:: Usage
1. From link above download dataset files: EnglishFnt\\EnglishHnd\\EnglishImg\\KannadaHnd\\KannadaImg.tgz, ListsTXT.tgz.
2. Unpack them.
3. Move .m files from folder ListsTXT/ to appropriate folder. For example, English/list_English_Img.m for EnglishImg.tgz.
4. To load data, for example "EnglishImg", run: ./opencv/build/bin/example_datasets_tr_chars -p=/home/user/path_to_unpacked_folder/English/
**References:**
.. [Campos09] T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2009

@ -1,33 +0,0 @@
The Street View Text Dataset
============================
.. ocv:class:: TR_svt
Implements loading dataset:
_`"The Street View Text Dataset"`: http://vision.ucsd.edu/~kai/svt/
.. note:: Usage
1. From link above download dataset file: svt.zip.
2. Unpack it.
3. To load data run: ./opencv/build/bin/example_datasets_tr_svt -p=/home/user/path_to_unpacked_folder/svt/svt1/
Benchmark
"""""""""
For this dataset was implemented benchmark with accuracy (mean f1): 0.217
To run benchmark execute:
.. code-block:: bash
./opencv/build/bin/example_datasets_tr_svt_benchmark -p=/home/user/path_to_unpacked_folders/svt/svt1/
**References:**
.. [Wang11] Kai Wang, Boris Babenko and Serge Belongie. End-to-end Scene Text Recognition. ICCV, 2011
.. [Wang10] Kai Wang and Serge Belongie. Word Spotting in the Wild. ECCV, 2010

@ -1,10 +0,0 @@
***************************************
face. Face Recognition
***************************************
The module contains some recently added functionality that has not been stabilized, or functionality that is considered optional.
.. toctree::
:maxdepth: 2
FaceRecognizer Documentation <index>

@ -1,412 +0,0 @@
FaceRecognizer
==============
.. highlight:: cpp
.. Sample code::
* An example using the FaceRecognizer class can be found at opencv_source_code/samples/cpp/facerec_demo.cpp
* (Python) An example using the FaceRecognizer class can be found at opencv_source_code/samples/python2/facerec_demo.py
FaceRecognizer
--------------
.. ocv:class:: FaceRecognizer : public Algorithm
All face recognition models in OpenCV are derived from the abstract base class :ocv:class:`FaceRecognizer`, which provides
a unified access to all face recongition algorithms in OpenCV. ::
class FaceRecognizer : public Algorithm
{
public:
//! virtual destructor
virtual ~FaceRecognizer() {}
// Trains a FaceRecognizer.
virtual void train(InputArray src, InputArray labels) = 0;
// Updates a FaceRecognizer.
virtual void update(InputArrayOfArrays src, InputArray labels);
// Gets a prediction from a FaceRecognizer.
virtual int predict(InputArray src) const = 0;
// Predicts the label and confidence for a given sample.
virtual void predict(InputArray src, int &label, double &confidence) const = 0;
// Serializes this object to a given filename.
virtual void save(const String& filename) const;
// Deserializes this object from a given filename.
virtual void load(const String& filename);
// Serializes this object to a given cv::FileStorage.
virtual void save(FileStorage& fs) const = 0;
// Deserializes this object from a given cv::FileStorage.
virtual void load(const FileStorage& fs) = 0;
// Sets additional string info for the label
virtual void setLabelInfo(int label, const String& strInfo);
// Gets string info by label
virtual String getLabelInfo(int label);
// Gets labels by string info
virtual vector<int> getLabelsByString(const String& str);
};
Description
+++++++++++
I'll go a bit more into detail explaining :ocv:class:`FaceRecognizer`, because it doesn't look like a powerful interface at first sight. But: Every :ocv:class:`FaceRecognizer` is an :ocv:class:`Algorithm`, so you can easily get/set all model internals (if allowed by the implementation). :ocv:class:`Algorithm` is a relatively new OpenCV concept, which is available since the 2.4 release. I suggest you take a look at its description.
:ocv:class:`Algorithm` provides the following features for all derived classes:
* So called “virtual constructor”. That is, each Algorithm derivative is registered at program start and you can get the list of registered algorithms and create instance of a particular algorithm by its name (see :ocv:func:`Algorithm::create`). If you plan to add your own algorithms, it is good practice to add a unique prefix to your algorithms to distinguish them from other algorithms.
* Setting/Retrieving algorithm parameters by name. If you used video capturing functionality from OpenCV highgui module, you are probably familar with :ocv:cfunc:`cvSetCaptureProperty`, :ocv:cfunc:`cvGetCaptureProperty`, :ocv:func:`VideoCapture::set` and :ocv:func:`VideoCapture::get`. :ocv:class:`Algorithm` provides similar method where instead of integer id's you specify the parameter names as text Strings. See :ocv:func:`Algorithm::set` and :ocv:func:`Algorithm::get` for details.
* Reading and writing parameters from/to XML or YAML files. Every Algorithm derivative can store all its parameters and then read them back. There is no need to re-implement it each time.
Moreover every :ocv:class:`FaceRecognizer` supports the:
* **Training** of a :ocv:class:`FaceRecognizer` with :ocv:func:`FaceRecognizer::train` on a given set of images (your face database!).
* **Prediction** of a given sample image, that means a face. The image is given as a :ocv:class:`Mat`.
* **Loading/Saving** the model state from/to a given XML or YAML.
* **Setting/Getting labels info**, that is stored as a string. String labels info is useful for keeping names of the recognized people.
.. note:: When using the FaceRecognizer interface in combination with Python, please stick to Python 2. Some underlying scripts like create_csv will not work in other versions, like Python 3.
Setting the Thresholds
+++++++++++++++++++++++
Sometimes you run into the situation, when you want to apply a threshold on the prediction. A common scenario in face recognition is to tell, whether a face belongs to the training dataset or if it is unknown. You might wonder, why there's no public API in :ocv:class:`FaceRecognizer` to set the threshold for the prediction, but rest assured: It's supported. It just means there's no generic way in an abstract class to provide an interface for setting/getting the thresholds of *every possible* :ocv:class:`FaceRecognizer` algorithm. The appropriate place to set the thresholds is in the constructor of the specific :ocv:class:`FaceRecognizer` and since every :ocv:class:`FaceRecognizer` is a :ocv:class:`Algorithm` (see above), you can get/set the thresholds at runtime!
Here is an example of setting a threshold for the Eigenfaces method, when creating the model:
.. code-block:: cpp
// Let's say we want to keep 10 Eigenfaces and have a threshold value of 10.0
int num_components = 10;
double threshold = 10.0;
// Then if you want to have a cv::FaceRecognizer with a confidence threshold,
// create the concrete implementation with the appropiate parameters:
Ptr<FaceRecognizer> model = createEigenFaceRecognizer(num_components, threshold);
Sometimes it's impossible to train the model, just to experiment with threshold values. Thanks to :ocv:class:`Algorithm` it's possible to set internal model thresholds during runtime. Let's see how we would set/get the prediction for the Eigenface model, we've created above:
.. code-block:: cpp
// The following line reads the threshold from the Eigenfaces model:
double current_threshold = model->getDouble("threshold");
// And this line sets the threshold to 0.0:
model->set("threshold", 0.0);
If you've set the threshold to ``0.0`` as we did above, then:
.. code-block:: cpp
//
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
// Get a prediction from the model. Note: We've set a threshold of 0.0 above,
// since the distance is almost always larger than 0.0, you'll get -1 as
// label, which indicates, this face is unknown
int predicted_label = model->predict(img);
// ...
is going to yield ``-1`` as predicted label, which states this face is unknown.
Getting the name of a FaceRecognizer
+++++++++++++++++++++++++++++++++++++
Since every :ocv:class:`FaceRecognizer` is a :ocv:class:`Algorithm`, you can use :ocv:func:`Algorithm::name` to get the name of a :ocv:class:`FaceRecognizer`:
.. code-block:: cpp
// Create a FaceRecognizer:
Ptr<FaceRecognizer> model = createEigenFaceRecognizer();
// And here's how to get its name:
String name = model->name();
FaceRecognizer::train
---------------------
Trains a FaceRecognizer with given data and associated labels.
.. ocv:function:: void FaceRecognizer::train( InputArrayOfArrays src, InputArray labels ) = 0
:param src: The training images, that means the faces you want to learn. The data has to be given as a ``vector<Mat>``.
:param labels: The labels corresponding to the images have to be given either as a ``vector<int>`` or a
The following source code snippet shows you how to learn a Fisherfaces model on a given set of images. The images are read with :ocv:func:`imread` and pushed into a ``std::vector<Mat>``. The labels of each image are stored within a ``std::vector<int>`` (you could also use a :ocv:class:`Mat` of type `CV_32SC1`). Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label. For the available :ocv:class:`FaceRecognizer` you don't have to pay any attention to the order of the labels, just make sure same persons have the same label:
.. code-block:: cpp
// holds images and labels
vector<Mat> images;
vector<int> labels;
// images for first person
images.push_back(imread("person0/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
images.push_back(imread("person0/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
images.push_back(imread("person0/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0);
// images for second person
images.push_back(imread("person1/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
images.push_back(imread("person1/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
images.push_back(imread("person1/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1);
Now that you have read some images, we can create a new :ocv:class:`FaceRecognizer`. In this example I'll create a Fisherfaces model and decide to keep all of the possible Fisherfaces:
.. code-block:: cpp
// Create a new Fisherfaces model and retain all available Fisherfaces,
// this is the most common usage of this specific FaceRecognizer:
//
Ptr<FaceRecognizer> model = createFisherFaceRecognizer();
And finally train it on the given dataset (the face images and labels):
.. code-block:: cpp
// This is the common interface to train all of the available cv::FaceRecognizer
// implementations:
//
model->train(images, labels);
FaceRecognizer::update
----------------------
Updates a FaceRecognizer with given data and associated labels.
.. ocv:function:: void FaceRecognizer::update( InputArrayOfArrays src, InputArray labels )
:param src: The training images, that means the faces you want to learn. The data has to be given as a ``vector<Mat>``.
:param labels: The labels corresponding to the images have to be given either as a ``vector<int>`` or a
This method updates a (probably trained) :ocv:class:`FaceRecognizer`, but only if the algorithm supports it. The Local Binary Patterns Histograms (LBPH) recognizer (see :ocv:func:`createLBPHFaceRecognizer`) can be updated. For the Eigenfaces and Fisherfaces method, this is algorithmically not possible and you have to re-estimate the model with :ocv:func:`FaceRecognizer::train`. In any case, a call to train empties the existing model and learns a new model, while update does not delete any model data.
.. code-block:: cpp
// Create a new LBPH model (it can be updated) and use the default parameters,
// this is the most common usage of this specific FaceRecognizer:
//
Ptr<FaceRecognizer> model = createLBPHFaceRecognizer();
// This is the common interface to train all of the available cv::FaceRecognizer
// implementations:
//
model->train(images, labels);
// Some containers to hold new image:
vector<Mat> newImages;
vector<int> newLabels;
// You should add some images to the containers:
//
// ...
//
// Now updating the model is as easy as calling:
model->update(newImages,newLabels);
// This will preserve the old model data and extend the existing model
// with the new features extracted from newImages!
Calling update on an Eigenfaces model (see :ocv:func:`createEigenFaceRecognizer`), which doesn't support updating, will throw an error similar to:
.. code-block:: none
OpenCV Error: The function/feature is not implemented (This FaceRecognizer (FaceRecognizer.Eigenfaces) does not support updating, you have to use FaceRecognizer::train to update it.) in update, file /home/philipp/git/opencv/modules/contrib/src/facerec.cpp, line 305
terminate called after throwing an instance of 'cv::Exception'
Please note: The :ocv:class:`FaceRecognizer` does not store your training images, because this would be very memory intense and it's not the responsibility of te :ocv:class:`FaceRecognizer` to do so. The caller is responsible for maintaining the dataset, he want to work with.
FaceRecognizer::predict
-----------------------
.. ocv:function:: int FaceRecognizer::predict( InputArray src ) const = 0
.. ocv:function:: void FaceRecognizer::predict( InputArray src, int & label, double & confidence ) const = 0
Predicts a label and associated confidence (e.g. distance) for a given input image.
:param src: Sample image to get a prediction from.
:param label: The predicted label for the given image.
:param confidence: Associated confidence (e.g. distance) for the predicted label.
The suffix ``const`` means that prediction does not affect the internal model
state, so the method can be safely called from within different threads.
The following example shows how to get a prediction from a trained model:
.. code-block:: cpp
using namespace cv;
// Do your initialization here (create the cv::FaceRecognizer model) ...
// ...
// Read in a sample image:
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
// And get a prediction from the cv::FaceRecognizer:
int predicted = model->predict(img);
Or to get a prediction and the associated confidence (e.g. distance):
.. code-block:: cpp
using namespace cv;
// Do your initialization here (create the cv::FaceRecognizer model) ...
// ...
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE);
// Some variables for the predicted label and associated confidence (e.g. distance):
int predicted_label = -1;
double predicted_confidence = 0.0;
// Get the prediction and associated confidence from the model
model->predict(img, predicted_label, predicted_confidence);
FaceRecognizer::save
--------------------
Saves a :ocv:class:`FaceRecognizer` and its model state.
.. ocv:function:: void FaceRecognizer::save(const String& filename) const
Saves this model to a given filename, either as XML or YAML.
:param filename: The filename to store this :ocv:class:`FaceRecognizer` to (either XML/YAML).
.. ocv:function:: void FaceRecognizer::save(FileStorage& fs) const
Saves this model to a given :ocv:class:`FileStorage`.
:param fs: The :ocv:class:`FileStorage` to store this :ocv:class:`FaceRecognizer` to.
Every :ocv:class:`FaceRecognizer` overwrites ``FaceRecognizer::save(FileStorage& fs)``
to save the internal model state. ``FaceRecognizer::save(const String& filename)`` saves
the state of a model to the given filename.
The suffix ``const`` means that prediction does not affect the internal model
state, so the method can be safely called from within different threads.
FaceRecognizer::load
--------------------
Loads a :ocv:class:`FaceRecognizer` and its model state.
.. ocv:function:: void FaceRecognizer::load( const String& filename )
.. ocv:function:: void FaceRecognizer::load( const FileStorage& fs ) = 0
Loads a persisted model and state from a given XML or YAML file . Every
:ocv:class:`FaceRecognizer` has to overwrite ``FaceRecognizer::load(FileStorage& fs)``
to enable loading the model state. ``FaceRecognizer::load(FileStorage& fs)`` in
turn gets called by ``FaceRecognizer::load(const String& filename)``, to ease
saving a model.
FaceRecognizer::setLabelInfo
-----------------------------
Sets string info for the specified model's label.
.. ocv:function:: void FaceRecognizer::setLabelInfo(int label, const String& strInfo)
The string info is replaced by the provided value if it was set before for the specified label.
FaceRecognizer::getLabelInfo
----------------------------
Gets string information by label.
.. ocv:function:: String FaceRecognizer::getLabelInfo(int label)
If an unknown label id is provided or there is no label information associated with the specified label id the method returns an empty string.
FaceRecognizer::getLabelsByString
---------------------------------
Gets vector of labels by string.
.. ocv:function:: vector<int> FaceRecognizer::getLabelsByString(const String& str)
The function searches for the labels containing the specified sub-string in the associated string info.
createEigenFaceRecognizer
-------------------------
.. ocv:function:: Ptr<FaceRecognizer> createEigenFaceRecognizer(int num_components = 0, double threshold = DBL_MAX)
:param num_components: The number of components (read: Eigenfaces) kept for this Principal Component Analysis. As a hint: There's no rule how many components (read: Eigenfaces) should be kept for good reconstruction capabilities. It is based on your input data, so experiment with the number. Keeping 80 components should almost always be sufficient.
:param threshold: The threshold applied in the prediction.
Notes:
++++++
* Training and prediction must be done on grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces.
* **THE EIGENFACES METHOD MAKES THE ASSUMPTION, THAT THE TRAINING AND TEST IMAGES ARE OF EQUAL SIZE.** (caps-lock, because I got so many mails asking for this). You have to make sure your input data has the correct shape, else a meaningful exception is thrown. Use :ocv:func:`resize` to resize the images.
* This model does not support updating.
Model internal data:
++++++++++++++++++++
* ``num_components`` see :ocv:func:`createEigenFaceRecognizer`.
* ``threshold`` see :ocv:func:`createEigenFaceRecognizer`.
* ``eigenvalues`` The eigenvalues for this Principal Component Analysis (ordered descending).
* ``eigenvectors`` The eigenvectors for this Principal Component Analysis (ordered by their eigenvalue).
* ``mean`` The sample mean calculated from the training data.
* ``projections`` The projections of the training data.
* ``labels`` The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1.
createFisherFaceRecognizer
--------------------------
.. ocv:function:: Ptr<FaceRecognizer> createFisherFaceRecognizer(int num_components = 0, double threshold = DBL_MAX)
:param num_components: The number of components (read: Fisherfaces) kept for this Linear Discriminant Analysis with the Fisherfaces criterion. It's useful to keep all components, that means the number of your classes ``c`` (read: subjects, persons you want to recognize). If you leave this at the default (``0``) or set it to a value less-equal ``0`` or greater ``(c-1)``, it will be set to the correct number ``(c-1)`` automatically.
:param threshold: The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1.
Notes:
++++++
* Training and prediction must be done on grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces.
* **THE FISHERFACES METHOD MAKES THE ASSUMPTION, THAT THE TRAINING AND TEST IMAGES ARE OF EQUAL SIZE.** (caps-lock, because I got so many mails asking for this). You have to make sure your input data has the correct shape, else a meaningful exception is thrown. Use :ocv:func:`resize` to resize the images.
* This model does not support updating.
Model internal data:
++++++++++++++++++++
* ``num_components`` see :ocv:func:`createFisherFaceRecognizer`.
* ``threshold`` see :ocv:func:`createFisherFaceRecognizer`.
* ``eigenvalues`` The eigenvalues for this Linear Discriminant Analysis (ordered descending).
* ``eigenvectors`` The eigenvectors for this Linear Discriminant Analysis (ordered by their eigenvalue).
* ``mean`` The sample mean calculated from the training data.
* ``projections`` The projections of the training data.
* ``labels`` The labels corresponding to the projections.
createLBPHFaceRecognizer
-------------------------
.. ocv:function:: Ptr<FaceRecognizer> createLBPHFaceRecognizer(int radius=1, int neighbors=8, int grid_x=8, int grid_y=8, double threshold = DBL_MAX)
:param radius: The radius used for building the Circular Local Binary Pattern. The greater the radius, the
:param neighbors: The number of sample points to build a Circular Local Binary Pattern from. An appropriate value is to use `` 8`` sample points. Keep in mind: the more sample points you include, the higher the computational cost.
:param grid_x: The number of cells in the horizontal direction, ``8`` is a common value used in publications. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector.
:param grid_y: The number of cells in the vertical direction, ``8`` is a common value used in publications. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector.
:param threshold: The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1.
Notes:
++++++
* The Circular Local Binary Patterns (used in training and prediction) expect the data given as grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces.
* This model supports updating.
Model internal data:
++++++++++++++++++++
* ``radius`` see :ocv:func:`createLBPHFaceRecognizer`.
* ``neighbors`` see :ocv:func:`createLBPHFaceRecognizer`.
* ``grid_x`` see :ocv:func:`createLBPHFaceRecognizer`.
* ``grid_y`` see :ocv:func:`createLBPHFaceRecognizer`.
* ``threshold`` see :ocv:func:`createLBPHFaceRecognizer`.
* ``histograms`` Local Binary Patterns Histograms calculated from the given training data (empty if none was given).
* ``labels`` Labels corresponding to the calculated Local Binary Patterns Histograms.

@ -1,86 +0,0 @@
Changelog
=========
Release 0.05
------------
This library is now included in the official OpenCV distribution (from 2.4 on).
The :ocv:class`FaceRecognizer` is now an :ocv:class:`Algorithm`, which better fits into the overall
OpenCV API.
To reduce the confusion on user side and minimize my work, libfacerec and OpenCV
have been synchronized and are now based on the same interfaces and implementation.
The library now has an extensive documentation:
* The API is explained in detail and with a lot of code examples.
* The face recognition guide I had written for Python and GNU Octave/MATLAB has been adapted to the new OpenCV C++ ``cv::FaceRecognizer``.
* A tutorial for gender classification with Fisherfaces.
* A tutorial for face recognition in videos (e.g. webcam).
Release highlights
++++++++++++++++++
* There are no single highlights to pick from, this release is a highlight itself.
Release 0.04
------------
This version is fully Windows-compatible and works with OpenCV 2.3.1. Several
bugfixes, but none influenced the recognition rate.
Release highlights
++++++++++++++++++
* A whole lot of exceptions with meaningful error messages.
* A tutorial for Windows users: `http://bytefish.de/blog/opencv_visual_studio_and_libfacerec <http://bytefish.de/blog/opencv_visual_studio_and_libfacerec>`_
Release 0.03
------------
Reworked the library to provide separate implementations in cpp files, because
it's the preferred way of contributing OpenCV libraries. This means the library
is not header-only anymore. Slight API changes were done, please see the
documentation for details.
Release highlights
++++++++++++++++++
* New Unit Tests (for LBP Histograms) make the library more robust.
* Added more documentation.
Release 0.02
------------
Reworked the library to provide separate implementations in cpp files, because
it's the preferred way of contributing OpenCV libraries. This means the library
is not header-only anymore. Slight API changes were done, please see the
documentation for details.
Release highlights
++++++++++++++++++
* New Unit Tests (for LBP Histograms) make the library more robust.
* Added a documentation and changelog in reStructuredText.
Release 0.01
------------
Initial release as header-only library.
Release highlights
++++++++++++++++++
* Colormaps for OpenCV to enhance the visualization.
* Face Recognition algorithms implemented:
* Eigenfaces [TP91]_
* Fisherfaces [BHK97]_
* Local Binary Patterns Histograms [AHP04]_
* Added persistence facilities to store the models with a common API.
* Unit Tests (using `gtest <http://code.google.com/p/googletest/>`_).
* Providing a CMakeLists.txt to enable easy cross-platform building.

@ -1,628 +0,0 @@
Face Recognition with OpenCV
############################
.. contents:: Table of Contents
:depth: 3
Introduction
============
`OpenCV (Open Source Computer Vision) <http://opencv.org>`_ is a popular computer vision library started by `Intel <http://www.intel.com>`_ in 1999. The cross-platform library sets its focus on real-time image processing and includes patent-free implementations of the latest computer vision algorithms. In 2008 `Willow Garage <http://www.willowgarage.com>`_ took over support and OpenCV 2.3.1 now comes with a programming interface to C, C++, `Python <http://www.python.org>`_ and `Android <http://www.android.com>`_. OpenCV is released under a BSD license so it is used in academic projects and commercial products alike.
OpenCV 2.4 now comes with the very new :ocv:class:`FaceRecognizer` class for face recognition, so you can start experimenting with face recognition right away. This document is the guide I've wished for, when I was working myself into face recognition. It shows you how to perform face recognition with :ocv:class:`FaceRecognizer` in OpenCV (with full source code listings) and gives you an introduction into the algorithms behind. I'll also show how to create the visualizations you can find in many publications, because a lot of people asked for.
The currently available algorithms are:
* Eigenfaces (see :ocv:func:`createEigenFaceRecognizer`)
* Fisherfaces (see :ocv:func:`createFisherFaceRecognizer`)
* Local Binary Patterns Histograms (see :ocv:func:`createLBPHFaceRecognizer`)
You don't need to copy and paste the source code examples from this page, because they are available in the ``src`` folder coming with this documentation. If you have built OpenCV with the samples turned on, chances are good you have them compiled already! Although it might be interesting for very advanced users, I've decided to leave the implementation details out as I am afraid they confuse new users.
All code in this document is released under the `BSD license <http://www.opensource.org/licenses/bsd-license>`_, so feel free to use it for your projects.
Face Recognition
================
Face recognition is an easy task for humans. Experiments in [Tu06]_ have shown, that even one to three day old babies are able to distinguish between known faces. So how hard could it be for a computer? It turns out we know little about human recognition to date. Are inner features (eyes, nose, mouth) or outer features (head shape, hairline) used for a successful face recognition? How do we analyze an image and how does the brain encode it? It was shown by `David Hubel <http://en.wikipedia.org/wiki/David_H._Hubel>`_ and `Torsten Wiesel <http://en.wikipedia.org/wiki/Torsten_Wiesel>`_, that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Since we don't see the world as scattered pieces, our visual cortex must somehow combine the different sources of information into useful patterns. Automatic face recognition is all about extracting those meaningful features from an image, putting them into a useful representation and performing some kind of classification on them.
Face recognition based on the geometric features of a face is probably the most intuitive approach to face recognition. One of the first automated face recognition systems was described in [Kanade73]_: marker points (position of eyes, ears, nose, ...) were used to build a feature vector (distance between the points, angle between them, ...). The recognition was performed by calculating the euclidean distance between feature vectors of a probe and reference image. Such a method is robust against changes in illumination by its nature, but has a huge drawback: the accurate registration of the marker points is complicated, even with state of the art algorithms. Some of the latest work on geometric face recognition was carried out in [Bru92]_. A 22-dimensional feature vector was used and experiments on large datasets have shown, that geometrical features alone my not carry enough information for face recognition.
The Eigenfaces method described in [TP91]_ took a holistic approach to face recognition: A facial image is a point from a high-dimensional image space and a lower-dimensional representation is found, where classification becomes easy. The lower-dimensional subspace is found with Principal Component Analysis, which identifies the axes with maximum variance. While this kind of transformation is optimal from a reconstruction standpoint, it doesn't take any class labels into account. Imagine a situation where the variance is generated from external sources, let it be light. The axes with maximum variance do not necessarily contain any discriminative information at all, hence a classification becomes impossible. So a class-specific projection with a Linear Discriminant Analysis was applied to face recognition in [BHK97]_. The basic idea is to minimize the variance within a class, while maximizing the variance between the classes at the same time.
Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of the input data only local regions of an image are described, the extracted features are (hopefully) more robust against partial occlusion, illumation and small sample size. Algorithms used for a local feature extraction are Gabor Wavelets ([Wiskott97]_), Discrete Cosinus Transform ([Messer06]_) and Local Binary Patterns ([AHP04]_). It's still an open research question what's the best way to preserve spatial information when applying a local feature extraction, because spatial information is potentially useful information.
Face Database
==============
Let's get some data to experiment with first. I don't want to do a toy example here. We are doing face recognition, so you'll need some face images! You can either create your own dataset or start with one of the available face databases, `http://face-rec.org/databases/ <http://face-rec.org/databases>`_ gives you an up-to-date overview. Three interesting databases are (parts of the description are quoted from `http://face-rec.org <http://face-rec.org>`_):
* `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_ The AT&T Facedatabase, sometimes also referred to as *ORL Database of Faces*, contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
* `Yale Facedatabase A <http://vision.ucsd.edu/content/yale-face-database>`_, also known as Yalefaces. The AT&T Facedatabase is good for initial tests, but it's a fairly easy database. The Eigenfaces method already has a 97% recognition rate on it, so you won't see any great improvements with other algorithms. The Yale Facedatabase A (also known as Yalefaces) is a more appropriate dataset for initial experiments, because the recognition problem is harder. The database consists of 15 people (14 male, 1 female) each with 11 grayscale images sized :math:`320 \times 243` pixel. There are changes in the light conditions (center light, left light, right light), facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses).
The original images are not cropped and aligned. Please look into the :ref:`appendixft` for a Python script, that does the job for you.
* `Extended Yale Facedatabase B <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html>`_ The Extended Yale Facedatabase B contains 2414 images of 38 different people in its cropped version. The focus of this database is set on extracting features that are robust to illumination, the images have almost no variation in emotion/occlusion/... . I personally think, that this dataset is too large for the experiments I perform in this document. You better use the `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_ for intial testing. A first version of the Yale Facedatabase B was used in [BHK97]_ to see how the Eigenfaces and Fisherfaces method perform under heavy illumination changes. [Lee05]_ used the same setup to take 16128 images of 28 people. The Extended Yale Facedatabase B is the merge of the two databases, which is now known as Extended Yalefacedatabase B.
Preparing the data
-------------------
Once we have acquired some data, we'll need to read it in our program. In the demo applications I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this:
.. code-block:: none
/path/to/image.ext;0
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign the label ``0`` to the image. Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label.
Download the AT&T Facedatabase from AT&T Facedatabase and the corresponding CSV file from at.txt, which looks like this (file is without ... of course):
.. code-block:: none
./at/s1/1.pgm;0
./at/s1/2.pgm;0
...
./at/s2/1.pgm;1
./at/s2/2.pgm;1
...
./at/s40/1.pgm;39
./at/s40/2.pgm;39
Imagine I have extracted the files to ``D:/data/at`` and have downloaded the CSV file to ``D:/data/at.txt``. Then you would simply need to Search & Replace ``./`` with ``D:/data/``. You can do that in an editor of your choice, every sufficiently advanced editor can do this. Once you have a CSV file with valid filenames and labels, you can run any of the demos by passing the path to the CSV file as parameter:
.. code-block:: none
facerec_demo.exe D:/data/at.txt
Creating the CSV File
+++++++++++++++++++++
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``):
.. code-block:: none
philipp@mango:~/facerec/data/at$ tree
.
|-- s1
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
|-- s2
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
...
|-- s40
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
Then simply call create_csv.py with the path to the folder, just like this and you could save the output:
.. code-block:: none
philipp@mango:~/facerec/data$ python create_csv.py
at/s13/2.pgm;0
at/s13/7.pgm;0
at/s13/6.pgm;0
at/s13/9.pgm;0
at/s13/5.pgm;0
at/s13/3.pgm;0
at/s13/4.pgm;0
at/s13/10.pgm;0
at/s13/8.pgm;0
at/s13/1.pgm;0
at/s17/2.pgm;1
at/s17/7.pgm;1
at/s17/6.pgm;1
at/s17/9.pgm;1
at/s17/5.pgm;1
at/s17/3.pgm;1
[...]
Please see the :ref:`appendixft` for additional informations.
Eigenfaces
==========
The problem with the image representation we are given is its high dimensionality. Two-dimensional :math:`p \times q` grayscale images span a :math:`m = pq`-dimensional vector space, so an image with :math:`100 \times 100` pixels lies in a :math:`10,000`-dimensional image space already. The question is: Are all dimensions equally useful for us? We can only make a decision if there's any variance in data, so what we are looking for are the components that account for most of the information. The Principal Component Analysis (PCA) was independently proposed by `Karl Pearson <http://en.wikipedia.org/wiki/Karl_Pearson>`_ (1901) and `Harold Hotelling <http://en.wikipedia.org/wiki/Harold_Hotelling>`_ (1933) to turn a set of possibly correlated variables into a smaller set of uncorrelated variables. The idea is, that a high-dimensional dataset is often described by correlated variables and therefore only a few meaningful dimensions account for most of the information. The PCA method finds the directions with the greatest variance in the data, called principal components.
Algorithmic Description
-----------------------
Let :math:`X = \{ x_{1}, x_{2}, \ldots, x_{n} \}` be a random vector with observations :math:`x_i \in R^{d}`.
1. Compute the mean :math:`\mu`
.. math::
\mu = \frac{1}{n} \sum_{i=1}^{n} x_{i}
2. Compute the the Covariance Matrix `S`
.. math::
S = \frac{1}{n} \sum_{i=1}^{n} (x_{i} - \mu) (x_{i} - \mu)^{T}`
3. Compute the eigenvalues :math:`\lambda_{i}` and eigenvectors :math:`v_{i}` of :math:`S`
.. math::
S v_{i} = \lambda_{i} v_{i}, i=1,2,\ldots,n
4. Order the eigenvectors descending by their eigenvalue. The :math:`k` principal components are the eigenvectors corresponding to the :math:`k` largest eigenvalues.
The :math:`k` principal components of the observed vector :math:`x` are then given by:
.. math::
y = W^{T} (x - \mu)
where :math:`W = (v_{1}, v_{2}, \ldots, v_{k})`.
The reconstruction from the PCA basis is given by:
.. math::
x = W y + \mu
where :math:`W = (v_{1}, v_{2}, \ldots, v_{k})`.
The Eigenfaces method then performs face recognition by:
* Projecting all training samples into the PCA subspace.
* Projecting the query image into the PCA subspace.
* Finding the nearest neighbor between the projected training images and the projected query image.
Still there's one problem left to solve. Imagine we are given :math:`400` images sized :math:`100 \times 100` pixel. The Principal Component Analysis solves the covariance matrix :math:`S = X X^{T}`, where :math:`{size}(X) = 10000 \times 400` in our example. You would end up with a :math:`10000 \times 10000` matrix, roughly :math:`0.8 GB`. Solving this problem isn't feasible, so we'll need to apply a trick. From your linear algebra lessons you know that a :math:`M \times N` matrix with :math:`M > N` can only have :math:`N - 1` non-zero eigenvalues. So it's possible to take the eigenvalue decomposition :math:`S = X^{T} X` of size :math:`N \times N` instead:
.. math::
X^{T} X v_{i} = \lambda_{i} v{i}
and get the original eigenvectors of :math:`S = X X^{T}` with a left multiplication of the data matrix:
.. math::
X X^{T} (X v_{i}) = \lambda_{i} (X v_{i})
The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they need to be normalized to unit length. I don't want to turn this into a publication, so please look into [Duda01]_ for the derivation and proof of the equations.
Eigenfaces in OpenCV
--------------------
For the first source code example, I'll go through it with you. I am first giving you the whole source code listing, and after this we'll look at the most important lines in detail. Please note: every source code listing is commented in detail, so you should have no problems following it.
.. literalinclude:: src/facerec_eigenfaces.cpp
:language: cpp
:linenos:
The source code for this demo application is also available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_eigenfaces.cpp <src/facerec_eigenfaces.cpp>`
I've used the jet colormap, so you can see how the grayscale values are distributed within the specific Eigenfaces. You can see, that the Eigenfaces do not only encode facial features, but also the illumination in the images (see the left light in Eigenface \#4, right light in Eigenfaces \#5):
.. image:: img/eigenfaces_opencv.png
:align: center
We've already seen, that we can reconstruct a face from its lower dimensional approximation. So let's see how many Eigenfaces are needed for a good reconstruction. I'll do a subplot with :math:`10,30,\ldots,310` Eigenfaces:
.. code-block:: cpp
// Display or save the image reconstruction at some predefined steps:
for(int num_components = 10; num_components < 300; num_components+=15) {
// slice the eigenvectors from the model
Mat evs = Mat(W, Range::all(), Range(0, num_components));
Mat projection = subspaceProject(evs, mean, images[0].reshape(1,1));
Mat reconstruction = subspaceReconstruct(evs, mean, projection);
// Normalize the result:
reconstruction = norm_0_255(reconstruction.reshape(1, images[0].rows));
// Display or save:
if(argc == 2) {
imshow(format("eigenface_reconstruction_%d", num_components), reconstruction);
} else {
imwrite(format("%s/eigenface_reconstruction_%d.png", output_folder.c_str(), num_components), reconstruction);
}
}
10 Eigenvectors are obviously not sufficient for a good image reconstruction, 50 Eigenvectors may already be sufficient to encode important facial features. You'll get a good reconstruction with approximately 300 Eigenvectors for the AT&T Facedatabase. There are rule of thumbs how many Eigenfaces you should choose for a successful face recognition, but it heavily depends on the input data. [Zhao03]_ is the perfect point to start researching for this:
.. image:: img/eigenface_reconstruction_opencv.png
:align: center
Fisherfaces
============
The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn't consider any classes and so a lot of discriminative information *may* be lost when throwing components away. Imagine a situation where the variance in your data is generated by an external source, let it be the light. The components identified by a PCA do not necessarily contain any discriminative information at all, so the projected samples are smeared together and a classification becomes impossible (see `http://www.bytefish.de/wiki/pca_lda_with_gnu_octave <http://www.bytefish.de/wiki/pca_lda_with_gnu_octave>`_ for an example).
The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented by the great statistician `Sir R. A. Fisher <http://en.wikipedia.org/wiki/Ronald_Fisher>`_. He successfully used it for classifying flowers in his 1936 paper *The use of multiple measurements in taxonomic problems* [Fisher36]_. In order to find the combination of features that separates best between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation. This was also recognized by `Belhumeur <http://www.cs.columbia.edu/~belhumeur/>`_, `Hespanha <http://www.ece.ucsb.edu/~hespanha/>`_ and `Kriegman <http://cseweb.ucsd.edu/~kriegman/>`_ and so they applied a Discriminant Analysis to face recognition in [BHK97]_.
Algorithmic Description
-----------------------
Let :math:`X` be a random vector with samples drawn from :math:`c` classes:
.. math::
:nowrap:
\begin{align*}
X & = & \{X_1,X_2,\ldots,X_c\} \\
X_i & = & \{x_1, x_2, \ldots, x_n\}
\end{align*}
The scatter matrices :math:`S_{B}` and `S_{W}` are calculated as:
.. math::
:nowrap:
\begin{align*}
S_{B} & = & \sum_{i=1}^{c} N_{i} (\mu_i - \mu)(\mu_i - \mu)^{T} \\
S_{W} & = & \sum_{i=1}^{c} \sum_{x_{j} \in X_{i}} (x_j - \mu_i)(x_j - \mu_i)^{T}
\end{align*}
, where :math:`\mu` is the total mean:
.. math::
\mu = \frac{1}{N} \sum_{i=1}^{N} x_i
And :math:`\mu_i` is the mean of class :math:`i \in \{1,\ldots,c\}`:
.. math::
\mu_i = \frac{1}{|X_i|} \sum_{x_j \in X_i} x_j
Fisher's classic algorithm now looks for a projection :math:`W`, that maximizes the class separability criterion:
.. math::
W_{opt} = \operatorname{arg\,max}_{W} \frac{|W^T S_B W|}{|W^T S_W W|}
Following [BHK97]_, a solution for this optimization problem is given by solving the General Eigenvalue Problem:
.. math::
:nowrap:
\begin{align*}
S_{B} v_{i} & = & \lambda_{i} S_w v_{i} \nonumber \\
S_{W}^{-1} S_{B} v_{i} & = & \lambda_{i} v_{i}
\end{align*}
There's one problem left to solve: The rank of :math:`S_{W}` is at most :math:`(N-c)`, with :math:`N` samples and :math:`c` classes. In pattern recognition problems the number of samples :math:`N` is almost always samller than the dimension of the input data (the number of pixels), so the scatter matrix :math:`S_{W}` becomes singular (see [RJ91]_). In [BHK97]_ this was solved by performing a Principal Component Analysis on the data and projecting the samples into the :math:`(N-c)`-dimensional space. A Linear Discriminant Analysis was then performed on the reduced data, because :math:`S_{W}` isn't singular anymore.
The optimization problem can then be rewritten as:
.. math::
:nowrap:
\begin{align*}
W_{pca} & = & \operatorname{arg\,max}_{W} |W^T S_T W| \\
W_{fld} & = & \operatorname{arg\,max}_{W} \frac{|W^T W_{pca}^T S_{B} W_{pca} W|}{|W^T W_{pca}^T S_{W} W_{pca} W|}
\end{align*}
The transformation matrix :math:`W`, that projects a sample into the :math:`(c-1)`-dimensional space is then given by:
.. math::
W = W_{fld}^{T} W_{pca}^{T}
Fisherfaces in OpenCV
---------------------
.. literalinclude:: src/facerec_fisherfaces.cpp
:language: cpp
:linenos:
The source code for this demo application is also available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_fisherfaces.cpp <src/facerec_fisherfaces.cpp>`
For this example I am going to use the Yale Facedatabase A, just because the plots are nicer. Each Fisherface has the same length as an original image, thus it can be displayed as an image. The demo shows (or saves) the first, at most 16 Fisherfaces:
.. image:: img/fisherfaces_opencv.png
:align: center
The Fisherfaces method learns a class-specific transformation matrix, so the they do not capture illumination as obviously as the Eigenfaces method. The Discriminant Analysis instead finds the facial features to discriminate between the persons. It's important to mention, that the performance of the Fisherfaces heavily depends on the input data as well. Practically said: if you learn the Fisherfaces for well-illuminated pictures only and you try to recognize faces in bad-illuminated scenes, then method is likely to find the wrong components (just because those features may not be predominant on bad illuminated images). This is somewhat logical, since the method had no chance to learn the illumination.
The Fisherfaces allow a reconstruction of the projected image, just like the Eigenfaces did. But since we only identified the features to distinguish between subjects, you can't expect a nice reconstruction of the original image. For the Fisherfaces method we'll project the sample image onto each of the Fisherfaces instead. So you'll have a nice visualization, which feature each of the Fisherfaces describes:
.. code-block:: cpp
// Display or save the image reconstruction at some predefined steps:
for(int num_component = 0; num_component < min(16, W.cols); num_component++) {
// Slice the Fisherface from the model:
Mat ev = W.col(num_component);
Mat projection = subspaceProject(ev, mean, images[0].reshape(1,1));
Mat reconstruction = subspaceReconstruct(ev, mean, projection);
// Normalize the result:
reconstruction = norm_0_255(reconstruction.reshape(1, images[0].rows));
// Display or save:
if(argc == 2) {
imshow(format("fisherface_reconstruction_%d", num_component), reconstruction);
} else {
imwrite(format("%s/fisherface_reconstruction_%d.png", output_folder.c_str(), num_component), reconstruction);
}
}
The differences may be subtle for the human eyes, but you should be able to see some differences:
.. image:: img/fisherface_reconstruction_opencv.png
:align: center
Local Binary Patterns Histograms
================================
Eigenfaces and Fisherfaces take a somewhat holistic approach to face recognition. You treat your data as a vector somewhere in a high-dimensional image space. We all know high-dimensionality is bad, so a lower-dimensional subspace is identified, where (probably) useful information is preserved. The Eigenfaces approach maximizes the total scatter, which can lead to problems if the variance is generated by an external source, because components with a maximum variance over all classes aren't necessarily useful for classification (see `http://www.bytefish.de/wiki/pca_lda_with_gnu_octave <http://www.bytefish.de/wiki/pca_lda_with_gnu_octave>`_). So to preserve some discriminative information we applied a Linear Discriminant Analysis and optimized as described in the Fisherfaces method. The Fisherfaces method worked great... at least for the constrained scenario we've assumed in our model.
Now real life isn't perfect. You simply can't guarantee perfect light settings in your images or 10 different images of a person. So what if there's only one image for each person? Our covariance estimates for the subspace *may* be horribly wrong, so will the recognition. Remember the Eigenfaces method had a 96% recognition rate on the AT&T Facedatabase? How many images do we actually need to get such useful estimates? Here are the Rank-1 recognition rates of the Eigenfaces and Fisherfaces method on the AT&T Facedatabase, which is a fairly easy image database:
.. image:: img/at_database_small_sample_size.png
:scale: 60%
:align: center
So in order to get good recognition rates you'll need at least 8(+-1) images for each person and the Fisherfaces method doesn't really help here. The above experiment is a 10-fold cross validated result carried out with the facerec framework at: `https://github.com/bytefish/facerec <https://github.com/bytefish/facerec>`_. This is not a publication, so I won't back these figures with a deep mathematical analysis. Please have a look into [KM01]_ for a detailed analysis of both methods, when it comes to small training datasets.
So some research concentrated on extracting local features from images. The idea is to not look at the whole image as a high-dimensional vector, but describe only local features of an object. The features you extract this way will have a low-dimensionality implicitly. A fine idea! But you'll soon observe the image representation we are given doesn't only suffer from illumination variations. Think of things like scale, translation or rotation in images - your local description has to be at least a bit robust against those things. Just like ``SIFT``, the Local Binary Patterns methodology has its roots in 2D texture analysis. The basic idea of Local Binary Patterns is to summarize the local structure in an image by comparing each pixel with its neighborhood. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater-equal its neighbor, then denote it with 1 and 0 if not. You'll end up with a binary number for each pixel, just like 11001111. So with 8 surrounding pixels you'll end up with 2^8 possible combinations, called *Local Binary Patterns* or sometimes referred to as *LBP codes*. The first LBP operator described in literature actually used a fixed 3 x 3 neighborhood just like this:
.. image:: img/lbp/lbp.png
:scale: 80%
:align: center
Algorithmic Description
-----------------------
A more formal description of the LBP operator can be given as:
.. math::
LBP(x_c, y_c) = \sum_{p=0}^{P-1} 2^p s(i_p - i_c)
, with :math:`(x_c, y_c)` as central pixel with intensity :math:`i_c`; and :math:`i_n` being the intensity of the the neighbor pixel. :math:`s` is the sign function defined as:
.. math::
:nowrap:
\begin{equation}
s(x) =
\begin{cases}
1 & \text{if $x \geq 0$}\\
0 & \text{else}
\end{cases}
\end{equation}
This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood in [AHP04]_. The idea is to align an abritrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods:
.. image:: img/lbp/patterns.png
:scale: 80%
:align: center
For a given Point :math:`(x_c,y_c)` the position of the neighbor :math:`(x_p,y_p), p \in P` can be calculated by:
.. math::
:nowrap:
\begin{align*}
x_{p} & = & x_c + R \cos({\frac{2\pi p}{P}})\\
y_{p} & = & y_c - R \sin({\frac{2\pi p}{P}})
\end{align*}
Where :math:`R` is the radius of the circle and :math:`P` is the number of sample points.
The operator is an extension to the original LBP codes, so it's sometimes called *Extended LBP* (also referred to as *Circular LBP*) . If a points coordinate on the circle doesn't correspond to image coordinates, the point get's interpolated. Computer science has a bunch of clever interpolation schemes, the OpenCV implementation does a bilinear interpolation:
.. math::
:nowrap:
\begin{align*}
f(x,y) \approx \begin{bmatrix}
1-x & x \end{bmatrix} \begin{bmatrix}
f(0,0) & f(0,1) \\
f(1,0) & f(1,1) \end{bmatrix} \begin{bmatrix}
1-y \\
y \end{bmatrix}.
\end{align*}
By definition the LBP operator is robust against monotonic gray scale transformations. We can easily verify this by looking at the LBP image of an artificially modified image (so you see what an LBP image looks like!):
.. image:: img/lbp/lbp_yale.jpg
:scale: 60%
:align: center
So what's left to do is how to incorporate the spatial information in the face recognition model. The representation proposed by Ahonen et. al [AHP04]_ is to divide the LBP image into :math:`m` local regions and extract a histogram from each. The spatially enhanced feature vector is then obtained by concatenating the local histograms (**not merging them**). These histograms are called *Local Binary Patterns Histograms*.
Local Binary Patterns Histograms in OpenCV
------------------------------------------
.. literalinclude:: src/facerec_lbph.cpp
:language: cpp
:linenos:
The source code for this demo application is also available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_lbph.cpp <src/facerec_lbph.cpp>`
Conclusion
==========
You've learned how to use the new :ocv:class:`FaceRecognizer` in real applications. After reading the document you also know how the algorithms work, so now it's time for you to experiment with the available algorithms. Use them, improve them and let the OpenCV community participate!
Credits
=======
This document wouldn't be possible without the kind permission to use the face images of the *AT&T Database of Faces* and the *Yale Facedatabase A/B*.
The Database of Faces
---------------------
** Important: when using these images, please give credit to "AT&T Laboratories, Cambridge." **
The Database of Faces, formerly *The ORL Database of Faces*, contains a set of face images taken between April 1992 and April 1994. The database was used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department.
There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement).
The files are in PGM format. The size of each image is 92x112 pixels, with 256 grey levels per pixel. The images are organised in 40 directories (one for each subject), which have names of the form sX, where X indicates the subject number (between 1 and 40). In each of these directories, there are ten different images of that subject, which have names of the form Y.pgm, where Y is the image number for that subject (between 1 and 10).
A copy of the database can be retrieved from: `http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip <http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip>`_.
Yale Facedatabase A
-------------------
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B.*
The Yale Face Database A (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. (Source: `http://cvc.yale.edu/projects/yalefaces/yalefaces.html <http://cvc.yale.edu/projects/yalefaces/yalefaces.html>`_)
Yale Facedatabase B
--------------------
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B.*
The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as the Yale Face Database B. Please refer to the homepage of the Yale Face Database B (or one copy of this page) for more detailed information of the data format.
You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", PAMI, 2001, `[bibtex] <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/athosref.html>`_.
The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 `[pdf] <http://vision.ucsd.edu/~leekc/papers/9pltsIEEE.pdf>`_." All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images. If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well. (Source: `http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html>`_)
Literature
==========
.. [AHP04] Ahonen, T., Hadid, A., and Pietikainen, M. *Face Recognition with Local Binary Patterns.* Computer Vision - ECCV 2004 (2004), 469–481.
.. [BHK97] Belhumeur, P. N., Hespanha, J., and Kriegman, D. *Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection.* IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 711–720.
.. [Bru92] Brunelli, R., Poggio, T. *Face Recognition through Geometrical Features.* European Conference on Computer Vision (ECCV) 1992, S. 792–800.
.. [Duda01] Duda, Richard O. and Hart, Peter E. and Stork, David G., *Pattern Classification* (2nd Edition) 2001.
.. [Fisher36] Fisher, R. A. *The use of multiple measurements in taxonomic problems.* Annals Eugen. 7 (1936), 179–188.
.. [GBK01] Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J., *From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose* IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 643-660.
.. [Kanade73] Kanade, T. *Picture processing system by computer complex and recognition of human faces.* PhD thesis, Kyoto University, November 1973
.. [KM01] Martinez, A and Kak, A. *PCA versus LDA* IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, pp. 228-233, 2001.
.. [Lee05] Lee, K., Ho, J., Kriegman, D. *Acquiring Linear Subspaces for Face Recognition under Variable Lighting.* In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27 (2005), Nr. 5
.. [Messer06] Messer, K. et al. *Performance Characterisation of Face Recognition Algorithms and Their Sensitivity to Severe Illumination Changes.* In: In: ICB, 2006, S. 1–11.
.. [RJ91] S. Raudys and A.K. Jain. *Small sample size effects in statistical pattern recognition: Recommendations for practitioneers.* - IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 3 (1991), 252-264.
.. [Tan10] Tan, X., and Triggs, B. *Enhanced local texture feature sets for face recognition under difficult lighting conditions.* IEEE Transactions on Image Processing 19 (2010), 1635–650.
.. [TP91] Turk, M., and Pentland, A. *Eigenfaces for recognition.* Journal of Cognitive Neuroscience 3 (1991), 71–86.
.. [Tu06] Chiara Turati, Viola Macchi Cassia, F. S., and Leo, I. *Newborns face recognition: Role of inner and outer facial features. Child Development* 77, 2 (2006), 297–311.
.. [Wiskott97] Wiskott, L., Fellous, J., Krüger, N., Malsburg, C. *Face Recognition By Elastic Bunch Graph Matching.* IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997), S. 775–779
.. [Zhao03] Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A. Face recognition: A literature survey. ACM Computing Surveys (CSUR) 35, 4 (2003), 399–458.
.. _appendixft:
Appendix
========
Creating the CSV File
---------------------
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``):
.. code-block:: none
philipp@mango:~/facerec/data/at$ tree
.
|-- s1
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
|-- s2
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
...
|-- s40
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output:
.. code-block:: none
philipp@mango:~/facerec/data$ python create_csv.py
at/s13/2.pgm;0
at/s13/7.pgm;0
at/s13/6.pgm;0
at/s13/9.pgm;0
at/s13/5.pgm;0
at/s13/3.pgm;0
at/s13/4.pgm;0
at/s13/10.pgm;0
at/s13/8.pgm;0
at/s13/1.pgm;0
at/s17/2.pgm;1
at/s17/7.pgm;1
at/s17/6.pgm;1
at/s17/9.pgm;1
at/s17/5.pgm;1
at/s17/3.pgm;1
[...]
Here is the script, if you can't find it:
.. literalinclude:: ./src/create_csv.py
:language: python
:linenos:
Aligning Face Images
---------------------
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where:
* *eye_left* is the position of the left eye
* *eye_right* is the position of the right eye
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction)
* *dest_sz* is the size of the output image
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes.
.. literalinclude:: ./src/crop_face.py
:language: python
:linenos:
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have.
Here are some examples:
+---------------------------------+----------------------------------------------------------------------------+
| Configuration | Cropped, Scaled, Rotated Face |
+=================================+============================================================================+
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_10_10_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_20_20_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_30_30_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ./img/tutorial/gender_classification/arnie_20_20_70_70.jpg |
+---------------------------------+----------------------------------------------------------------------------+
CSV for the AT&T Facedatabase
------------------------------
.. literalinclude:: etc/at.txt
:language: none
:linenos:

@ -1,31 +0,0 @@
FaceRecognizer - Face Recognition with OpenCV
##############################################
OpenCV 2.4 now comes with the very new :ocv:class:`FaceRecognizer` class for face recognition. This documentation is going to explain you :doc:`the API <facerec_api>` in detail and it will give you a lot of help to get started (full source code examples). :doc:`Face Recognition with OpenCV <facerec_tutorial>` is the definite guide to the new :ocv:class:`FaceRecognizer`. There's also a :doc:`tutorial on gender classification <tutorial/facerec_gender_classification>`, a :doc:`tutorial for face recognition in videos <tutorial/facerec_video_recognition>` and it's shown :doc:`how to load & save your results <tutorial/facerec_save_load>`.
These documents are the help I have wished for, when I was working myself into face recognition. I hope you also think the new :ocv:class:`FaceRecognizer` is a useful addition to OpenCV.
Please issue any feature requests and/or bugs on the official OpenCV bug tracker at:
* http://code.opencv.org/projects/opencv/issues
Contents
========
.. toctree::
:maxdepth: 1
FaceRecognizer API <facerec_api>
Guide to Face Recognition with OpenCV <facerec_tutorial>
Tutorial on Gender Classification <tutorial/facerec_gender_classification>
Tutorial on Face Recognition in Videos <tutorial/facerec_video_recognition>
Tutorial On Saving & Loading a FaceRecognizer <tutorial/facerec_save_load>
Changelog <facerec_changelog>
Indices and tables
==================
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`

@ -1,233 +0,0 @@
Gender Classification with OpenCV
=================================
.. contents:: Table of Contents
:depth: 3
Introduction
------------
A lot of people interested in face recognition, also want to know how to perform image classification tasks like:
* Gender Classification (Gender Detection)
* Emotion Classification (Emotion Detection)
* Glasses Classification (Glasses Detection)
* ...
This is has become very, very easy with the new :ocv:class:`FaceRecognizer` class. In this tutorial I'll show you how to perform gender classification with OpenCV on a set of face images. You'll also learn how to align your images to enhance the recognition results. If you want to do emotion classification instead of gender classification, all you need to do is to update is your training data and the configuration you pass to the demo.
Prerequisites
--------------
For gender classification of faces, you'll need some images of male and female faces first. I've decided to search faces of celebrities using `Google Images <http://www.google.com/images>`_ with the faces filter turned on (my god, they have great algorithms at `Google <http://www.google.com>`_!). My database has 8 male and 5 female subjects, each with 10 images. Here are the names, if you don't know who to search:
* Angelina Jolie
* Arnold Schwarzenegger
* Brad Pitt
* Emma Watson
* George Clooney
* Jennifer Lopez
* Johnny Depp
* Justin Timberlake
* Katy Perry
* Keanu Reeves
* Naomi Watts
* Patrick Stewart
* Tom Cruise
Once you have acquired some images, you'll need to read them. In the demo application I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this:
.. code-block:: none
/path/to/image.ext;0
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign a label ``0`` to the image. Think of the label as the subject (the person, the gender or whatever comes to your mind). In the gender classification scenario, the label is the gender the person has. I'll give the label ``0`` to *male* persons and the label ``1`` is for *female* subjects. So my CSV file looks like this:
.. code-block:: none
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_01.jpg;0
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_02.jpg;0
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_03.jpg;0
...
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_01.jpg;1
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_02.jpg;1
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_03.jpg;1
...
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_01.jpg;0
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_02.jpg;0
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_03.jpg;0
...
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_08.jpg;1
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_02.jpg;1
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_03.jpg;1
All images for this example were chosen to have a frontal face perspective. They have been cropped, scaled and rotated to be aligned at the eyes, just like this set of George Clooney images:
.. image:: ../img/tutorial/gender_classification/clooney_set.png
:align: center
You really don't want to create the CSV file by hand. And you really don't want scale, rotate & translate the images manually. I have prepared you two Python scripts ``create_csv.py`` and ``crop_face.py``, you can find them in the ``src`` folder coming with this documentation. You'll see how to use them in the :ref:`appendixfgc`.
Fisherfaces for Gender Classification
--------------------------------------
If you want to decide whether a person is *male* or *female*, you have to learn the discriminative features of both classes. The Eigenfaces method is based on the Principal Component Analysis, which is an unsupervised statistical model and not suitable for this task. Please see the Face Recognition tutorial for insights into the algorithms. The Fisherfaces instead yields a class-specific linear projection, so it is much better suited for the gender classification task. `http://www.bytefish.de/blog/gender_classification <http://www.bytefish.de/blog/gender_classification>`_ shows the recognition rate of the Fisherfaces method for gender classification.
The Fisherfaces method achieves a 98% recognition rate in a subject-independent cross-validation. A subject-independent cross-validation means *images of the person under test are never used for learning the model*. And could you believe it: you can simply use the facerec_fisherfaces demo, that's inlcuded in OpenCV.
Fisherfaces in OpenCV
---------------------
The source code for this demo application is also available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_fisherfaces.cpp <../src/facerec_fisherfaces.cpp>`
.. literalinclude:: ../src/facerec_fisherfaces.cpp
:language: cpp
:linenos:
Running the Demo
----------------
If you are in Windows, then simply start the demo by running (from command line):
.. code-block:: none
facerec_fisherfaces.exe C:/path/to/your/csv.ext
If you are in Linux, then simply start the demo by running:
.. code-block:: none
./facerec_fisherfaces /path/to/your/csv.ext
If you don't want to display the images, but save them, then pass the desired path to the demo. It works like this in Windows:
.. code-block:: none
facerec_fisherfaces.exe C:/path/to/your/csv.ext C:/path/to/store/results/at
And in Linux:
.. code-block:: none
./facerec_fisherfaces /path/to/your/csv.ext /path/to/store/results/at
Results
-------
If you run the program with your CSV file as parameter, you'll see the Fisherface that separates between male and female images. I've decided to apply a Jet colormap in this demo, so you can see which features the method identifies:
.. image:: ../img/tutorial/gender_classification/fisherface_0.png
The demo also shows the average face of the male and female training images you have passed:
.. image:: ../img/tutorial/gender_classification/mean.png
Moreover it the demo should yield the prediction for the correct gender:
.. code-block:: none
Predicted class = 1 / Actual class = 1.
And for advanced users I have also shown the Eigenvalue for the Fisherface:
.. code-block:: none
Eigenvalue #0 = 152.49493
And the Fisherfaces reconstruction:
.. image:: ../img/tutorial/gender_classification/fisherface_reconstruction_0.png
I hope this gives you an idea how to approach gender classification and the other image classification tasks.
.. _appendixfgc:
Appendix
--------
Creating the CSV File
+++++++++++++++++++++
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``):
.. code-block:: none
philipp@mango:~/facerec/data/at$ tree
.
|-- s1
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
|-- s2
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
...
|-- s40
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output:
.. code-block:: none
philipp@mango:~/facerec/data$ python create_csv.py
at/s13/2.pgm;0
at/s13/7.pgm;0
at/s13/6.pgm;0
at/s13/9.pgm;0
at/s13/5.pgm;0
at/s13/3.pgm;0
at/s13/4.pgm;0
at/s13/10.pgm;0
at/s13/8.pgm;0
at/s13/1.pgm;0
at/s17/2.pgm;1
at/s17/7.pgm;1
at/s17/6.pgm;1
at/s17/9.pgm;1
at/s17/5.pgm;1
at/s17/3.pgm;1
[...]
Here is the script, if you can't find it:
.. literalinclude:: ../src/create_csv.py
:language: python
:linenos:
Aligning Face Images
++++++++++++++++++++
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where:
* *eye_left* is the position of the left eye
* *eye_right* is the position of the right eye
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction)
* *dest_sz* is the size of the output image
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes.
.. literalinclude:: ../src/crop_face.py
:language: python
:linenos:
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have.
Here are some examples:
+---------------------------------+----------------------------------------------------------------------------+
| Configuration | Cropped, Scaled, Rotated Face |
+=================================+============================================================================+
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_10_10_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_30_30_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_70_70.jpg |
+---------------------------------+----------------------------------------------------------------------------+

@ -1,46 +0,0 @@
Saving and Loading a FaceRecognizer
===================================
Introduction
------------
Saving and loading a :ocv:class:`FaceRecognizer` is very important. Training a FaceRecognizer can be a very time-intense task, plus it's often impossible to ship the whole face database to the user of your product. The task of saving and loading a FaceRecognizer is easy with :ocv:class:`FaceRecognizer`. You only have to call :ocv:func:`FaceRecognizer::load` for loading and :ocv:func:`FaceRecognizer::save` for saving a :ocv:class:`FaceRecognizer`.
I'll adapt the Eigenfaces example from the :doc:`../facerec_tutorial`: Imagine we want to learn the Eigenfaces of the `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_, store the model to a YAML file and then load it again.
From the loaded model, we'll get a prediction, show the mean, Eigenfaces and the image reconstruction.
Using FaceRecognizer::save and FaceRecognizer::load
-----------------------------------------------------
The source code for this demo application is also available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_save_load.cpp <../src/facerec_save_load.cpp>`
.. literalinclude:: ../src/facerec_save_load.cpp
:language: cpp
:linenos:
Results
-------
``eigenfaces_at.yml`` then contains the model state, we'll simply look at the first 10 lines with ``head eigenfaces_at.yml``:
.. code-block:: none
philipp@mango:~/github/libfacerec-build$ head eigenfaces_at.yml
%YAML:1.0
num_components: 399
mean: !!opencv-matrix
rows: 1
cols: 10304
dt: d
data: [ 8.5558897243107765e+01, 8.5511278195488714e+01,
8.5854636591478695e+01, 8.5796992481203006e+01,
8.5952380952380949e+01, 8.6162907268170414e+01,
8.6082706766917283e+01, 8.5776942355889716e+01,
And here is the Reconstruction, which is the same as the original:
.. image:: ../img/eigenface_reconstruction_opencv.png
:align: center

@ -1,207 +0,0 @@
Face Recognition in Videos with OpenCV
=======================================
.. contents:: Table of Contents
:depth: 3
Introduction
------------
Whenever you hear the term *face recognition*, you instantly think of surveillance in videos. So performing face recognition in videos (e.g. webcam) is one of the most requested features I have got. I have heard your cries, so here it is. An application, that shows you how to do face recognition in videos! For the face detection part we'll use the awesome :ocv:class:`CascadeClassifier` and we'll use :ocv:class:`FaceRecognizer` for face recognition. This example uses the Fisherfaces method for face recognition, because it is robust against large changes in illumination.
Here is what the final application looks like. As you can see I am only writing the id of the recognized person above the detected face (by the way this id is Arnold Schwarzenegger for my data set):
.. image:: ../img/tutorial/facerec_video/facerec_video.png
:align: center
:scale: 70%
This demo is a basis for your research and it shows you how to implement face recognition in videos. You probably want to extend the application and make it more sophisticated: You could combine the id with the name, then show the confidence of the prediction, recognize the emotion... and and and. But before you send mails, asking what these Haar-Cascade thing is or what a CSV is: Make sure you have read the entire tutorial. It's all explained in here. If you just want to scroll down to the code, please note:
* The available Haar-Cascades for face detection are located in the ``data`` folder of your OpenCV installation! One of the available Haar-Cascades for face detection is for example ``/path/to/opencv/data/haarcascades/haarcascade_frontalface_default.xml``.
I encourage you to experiment with the application. Play around with the available :ocv:class:`FaceRecognizer` implementations, try the available cascades in OpenCV and see if you can improve your results!
Prerequisites
--------------
You want to do face recognition, so you need some face images to learn a :ocv:class:`FaceRecognizer` on. I have decided to reuse the images from the gender classification example: :doc:`facerec_gender_classification`.
I have the following celebrities in my training data set:
* Angelina Jolie
* Arnold Schwarzenegger
* Brad Pitt
* George Clooney
* Johnny Depp
* Justin Timberlake
* Katy Perry
* Keanu Reeves
* Patrick Stewart
* Tom Cruise
In the demo I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this:
.. code-block:: none
/path/to/image.ext;0
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign a label ``0`` to the image. Think of the label as the subject (the person, the gender or whatever comes to your mind). In the face recognition scenario, the label is the person this image belongs to. In the gender classification scenario, the label is the gender the person has. So my CSV file looks like this:
.. code-block:: none
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_01.jpg;0
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_02.jpg;0
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_03.jpg;0
...
/home/philipp/facerec/data/c/katy_perry/katy_perry_01.jpg;1
/home/philipp/facerec/data/c/katy_perry/katy_perry_02.jpg;1
/home/philipp/facerec/data/c/katy_perry/katy_perry_03.jpg;1
...
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_01.jpg;2
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_02.jpg;2
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_03.jpg;2
...
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_08.jpg;6
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_05.jpg;6
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_02.jpg;6
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_03.jpg;6
All images for this example were chosen to have a frontal face perspective. They have been cropped, scaled and rotated to be aligned at the eyes, just like this set of George Clooney images:
.. image:: ../img/tutorial/gender_classification/clooney_set.png
:align: center
Face Recongition from Videos
-----------------------------
The source code for the demo is available in the ``src`` folder coming with this documentation:
* :download:`src/facerec_video.cpp <../src/facerec_video.cpp>`
This demo uses the :ocv:class:`CascadeClassifier`:
.. literalinclude:: ../src/facerec_video.cpp
:language: cpp
:linenos:
Running the Demo
----------------
You'll need:
* The path to a valid Haar-Cascade for detecting a face with a :ocv:class:`CascadeClassifier`.
* The path to a valid CSV File for learning a :ocv:class:`FaceRecognizer`.
* A webcam and its device id (you don't know the device id? Simply start from 0 on and see what happens).
If you are in Windows, then simply start the demo by running (from command line):
.. code-block:: none
facerec_video.exe <C:/path/to/your/haar_cascade.xml> <C:/path/to/your/csv.ext> <video device>
If you are in Linux, then simply start the demo by running:
.. code-block:: none
./facerec_video </path/to/your/haar_cascade.xml> </path/to/your/csv.ext> <video device>
An example. If the haar-cascade is at ``C:/opencv/data/haarcascades/haarcascade_frontalface_default.xml``, the CSV file is at ``C:/facerec/data/celebrities.txt`` and I have a webcam with deviceId ``1``, then I would call the demo with:
.. code-block:: none
facerec_video.exe C:/opencv/data/haarcascades/haarcascade_frontalface_default.xml C:/facerec/data/celebrities.txt 1
That's it.
Results
-------
Enjoy!
Appendix
--------
Creating the CSV File
+++++++++++++++++++++
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``):
.. code-block:: none
philipp@mango:~/facerec/data/at$ tree
.
|-- s1
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
|-- s2
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
...
|-- s40
| |-- 1.pgm
| |-- ...
| |-- 10.pgm
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output:
.. code-block:: none
philipp@mango:~/facerec/data$ python create_csv.py
at/s13/2.pgm;0
at/s13/7.pgm;0
at/s13/6.pgm;0
at/s13/9.pgm;0
at/s13/5.pgm;0
at/s13/3.pgm;0
at/s13/4.pgm;0
at/s13/10.pgm;0
at/s13/8.pgm;0
at/s13/1.pgm;0
at/s17/2.pgm;1
at/s17/7.pgm;1
at/s17/6.pgm;1
at/s17/9.pgm;1
at/s17/5.pgm;1
at/s17/3.pgm;1
[...]
Here is the script, if you can't find it:
.. literalinclude:: ../src/create_csv.py
:language: python
:linenos:
Aligning Face Images
++++++++++++++++++++
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where:
* *eye_left* is the position of the left eye
* *eye_right* is the position of the right eye
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction)
* *dest_sz* is the size of the output image
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes.
.. literalinclude:: ../src/crop_face.py
:language: python
:linenos:
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have.
Here are some examples:
+---------------------------------+----------------------------------------------------------------------------+
| Configuration | Cropped, Scaled, Rotated Face |
+=================================+============================================================================+
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_10_10_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_30_30_200_200.jpg |
+---------------------------------+----------------------------------------------------------------------------+
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_70_70.jpg |
+---------------------------------+----------------------------------------------------------------------------+

@ -1,113 +0,0 @@
Latent SVM
===============================================================
Discriminatively Trained Part Based Models for Object Detection
---------------------------------------------------------------
The object detector described below has been initially proposed by
P.F. Felzenszwalb in [Felzenszwalb2010a]_. It is based on a
Dalal-Triggs detector that uses a single filter on histogram of
oriented gradients (HOG) features to represent an object category.
This detector uses a sliding window approach, where a filter is
applied at all positions and scales of an image. The first
innovation is enriching the Dalal-Triggs model using a
star-structured part-based model defined by a "root" filter
(analogous to the Dalal-Triggs filter) plus a set of parts filters
and associated deformation models. The score of one of star models
at a particular position and scale within an image is the score of
the root filter at the given location plus the sum over parts of the
maximum, over placements of that part, of the part filter score on
its location minus a deformation cost easuring the deviation of the
part from its ideal location relative to the root. Both root and
part filter scores are defined by the dot product between a filter
(a set of weights) and a subwindow of a feature pyramid computed
from the input image. Another improvement is a representation of the
class of models by a mixture of star models. The score of a mixture
model at a particular position and scale is the maximum over
components, of the score of that component model at the given
location.
The detector was dramatically speeded-up with cascade algorithm
proposed by P.F. Felzenszwalb in [Felzenszwalb2010b]_. The algorithm
prunes partial hypotheses using thresholds on their scores.The basic
idea of the algorithm is to use a hierarchy of models defined by an
ordering of the original model's parts. For a model with (n+1)
parts, including the root, a sequence of (n+1) models is obtained.
The i-th model in this sequence is defined by the first i parts from
the original model. Using this hierarchy, low scoring hypotheses can be
pruned after looking at the best configuration of a subset of the parts.
Hypotheses that score high under a weak model are evaluated further using
a richer model.
In OpenCV there is an C++ implementation of Latent SVM.
.. highlight:: cpp
LSVMDetector
-----------------
.. ocv:class:: LSVMDetector
This is a C++ abstract class, it provides external user API to work with Latent SVM.
LSVMDetector::ObjectDetection
----------------------------------
.. ocv:struct:: LSVMDetector::ObjectDetection
Structure contains the detection information.
.. ocv:member:: Rect rect
bounding box for a detected object
.. ocv:member:: float score
confidence level
.. ocv:member:: int classID
class (model or detector) ID that detect an object
LSVMDetector::~LSVMDetector
-------------------------------------
Destructor.
.. ocv:function:: LSVMDetector::~LSVMDetector()
LSVMDetector::create
-----------------------
Load the trained models from given ``.xml`` files and return ``cv::Ptr<LSVMDetector>``.
.. ocv:function:: static cv::Ptr<LSVMDetector> LSVMDetector::create( const vector<string>& filenames, const vector<string>& classNames=vector<string>() )
:param filenames: A set of filenames storing the trained detectors (models). Each file contains one model. See examples of such files here /opencv_extra/testdata/cv/LSVMDetector/models_VOC2007/.
:param classNames: A set of trained models names. If it's empty then the name of each model will be constructed from the name of file containing the model. E.g. the model stored in "/home/user/cat.xml" will get the name "cat".
LSVMDetector::detect
-------------------------
Find rectangular regions in the given image that are likely to contain objects of loaded classes (models)
and corresponding confidence levels.
.. ocv:function:: void LSVMDetector::detect( const Mat& image, vector<ObjectDetection>& objectDetections, float overlapThreshold=0.5f, int numThreads=-1 )
:param image: An image.
:param objectDetections: The detections: rectangulars, scores and class IDs.
:param overlapThreshold: Threshold for the non-maximum suppression algorithm.
:param numThreads: Number of threads used in parallel version of the algorithm.
LSVMDetector::getClassNames
--------------------------------
Return the class (model) names that were passed in constructor or method ``load`` or extracted from models filenames in those methods.
.. ocv:function:: const vector<string>& LSVMDetector::getClassNames() const
LSVMDetector::getClassCount
--------------------------------
Return a count of loaded models (classes).
.. ocv:function:: size_t LSVMDetector::getClassCount() const
.. [Felzenszwalb2010a] Felzenszwalb, P. F. and Girshick, R. B. and McAllester, D. and Ramanan, D. *Object Detection with Discriminatively Trained Part Based Models*. PAMI, vol. 32, no. 9, pp. 1627-1645, September 2010
.. [Felzenszwalb2010b] Felzenszwalb, P. F. and Girshick, R. B. and McAllester, D. *Cascade Object Detection with Deformable Part Models*. CVPR 2010, pp. 2241-2248

@ -1,52 +0,0 @@
.. _LSDDetector:
Line Segments Detector
======================
Lines extraction methodology
----------------------------
The lines extraction methodology described in the following is mainly based on [EDLN]_.
The extraction starts with a Gaussian pyramid generated from an original image, downsampled N-1 times, blurred N times, to obtain N layers (one for each octave), with layer 0 corresponding to input image. Then, from each layer (octave) in the pyramid, lines are extracted using LSD algorithm.
Differently from EDLine lines extractor used in original article, LSD furnishes information only about lines extremes; thus, additional information regarding slope and equation of line are computed via analytic methods. The number of pixels is obtained using *LineIterator*. Extracted lines are returned in the form of KeyLine objects, but since extraction is based on a method different from the one used in *BinaryDescriptor* class, data associated to a line's extremes in original image and in octave it was extracted from, coincide. KeyLine's field *class_id* is used as an index to indicate the order of extraction of a line inside a single octave.
LSDDetector::createLSDDetector
------------------------------
Creates ad LSDDetector object, using smart pointers.
.. ocv:function:: Ptr<LSDDetector> LSDDetector::createLSDDetector()
LSDDetector::detect
-------------------
Detect lines inside an image.
.. ocv:function:: void LSDDetector::detect( const Mat& image, std::vector<KeyLine>& keylines, int scale, int numOctaves, const Mat& mask=Mat())
.. ocv:function:: void LSDDetector::detect( const std::vector<Mat>& images, std::vector<std::vector<KeyLine> >& keylines, int scale, int numOctaves, const std::vector<Mat>& masks=std::vector<Mat>() ) const
:param image: input image
:param images: input images
:param keylines: vector or set of vectors that will store extracted lines for one or more images
:param mask: mask matrix to detect only KeyLines of interest
:param masks: vector of mask matrices to detect only KeyLines of interest from each input image
:param scale: scale factor used in pyramids generation
:param numOctaves: number of octaves inside pyramid
References
----------
.. [EDLN] Von Gioi, R. Grompone, et al. *LSD: A fast line segment detector with a false detection control*, IEEE Transactions on Pattern Analysis and Machine Intelligence 32.4 (2010): 722-732.

@ -1,223 +0,0 @@
.. _binary_descriptor:
BinaryDescriptor Class
======================
.. highlight:: cpp
BinaryDescriptor Class implements both functionalities for detection of lines and computation of their binary descriptor. Class' interface is mainly based on the ones of classical detectors and extractors, such as Feature2d's `FeatureDetector <http://docs.opencv.org/modules/features2d/doc/common_interfaces_of_feature_detectors.html?highlight=featuredetector#featuredetector>`_ and `DescriptorExtractor <http://docs.opencv.org/modules/features2d/doc/common_interfaces_of_descriptor_extractors.html?highlight=extractor#DescriptorExtractor : public Algorithm>`_.
Retrieved information about lines is stored in *KeyLine* objects.
BinaryDescriptor::Params
-----------------------------------------------------------------------
.. ocv:struct:: BinaryDescriptor::Params
List of BinaryDescriptor parameters::
struct CV_EXPORTS_W_SIMPLE Params{
CV_WRAP Params();
/* the number of image octaves (default = 1) */
CV_PROP_RW int numOfOctave_;
/* the width of band; (default = 7) */
CV_PROP_RW int widthOfBand_;
/* image's reduction ratio in construction of Gaussian pyramids (default = 2) */
CV_PROP_RW int reductionRatio;
/* read parameters from a FileNode object and store them (struct function) */
void read( const FileNode& fn );
/* store parameters to a FileStorage object (struct function) */
void write( FileStorage& fs ) const;
};
BinaryDescriptor::BinaryDescriptor
----------------------------------
Constructor
.. ocv:function:: bool BinaryDescriptor::BinaryDescriptor( const BinaryDescriptor::Params &parameters = BinaryDescriptor::Params() )
:param parameters: configuration parameters :ocv:struct:`BinaryDescriptor::Params`
If no argument is provided, constructor sets default values (see comments in the code snippet in previous section). Default values are strongly reccomended.
BinaryDescriptor::getNumOfOctaves
---------------------------------
Get current number of octaves
.. ocv:function:: int BinaryDescriptor::getNumOfOctaves()
BinaryDescriptor::setNumOfOctaves
---------------------------------
Set number of octaves
.. ocv:function:: void BinaryDescriptor::setNumOfOctaves( int octaves )
:param octaves: number of octaves
BinaryDescriptor::getWidthOfBand
--------------------------------
Get current width of bands
.. ocv:function:: int BinaryDescriptor::getWidthOfBand()
BinaryDescriptor::setWidthOfBand
--------------------------------
Set width of bands
.. ocv:function:: void BinaryDescriptor::setWidthOfBand( int width )
:param width: width of bands
BinaryDescriptor::getReductionRatio
-----------------------------------
Get current reduction ratio (used in Gaussian pyramids)
.. ocv:function:: int BinaryDescriptor::getReductionRatio()
BinaryDescriptor::setReductionRatio
-----------------------------------
Set reduction ratio (used in Gaussian pyramids)
.. ocv:function:: void BinaryDescriptor::setReductionRatio( int rRatio )
:param rRatio: reduction ratio
BinaryDescriptor::createBinaryDescriptor
----------------------------------------
Create a BinaryDescriptor object with default parameters (or with the ones provided) and return a smart pointer to it
.. ocv:function:: Ptr<BinaryDescriptor> BinaryDescriptor::createBinaryDescriptor()
.. ocv:function:: Ptr<BinaryDescriptor> BinaryDescriptor::createBinaryDescriptor( Params parameters )
BinaryDescriptor::operator()
----------------------------
Define operator '()' to perform detection of KeyLines and computation of descriptors in a row.
.. ocv:function:: void BinaryDescriptor::operator()( InputArray image, InputArray mask, vector<KeyLine>& keylines, OutputArray descriptors, bool useProvidedKeyLines=false, bool returnFloatDescr ) const
:param image: input image
:param mask: mask matrix to select which lines in KeyLines must be accepted among the ones extracted (used when *keylines* is not empty)
:param keylines: vector that contains input lines (when filled, the detection part will be skipped and input lines will be passed as input to the algorithm computing descriptors)
:param descriptors: matrix that will store final descriptors
:param useProvidedKeyLines: flag (when set to true, detection phase will be skipped and only computation of descriptors will be executed, using lines provided in *keylines*)
:param returnFloatDescr: flag (when set to true, original non-binary descriptors are returned)
BinaryDescriptor::read
----------------------
Read parameters from a FileNode object and store them
.. ocv:function:: void BinaryDescriptor::read( const FileNode& fn )
:param fn: source FileNode file
BinaryDescriptor::write
-----------------------
Store parameters to a FileStorage object
.. ocv:function:: void BinaryDescriptor::write( FileStorage& fs ) const
:param fs: output FileStorage file
BinaryDescriptor::defaultNorm
-----------------------------
Return norm mode
.. ocv:function:: int BinaryDescriptor::defaultNorm() const
BinaryDescriptor::descriptorType
--------------------------------
Return data type
.. ocv:function:: int BinaryDescriptor::descriptorType() const
BinaryDescriptor::descriptorSize
--------------------------------
Return descriptor size
.. ocv:function:: int BinaryDescriptor::descriptorSize() const
BinaryDescriptor::detect
------------------------
Requires line detection (for one or more images)
.. ocv:function:: void detect( const Mat& image, vector<KeyLine>& keylines, Mat& mask=Mat() )
.. ocv:function:: void detect( const vector<Mat>& images, vector<vector<KeyLine> >& keylines, vector<Mat>& masks=vector<Mat>() ) const
:param image: input image
:param images: input images
:param keylines: vector or set of vectors that will store extracted lines for one or more images
:param mask: mask matrix to detect only KeyLines of interest
:param masks: vector of mask matrices to detect only KeyLines of interest from each input image
BinaryDescriptor::compute
-------------------------
Requires descriptors computation (for one or more images)
.. ocv:function:: void compute( const Mat& image, vector<KeyLine>& keylines, Mat& descriptors, bool returnFloatDescr ) const
.. ocv:function:: void compute( const vector<Mat>& images, vector<vector<KeyLine> >& keylines, vector<Mat>& descriptors, bool returnFloatDescr ) const
:param image: input image
:param images: input images
:param keylines: vector or set of vectors containing lines for which descriptors must be computed
:param mask: mask to select for which lines, among the ones provided in input, descriptors must be computed
:param masks: set of masks to select for which lines, among the ones provided in input, descriptors must be computed
:param returnFloatDescr: flag (when set to true, original non-binary descriptors are returned)
Related pages
-------------
* :ref:`line_descriptor`
* :ref:`matching`
* :ref:`drawing`

@ -1,70 +0,0 @@
.. _drawing:
Drawing Functions for Keylines and Matches
==========================================
.. highlight:: cpp
drawLineMatches
---------------
Draws the found matches of keylines from two images.
.. ocv:function:: void drawLineMatches( const Mat& img1, const std::vector<KeyLine>& keylines1, const Mat& img2, const std::vector<KeyLine>& keylines2, const std::vector<DMatch>& matches1to2, Mat& outImg, const Scalar& matchColor=Scalar::all(-1), const Scalar& singleLineColor=Scalar::all(-1), const std::vector<char>& matchesMask=std::vector<char>(), int flags=DrawLinesMatchesFlags::DEFAULT )
:param img1: first image
:param keylines1: keylines extracted from first image
:param img2: second image
:param keylines2: keylines extracted from second image
:param matches1to2: vector of matches
:param outImg: output matrix to draw on
:param matchColor: drawing color for matches (chosen randomly in case of default value)
:param singleLineColor: drawing color for keylines (chosen randomly in case of default value)
:param matchesMask: mask to indicate which matches must be drawn
:param flags: drawing flags
.. note:: If both *matchColor* and *singleLineColor* are set to their default values, function draws matched lines and line connecting them with same color
The structure of drawing flags is shown in the following:
.. code-block:: cpp
/* struct for drawing options */
struct CV_EXPORTS DrawLinesMatchesFlags
{
enum
{
DEFAULT = 0, // Output image matrix will be created (Mat::create),
// i.e. existing memory of output image may be reused.
// Two source images, matches, and single keylines
// will be drawn.
DRAW_OVER_OUTIMG = 1, // Output image matrix will not be
// created (using Mat::create). Matches will be drawn
// on existing content of output image.
NOT_DRAW_SINGLE_LINES = 2 // Single keylines will not be drawn.
};
};
..
drawKeylines
------------
Draws keylines.
.. ocv:function:: void drawKeylines( const Mat& image, const std::vector<KeyLine>& keylines, Mat& outImage, const Scalar& color=Scalar::all(-1), int flags=DrawLinesMatchesFlags::DEFAULT )
:param image: input image
:param keylines: keylines to be drawn
:param outImage: output image to draw on
:param color: color of lines to be drawn (if set to defaul value, color is chosen randomly)
:param flags: drawing flags
Related pages
-------------
* :ref:`line_descriptor`
* :ref:`binary_descriptor`
* :ref:`matching`

@ -1,142 +0,0 @@
.. _line_descriptor:
Binary descriptors for lines extracted from an image
====================================================
.. highlight:: cpp
Introduction
------------
One of the most challenging activities in computer vision is the extraction of useful information from a given image. Such information, usually comes in the form of points that preserve some kind of property (for instance, they are scale-invariant) and are actually representative of input image.
The goal of this module is seeking a new kind of representative information inside an image and providing the functionalities for its extraction and representation. In particular, differently from previous methods for detection of relevant elements inside an image, lines are extracted in place of points; a new class is defined ad hoc to summarize a line's properties, for reuse and plotting purposes.
A class to represent a line: KeyLine
------------------------------------
As aformentioned, it is been necessary to design a class that fully stores the information needed to characterize completely a line and plot it on image it was extracted from, when required.
*KeyLine* class has been created for such goal; it is mainly inspired to Feature2d's KeyPoint class, since KeyLine shares some of *KeyPoint*'s fields, even if a part of them assumes a different meaning, when speaking about lines.
In particular:
* the *class_id* field is used to gather lines extracted from different octaves which refer to same line inside original image (such lines and the one they represent in original image share the same *class_id* value)
* the *angle* field represents line's slope with respect to (positive) X axis
* the *pt* field represents line's midpoint
* the *response* field is computed as the ratio between the line's length and maximum between image's width and height
* the *size* field is the area of the smallest rectangle containing line
Apart from fields inspired to KeyPoint class, KeyLines stores information about extremes of line in original image and in octave it was extracted from, about line's length and number of pixels it covers. Code relative to KeyLine class is reported in the following snippet:
.. ocv:class:: KeyLine
::
class CV_EXPORTS_W KeyLine
{
public:
/* orientation of the line */
float angle;
/* object ID, that can be used to cluster keylines by the line they represent */
int class_id;
/* octave (pyramid layer), from which the keyline has been extracted */
int octave;
/* coordinates of the middlepoint */
Point pt;
/* the response, by which the strongest keylines have been selected.
It's represented by the ratio between line's length and maximum between
image's width and height */
float response;
/* minimum area containing line */
float size;
/* lines's extremes in original image */
float startPointX;
float startPointY;
float endPointX;
float endPointY;
/* line's extremes in image it was extracted from */
float sPointInOctaveX;
float sPointInOctaveY;
float ePointInOctaveX;
float ePointInOctaveY;
/* the length of line */
float lineLength;
/* number of pixels covered by the line */
unsigned int numOfPixels;
/* constructor */
KeyLine(){}
};
Computation of binary descriptors
---------------------------------
To obtatin a binary descriptor representing a certain line detected from a certain octave of an image, we first compute a non-binary descriptor as described in [LBD]_. Such algorithm works on lines extracted using EDLine detector, as explained in [EDL]_. Given a line, we consider a rectangular region centered at it and called *line support region (LSR)*. Such region is divided into a set of bands :math:`\{B_1, B_2, ..., B_m\}`, whose length equals the one of line.
If we indicate with :math:`\bf{d}_L` the direction of line, the orthogonal and clockwise direction to line :math:`\bf{d}_{\perp}` can be determined; these two directions, are used to construct a reference frame centered in the middle point of line. The gradients of pixels :math:`\bf{g'}` inside LSR can be projected to the newly determined frame, obtaining their local equivalent :math:`\bf{g'} = (\bf{g}^T \cdot \bf{d}_{\perp}, \bf{g}^T \cdot \bf{d}_L)^T \triangleq (\bf{g'}_{d_{\perp}}, \bf{g'}_{d_L})^T`.
Later on, a Gaussian function is applied to all LSR's pixels along :math:`\bf{d}_\perp` direction; first, we assign a global weighting coefficient :math:`f_g(i) = (1/\sqrt{2\pi}\sigma_g)e^{-d^2_i/2\sigma^2_g}` to *i*-th row in LSR, where :math:`d_i` is the distance of *i*-th row from the center row in LSR, :math:`\sigma_g = 0.5(m \cdot w - 1)` and :math:`w` is the width of bands (the same for every band). Secondly, considering a band :math:`B_j` and its neighbor bands :math:`B_{j-1}, B_{j+1}`, we assign a local weighting :math:`F_l(k) = (1/\sqrt{2\pi}\sigma_l)e^{-d'^2_k/2\sigma_l^2}`, where :math:`d'_k` is the distance of *k*-th row from the center row in :math:`B_j` and :math:`\sigma_l = w`. Using the global and local weights, we obtain, at the same time, the reduction of role played by gradients far from line and of boundary effect, respectively.
Each band :math:`B_j` in LSR has an associated *band descriptor(BD)* which is computed considering previous and next band (top and bottom bands are ignored when computing descriptor for first and last band). Once each band has been assignen its BD, the LBD descriptor of line is simply given by
.. math::
LBD = (BD_1^T, BD_2^T, ... , BD^T_m)^T.
To compute a band descriptor :math:`B_j`, each *k*-th row in it is considered and the gradients in such row are accumulated:
.. math::
\begin{matrix} \bf{V1}^k_j = \lambda \sum\limits_{\bf{g}'_{d_\perp}>0}\bf{g}'_{d_\perp}, & \bf{V2}^k_j = \lambda \sum\limits_{\bf{g}'_{d_\perp}<0} -\bf{g}'_{d_\perp}, \\ \bf{V3}^k_j = \lambda \sum\limits_{\bf{g}'_{d_L}>0}\bf{g}'_{d_L}, & \bf{V4}^k_j = \lambda \sum\limits_{\bf{g}'_{d_L}<0} -\bf{g}'_{d_L}\end{matrix}.
with :math:`\lambda = f_g(k)f_l(k)`.
By stacking previous results, we obtain the *band description matrix (BDM)*
.. math::
BDM_j = \left(\begin{matrix} \bf{V1}_j^1 & \bf{V1}_j^2 & \ldots & \bf{V1}_j^n \\ \bf{V2}_j^1 & \bf{V2}_j^2 & \ldots & \bf{V2}_j^n \\ \bf{V3}_j^1 & \bf{V3}_j^2 & \ldots & \bf{V3}_j^n \\ \bf{V4}_j^1 & \bf{V4}_j^2 & \ldots & \bf{V4}_j^n \end{matrix} \right) \in \mathbb{R}^{4\times n},
with :math:`n` the number of rows in band :math:`B_j`:
.. math::
n = \begin{cases} 2w, & j = 1||m; \\ 3w, & \mbox{else}. \end{cases}
Each :math:`BD_j` can be obtained using the standard deviation vector :math:`S_j` and mean vector :math:`M_j` of :math:`BDM_J`. Thus, finally:
.. math::
LBD = (M_1^T, S_1^T, M_2^T, S_2^T, \ldots, M_m^T, S_m^T)^T \in \mathbb{R}^{8m}
Once the LBD has been obtained, it must be converted into a binary form. For such purpose, we consider 32 possible pairs of BD inside it; each couple of BD is compared bit by bit and comparison generates an 8 bit string. Concatenating 32 comparison strings, we get the 256-bit final binary representation of a single LBD.
References
----------
.. [LBD] Zhang, Lilian, and Reinhard Koch. *An efficient and robust line segment matching approach based on LBD descriptor and pairwise geometric consistency*, Journal of Visual Communication and Image Representation 24.7 (2013): 794-805.
.. [EDL] Von Gioi, R. Grompone, et al. *LSD: A fast line segment detector with a false detection control*, IEEE Transactions on Pattern Analysis and Machine Intelligence 32.4 (2010): 722-732.
Summary
-------
.. toctree::
:maxdepth: 2
binary_descriptor
LSDDetector
matching
drawing_functions
tutorial

@ -1,143 +0,0 @@
.. _matching:
Matching with binary descriptors
================================
.. highlight:: cpp
Once descriptors have been extracted from an image (both they represent lines and points), it becomes interesting to be able to match a descriptor with another one extracted from a different image and representing the same line or point, seen from a differente perspective or on a different scale.
In reaching such goal, the main headache is designing an efficient search algorithm to associate a query descriptor to one extracted from a dataset.
In the following, a matching modality based on *Multi-Index Hashing (MiHashing)* will be described.
Multi-Index Hashing
-------------------
The theory described in this section is based on [MIH]_.
Given a dataset populated with binary codes, each code is indexed *m* times into *m* different hash tables, according to *m* substrings it has been divided into. Thus, given a query code, all the entries close to it at least in one substring are returned by search as *neighbor candidates*. Returned entries are then checked for validity by verifying that their full codes are not distant (in Hamming space) more than *r* bits from query code.
In details, each binary code **h** composed of *b* bits is divided into *m* disjoint substrings :math:`\mathbf{h}^{(1)}, ..., \mathbf{h}^{(m)}`, each with length :math:`\lfloor b/m \rfloor` or :math:`\lceil b/m \rceil` bits. Formally, when two codes **h** and **g** differ by at the most *r* bits, in at the least one of their *m* substrings they differ by at the most :math:`\lfloor r/m \rfloor` bits. In particular, when :math:`||\mathbf{h}-\mathbf{g}||_H \le r` (where :math:`||.||_H` is the Hamming norm), there must exist a substring *k* (with :math:`1 \le k \le m`) such that
.. math::
||\mathbf{h}^{(k)} - \mathbf{g}^{(k)}||_H \le \left\lfloor \frac{r}{m} \right\rfloor .
That means that if Hamming distance between each of the *m* substring is strictly greater than :math:`\lfloor r/m \rfloor`, then :math:`||\mathbf{h}-\mathbf{g}||_H` must be larger that *r* and that is a contradiction.
If the codes in dataset are divided into *m* substrings, then *m* tables will be built. Given a query **q** with substrings :math:`\{\mathbf{q}^{(i)}\}^m_{i=1}`, *i*-th hash table is searched for entries distant at the most :math:`\lfloor r/m \rfloor` from :math:`\mathbf{q}^{(i)}` and a set of candidates :math:`\mathcal{N}_i(\mathbf{q})` is obtained.
The union of sets :math:`\mathcal{N}(\mathbf{q}) = \bigcup_i \mathcal{N}_i(\mathbf{q})` is a superset of the *r*-neighbors of **q**. Then, last step of algorithm is computing the Hamming distance between **q** and each element in :math:`\mathcal{N}(\mathbf{q})`, deleting the codes that are distant more that *r* from **q**.
BinaryDescriptorMatcher Class
=============================
BinaryDescriptorMatcher Class furnishes all functionalities for querying a dataset provided by user or internal to class (that user must, anyway, populate) on the model of Feature2d's `DescriptorMatcher <http://docs.opencv.org/modules/features2d/doc/common_interfaces_of_descriptor_matchers.html?highlight=bfmatcher#descriptormatcher>`_.
BinaryDescriptorMatcher::BinaryDescriptorMatcher
--------------------------------------------------
Constructor.
.. ocv:function:: BinaryDescriptorMatcher::BinaryDescriptorMatcher()
The BinaryDescriptorMatcher constructed is able to store and manage 256-bits long entries.
BinaryDescriptorMatcher::createBinaryDescriptorMatcher
------------------------------------------------------
Create a BinaryDescriptorMatcher object and return a smart pointer to it.
.. ocv:function:: Ptr<BinaryDescriptorMatcher> BinaryDescriptorMatcher::createBinaryDescriptorMatcher()
BinaryDescriptorMatcher::add
----------------------------
Store locally new descriptors to be inserted in dataset, without updating dataset.
.. ocv:function:: void BinaryDescriptorMatcher::add( const std::vector<Mat>& descriptors )
:param descriptors: matrices containing descriptors to be inserted into dataset
.. note:: Each matrix *i* in **descriptors** should contain descriptors relative to lines extracted from *i*-th image.
BinaryDescriptorMatcher::train
------------------------------
Update dataset by inserting into it all descriptors that were stored locally by *add* function.
.. ocv:function:: void BinaryDescriptorMatcher::train()
.. note:: Every time this function is invoked, current dataset is deleted and locally stored descriptors are inserted into dataset. The locally stored copy of just inserted descriptors is then removed.
BinaryDescriptorMatcher::clear
------------------------------
Clear dataset and internal data
.. ocv:function:: void BinaryDescriptorMatcher::clear()
BinaryDescriptorMatcher::match
------------------------------
For every input query descriptor, retrieve the best matching one from a dataset provided from user or from the one internal to class
.. ocv:function:: void BinaryDescriptorMatcher::match( const Mat& queryDescriptors, const Mat& trainDescriptors, std::vector<DMatch>& matches, const Mat& mask=Mat() ) const
.. ocv:function:: void BinaryDescriptorMatcher::match( const Mat& queryDescriptors, std::vector<DMatch>& matches, const std::vector<Mat>& masks=std::vector<Mat>() )
:param queryDescriptors: query descriptors
:param trainDescriptors: dataset of descriptors furnished by user
:param matches: vector to host retrieved matches
:param mask: mask to select which input descriptors must be matched to one in dataset
:param masks: vector of masks to select which input descriptors must be matched to one in dataset (the *i*-th mask in vector indicates whether each input query can be matched with descriptors in dataset relative to *i*-th image)
BinaryDescriptorMatcher::knnMatch
---------------------------------
For every input query descriptor, retrieve the best *k* matching ones from a dataset provided from user or from the one internal to class
.. ocv:function:: void BinaryDescriptorMatcher::knnMatch( const Mat& queryDescriptors, const Mat& trainDescriptors, std::vector<std::vector<DMatch> >& matches, int k, const Mat& mask=Mat(), bool compactResult=false ) const
.. ocv:function:: void BinaryDescriptorMatcher::knnMatch( const Mat& queryDescriptors, std::vector<std::vector<DMatch> >& matches, int k, const std::vector<Mat>& masks=std::vector<Mat>(), bool compactResult=false )
:param queryDescriptors: query descriptors
:param trainDescriptors: dataset of descriptors furnished by user
:param matches: vector to host retrieved matches
:param k: number of the closest descriptors to be returned for every input query
:param mask: mask to select which input descriptors must be matched to ones in dataset
:param masks: vector of masks to select which input descriptors must be matched to ones in dataset (the *i*-th mask in vector indicates whether each input query can be matched with descriptors in dataset relative to *i*-th image)
:param compactResult: flag to obtain a compact result (if true, a vector that doesn't contain any matches for a given query is not inserted in final result)
BinaryDescriptorMatcher::radiusMatch
------------------------------------
For every input query descriptor, retrieve, from a dataset provided from user or from the one internal to class, all the descriptors that are not further than *maxDist* from input query
.. ocv:function:: void BinaryDescriptorMatcher::radiusMatch( const Mat& queryDescriptors, const Mat& trainDescriptors, std::vector<std::vector<DMatch> >& matches, float maxDistance, const Mat& mask=Mat(), bool compactResult=false ) const
.. ocv:function:: void BinaryDescriptorMatcher::radiusMatch( const Mat& queryDescriptors, std::vector<std::vector<DMatch> >& matches, float maxDistance, const std::vector<Mat>& masks=std::vector<Mat>(), bool compactResult=false )
:param queryDescriptors: query descriptors
:param trainDescriptors: dataset of descriptors furnished by user
:param matches: vector to host retrieved matches
:param maxDist: search radius
:param mask: mask to select which input descriptors must be matched to ones in dataset
:param masks: vector of masks to select which input descriptors must be matched to ones in dataset (the *i*-th mask in vector indicates whether each input query can be matched with descriptors in dataset relative to *i*-th image)
:param compactResult: flag to obtain a compact result (if true, a vector that doesn't contain any matches for a given query is not inserted in final result)
Related pages
-------------
* :ref:`line_descriptor`
* :ref:`binary_descriptor`
* :ref:`drawing`
References
----------
.. [MIH] Norouzi, Mohammad, Ali Punjani, and David J. Fleet. *Fast search in hamming space with multi-index hashing*, Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.

@ -1,428 +0,0 @@
Line Features Tutorial
======================
In this tutorial it will be shown how to:
* use the *BinaryDescriptor* interface to extract lines and store them in *KeyLine* objects
* use the same interface to compute descriptors for every extracted line
* use the *BynaryDescriptorMatcher* to determine matches among descriptors obtained from different images
Lines extraction and descriptors computation
--------------------------------------------
In the following snippet of code, it is shown how to detect lines from an image. The LSD extractor is initialized with *LSD_REFINE_ADV* option; remaining parameters are left to their default values. A mask of ones is used in order to accept all extracted lines, which, at the end, are displayed using random colors for octave 0.
.. code-block:: cpp
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
using namespace std;
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
cout << "\nThis example shows the functionalities of lines extraction " << "furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_lines_extraction <path_to_input_image>" << endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String image_path = parser.get<String>( 0 );
if( image_path.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat = imread( image_path, 1 );
if( imageMat.data == NULL )
{
std::cout << "Error, image could not be loaded. Please, check its path" << std::endl;
}
/* create a ramdom binary mask */
cv::Mat mask = Mat::ones( imageMat.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with deafult parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* create a structure to store extracted lines */
vector<KeyLine> lines;
/* extract lines */
bd->detect( imageMat, lines, mask );
/* draw lines extracted from octave 0 */
cv::Mat output = imageMat.clone();
if( output.channels() == 1 )
cvtColor( output, output, COLOR_GRAY2BGR );
for ( size_t i = 0; i < lines.size(); i++ )
{
KeyLine kl = lines[i];
if( kl.octave == 0)
{
/* get a random color */
int R = ( rand() % (int) ( 255 + 1 ) );
int G = ( rand() % (int) ( 255 + 1 ) );
int B = ( rand() % (int) ( 255 + 1 ) );
/* get extremes of line */
Point pt1 = Point( kl.startPointX, kl.startPointY );
Point pt2 = Point( kl.endPointX, kl.endPointY );
/* draw line */
line( output, pt1, pt2, Scalar( B, G, R ), 5 );
}
}
/* show lines on image */
imshow( "Lines", output );
waitKey();
}
..
This is the result obtained for famous cameraman image:
.. image:: pics/lines_cameraman_edl.png
:width: 512px
:align: center
:height: 512px
:alt: alternate text
Another way to extract lines is using *LSDDetector* class; such class uses the LSD extractor to compute lines. To obtain this result, it is sufficient to use the snippet code seen above, just modifying it by the rows
.. code-block:: cpp
/* create a pointer to an LSDDetector object */
Ptr<LSDDetector> lsd = LSDDetector::createLSDDetector();
/* compute lines */
std::vector<KeyLine> keylines;
lsd->detect( imageMat, keylines, mask );
..
Here's the result returned by LSD detector again on cameraman picture:
.. image:: pics/cameraman_lines2.png
:width: 512px
:align: center
:height: 512px
:alt: alternate text
Once keylines have been detected, it is possible to compute their descriptors as shown in the following:
.. code-block:: cpp
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of lines extraction " << "and descriptors computation furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_compute_descriptors <path_to_input_image>"
<< std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from command line */
CommandLineParser parser( argc, argv, keys );
String image_path = parser.get<String>( 0 );
if( image_path.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat = imread( image_path, 1 );
if( imageMat.data == NULL )
{
std::cout << "Error, image could not be loaded. Please, check its path" << std::endl;
}
/* create a binary mask */
cv::Mat mask = Mat::ones( imageMat.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with default parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines */
std::vector<KeyLine> keylines;
bd->detect( imageMat, keylines, mask );
/* compute descriptors */
cv::Mat descriptors;
bd->compute( imageMat, keylines, descriptors );
}
..
Matching among descriptors
--------------------------
If we have extracted descriptors from two different images, it is possible to search for matches among them. One way of doing it is matching exactly a descriptor to each input query descriptor, choosing the one at closest distance:
.. code-block:: cpp
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
using namespace cv;
static const char* keys =
{ "{@image_path1 | | Image path 1 }"
"{@image_path2 | | Image path 2 }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of lines extraction " << "and descriptors computation furnished by BinaryDescriptor class\n"
<< "Please, run this sample using a command in the form\n" << "./example_line_descriptor_compute_descriptors <path_to_input_image 1>"
<< "<path_to_input_image 2>" << std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String image_path1 = parser.get<String>( 0 );
String image_path2 = parser.get<String>( 1 );
if( image_path1.empty() || image_path2.empty() )
{
help();
return -1;
}
/* load image */
cv::Mat imageMat1 = imread( image_path1, 1 );
cv::Mat imageMat2 = imread( image_path2, 1 );
waitKey();
if( imageMat1.data == NULL || imageMat2.data == NULL )
{
std::cout << "Error, images could not be loaded. Please, check their path" << std::endl;
}
/* create binary masks */
cv::Mat mask1 = Mat::ones( imageMat1.size(), CV_8UC1 );
cv::Mat mask2 = Mat::ones( imageMat2.size(), CV_8UC1 );
/* create a pointer to a BinaryDescriptor object with default parameters */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines */
std::vector<KeyLine> keylines1, keylines2;
bd->detect( imageMat1, keylines1, mask1 );
bd->detect( imageMat2, keylines2, mask2 );
/* compute descriptors */
cv::Mat descr1, descr2;
bd->compute( imageMat1, keylines1, descr1 );
bd->compute( imageMat2, keylines2, descr2 );
/* create a BinaryDescriptorMatcher object */
Ptr<BinaryDescriptorMatcher> bdm = BinaryDescriptorMatcher::createBinaryDescriptorMatcher();
/* require match */
std::vector<DMatch> matches;
bdm->match( descr1, descr2, matches );
/* plot matches */
cv::Mat outImg;
std::vector<char> mask( matches.size(), 1 );
drawLineMatches( imageMat1, keylines1, imageMat2, keylines2, matches, outImg, Scalar::all( -1 ), Scalar::all( -1 ), mask,
DrawLinesMatchesFlags::DEFAULT );
imshow( "Matches", outImg );
waitKey();
}
..
Sometimes, we could be interested in searching for the closest *k* descriptors, given an input one. This requires to modify slightly previous code:
.. code-block:: cpp
/* prepare a structure to host matches */
std::vector<std::vector<DMatch> > matches;
/* require knn match */
bdm->knnMatch( descr1, descr2, matches, 6 );
..
In the above example, the closest 6 descriptors are returned for every query. In some cases, we could have a search radius and look for all descriptors distant at the most *r* from input query. Previous code must me modified:
.. code-block:: cpp
/* prepare a structure to host matches */
std::vector<std::vector<DMatch> > matches;
/* compute matches */
bdm->radiusMatch( queries, matches, 30 );
..
Here's an example om matching among descriptors extratced from original cameraman image and its downsampled (and blurred) version:
.. image:: pics/matching2.png
:width: 765px
:align: center
:height: 540px
:alt: alternate text
Querying internal database
--------------------------
The *BynaryDescriptorMatcher* class, owns an internal database that can be populated with descriptors extracted from different images and queried using one of the modalities described in previous section.
Population of internal dataset can be done using the *add* function; such function doesn't directly add new data to database, but it just stores it them locally. The real update happens when function *train* is invoked or when any querying function is executed, since each of them invokes *train* before querying.
When queried, internal database not only returns required descriptors, but, for every returned match, it is able to tell which image matched descriptor was extracted from.
An example of internal dataset usage is described in the following code; after adding locally new descriptors, a radius search is invoked. This provokes local data to be transferred to dataset, which, in turn, is then queried.
.. code-block:: cpp
#include <opencv2/line_descriptor.hpp>
#include "opencv2/core/utility.hpp"
#include "opencv2/core/private.hpp"
#include <opencv2/imgproc.hpp>
#include <opencv2/features2d.hpp>
#include <opencv2/highgui.hpp>
#include <iostream>
#include <vector>
using namespace cv;
static const std::string images[] =
{ "cameraman.jpg", "church.jpg", "church2.png", "einstein.jpg", "stuff.jpg" };
static const char* keys =
{ "{@image_path | | Image path }" };
static void help()
{
std::cout << "\nThis example shows the functionalities of radius matching " << "Please, run this sample using a command in the form\n"
<< "./example_line_descriptor_radius_matching <path_to_input_images>/" << std::endl;
}
int main( int argc, char** argv )
{
/* get parameters from comand line */
CommandLineParser parser( argc, argv, keys );
String pathToImages = parser.get<String>( 0 );
/* create structures for hosting KeyLines and descriptors */
int num_elements = sizeof ( images ) / sizeof ( images[0] );
std::vector<Mat> descriptorsMat;
std::vector<std::vector<KeyLine> > linesMat;
/*create a pointer to a BinaryDescriptor object */
Ptr<BinaryDescriptor> bd = BinaryDescriptor::createBinaryDescriptor();
/* compute lines and descriptors */
for ( int i = 0; i < num_elements; i++ )
{
/* get path to image */
std::stringstream image_path;
image_path << pathToImages << images[i];
/* load image */
Mat loadedImage = imread( image_path.str().c_str(), 1 );
if( loadedImage.data == NULL )
{
std::cout << "Could not load images." << std::endl;
help();
exit( -1 );
}
/* compute lines and descriptors */
std::vector<KeyLine> lines;
Mat computedDescr;
bd->detect( loadedImage, lines );
bd->compute( loadedImage, lines, computedDescr );
descriptorsMat.push_back( computedDescr );
linesMat.push_back( lines );
}
/* compose a queries matrix */
Mat queries;
for ( size_t j = 0; j < descriptorsMat.size(); j++ )
{
if( descriptorsMat[j].rows >= 5 )
queries.push_back( descriptorsMat[j].rowRange( 0, 5 ) );
else if( descriptorsMat[j].rows > 0 && descriptorsMat[j].rows < 5 )
queries.push_back( descriptorsMat[j] );
}
std::cout << "It has been generated a matrix of " << queries.rows << " descriptors" << std::endl;
/* create a BinaryDescriptorMatcher object */
Ptr<BinaryDescriptorMatcher> bdm = BinaryDescriptorMatcher::createBinaryDescriptorMatcher();
/* populate matcher */
bdm->add( descriptorsMat );
/* compute matches */
std::vector<std::vector<DMatch> > matches;
bdm->radiusMatch( queries, matches, 30 );
/* print matches */
for ( size_t q = 0; q < matches.size(); q++ )
{
for ( size_t m = 0; m < matches[q].size(); m++ )
{
DMatch dm = matches[q][m];
std::cout << "Descriptor: " << q << " Image: " << dm.imgIdx << " Distance: " << dm.distance << std::endl;
}
}
}
..

@ -1,116 +0,0 @@
Dense Optical Flow
===================
Dense optical flow algorithms compute motion for each point
calcOpticalFlowSF
-----------------
Calculate an optical flow using "SimpleFlow" algorithm.
.. ocv:function:: void calcOpticalFlowSF( InputArray from, InputArray to, OutputArray flow, int layers, int averaging_block_size, int max_flow )
.. ocv:function:: calcOpticalFlowSF( InputArray from, InputArray to, OutputArray flow, int layers, int averaging_block_size, int max_flow, double sigma_dist, double sigma_color, int postprocess_window, double sigma_dist_fix, double sigma_color_fix, double occ_thr, int upscale_averaging_radius, double upscale_sigma_dist, double upscale_sigma_color, double speed_up_thr )
:param prev: First 8-bit 3-channel image.
:param next: Second 8-bit 3-channel image of the same size as ``prev``
:param flow: computed flow image that has the same size as ``prev`` and type ``CV_32FC2``
:param layers: Number of layers
:param averaging_block_size: Size of block through which we sum up when calculate cost function for pixel
:param max_flow: maximal flow that we search at each level
:param sigma_dist: vector smooth spatial sigma parameter
:param sigma_color: vector smooth color sigma parameter
:param postprocess_window: window size for postprocess cross bilateral filter
:param sigma_dist_fix: spatial sigma for postprocess cross bilateralf filter
:param sigma_color_fix: color sigma for postprocess cross bilateral filter
:param occ_thr: threshold for detecting occlusions
:param upscale_averaging_radius: window size for bilateral upscale operation
:param upscale_sigma_dist: spatial sigma for bilateral upscale operation
:param upscale_sigma_color: color sigma for bilateral upscale operation
:param speed_up_thr: threshold to detect point with irregular flow - where flow should be recalculated after upscale
See [Tao2012]_. And site of project - http://graphics.berkeley.edu/papers/Tao-SAN-2012-05/.
.. note::
* An example using the simpleFlow algorithm can be found at samples/simpleflow_demo.cpp
optflow::OpticalFlowDeepFlow
-----------------------------
DeepFlow optical flow algorithm implementation.
.. ocv:class:: optflow::OpticalFlowDeepFlow: public DenseOpticalFlow
The class implements the DeepFlow optical flow algorithm described in [Weinzaepfel2013]_ . See also http://lear.inrialpes.fr/src/deepmatching/ .
Parameters - class fields - that may be modified after creating a class instance:
.. ocv:member:: float alpha
Smoothness assumption weight
.. ocv:member:: float delta
Color constancy assumption weight
.. ocv:member:: float gamma
Gradient constancy weight
.. ocv:member:: float sigma
Gaussian smoothing parameter
.. ocv:member:: int minSize
Minimal dimension of an image in the pyramid (next, smaller images in the pyramid are
generated until one of the dimensions reaches this size)
.. ocv:member:: float downscaleFactor
Scaling factor in the image pyramid (must be < 1)
.. ocv:member:: int fixedPointIterations
How many iterations on each level of the pyramid
.. ocv:member:: int sorIterations
Iterations of Succesive Over-Relaxation (solver)
.. ocv:member:: float omega
Relaxation factor in SOR
optflow::createOptFlow_DeepFlow
---------------------------------
Create an OpticalFlowDeepFlow instance
.. ocv:function:: Ptr<DenseOpticalFlow> optflow::createOptFlow_DeepFlow()
Returns a pointer to a ``DenseOpticalFlow`` instance - see ``DenseOpticalFlow::calc()``
.. [Tao2012] Michael Tao, Jiamin Bai, Pushmeet Kohli and Sylvain Paris. SimpleFlow: A Non-iterative, Sublinear Optical Flow Algorithm. Computer Graphics Forum (Eurographics 2012)
.. [Weinzaepfel2013] P. Weinzaepfel, J. Revaud, Z. Harchaoui, and C. Schmid, “DeepFlow: Large Displacement Optical Flow with Deep Matching,” 2013 IEEE Int. Conf. Comput. Vis., pp. 1385–1392, Dec. 2013.

@ -1,125 +0,0 @@
Motion Templates
================
Motion templates is alternative technique for detecting motion and computing its direction.
See ``samples/motempl.py``.
updateMotionHistory
-----------------------
Updates the motion history image by a moving silhouette.
.. ocv:function:: void updateMotionHistory( InputArray silhouette, InputOutputArray mhi, double timestamp, double duration )
.. ocv:pyfunction:: cv2.updateMotionHistory(silhouette, mhi, timestamp, duration) -> mhi
:param silhouette: Silhouette mask that has non-zero pixels where the motion occurs.
:param mhi: Motion history image that is updated by the function (single-channel, 32-bit floating-point).
:param timestamp: Current time in milliseconds or other units.
:param duration: Maximal duration of the motion track in the same units as ``timestamp`` .
The function updates the motion history image as follows:
.. math::
\texttt{mhi} (x,y)= \forkthree{\texttt{timestamp}}{if $\texttt{silhouette}(x,y) \ne 0$}{0}{if $\texttt{silhouette}(x,y) = 0$ and $\texttt{mhi} < (\texttt{timestamp} - \texttt{duration})$}{\texttt{mhi}(x,y)}{otherwise}
That is, MHI pixels where the motion occurs are set to the current ``timestamp`` , while the pixels where the motion happened last time a long time ago are cleared.
The function, together with
:ocv:func:`calcMotionGradient` and
:ocv:func:`calcGlobalOrientation` , implements a motion templates technique described in
[Davis97]_ and [Bradski00]_.
calcMotionGradient
----------------------
Calculates a gradient orientation of a motion history image.
.. ocv:function:: void calcMotionGradient( InputArray mhi, OutputArray mask, OutputArray orientation, double delta1, double delta2, int apertureSize=3 )
.. ocv:pyfunction:: cv2.calcMotionGradient(mhi, delta1, delta2[, mask[, orientation[, apertureSize]]]) -> mask, orientation
:param mhi: Motion history single-channel floating-point image.
:param mask: Output mask image that has the type ``CV_8UC1`` and the same size as ``mhi`` . Its non-zero elements mark pixels where the motion gradient data is correct.
:param orientation: Output motion gradient orientation image that has the same type and the same size as ``mhi`` . Each pixel of the image is a motion orientation, from 0 to 360 degrees.
:param delta1: Minimal (or maximal) allowed difference between ``mhi`` values within a pixel neighborhood.
:param delta2: Maximal (or minimal) allowed difference between ``mhi`` values within a pixel neighborhood. That is, the function finds the minimum ( :math:`m(x,y)` ) and maximum ( :math:`M(x,y)` ) ``mhi`` values over :math:`3 \times 3` neighborhood of each pixel and marks the motion orientation at :math:`(x, y)` as valid only if
.. math::
\min ( \texttt{delta1} , \texttt{delta2} ) \le M(x,y)-m(x,y) \le \max ( \texttt{delta1} , \texttt{delta2} ).
:param apertureSize: Aperture size of the :ocv:func:`Sobel` operator.
The function calculates a gradient orientation at each pixel
:math:`(x, y)` as:
.. math::
\texttt{orientation} (x,y)= \arctan{\frac{d\texttt{mhi}/dy}{d\texttt{mhi}/dx}}
In fact,
:ocv:func:`fastAtan2` and
:ocv:func:`phase` are used so that the computed angle is measured in degrees and covers the full range 0..360. Also, the ``mask`` is filled to indicate pixels where the computed angle is valid.
.. note::
* (Python) An example on how to perform a motion template technique can be found at opencv_source_code/samples/python2/motempl.py
calcGlobalOrientation
-------------------------
Calculates a global motion orientation in a selected region.
.. ocv:function:: double calcGlobalOrientation( InputArray orientation, InputArray mask, InputArray mhi, double timestamp, double duration )
.. ocv:pyfunction:: cv2.calcGlobalOrientation(orientation, mask, mhi, timestamp, duration) -> retval
:param orientation: Motion gradient orientation image calculated by the function :ocv:func:`calcMotionGradient` .
:param mask: Mask image. It may be a conjunction of a valid gradient mask, also calculated by :ocv:func:`calcMotionGradient` , and the mask of a region whose direction needs to be calculated.
:param mhi: Motion history image calculated by :ocv:func:`updateMotionHistory` .
:param timestamp: Timestamp passed to :ocv:func:`updateMotionHistory` .
:param duration: Maximum duration of a motion track in milliseconds, passed to :ocv:func:`updateMotionHistory` .
The function calculates an average
motion direction in the selected region and returns the angle between
0 degrees and 360 degrees. The average direction is computed from
the weighted orientation histogram, where a recent motion has a larger
weight and the motion occurred in the past has a smaller weight, as recorded in ``mhi`` .
segmentMotion
-------------
Splits a motion history image into a few parts corresponding to separate independent motions (for example, left hand, right hand).
.. ocv:function:: void segmentMotion(InputArray mhi, OutputArray segmask, vector<Rect>& boundingRects, double timestamp, double segThresh)
.. ocv:pyfunction:: cv2.segmentMotion(mhi, timestamp, segThresh[, segmask]) -> segmask, boundingRects
:param mhi: Motion history image.
:param segmask: Image where the found mask should be stored, single-channel, 32-bit floating-point.
:param boundingRects: Vector containing ROIs of motion connected components.
:param timestamp: Current time in milliseconds or other units.
:param segThresh: Segmentation threshold that is recommended to be equal to the interval between motion history "steps" or greater.
The function finds all of the motion segments and marks them in ``segmask`` with individual values (1,2,...). It also computes a vector with ROIs of motion connected components. After that the motion direction for every component can be calculated with :ocv:func:`calcGlobalOrientation` using the extracted mask of the particular component.
.. [Bradski00] Davis, J.W. and Bradski, G.R. "Motion Segmentation and Pose Recognition with Motion History Gradients", WACV00, 2000
.. [Davis97] Davis, J.W. and Bobick, A.F. "The Representation and Recognition of Action Using Temporal Templates", CVPR97, 1997

@ -1,14 +0,0 @@
******************************************
optflow. Optical Flow Algorithms
******************************************
.. highlight:: cpp
The opencv_optflow module includes different algorithms for tracking points
.. toctree::
:maxdepth: 2
dense_optflow
motion_templates
optflow_io

@ -1,36 +0,0 @@
Optical flow input / output
============================
Functions reading and writing .flo files in "Middlebury" format, see: [MiddleburyFlo]_
readOpticalFlow
-----------------
Read a .flo file
.. ocv:function:: Mat readOpticalFlow( const String& path )
:param path: Path to the file to be loaded
The function readOpticalFlow loads a flow field from a file and returns it as a single matrix.
Resulting ``Mat`` has a type ``CV_32FC2`` - floating-point, 2-channel.
First channel corresponds to the flow in the horizontal direction (u), second - vertical (v).
writeOpticalFlow
-----------------
Write a .flo to disk
.. ocv:function:: bool writeOpticalFlow( const String& path, InputArray flow )
:param path: Path to the file to be written
:param flow: Flow field to be stored
The function stores a flow field in a file, returns ``true`` on success, ``false`` otherwise.
The flow field must be a 2-channel, floating-point matrix (``CV_32FC2``).
First channel corresponds to the flow in the horizontal direction (u), second - vertical (v).
.. [MiddleburyFlo] http://vision.middlebury.edu/flow/code/flow-code/README.txt

@ -1,8 +0,0 @@
**************************
reg. Image Registration
**************************
.. toctree::
:maxdepth: 2
registration

@ -1,66 +0,0 @@
Registration
============
.. highlight:: cpp
The Registration module implements parametric image registration. The implemented method is direct alignment, that is, it uses directly the pixel values for calculating the registration between a pair of images, as opposed to feature-based registration. The implementation follows essentially the corresponding part of [Szeliski06]_.
Feature based methods have some advantages over pixel based methods when we are trying to register pictures that have been shoot under different lighting conditions or exposition times, or when the images overlap only partially. On the other hand, the main advantage of pixel-based methods when compared to feature based methods is their better precision for some pictures (those shoot under similar lighting conditions and that have a significative overlap), due to the fact that we are using all the information available in the image, which allows us to achieve subpixel accuracy. This is particularly important for certain applications like multi-frame denoising or super-resolution.
In fact, pixel and feature registration methods can complement each other: an application could first obtain a coarse registration using features and then refine the registration using a pixel based method on the overlapping area of the images. The code developed allows this use case.
The module implements classes derived from the abstract classes cv::reg::Map or cv::reg::Mapper. The former models a coordinate transformation between two reference frames, while the later encapsulates a way of invoking a method that calculates a Map between two images. Although the objective has been to
implement pixel based methods, the module can be extended to support other methods that can calculate transformations between images (feature methods, optical flow, etc.).
Each class derived from Map implements a motion model, as follows:
* MapShift: Models a simple translation
* MapAffine: Models an affine transformation
* MapProject: Models a projective transformation
MapProject can also be used to model affine motion or translations, but some operations on it are more costly, and that is the reason for defining the other two classes.
The classes derived from Mapper are
* MapperGradShift: Gradient based alignment for calculating translations. It produces a MapShift (two parameters that correspond to the shift vector).
* MapperGradEuclid: Gradient based alignment for euclidean motions, that is, rotations and translations. It calculates three parameters (angle and shift vector), although the result is stored in a MapAffine object for convenience.
* MapperGradSimilar: Gradient based alignment for calculating similarities, which adds scaling to the euclidean motion. It calculates four parameters (two for the anti-symmetric matrix and two for the shift vector), although the result is stored in a MapAffine object for better convenience.
* MapperGradAffine: Gradient based alignment for an affine motion model. The number of parameters is six and the result is stored in a MapAffine object.
* MapperGradProj: Gradient based alignment for calculating projective transformations. The number of parameters is eight and the result is stored in a MapProject object.
* MapperPyramid: It implements hyerarchical motion estimation using a Gaussian pyramid. Its constructor accepts as argument any other object that implements the Mapper interface, and it is that mapper the one called by MapperPyramid for each scale of the pyramid.
If the motion between the images is not very small, the normal way of using these classes is to create a MapperGrad* object and use it as input to create a MapperPyramid, which in turn is called to perform the calculation. However, if the motion between the images is small enough, we can use directly the
MapperGrad* classes. Another possibility is to use first a feature based method to perform a coarse registration and then do a refinement through MapperPyramid or directly a MapperGrad* object. The "calculate" method of the mappers accepts an initial estimation of the motion as input.
When deciding which MapperGrad to use we must take into account that mappers with more parameters can handle more complex motions, but involve more calculations and are therefore slower. Also, if we are confident on the motion model that is followed by the sequence, increasing the number of parameters beyond what we need will decrease the accuracy: it is better to use the least number of degrees of freedom that we can.
In the module tests there are examples that show how to register a pair of images using any of the implemented mappers.
reg::Map
--------
Base class for modelling a Map between two images.
.. ocv:class:: reg::Map
The class is only used to define the common interface for any possible map.
reg::Mapper
-----------
Base class for modelling an algorithm for calculating a .. ocv:class:: reg::Map.
.. ocv:class:: reg::Mapper
The class is only used to define the common interface for any possible mapping algorithm.
.. [Szeliski06] R. Szeliski. Image alignment and stitching: A tutorial. Foundations and Trends in Computer Graphics and Vision, vol. 2, no 1, pp. 1-104, 2006.

@ -1,8 +0,0 @@
**************************
rgbd. RGB-Depth Processing
**************************
.. highlight:: cpp
.. toctree::
:maxdepth: 2

@ -1,67 +0,0 @@
Common Interfaces of Saliency
=============================
.. highlight:: cpp
Saliency : Algorithm
--------------------
.. ocv:class:: Saliency
Base abstract class for Saliency algorithms::
class CV_EXPORTS Saliency : public virtual Algorithm
{
public:
virtual ~Saliency();
static Ptr<Saliency> create( const String& saliencyType );
bool computeSaliency( const InputArray image, OutputArray saliencyMap );
String getClassName() const;
};
Saliency::create
----------------
Creates a specialized saliency algorithm by its name.
.. ocv:function:: static Ptr<Saliency> Saliency::create( const String& saliencyType )
:param saliencyType: saliency Type
The following saliency types are now supported:
* ``"SPECTRAL_RESIDUAL"`` -- :ocv:class:`StaticSaliencySpectralResidual`
* ``"BING"`` -- :ocv:class:`ObjectnessBING`
Saliency::computeSaliency
-------------------------
Performs all the operations, according to the specific algorithm created, to obtain the saliency map.
.. ocv:function:: bool Saliency::computeSaliency( const InputArray image, OutputArray saliencyMap )
:param image: image or set of input images. According to InputArray proxy and to the needs of different algorithms (currently plugged), the param image may be *Mat* or *vector<Mat>*
:param saliencyMap: saliency map. According to OutputArray proxy and to the results given by different algorithms (currently plugged), the saliency map may be a *Mat* or *vector<Vec4i>* (BING results).
Saliency::getClassName
----------------------
Get the name of the specific Saliency Algorithm.
.. ocv:function:: String Saliency::getClassName() const

@ -1,66 +0,0 @@
Motion Saliency Algorithms
============================
.. highlight:: cpp
Algorithms belonging to this category, are particularly focused to detect salient objects over time (hence also over frame), then there is a temporal component sealing cosider that allows to detect "moving" objects as salient, meaning therefore also the more general sense of detection the changes in the scene.
Presently, the Fast Self-tuning Background Subtraction Algorithm [BinWangApr2014]_ has been implemented.
.. [BinWangApr2014] B. Wang and P. Dudek "A Fast Self-tuning Background Subtraction Algorithm", in proc of IEEE Workshop on Change Detection, 2014
MotionSaliencyBinWangApr2014
----------------------------
.. ocv:class:: MotionSaliencyBinWangApr2014
Implementation of MotionSaliencyBinWangApr2014 from :ocv:class:`MotionSaliency`::
class CV_EXPORTS MotionSaliencyBinWangApr2014 : public MotionSaliency
{
public:
MotionSaliencyBinWangApr2014();
~MotionSaliencyBinWangApr2014();
void setImagesize( int W, int H );
bool init();
protected:
bool computeSaliencyImpl( const InputArray image, OutputArray saliencyMap );
};
MotionSaliencyBinWangApr2014::MotionSaliencyBinWangApr2014
----------------------------------------------------------
Constructor
.. ocv:function:: MotionSaliencyBinWangApr2014::MotionSaliencyBinWangApr2014()
MotionSaliencyBinWangApr2014::setImagesize
------------------------------------------
This is a utility function that allows to set the correct size (taken from the input image) in the corresponding variables that will be used to size the data structures of the algorithm.
.. ocv:function:: void MotionSaliencyBinWangApr2014::setImagesize( int W, int H )
:param W: width of input image
:param H: height of input image
MotionSaliencyBinWangApr2014::init
-----------------------------------
This function allows the correct initialization of all data structures that will be used by the algorithm.
.. ocv:function:: bool MotionSaliencyBinWangApr2014::init()
MotionSaliencyBinWangApr2014::computeSaliencyImpl
-------------------------------------------------
Performs all the operations and calls all internal functions necessary for the accomplishment of the Fast Self-tuning Background Subtraction Algorithm algorithm.
.. ocv:function:: bool MotionSaliencyBinWangApr2014::computeSaliencyImpl( const InputArray image, OutputArray saliencyMap )
:param image: input image. According to the needs of this specialized algorithm, the param image is a single *Mat*.
:param saliencyMap: Saliency Map. Is a binarized map that, in accordance with the nature of the algorithm, highlights the moving objects or areas of change in the scene.
The saliency map is given by a single *Mat* (one for each frame of an hypothetical video stream).

@ -1,82 +0,0 @@
Objectness Algorithms
============================
.. highlight:: cpp
Objectness is usually represented as a value which reflects how likely an image window covers an object of any category. Algorithms belonging to this category, avoid making decisions early on, by proposing a small number of category-independent proposals, that are expected to cover all objects in an image. Being able to perceive objects before identifying them is closely related to bottom up visual attention (saliency)
Presently, the Binarized normed gradients algorithm [BING]_ has been implemented.
.. [BING] Cheng, Ming-Ming, et al. "BING: Binarized normed gradients for objectness estimation at 300fps." IEEE CVPR. 2014.
ObjectnessBING
--------------
.. ocv:class:: ObjectnessBING
Implementation of BING from :ocv:class:`Objectness`::
class CV_EXPORTS ObjectnessBING : public Objectness
{
public:
ObjectnessBING();
~ObjectnessBING();
void read();
void write() const;
vector<float> getobjectnessValues();
void setTrainingPath( string trainingPath );
void setBBResDir( string resultsDir );
protected:
bool computeSaliencyImpl( const InputArray src, OutputArray dst );
};
ObjectnessBING::ObjectnessBING
------------------------------
Constructor
.. ocv:function:: ObjectnessBING::ObjectnessBING()
ObjectnessBING::getobjectnessValues
-----------------------------------
Return the list of the rectangles' objectness value, in the same order as the *vector<Vec4i> objectnessBoundingBox* returned by the algorithm (in computeSaliencyImpl function).
The bigger value these scores are, it is more likely to be an object window.
.. ocv:function:: vector<float> ObjectnessBING::getobjectnessValues()
ObjectnessBING::setTrainingPath
--------------------------------
This is a utility function that allows to set the correct path from which the algorithm will load the trained model.
.. ocv:function:: void ObjectnessBING::setTrainingPath( string trainingPath )
:param trainingPath: trained model path
ObjectnessBING::setBBResDir
---------------------------
This is a utility function that allows to set an arbitrary path in which the algorithm will save the optional results
(ie writing on file the total number and the list of rectangles returned by objectess, one for each row).
.. ocv:function:: void ObjectnessBING::setBBResDir( string resultsDir )
:param setBBResDir: results' folder path
ObjectnessBING::computeSaliencyImpl
-----------------------------------
Performs all the operations and calls all internal functions necessary for the accomplishment of the Binarized normed gradients algorithm.
.. ocv:function:: bool ObjectnessBING::computeSaliencyImpl( const InputArray image, OutputArray objectnessBoundingBox )
:param image: input image. According to the needs of this specialized algorithm, the param image is a single *Mat*
:param saliencyMap: objectness Bounding Box vector. According to the result given by this specialized algorithm, the objectnessBoundingBox is a *vector<Vec4i>*.
Each bounding box is represented by a *Vec4i* for (minX, minY, maxX, maxY).

@ -1,47 +0,0 @@
Saliency API
============
.. highlight:: cpp
Many computer vision applications may benefit from understanding where humans focus given a scene. Other than cognitively understanding the way human perceive images and scenes, finding salient regions and objects in the images helps various tasks such as speeding up object detection, object recognition, object tracking and content-aware image editing.
About the saliency, there is a rich literature but the development is very fragmented. The principal purpose of this API is to give a
unique interface, a unique framework for use and plug sever saliency algorithms, also with very different nature and methodology, but they share the same purpose, organizing algorithms into three main categories:
**Static Saliency**: algorithms belonging to this category, exploit different image features that allow to detect salient objects in a non dynamic scenarios.
**Motion Saliency**: algorithms belonging to this category, are particularly focused to detect salient objects over time (hence also over frame), then there is a temporal component sealing cosider that allows to detect "moving" objects as salient, meaning therefore also the more general sense of detection the changes in the scene.
**Objectness**: Objectness is usually represented as a value which reflects how likely an image window covers an object of any category. Algorithms belonging to this category, avoid making decisions early on, by proposing a small number of category-independent proposals, that are expected to cover all objects in an image. Being able to perceive objects before identifying them is closely related to bottom up visual attention (saliency).
UML design:
-----------
**Saliency diagram**
.. image:: pics/saliency.png
:width: 80%
:alt: Saliency diagram
:align: center
To see how API works, try tracker demo: https://github.com/fpuja/opencv_contrib/blob/saliencyModuleDevelop/modules/saliency/samples/computeSaliency.cpp
.. note:: This Tracking API has been designed with PlantUML. If you modify this API please change UML files under modules/tracking/misc/
Saliency classes:
-----------------
.. toctree::
:maxdepth: 2
common_interfaces_saliency
saliency_categories
static_saliency_algorithms
motion_saliency_algorithms
objectness_algorithms

@ -1,72 +0,0 @@
Saliency categories
============================
.. highlight:: cpp
Base classes which give a general interface for each specialized type of saliency algorithm and provide utility methods for each algorithm in its class.
StaticSaliency
--------------
.. ocv:class:: StaticSaliency
StaticSaliency class::
class CV_EXPORTS StaticSaliency : public virtual Saliency
{
public:
bool computeBinaryMap( const Mat& saliencyMap, Mat& binaryMap );
protected:
virtual bool computeSaliencyImpl( const InputArray image, OutputArray saliencyMap ) = 0;
};
StaticSaliency::computeBinaryMap
--------------------------------
This function perform a binary map of given saliency map. This is obtained in this way:
In a first step, to improve the definition of interest areas and facilitate identification of targets, a segmentation
by clustering is performed, using *K-means algorithm*. Then, to gain a binary representation of clustered saliency map, since values of the map can vary according to the characteristics of frame under analysis, it is not convenient to use a fixed threshold.
So, *Otsu’s algorithm* is used, which assumes that the image to be thresholded contains two classes of pixels or bi-modal histograms (e.g. foreground and back-ground pixels); later on, the algorithm calculates the optimal threshold separating those two classes, so that their
intra-class variance is minimal.
.. ocv:function:: bool computeBinaryMap( const Mat& saliencyMap, Mat& binaryMap )
:param saliencyMap: the saliency map obtained through one of the specialized algorithms
:param binaryMap: the binary map
MotionSaliency
--------------
.. ocv:class:: MotionSaliency
MotionSaliency class::
class CV_EXPORTS MotionSaliency : public virtual Saliency
{
protected:
virtual bool computeSaliencyImpl( const InputArray image, OutputArray saliencyMap ) = 0;
};
Objectness
----------
.. ocv:class:: Objectness
Objectness class::
class CV_EXPORTS Objectness : public virtual Saliency
{
protected:
virtual bool computeSaliencyImpl( const InputArray image, OutputArray saliencyMap ) = 0;
};

@ -1,81 +0,0 @@
Static Saliency algorithms
============================
.. highlight:: cpp
Algorithms belonging to this category, exploit different image features that allow to detect salient objects in a non dynamic scenarios.
Presently, the Spectral Residual approach [SR]_ has been implemented.
.. [SR] Hou, Xiaodi, and Liqing Zhang. "Saliency detection: A spectral residual approach." Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007.
SpectralResidual
----------------
Starting from the principle of natural image statistics, this method simulate the behavior of pre-attentive visual search. The algorithm analyze the log spectrum of each image and obtain the spectral residual. Then transform the spectral residual to spatial domain to obtain the saliency map, which suggests the positions of proto-objects.
.. ocv:class:: StaticSaliencySpectralResidual
Implementation of SpectralResidual from :ocv:class:`StaticSaliency`::
class CV_EXPORTS StaticSaliencySpectralResidual : public StaticSaliency
{
public:
StaticSaliencySpectralResidual();
~StaticSaliencySpectralResidual();
typedef Ptr<Size> (Algorithm::*SizeGetter)();
typedef void (Algorithm::*SizeSetter)( const Ptr<Size> & );
Ptr<Size> getWsize();
void setWsize( const Ptr<Size> &newSize );
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
protected:
bool computeSaliencyImpl( const InputArray src, OutputArray dst );
};
StaticSaliencySpectralResidual::StaticSaliencySpectralResidual
--------------------------------------------------------------
Constructor
.. ocv:function:: StaticSaliencySpectralResidual::StaticSaliencySpectralResidual()
StaticSaliencySpectralResidual::getWsize
----------------------------------------
Return the resized image size.
.. ocv:function:: Ptr<Size> StaticSaliencySpectralResidual::getWsize()
StaticSaliencySpectralResidual::setWsize
----------------------------------------
Set the dimension to which the image should be resized.
.. ocv:function:: StaticSaliencySpectralResidual::void setWsize( const Ptr<Size> &newSize )
:param newSize: dimension to which the image should be resized
StaticSaliencySpectralResidual::computeSaliency
-----------------------------------------------
Performs all the operations and calls all internal functions necessary for the accomplishment of spectral residual saliency map.
.. ocv:function:: bool StaticSaliencySpectralResidual::computeSaliency( const InputArray image, OutputArray saliencyMap )
:param image: input image. According to the needs of this specialized saliency algorithm, the param image is a single *Mat*
:param saliencyMap: saliency map. According to the result given by this specialized saliency algorithm, the saliency map is a single *Mat*

@ -1,355 +0,0 @@
.. _surfacematching:
surface_matching. Surface Matching
**********************************
Introduction to Surface Matching
================================
Cameras and similar devices with the capability of sensation of 3D structure are becoming more common. Thus, using depth and intensity information for matching 3D objects (or parts) are of crucial importance for computer vision. Applications range from industrial control to guiding everyday actions for visually impaired people. The task in recognition and pose estimation in range images aims to identify and localize a queried 3D free-form object by matching it to the acquired database.
From an industrial perspective, enabling robots to automatically locate and pick up randomly placed and oriented objects from a bin is an important challenge in factory automation, replacing tedious and heavy manual labor. A system should be able to recognize and locate objects with a predefined shape and estimate the position with the precision necessary for a gripping robot to pick it up. This is where vision guided robotics takes the stage. Similar tools are also capable of guiding robots (and even people) through unstructured environments, leading to automated navigation. These properties make 3D matching from point clouds a ubiquitous necessity. Within this context, I will now describe the OpenCV implementation of a 3D object recognition and pose estimation algorithm using 3D features.
Surface Matching Algorithm Through 3D Features
==============================================
The state of the algorithms in order to achieve the task 3D matching is heavily based on [drost2010]_, which is one of the first and main practical methods presented in this area. The approach is composed of extracting 3D feature points randomly from depth images or generic point clouds, indexing them and later in runtime querying them efficiently. Only the 3D structure is considered, and a trivial hash table is used for feature queries.
While being fully aware that utilization of the nice CAD model structure in order to achieve a smart point sampling, I will be leaving that aside now in order to respect the generalizability of the methods (Typically for such algorithms training on a CAD model is not needed, and a point cloud would be sufficient). Below is the outline of the entire algorithm:
.. image:: surface_matching/pics/outline.jpg
:scale: 75 %
:align: center
:alt: Outline of the Algorithm
As explained, the algorithm relies on the extraction and indexing of point pair features, which are defined as follows:
.. math:: \bf{{F}}(\bf{{m1}}, \bf{{m2}}) = (||\bf{{d}}||_2, <(\bf{{n1}},\bf{{d}}), <(\bf{{n2}},\bf{{d}}), <(\bf{{n1}},\bf{{n2}}))
where :math:`\bf{{m1}}` and :math:`\bf{{m2}}` are feature two selected
points on the model (or scene), :math:`\bf{{d}}` is the difference
vector, :math:`\bf{{n1}}` and :math:`\bf{{n2}}` are the normals at
:math:`\bf{{m1}}` and :math:`\bf{m2}`. During the training stage, this
vector is quantized, indexed. In the test stage, same features are
extracted from the scene and compared to the database. With a few tricks
like separation of the rotational components, the pose estimation part
can also be made efficient (check the reference for more details). A
Hough-like voting and clustering is employed to estimate the object
pose. To cluster the poses, the raw pose hypotheses are sorted in decreasing order
of the number of votes. From the highest vote, a
new cluster is created. If the next pose hypothesis is close to one of
the existing clusters, the hypothesis is added to the cluster
and the cluster center is updated as the average of the pose
hypotheses within the cluster. If the next hypothesis is not
close to any of the clusters, it creates a new cluster. The
proximity testing is done with fixed thresholds in translation
and rotation. Distance computation and averaging for translation are performed in the 3D Euclidean space, while those
for rotation are performed using quaternion representation.
After clustering, the clusters are sorted in decreasing order
of the total number of votes which determines confidence of
the estimated poses.
This pose is further refined using :math:`ICP` in order to obtain
the final pose.
PPF presented above depends largely on robust computation of angles
between 3D vectors. Even though not reported in the paper, the naive way
of doing this (:math:`\theta = cos^{-1}({\bf{a}}\cdot{\bf{b}})` remains
numerically unstable. A better way to do this is then use inverse
tangents, like:
.. math::
<(\bf{n1},\bf{n2})=tan^{-1}(||{\bf{n1} \wedge \bf{n2}}||_2, \bf{n1} \cdot \bf{n2})
Rough Computation of Object Pose Given PPF
============================================
Let me summarize the following notation:
- :math:`p^i_m`: :math:`i^{th}` point of the model (:math:`p^j_m`
accordingly)
- :math:`n^i_m`: Normal of the :math:`i^{th}` point of the model
(:math:`n^j_m` accordingly)
- :math:`p^i_s`: :math:`i^{th}` point of the scene (:math:`p^j_s`
accordingly)
- :math:`n^i_s`: Normal of the :math:`i^{th}` point of the scene
(:math:`n^j_s` accordingly)
- :math:`T_{m\rightarrow g}`: The transformation required to translate
:math:`p^i_m` to the origin and rotate its normal :math:`n^i_m` onto
the :math:`x`-axis.
- :math:`R_{m\rightarrow g}`: Rotational component of
:math:`T_{m\rightarrow g}`.
- :math:`t_{m\rightarrow g}`: Translational component of
:math:`T_{m\rightarrow g}`.
- :math:`(p^i_m)^{'}`: :math:`i^{th}` point of the model transformed by
:math:`T_{m\rightarrow g}`. (:math:`(p^j_m)^{'}` accordingly).
- :math:`{\bf{R_{m\rightarrow g}}}`: Axis angle representation of
rotation :math:`R_{m\rightarrow g}`.
- :math:`\theta_{m\rightarrow g}`: The angular component of the axis
angle representation :math:`{\bf{R_{m\rightarrow g}}}`.
The transformation in a point pair feature is computed by first finding the transformation :math:`T_{m\rightarrow g}` from the first point, and applying the same transformation to the second one. Transforming each point, together with the normal, to the ground plane leaves us with an angle to find out, during a comparison with a new point pair.
We could now simply start writing
.. math::
(p^i_m)^{'} = T_{m\rightarrow g} p^i_m
where
.. math::
T_{m\rightarrow g} = -t_{m\rightarrow g}R_{m\rightarrow g}
Note that this is nothing but a stacked transformation. The translational component :math:`t_{m\rightarrow g}` reads
.. math::
t_{m\rightarrow g} = -R_{m\rightarrow g}p^i_m
and the rotational being
.. math::
\theta_{m\rightarrow g} = \cos^{-1}(n^i_m \cdot {\bf{x}})\\
{\bf{R_{m\rightarrow g}}} = n^i_m \wedge {\bf{x}}
in axis angle format. Note that bold refers to the vector form. After this transformation, the feature vectors of the model are registered onto the ground plane X and the angle with respect to :math:`x=0` is called :math:`\alpha_m`. Similarly, for the scene, it is called :math:`\alpha_s`.
Hough-like Voting Scheme
------------------------
As shown in the outline, PPF (point pair features) are extracted from the model, quantized, stored in the hashtable and indexed, during the training stage. During the runtime however, the similar operation is perfomed on the input scene with the exception that this time a similarity lookup over the hashtable is performed, instead of an insertion. This lookup also allows us to compute a transformation to the ground plane for the scene pairs. After this point, computing the rotational component of the pose reduces to computation of the difference :math:`\alpha=\alpha_m-\alpha_s`. This component carries
the cue about the object pose. A Hough-like voting scheme is performed over the local model coordinate vector and :math:`\alpha`. The highest poses achieved for every scene point lets us recover the object pose.
Source Code for PPF Matching
----------------------------
.. code-block:: cpp
// pc is the loaded point cloud of the model
// (Nx6) and pcTest is a loaded point cloud of
// the scene (Mx6)
ppf_match_3d::PPF3DDetector detector(0.03, 0.05);
detector.trainModel(pc);
vector<Pose3DPtr> results;
detector.match(pcTest, results, 1.0/10.0, 0.05);
cout << "Poses: " << endl;
// print the poses
for (size_t i=0; i<results.size(); i++)
{
Pose3DPtr pose = results[i];
cout << "Pose Result " << i << endl;
pose->printPose();
}
Pose Registration via ICP
=========================
The matching process terminates with the attainment of the pose.
However, due to the multiple matching points, erroneous hypothesis, pose
averaging and etc. such pose is very open to noise and many times is far
from being perfect. Although the visual results obtained in that stage
are pleasing, the quantitative evaluation shows :math:`~10` degrees
variation (error), which is an acceptable level of matching. Many times, the requirement
might be set well beyond this margin and it is desired to refine the
computed pose.
Furthermore, in typical RGBD scenes and point clouds, 3D structure can capture only
less than half of the model due to the visibility in the scene.
Therefore, a robust pose refinement algorithm, which can register
occluded and partially visible shapes quickly and correctly is not an
unrealistic wish.
At this point, a trivial option would be to use the well known iterative
closest point algorithm . However, utilization of the basic ICP leads to
slow convergence, bad registration, outlier sensitivity and failure to
register partial shapes. Thus, it is definitely not suited to the
problem. For this reason, many variants have been proposed . Different
variants contribute to different stages of the pose estimation process.
ICP is composed of :math:`6` stages and the improvements I propose for
each stage is summarized below.
Sampling
--------
To improve convergence speed and computation time, it is common to use
less points than the model actually has. However, sampling the correct
points to register is an issue in itself. The naive way would be to
sample uniformly and hope to get a reasonable subset. More smarter ways
try to identify the critical points, which are found to highly
contribute to the registration process. Gelfand et. al. exploit the
covariance matrix in order to constrain the eigenspace, so that a set of
points which affect both translation and rotation are used. This is a
clever way of subsampling, which I will optionally be using in the
implementation.
Correspondence Search
---------------------
As the name implies, this step is actually the assignment of the points
in the data and the model in a closest point fashion. Correct
assignments will lead to a correct pose, where wrong assignments
strongly degrade the result. In general, KD-trees are used in the search
of nearest neighbors, to increase the speed. However this is not an
optimality guarantee and many times causes wrong points to be matched.
Luckily the assignments are corrected over iterations.
To overcome some of the limitations, Picky ICP [pickyicp]_ and BC-ICP (ICP using
bi-unique correspondences) are two well-known methods. Picky ICP first
finds the correspondences in the old-fashioned way and then among the
resulting corresponding pairs, if more than one scene point :math:`p_i`
is assigned to the same model point :math:`m_j`, it selects :math:`p_i`
that corresponds to the minimum distance. BC-ICP on the other hand,
allows multiple correspondences first and then resolves the assignments
by establishing bi-unique correspondences. It also defines a novel
no-correspondence outlier, which intrinsically eases the process of
identifying outliers.
For reference, both methods are used. Because P-ICP is a bit faster,
with not-so-significant performance drawback, it will be the method of
choice in refinment of correspondences.
Weighting of Pairs
------------------
In my implementation, I currently do not use a weighting scheme. But the
common approaches involve *normal compatibility*
(:math:`w_i=n^1_i\cdot n^2_j`) or assigning lower weights to point pairs
with greater distances
(:math:`w=1-\frac{||dist(m_i,s_i)||_2}{dist_{max}}`).
Rejection of Pairs
------------------
The rejections are done using a dynamic thresholding based on a robust
estimate of the standard deviation. In other words, in each iteration, I
find the MAD estimate of the Std. Dev. I denote this as :math:`mad_i`. I
reject the pairs with distances :math:`d_i>\tau mad_i`. Here
:math:`\tau` is the threshold of rejection and by default set to
:math:`3`. The weighting is applied prior to Picky refinement, explained
in the previous stage.
Error Metric
------------
As described in , a linearization of point to plane as in [koklimlow]_ error metric is
used. This both speeds up the registration process and improves convergence.
Minimization
------------
Even though many non-linear optimizers (such as Levenberg Mardquardt)
are proposed, due to the linearization in the previous step, pose
estimation reduces to solving a linear system of equations. This is what
I do exactly using cv::solve with DECOMP_SVD option.
ICP Algorithm
-------------
Having described the steps above, here I summarize the layout of the ICP
algorithm.
Efficient ICP Through Point Cloud Pyramids
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
While the up-to-now-proposed variants deal well with some outliers and
bad initializations, they require significant number of iterations. Yet,
multi-resolution scheme can help reducing the number of iterations by
allowing the registration to start from a coarse level and propagate to
the lower and finer levels. Such approach both improves the performances
and enhances the runtime.
The search is done through multiple levels, in a hierarchical fashion.
The registration starts with a very coarse set of samples of the model.
Iteratively, the points are densified and sought. After each iteration
the previously estimated pose is used as an initial pose and refined
with the ICP.
Visual Results
^^^^^^^^^^^^^^
Results on Synthetic Data
~~~~~~~~~~~~~~~~~~~~~~~~~
In all of the results, the pose is initiated by PPF and the rest is left as:
:math:`[\theta_x, \theta_y, \theta_z, t_x, t_y, t_z]=[0]`
Source Code for Pose Refinement Using ICP
-----------------------------------------
.. code-block:: cpp
ICP icp(200, 0.001f, 2.5f, 8);
// Using the previously declared pc and pcTest
// This will perform registration for every pose
// contained in results
icp.registerModelToScene(pc, pcTest, results);
// results now contain the refined poses
Results
=======
This section is dedicated to the results of surface matching (point-pair-feature matching
and a following ICP refinement):
.. image:: surface_matching/pics/gsoc_forg_matches.jpg
:scale: 65 %
:align: center
:alt: Several matches of a single frog model using ppf + icp
Matches of different models for Mian dataset is presented below:
.. image:: surface_matching/pics/snapshot27.jpg
:scale: 50 %
:align: center
:alt: Matches of different models for Mian dataset
You might checkout the video on `youTube here <http://www.youtube.com/watch?v=uFnqLFznuZU>`_.
.. raw:: html
<div align="center">
<iframe width="775" height="436" src="http://www.youtube.com/embed/uFnqLFznuZU?list=UUMSqZYDAmbiaAhyvLPJGhsg" frameborder="0" allowfullscreen></iframe>
</div>
A Complete Sample
=================
.. literalinclude:: surface_matching/src/ppf_load_match.cpp
:language: cpp
:linenos:
:tab-width: 4
Parameter Tuning
----------------
Surface matching module treats its parameters relative to the model diameter (diameter of the axis parallel bounding box), whenever it can. This makes the parameters independent from the model size. This is why, both model and scene cloud were subsampled such that all points have a minimum distance of :math:`RelativeSamplingStep*DimensionRange`, where :math:`DimensionRange` is the distance along a given dimension. All three dimensions are sampled in similar manner. For example, if :math:`RelativeSamplingStep` is set to 0.05 and the diameter of model is 1m (1000mm), the points sampled from the object's surface will be approximately 50 mm apart. From another point of view, if the sampling RelativeSamplingStep is set to 0.05, at most :math:`20x20x20 = 8000` model points are generated (depending on how the model fills in the volume). Consequently this results in at most 8000x8000 pairs. In practice, because the models are not uniformly distributed over a rectangular prism, much less points are to be expected. Decreasing this value, results in more model points and thus a more accurate representation. However, note that number of point pair features to be computed is now quadratically increased as the complexity is O(N^2). This is especially a concern for 32 bit systems, where large models can easily overshoot the available memory. Typically, values in the range of 0.025 - 0.05 seem adequate for most of the applications, where the default value is 0.03. (Note that there is a difference in this paremeter with the one presented in [drost2010]_. In [drost2010]_ a uniform cuboid is used for quantization and model diameter is used for reference of sampling. In my implementation, the cuboid is a rectangular prism, and each dimension is quantized independently. I do not take reference from the diameter but along the individual dimensions.
It would very wise to remove the outliers from the model and prepare an ideal model initially. This is because, the outliers directly affect the relative computations and degrade the matching accuracy.
During runtime stage, the scene is again sampled by :math:`RelativeSamplingStep`, as described above. However this time, only a portion of the scene points are used as reference. This portion is controlled by the parameter :math:`RelativeSceneSampleStep`, where :math:`SceneSampleStep = (int)(1.0/RelativeSceneSampleStep)`. In other words, if the :math:`RelativeSceneSampleStep = 1.0/5.0`, the subsampled scene will once again be uniformly sampled to 1/5 of the number of points. Maximum value of this parameter is 1 and increasing this parameter also increases the stability, but decreases the speed. Again, because of the initial scene-independent relative sampling, fine tuning this parameter is not a big concern. This would only be an issue when the model shape occupies a volume uniformly, or when the model shape is condensed in a tiny place within the quantization volume (e.g. The octree representation would have too much empty cells).
:math:`RelativeDistanceStep` acts as a step of discretization over the hash table. The point pair features are quantized to be mapped to the buckets of the hashtable. This discretization involves a multiplication and a casting to the integer. Adjusting RelativeDistanceStep in theory controls the collision rate. Note that, more collisions on the hashtable results in less accurate estimations. Reducing this parameter increases the affect of quantization but starts to assign non-similar point pairs to the same bins. Increasing it however, wanes the ability to group the similar pairs. Generally, because during the sampling stage, the training model points are selected uniformly with a distance controlled by RelativeSamplingStep, RelativeDistanceStep is expected to equate to this value. Yet again, values in the range of 0.025-0.05 are sensible. This time however, when the model is dense, it is not advised to decrease this value. For noisy scenes, the value can be increased to improve the robustness of the matching against noisy points.
References
==========
.. [drost2010] B. Drost, S. Ilic 3D Object Detection and Localization Using Multimodal Point Pair Features Second Joint 3DIM/3DPVT Conference: 3D Imaging, Modeling, Processing, Visualization & Transmission (3DIMPVT), Zurich, Switzerland, October 2012
.. [pickyicp] Zinsser, Timo and Schmidt, Jochen and Niemann, Heinrich A refined ICP algorithm for robust 3-D correspondence estimation Image Processing, 2003. ICIP 2003. Proceedings. 2003 International Conference on Image Processing, IEEE.
.. [koklimlow] Kok Lim Low, Linear Least-Squares Optimization for Point-to-Plane ICP Surface Registration Technical Report TR04-004, Department of Computer Science, University of North Carolina at Chapel Hill, February 2004

@ -1,239 +0,0 @@
Scene Text Detection
====================
.. highlight:: cpp
Class-specific Extremal Regions for Scene Text Detection
--------------------------------------------------------
The scene text detection algorithm described below has been initially proposed by Lukás Neumann & Jiri Matas [Neumann12]. The main idea behind Class-specific Extremal Regions is similar to the MSER in that suitable Extremal Regions (ERs) are selected from the whole component tree of the image. However, this technique differs from MSER in that selection of suitable ERs is done by a sequential classifier trained for character detection, i.e. dropping the stability requirement of MSERs and selecting class-specific (not necessarily stable) regions.
The component tree of an image is constructed by thresholding by an increasing value step-by-step from 0 to 255 and then linking the obtained connected components from successive levels in a hierarchy by their inclusion relation:
.. image:: pics/component_tree.png
:width: 100%
The component tree may conatain a huge number of regions even for a very simple image as shown in the previous image. This number can easily reach the order of 1 x 10^6 regions for an average 1 Megapixel image. In order to efficiently select suitable regions among all the ERs the algorithm make use of a sequential classifier with two differentiated stages.
In the first stage incrementally computable descriptors (area, perimeter, bounding box, and euler number) are computed (in O(1)) for each region r and used as features for a classifier which estimates the class-conditional probability p(r|character). Only the ERs which correspond to local maximum of the probability p(r|character) are selected (if their probability is above a global limit p_min and the difference between local maximum and local minimum is greater than a \delta_min value).
In the second stage, the ERs that passed the first stage are classified into character and non-character classes using more informative but also more computationally expensive features. (Hole area ratio, convex hull ratio, and the number of outer boundary inflexion points).
This ER filtering process is done in different single-channel projections of the input image in order to increase the character localization recall.
After the ER filtering is done on each input channel, character candidates must be grouped in high-level text blocks (i.e. words, text lines, paragraphs, ...). The opencv_text module implements two different grouping algorithms: the Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text, and the method proposed by Lluis Gomez and Dimosthenis Karatzas in [Gomez13][Gomez14] for grouping arbitrary oriented text (see :ocv:func:`erGrouping`).
To see the text detector at work, have a look at the textdetection demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/textdetection.cpp
.. [Neumann12] Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/neumann-cvpr2012.pdf
.. [Neumann11] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf
.. [Gomez13] Gomez L. and Karatzas D.: Multi-script Text Extraction from Natural Scenes, ICDAR 2013. The paper is available online at http://158.109.8.37/files/GoK2013.pdf
.. [Gomez14] Gomez L. and Karatzas D.: A Fast Hierarchical Method for Multi-script and Arbitrary Oriented Scene Text Extraction, arXiv:1407.7504 [cs.CV]. The paper is available online at http://arxiv.org/abs/1407.7504
ERStat
------
.. ocv:struct:: ERStat
The ERStat structure represents a class-specific Extremal Region (ER).
An ER is a 4-connected set of pixels with all its grey-level values smaller than the values in its outer boundary. A class-specific ER is selected (using a classifier) from all the ER's in the component tree of the image. ::
struct CV_EXPORTS ERStat
{
public:
//! Constructor
explicit ERStat(int level = 256, int pixel = 0, int x = 0, int y = 0);
//! Destructor
~ERStat() { }
//! seed point and threshold (max grey-level value)
int pixel;
int level;
//! incrementally computable features
int area;
int perimeter;
int euler; //!< euler number
Rect rect; //!< bounding box
double raw_moments[2]; //!< order 1 raw moments to derive the centroid
double central_moments[3]; //!< order 2 central moments to construct the covariance matrix
std::deque<int> *crossings;//!< horizontal crossings
float med_crossings; //!< median of the crossings at three different height levels
//! 2nd stage features
float hole_area_ratio;
float convex_hull_ratio;
float num_inflexion_points;
//! probability that the ER belongs to the class we are looking for
double probability;
//! pointers preserving the tree structure of the component tree
ERStat* parent;
ERStat* child;
ERStat* next;
ERStat* prev;
};
MSERsToERStats
--------------
Converts MSER contours (vector<Point>) to ERStat regions.
.. ocv:function:: void MSERsToERStats(InputArray image, vector< vector<Point> > &contours, vector< vector<ERStat> > &regions)
:param image: Source image ``CV_8UC1`` from which the MSERs where extracted.
:param contours: Intput vector with all the contours (vector<Point>).
:param regions: Output where the ERStat regions are stored.
It takes as input the contours provided by the OpenCV MSER feature detector and returns as output two vectors of ERStats. This is because MSER() output contains both MSER+ and MSER- regions in a single vector<Point>, the function separates them in two different vectors (this is as if the ERStats where extracted from two different channels).
An example of MSERsToERStats in use can be found in the text detection webcam_demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/webcam_demo.cpp
computeNMChannels
-----------------
Compute the different channels to be processed independently in the N&M algorithm [Neumann12].
.. ocv:function:: void computeNMChannels(InputArray _src, OutputArrayOfArrays _channels, int _mode = ERFILTER_NM_RGBLGrad)
:param _src: Source image. Must be RGB ``CV_8UC3``.
:param _channels: Output vector<Mat> where computed channels are stored.
:param _mode: Mode of operation. Currently the only available options are: **ERFILTER_NM_RGBLGrad** (used by default) and **ERFILTER_NM_IHSGrad**.
In N&M algorithm, the combination of intensity (I), hue (H), saturation (S), and gradient magnitude channels (Grad) are used in order to obtain high localization recall. This implementation also provides an alternative combination of red (R), green (G), blue (B), lightness (L), and gradient magnitude (Grad).
ERFilter
--------
.. ocv:class:: ERFilter : public Algorithm
Base class for 1st and 2nd stages of Neumann and Matas scene text detection algorithm [Neumann12]. ::
class CV_EXPORTS ERFilter : public Algorithm
{
public:
//! callback with the classifier is made a class.
//! By doing it we hide SVM, Boost etc. Developers can provide their own classifiers
class CV_EXPORTS Callback
{
public:
virtual ~Callback() { }
//! The classifier must return probability measure for the region.
virtual double eval(const ERStat& stat) = 0;
};
/*!
the key method. Takes image on input and returns the selected regions in a vector of ERStat
only distinctive ERs which correspond to characters are selected by a sequential classifier
*/
virtual void run( InputArray image, std::vector<ERStat>& regions ) = 0;
(...)
};
ERFilter::Callback
------------------
Callback with the classifier is made a class. By doing it we hide SVM, Boost etc. Developers can provide their own classifiers to the ERFilter algorithm.
.. ocv:class:: ERFilter::Callback
ERFilter::Callback::eval
------------------------
The classifier must return probability measure for the region.
.. ocv:function:: double ERFilter::Callback::eval(const ERStat& stat)
:param stat: The region to be classified
ERFilter::run
-------------
The key method of ERFilter algorithm. Takes image on input and returns the selected regions in a vector of ERStat only distinctive ERs which correspond to characters are selected by a sequential classifier
.. ocv:function:: void ERFilter::run( InputArray image, std::vector<ERStat>& regions )
:param image: Single channel image ``CV_8UC1``
:param regions: Output for the 1st stage and Input/Output for the 2nd. The selected Extremal Regions are stored here.
Extracts the component tree (if needed) and filter the extremal regions (ER's) by using a given classifier.
createERFilterNM1
-----------------
Create an Extremal Region Filter for the 1st stage classifier of N&M algorithm [Neumann12].
.. ocv:function:: Ptr<ERFilter> createERFilterNM1( const Ptr<ERFilter::Callback>& cb, int thresholdDelta = 1, float minArea = 0.00025, float maxArea = 0.13, float minProbability = 0.4, bool nonMaxSuppression = true, float minProbabilityDiff = 0.1 )
:param cb: Callback with the classifier. Default classifier can be implicitly load with function :ocv:func:`loadClassifierNM1`, e.g. from file in samples/cpp/trained_classifierNM1.xml
:param thresholdDelta: Threshold step in subsequent thresholds when extracting the component tree
:param minArea: The minimum area (% of image size) allowed for retreived ER's
:param minArea: The maximum area (% of image size) allowed for retreived ER's
:param minProbability: The minimum probability P(er|character) allowed for retreived ER's
:param nonMaxSuppression: Whenever non-maximum suppression is done over the branch probabilities
:param minProbability: The minimum probability difference between local maxima and local minima ERs
The component tree of the image is extracted by a threshold increased step by step from 0 to 255, incrementally computable descriptors (aspect_ratio, compactness, number of holes, and number of horizontal crossings) are computed for each ER and used as features for a classifier which estimates the class-conditional probability P(er|character). The value of P(er|character) is tracked using the inclusion relation of ER across all thresholds and only the ERs which correspond to local maximum of the probability P(er|character) are selected (if the local maximum of the probability is above a global limit pmin and the difference between local maximum and local minimum is greater than minProbabilityDiff).
createERFilterNM2
-----------------
Create an Extremal Region Filter for the 2nd stage classifier of N&M algorithm [Neumann12].
.. ocv:function:: Ptr<ERFilter> createERFilterNM2( const Ptr<ERFilter::Callback>& cb, float minProbability = 0.3 )
:param cb: Callback with the classifier. Default classifier can be implicitly load with function :ocv:func:`loadClassifierNM2`, e.g. from file in samples/cpp/trained_classifierNM2.xml
:param minProbability: The minimum probability P(er|character) allowed for retreived ER's
In the second stage, the ERs that passed the first stage are classified into character and non-character classes using more informative but also more computationally expensive features. The classifier uses all the features calculated in the first stage and the following additional features: hole area ratio, convex hull ratio, and number of outer inflexion points.
loadClassifierNM1
-----------------
Allow to implicitly load the default classifier when creating an ERFilter object.
.. ocv:function:: Ptr<ERFilter::Callback> loadClassifierNM1(const std::string& filename)
:param filename: The XML or YAML file with the classifier model (e.g. trained_classifierNM1.xml)
returns a pointer to ERFilter::Callback.
loadClassifierNM2
-----------------
Allow to implicitly load the default classifier when creating an ERFilter object.
.. ocv:function:: Ptr<ERFilter::Callback> loadClassifierNM2(const std::string& filename)
:param filename: The XML or YAML file with the classifier model (e.g. trained_classifierNM2.xml)
returns a pointer to ERFilter::Callback.
erGrouping
----------
Find groups of Extremal Regions that are organized as text blocks.
.. ocv:function:: void erGrouping(InputArray img, InputArrayOfArrays channels, std::vector<std::vector<ERStat> > &regions, std::vector<std::vector<Vec2i> > &groups, std::vector<Rect> &groups_rects, int method = ERGROUPING_ORIENTATION_HORIZ, const std::string& filename = std::string(), float minProbablity = 0.5)
:param image: Original RGB or Greyscale image from wich the regions were extracted.
:param src: Vector of single channel images CV_8UC1 from wich the regions were extracted.
:param regions: Vector of ER's retreived from the ERFilter algorithm from each channel.
:param groups: The output of the algorithm is stored in this parameter as set of lists of indexes to provided regions.
:param groups_rects: The output of the algorithm are stored in this parameter as list of rectangles.
:param method: Grouping method (see the details below). Can be one of ``ERGROUPING_ORIENTATION_HORIZ``, ``ERGROUPING_ORIENTATION_ANY``.
:param filename: The XML or YAML file with the classifier model (e.g. samples/trained_classifier_erGrouping.xml). Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
:param minProbability: The minimum probability for accepting a group. Only to use when grouping method is ``ERGROUPING_ORIENTATION_ANY``.
This function implements two different grouping algorithms:
* **ERGROUPING_ORIENTATION_HORIZ**
Exhaustive Search algorithm proposed in [Neumann11] for grouping horizontally aligned text. The algorithm models a verification function for all the possible ER sequences. The verification fuction for ER pairs consists in a set of threshold-based pairwise rules which compare measurements of two regions (height ratio, centroid angle, and region distance). The verification function for ER triplets creates a word text line estimate using Least Median-Squares fitting for a given triplet and then verifies that the estimate is valid (based on thresholds created during training). Verification functions for sequences larger than 3 are approximated by verifying that the text line parameters of all (sub)sequences of length 3 are consistent.
* **ERGROUPING_ORIENTATION_ANY**
Text grouping method proposed in [Gomez13][Gomez14] for grouping arbitrary oriented text. Regions are agglomerated by Single Linkage Clustering in a weighted feature space that combines proximity (x,y coordinates) and similarity measures (color, size, gradient magnitude, stroke width, etc.). SLC provides a dendrogram where each node represents a text group hypothesis. Then the algorithm finds the branches corresponding to text groups by traversing this dendrogram with a stopping rule that combines the output of a rotation invariant text group classifier and a probabilistic measure for hierarchical clustering validity assessment.

@ -1,106 +0,0 @@
Scene Text Recognition
======================
.. highlight:: cpp
OCRTesseract
------------
.. ocv:class:: OCRTesseract : public BaseOCR
OCRTesseract class provides an interface with the tesseract-ocr API (v3.02.02) in C++. Notice that it is compiled only when tesseract-ocr is correctly installed.
.. note::
* (C++) An example of OCRTesseract recognition combined with scene text detection can be found at the end_to_end_recognition demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/end_to_end_recognition.cpp
* (C++) Another example of OCRTesseract recognition combined with scene text detection can be found at the webcam_demo: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/webcam_demo.cpp
OCRTesseract::create
--------------------
Creates an instance of the OCRTesseract class. Initializes Tesseract.
.. ocv:function:: Ptr<OCRTesseract> OCRTesseract::create(const char* datapath=NULL, const char* language=NULL, const char* char_whitelist=NULL, int oem=(int)tesseract::OEM_DEFAULT, int psmode=(int)tesseract::PSM_AUTO)
:param datapath: the name of the parent directory of tessdata ended with "/", or NULL to use the system's default directory.
:param language: an ISO 639-3 code or NULL will default to "eng".
:param char_whitelist: specifies the list of characters used for recognition. NULL defaults to "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ".
:param oem: tesseract-ocr offers different OCR Engine Modes (OEM), by deffault tesseract::OEM_DEFAULT is used. See the tesseract-ocr API documentation for other possible values.
:param psmode: tesseract-ocr offers different Page Segmentation Modes (PSM) tesseract::PSM_AUTO (fully automatic layout analysis) is used. See the tesseract-ocr API documentation for other possible values.
OCRTesseract::run
-----------------
Recognize text using the tesseract-ocr API. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
.. ocv:function:: void OCRTesseract::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0)
:param image: Input image ``CV_8UC1`` or ``CV_8UC3``
:param output_text: Output text of the tesseract-ocr.
:param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words or text lines).
:param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words or text lines).
:param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words or text lines).
:param component_level: ``OCR_LEVEL_WORD`` (by default), or ``OCR_LEVEL_TEXT_LINE``.
OCRHMMDecoder
-------------
.. ocv:class:: OCRHMMDecoder : public BaseOCR
OCRHMMDecoder class provides an interface for OCR using Hidden Markov Models.
.. note::
* (C++) An example on using OCRHMMDecoder recognition combined with scene text detection can be found at the webcam_demo sample: https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/webcam_demo.cpp
OCRHMMDecoder::ClassifierCallback
---------------------------------
Callback with the character classifier is made a class. This way it hides the feature extractor and the classifier itself, so developers can write their own OCR code.
.. ocv:class:: OCRHMMDecoder::ClassifierCallback
The default character classifier and feature extractor can be loaded using the utility funtion ``loadOCRHMMClassifierNM`` and KNN model provided in https://github.com/Itseez/opencv_contrib/blob/master/modules/text/samples/OCRHMM_knn_model_data.xml.gz.
OCRHMMDecoder::ClassifierCallback::eval
---------------------------------------
The character classifier must return a (ranked list of) class(es) id('s)
.. ocv:function:: void OCRHMMDecoder::ClassifierCallback::eval( InputArray image, std::vector<int>& out_class, std::vector<double>& out_confidence)
:param image: Input image ``CV_8UC1`` or ``CV_8UC3`` with a single letter.
:param out_class: The classifier returns the character class categorical label, or list of class labels, to which the input image corresponds.
:param out_confidence: The classifier returns the probability of the input image corresponding to each classes in ``out_class``.
OCRHMMDecoder::create
---------------------
Creates an instance of the OCRHMMDecoder class. Initializes HMMDecoder.
.. ocv:function:: Ptr<OCRHMMDecoder> OCRHMMDecoder::create(const Ptr<OCRHMMDecoder::ClassifierCallback> classifier, const std::string& vocabulary, InputArray transition_probabilities_table, InputArray emission_probabilities_table, decoder_mode mode = OCR_DECODER_VITERBI)
:param classifier: The character classifier with built in feature extractor.
:param vocabulary: The language vocabulary (chars when ascii english text). vocabulary.size() must be equal to the number of classes of the classifier.
:param transition_probabilities_table: Table with transition probabilities between character pairs. cols == rows == vocabulary.size().
:param emission_probabilities_table: Table with observation emission probabilities. cols == rows == vocabulary.size().
:param mode: HMM Decoding algorithm. Only ``OCR_DECODER_VITERBI`` is available for the moment (http://en.wikipedia.org/wiki/Viterbi_algorithm).
OCRHMMDecoder::run
------------------
Recognize text using HMM. Takes image on input and returns recognized text in the output_text parameter. Optionally provides also the Rects for individual text elements found (e.g. words), and the list of those text elements with their confidence values.
.. ocv:function:: void OCRHMMDecoder::run(Mat& image, string& output_text, vector<Rect>* component_rects=NULL, vector<string>* component_texts=NULL, vector<float>* component_confidences=NULL, int component_level=0)
:param image: Input image ``CV_8UC1`` with a single text line (or word).
:param output_text: Output text. Most likely character sequence found by the HMM decoder.
:param component_rects: If provided the method will output a list of Rects for the individual text elements found (e.g. words).
:param component_text: If provided the method will output a list of text strings for the recognition of individual text elements found (e.g. words).
:param component_confidences: If provided the method will output a list of confidence values for the recognition of individual text elements found (e.g. words).
:param component_level: Only ``OCR_LEVEL_WORD`` is supported.
loadOCRHMMClassifierNM
----------------------
Allow to implicitly load the default character classifier when creating an OCRHMMDecoder object.
.. ocv:function:: Ptr<OCRHMMDecoder::ClassifierCallback> loadOCRHMMClassifierNM(const std::string& filename)
:param filename: The XML or YAML file with the classifier model (e.g. OCRHMM_knn_model_data.xml)
The default classifier is based in the scene text recognition method proposed by Lukás Neumann & Jiri Matas in [Neumann11b]. Basically, the region (contour) in the input image is normalized to a fixed size, while retaining the centroid and aspect ratio, in order to extract a feature vector based on gradient orientations along the chain-code of its perimeter. Then, the region is classified using a KNN model trained with synthetic data of rendered characters with different standard font types.
.. [Neumann11b] Neumann L., Matas J.: Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search, ICDAR 2011. The paper is available online at http://cmp.felk.cvut.cz/~neumalu1/icdar2011_article.pdf

@ -1,13 +0,0 @@
******************************************
text. Scene Text Detection and Recognition
******************************************
.. highlight:: cpp
The opencv_text module provides different algorithms for text detection and recognition in natural scene images.
.. toctree::
:maxdepth: 2
erfilter
ocr

@ -1,290 +0,0 @@
Common Interfaces of Tracker
============================
.. highlight:: cpp
Tracker : Algorithm
-------------------
.. ocv:class:: Tracker
Base abstract class for the long-term tracker::
class CV_EXPORTS_W Tracker : public virtual Algorithm
{
virtual ~Tracker();
bool init( const Mat& image, const Rect2d& boundingBox );
bool update( const Mat& image, Rect2d& boundingBox );
static Ptr<Tracker> create( const String& trackerType );
};
Tracker::init
-------------
Initialize the tracker with a know bounding box that surrounding the target
.. ocv:function:: bool Tracker::init( const Mat& image, const Rect2d& boundingBox )
:param image: The initial frame
:param boundingBox: The initial boundig box
:return: True if initialization went succesfully, false otherwise
Tracker::update
---------------
Update the tracker, find the new most likely bounding box for the target
.. ocv:function:: bool Tracker::update( const Mat& image, Rect2d& boundingBox )
:param image: The current frame
:param boundingBox: The boundig box that represent the new target location, if true was returned, not modified otherwise
:return: True means that target was located and false means that tracker cannot locate target in current frame. Note, that latter *does not* imply that tracker has failed, maybe target is indeed missing from the frame (say, out of sight)
Tracker::create
---------------
Creates a tracker by its name.
.. ocv:function:: static Ptr<Tracker> Tracker::create( const String& trackerType )
:param trackerType: Tracker type
The following detector types are supported:
* ``"MIL"`` -- :ocv:class:`TrackerMIL`
* ``"BOOSTING"`` -- :ocv:class:`TrackerBoosting`
Creating Own Tracker
--------------------
If you want create a new tracker, here's what you have to do. First, decide on the name of the class for the tracker (to meet the existing style,
we suggest something with prefix "tracker", e.g. trackerMIL, trackerBoosting) -- we shall refer to this choice as to "classname" in subsequent. Also,
you should decide upon the name of the tracker, is it will be known to user (the current style suggests using all capitals, say MIL or BOOSTING) --
we'll call it a "name".
* Declare your tracker in ``include/opencv2/tracking/tracker.hpp``.
Your tracker should inherit from :ocv:class:`Tracker` (please, see the example below). You should declare the specialized ``Param``
structure, where you probably will want to put the data, needed to initialize your tracker. Also don't forget to put the
BOILERPLATE_CODE(name,classname) macro inside the class declaration. That macro will generate static ``createTracker()`` function, which
we'll talk about later. You should get something similar to ::
class CV_EXPORTS_W TrackerMIL : public Tracker
{
public:
struct CV_EXPORTS Params
{
Params();
//parameters for sampler
float samplerInitInRadius; // radius for gathering positive instances during init
int samplerInitMaxNegNum; // # negative samples to use during init
float samplerSearchWinSize; // size of search window
float samplerTrackInRadius; // radius for gathering positive instances during tracking
int samplerTrackMaxPosNum; // # positive samples to use during tracking
int samplerTrackMaxNegNum; // # negative samples to use during tracking
int featureSetNumFeatures; // #features
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
};
of course, you can also add any additional methods of your choice. It should be pointed out, however, that it is not expected to have a constructor
declared, as creation should be done via the corresponding ``createTracker()`` method.
* In ``src/tracker.cpp`` file add BOILERPLATE_CODE(name,classname) line to the body of ``Tracker::create()`` method you will find there, like ::
Ptr<Tracker> Tracker::create( const String& trackerType )
{
BOILERPLATE_CODE("BOOSTING",TrackerBoosting);
BOILERPLATE_CODE("MIL",TrackerMIL);
return Ptr<Tracker>();
}
* Finally, you should implement the function with signature ::
Ptr<classname> classname::createTracker(const classname::Params &parameters){
...
}
That function can (and probably will) return a pointer to some derived class of "classname", which will probably have a real constructor.
Every tracker has three component :ocv:class:`TrackerSampler`, :ocv:class:`TrackerFeatureSet` and :ocv:class:`TrackerModel`.
The first two are instantiated from Tracker base class, instead the last component is abstract, so you must implement your TrackerModel.
TrackerSampler
..............
TrackerSampler is already instantiated, but you should define the sampling algorithm and add the classes (or single class) to TrackerSampler.
You can choose one of the ready implementation as TrackerSamplerCSC or you can implement your sampling method, in this case
the class must inherit :ocv:class:`TrackerSamplerAlgorithm`. Fill the samplingImpl method that writes the result in "sample" output argument.
Example of creating specialized TrackerSamplerAlgorithm ``TrackerSamplerCSC`` : ::
class CV_EXPORTS_W TrackerSamplerCSC : public TrackerSamplerAlgorithm
{
public:
TrackerSamplerCSC( const TrackerSamplerCSC::Params &parameters = TrackerSamplerCSC::Params() );
~TrackerSamplerCSC();
...
protected:
bool samplingImpl( const Mat& image, Rect boundingBox, std::vector<Mat>& sample );
...
};
Example of adding TrackerSamplerAlgorithm to TrackerSampler : ::
//sampler is the TrackerSampler
Ptr<TrackerSamplerAlgorithm> CSCSampler = new TrackerSamplerCSC( CSCparameters );
if( !sampler->addTrackerSamplerAlgorithm( CSCSampler ) )
return false;
//or add CSC sampler with default parameters
//sampler->addTrackerSamplerAlgorithm( "CSC" );
.. seealso::
:ocv:class:`TrackerSamplerCSC`, :ocv:class:`TrackerSamplerAlgorithm`
TrackerFeatureSet
.................
TrackerFeatureSet is already instantiated (as first) , but you should define what kinds of features you'll use in your tracker.
You can use multiple feature types, so you can add a ready implementation as :ocv:class:`TrackerFeatureHAAR` in your TrackerFeatureSet or develop your own implementation.
In this case, in the computeImpl method put the code that extract the features and
in the selection method optionally put the code for the refinement and selection of the features.
Example of creating specialized TrackerFeature ``TrackerFeatureHAAR`` : ::
class CV_EXPORTS_W TrackerFeatureHAAR : public TrackerFeature
{
public:
TrackerFeatureHAAR( const TrackerFeatureHAAR::Params &parameters = TrackerFeatureHAAR::Params() );
~TrackerFeatureHAAR();
void selection( Mat& response, int npoints );
...
protected:
bool computeImpl( const std::vector<Mat>& images, Mat& response );
...
};
Example of adding TrackerFeature to TrackerFeatureSet : ::
//featureSet is the TrackerFeatureSet
Ptr<TrackerFeature> trackerFeature = new TrackerFeatureHAAR( HAARparameters );
featureSet->addTrackerFeature( trackerFeature );
.. seealso::
:ocv:class:`TrackerFeatureHAAR`, :ocv:class:`TrackerFeatureSet`
TrackerModel
............
TrackerModel is abstract, so in your implementation you must develop your TrackerModel that inherit from :ocv:class:`TrackerModel`.
Fill the method for the estimation of the state "modelEstimationImpl", that estimates the most likely target location,
see [AAM]_ table I (ME) for further information. Fill "modelUpdateImpl" in order to update the model, see [AAM]_ table I (MU).
In this class you can use the :c:type:`ConfidenceMap` and :c:type:`Trajectory` to storing the model. The first represents the model on the all
possible candidate states and the second represents the list of all estimated states.
Example of creating specialized TrackerModel ``TrackerMILModel`` : ::
class TrackerMILModel : public TrackerModel
{
public:
TrackerMILModel( const Rect& boundingBox );
~TrackerMILModel();
...
protected:
void modelEstimationImpl( const std::vector<Mat>& responses );
void modelUpdateImpl();
...
};
And add it in your Tracker : ::
bool TrackerMIL::initImpl( const Mat& image, const Rect2d& boundingBox )
{
...
//model is the general TrackerModel field od the general Tracker
model = new TrackerMILModel( boundingBox );
...
}
In the last step you should define the TrackerStateEstimator based on your implementation or you can use one of ready class as :ocv:class:`TrackerStateEstimatorMILBoosting`.
It represent the statistical part of the model that estimates the most likely target state.
Example of creating specialized TrackerStateEstimator ``TrackerStateEstimatorMILBoosting`` : ::
class CV_EXPORTS_W TrackerStateEstimatorMILBoosting : public TrackerStateEstimator
{
class TrackerMILTargetState : public TrackerTargetState
{
...
};
public:
TrackerStateEstimatorMILBoosting( int nFeatures = 250 );
~TrackerStateEstimatorMILBoosting();
...
protected:
Ptr<TrackerTargetState> estimateImpl( const std::vector<ConfidenceMap>& confidenceMaps );
void updateImpl( std::vector<ConfidenceMap>& confidenceMaps );
...
};
And add it in your TrackerModel : ::
//model is the TrackerModel of your Tracker
Ptr<TrackerStateEstimatorMILBoosting> stateEstimator = new TrackerStateEstimatorMILBoosting( params.featureSetNumFeatures );
model->setTrackerStateEstimator( stateEstimator );
.. seealso::
:ocv:class:`TrackerModel`, :ocv:class:`TrackerStateEstimatorMILBoosting`, :ocv:class:`TrackerTargetState`
During this step, you should define your TrackerTargetState based on your implementation. :ocv:class:`TrackerTargetState` base class has only the bounding box (upper-left position, width and height), you can
enrich it adding scale factor, target rotation, etc.
Example of creating specialized TrackerTargetState ``TrackerMILTargetState`` : ::
class TrackerMILTargetState : public TrackerTargetState
{
public:
TrackerMILTargetState( const Point2f& position, int targetWidth, int targetHeight, bool foreground, const Mat& features );
~TrackerMILTargetState();
...
private:
bool isTarget;
Mat targetFeatures;
...
};
Try it
......
To try your tracker you can use the demo at https://github.com/lenlen/opencv/blob/tracking_api/samples/cpp/tracker.cpp.
The first argument is the name of the tracker and the second is a video source.

@ -1,343 +0,0 @@
Common Interfaces of TrackerFeatureSet
======================================
.. highlight:: cpp
TrackerFeatureSet
-----------------
Class that manages the extraction and selection of features
[AAM]_ Feature Extraction and Feature Set Refinement (Feature Processing and Feature Selection). See table I and section III C
[AMVOT]_ Appearance modelling -> Visual representation (Table II, section 3.1 - 3.2)
.. ocv:class:: TrackerFeatureSet
TrackerFeatureSet class::
class CV_EXPORTS_W TrackerFeatureSet
{
public:
TrackerFeatureSet();
~TrackerFeatureSet();
void extraction( const std::vector<Mat>& images );
void selection();
void removeOutliers();
bool addTrackerFeature( String trackerFeatureType );
bool addTrackerFeature( Ptr<TrackerFeature>& feature );
const std::vector<std::pair<String, Ptr<TrackerFeature> > >& getTrackerFeature() const;
const std::vector<Mat>& getResponses() const;
};
TrackerFeatureSet is an aggregation of :ocv:class:`TrackerFeature`
.. seealso::
:ocv:class:`TrackerFeature`
TrackerFeatureSet::extraction
-----------------------------
Extract features from the images collection
.. ocv:function:: void TrackerFeatureSet::extraction( const std::vector<Mat>& images )
:param images: The input images
TrackerFeatureSet::selection
----------------------------
Identify most effective features for all feature types (optional)
.. ocv:function:: void TrackerFeatureSet::selection()
TrackerFeatureSet::removeOutliers
---------------------------------
Remove outliers for all feature types (optional)
.. ocv:function:: void TrackerFeatureSet::removeOutliers()
TrackerFeatureSet::addTrackerFeature
------------------------------------
Add TrackerFeature in the collection. Return true if TrackerFeature is added, false otherwise
.. ocv:function:: bool TrackerFeatureSet::addTrackerFeature( String trackerFeatureType )
:param trackerFeatureType: The TrackerFeature name
.. ocv:function:: bool TrackerFeatureSet::addTrackerFeature( Ptr<TrackerFeature>& feature )
:param feature: The TrackerFeature class
The modes available now:
* ``"HAAR"`` -- Haar Feature-based
The modes that will be available soon:
* ``"HOG"`` -- Histogram of Oriented Gradients features
* ``"LBP"`` -- Local Binary Pattern features
* ``"FEATURE2D"`` -- All types of Feature2D
Example ``TrackerFeatureSet::addTrackerFeature`` : ::
//sample usage:
Ptr<TrackerFeature> trackerFeature = new TrackerFeatureHAAR( HAARparameters );
featureSet->addTrackerFeature( trackerFeature );
//or add CSC sampler with default parameters
//featureSet->addTrackerFeature( "HAAR" );
.. note:: If you use the second method, you must initialize the TrackerFeature
TrackerFeatureSet::getTrackerFeature
------------------------------------
Get the TrackerFeature collection (TrackerFeature name, TrackerFeature pointer)
.. ocv:function:: const std::vector<std::pair<String, Ptr<TrackerFeature> > >& TrackerFeatureSet::getTrackerFeature() const
TrackerFeatureSet::getResponses
-------------------------------
Get the responses
.. ocv:function:: const std::vector<Mat>& TrackerFeatureSet::getResponses() const
.. note:: Be sure to call extraction before getResponses
Example ``TrackerFeatureSet::getResponses`` : ::
//get the patches from sampler
std::vector<Mat> detectSamples = sampler->getSamples();
if( detectSamples.empty() )
return false;
//features extraction
featureSet->extraction( detectSamples );
//get responses
std::vector<Mat> response = featureSet->getResponses();
TrackerFeature
--------------
Abstract base class for TrackerFeature that represents the feature.
.. ocv:class:: TrackerFeature
TrackerFeature class::
class CV_EXPORTS_W TrackerFeature
{
public:
virtual ~TrackerFeature();
static Ptr<TrackerFeature> create( const String& trackerFeatureType );
void compute( const std::vector<Mat>& images, Mat& response );
virtual void selection( Mat& response, int npoints ) = 0;
String getClassName() const;
};
TrackerFeature::create
----------------------
Create TrackerFeature by tracker feature type
.. ocv:function:: static Ptr<TrackerFeature> TrackerFeature::create( const String& trackerFeatureType )
:param trackerFeatureType: The TrackerFeature name
The modes available now:
* ``"HAAR"`` -- Haar Feature-based
The modes that will be available soon:
* ``"HOG"`` -- Histogram of Oriented Gradients features
* ``"LBP"`` -- Local Binary Pattern features
* ``"FEATURE2D"`` -- All types of Feature2D
TrackerFeature::compute
-----------------------
Compute the features in the images collection
.. ocv:function:: void TrackerFeature::compute( const std::vector<Mat>& images, Mat& response )
:param images: The images
:param response: The output response
TrackerFeature::selection
-------------------------
Identify most effective features
.. ocv:function:: void TrackerFeature::selection( Mat& response, int npoints )
:param response: Collection of response for the specific TrackerFeature
:param npoints: Max number of features
.. note:: This method modifies the response parameter
TrackerFeature::getClassName
----------------------------
Get the name of the specific TrackerFeature
.. ocv:function:: String TrackerFeature::getClassName() const
Specialized TrackerFeature
==========================
In [AAM]_ table I and section III C are described the most known features type. At moment only :ocv:class:`TrackerFeatureHAAR` is implemented.
TrackerFeatureHAAR : TrackerFeature
-----------------------------------
TrackerFeature based on HAAR features, used by TrackerMIL and many others algorithms
.. ocv:class:: TrackerFeatureHAAR
TrackerFeatureHAAR class::
class CV_EXPORTS_W TrackerFeatureHAAR : TrackerFeature
{
public:
TrackerFeatureHAAR( const TrackerFeatureHAAR::Params &parameters = TrackerFeatureHAAR::Params() );
~TrackerFeatureHAAR();
void selection( Mat& response, int npoints );
bool extractSelected( const std::vector<int> selFeatures, const std::vector<Mat>& images, Mat& response );
std::vector<std::pair<float, float> >& getMeanSigmaPairs();
bool swapFeature( int source, int target );
bool swapFeature( int id, CvHaarEvaluator::FeatureHaar& feature );
CvHaarEvaluator::FeatureHaar& getFeatureAt( int id );
};
.. note:: HAAR features implementation is copied from apps/traincascade and modified according to MIL implementation
TrackerFeatureHAAR::Params
--------------------------
.. ocv:struct:: TrackerFeatureHAAR::Params
List of TrackerFeatureHAAR parameters::
struct CV_EXPORTS Params
{
Params();
int numFeatures; // # of rects
Size rectSize; // rect size
bool isIntegral; // true if input images are integral, false otherwise
};
TrackerFeatureHAAR::TrackerFeatureHAAR
--------------------------------------
Constructor
.. ocv:function:: TrackerFeatureHAAR::TrackerFeatureHAAR( const TrackerFeatureHAAR::Params &parameters = TrackerFeatureHAAR::Params() )
:param parameters: TrackerFeatureHAAR parameters :ocv:struct:`TrackerFeatureHAAR::Params`
TrackerFeatureHAAR::selection
-----------------------------
Identify most effective features
.. ocv:function:: void TrackerFeatureHAAR::selection( Mat& response, int npoints )
:param response: Collection of response for the specific TrackerFeature
:param npoints: Max number of features
.. note:: This method modifies the response parameter
TrackerFeatureHAAR::extractSelected
-----------------------------------
Compute the features only for the selected indices in the images collection
.. ocv:function:: bool TrackerFeatureHAAR::extractSelected( const std::vector<int> selFeatures, const std::vector<Mat>& images, Mat& response )
:param selFeatures: indices of selected features
:param images: The images
:param response: Collection of response for the specific TrackerFeature
TrackerFeatureHAAR::getMeanSigmaPairs
-------------------------------------
Get the list of mean/sigma. Return the list of mean/sigma
.. ocv:function:: std::vector<std::pair<float, float> >& TrackerFeatureHAAR::getMeanSigmaPairs()
TrackerFeatureHAAR::swapFeature
-------------------------------
Swap the feature in position source with the feature in position target
.. ocv:function:: bool TrackerFeatureHAAR::swapFeature( int source, int target )
:param source: The source position
:param target: The target position
Swap the feature in position id with the feature input
.. ocv:function:: bool TrackerFeatureHAAR::swapFeature( int id, CvHaarEvaluator::FeatureHaar& feature )
:param id: The position
:param feature: The feature
TrackerFeatureHAAR::getFeatureAt
--------------------------------
Get the feature in position id
.. ocv:function:: CvHaarEvaluator::FeatureHaar& TrackerFeatureHAAR::getFeatureAt( int id )
:param id: The position
TrackerFeatureHOG
-----------------
TODO To be implemented
TrackerFeatureLBP
-----------------
TODO To be implemented
TrackerFeatureFeature2d
-----------------------
TODO To be implemented

@ -1,506 +0,0 @@
Common Interfaces of TrackerModel
=================================
.. highlight:: cpp
ConfidenceMap
-------------
Represents the model of the target at frame :math:`k` (all states and scores)
[AAM]_ The set of the pair :math:`\langle \hat{x}^{i}_{k}, C^{i}_{k} \rangle`
.. c:type:: ConfidenceMap
ConfidenceMap::
typedef std::vector<std::pair<Ptr<TrackerTargetState>, float> > ConfidenceMap;
.. seealso::
:ocv:class:`TrackerTargetState`
Trajectory
----------
Represents the estimate states for all frames
[AAM]_ :math:`x_{k}` is the trajectory of the target up to time :math:`k`
.. c:type:: Trajectory
Trajectory::
typedef std::vector<Ptr<TrackerTargetState> > Trajectory;
.. seealso::
:ocv:class:`TrackerTargetState`
TrackerTargetState
------------------
Abstract base class for TrackerTargetState that represents a possible state of the target.
[AAM]_ :math:`\hat{x}^{i}_{k}` all the states candidates.
Inherits this class with your Target state
.. ocv:class:: TrackerTargetState
TrackerTargetState class::
class CV_EXPORTS_W TrackerTargetState
{
public:
virtual ~TrackerTargetState(){};
Point2f getTargetPosition() const;
void setTargetPosition( const Point2f& position );
int getTargetWidth() const;
void setTargetWidth( int width );
int getTargetHeight() const;
void setTargetHeight( int height );
};
In own implementation you can add scale variation, width, height, orientation, etc.
TrackerStateEstimator
---------------------
Abstract base class for TrackerStateEstimator that estimates the most likely target state.
[AAM]_ State estimator
[AMVOT]_ Statistical modeling (Fig. 3), Table III (generative) - IV (discriminative) - V (hybrid)
.. ocv:class:: TrackerStateEstimator
TrackerStateEstimator class::
class CV_EXPORTS_W TrackerStateEstimator
{
public:
virtual ~TrackerStateEstimator();
static Ptr<TrackerStateEstimator> create( const String& trackeStateEstimatorType );
Ptr<TrackerTargetState> estimate( const std::vector<ConfidenceMap>& confidenceMaps );
void update( std::vector<ConfidenceMap>& confidenceMaps );
String getClassName() const;
};
TrackerStateEstimator::create
-----------------------------
Create TrackerStateEstimator by tracker state estimator type
.. ocv:function:: static Ptr<TrackerStateEstimator> TrackerStateEstimator::create( const String& trackeStateEstimatorType )
:param trackeStateEstimatorType: The TrackerStateEstimator name
The modes available now:
* ``"BOOSTING"`` -- Boosting-based discriminative appearance models. See [AMVOT]_ section 4.4
The modes available soon:
* ``"SVM"`` -- SVM-based discriminative appearance models. See [AMVOT]_ section 4.5
TrackerStateEstimator::estimate
-------------------------------
Estimate the most likely target state, return the estimated state
.. ocv:function:: Ptr<TrackerTargetState> TrackerStateEstimator::estimate( const std::vector<ConfidenceMap>& confidenceMaps )
:param confidenceMaps: The overall appearance model as a list of :c:type:`ConfidenceMap`
TrackerStateEstimator::update
-----------------------------
Update the ConfidenceMap with the scores
.. ocv:function:: void TrackerStateEstimator::update( std::vector<ConfidenceMap>& confidenceMaps )
:param confidenceMaps: The overall appearance model as a list of :c:type:`ConfidenceMap`
TrackerStateEstimator::getClassName
-----------------------------------
Get the name of the specific TrackerStateEstimator
.. ocv:function:: String TrackerStateEstimator::getClassName() const
TrackerModel
------------
Abstract class that represents the model of the target. It must be instantiated by specialized tracker
[AAM]_ Ak
Inherits this with your TrackerModel
.. ocv:class:: TrackerModel
TrackerModel class::
class CV_EXPORTS_W TrackerModel
{
public:
TrackerModel();
virtual ~TrackerModel();
void modelEstimation( const std::vector<Mat>& responses );
void modelUpdate();
bool runStateEstimator();
bool setTrackerStateEstimator( Ptr<TrackerStateEstimator> trackerStateEstimator );
void setLastTargetState( const Ptr<TrackerTargetState>& lastTargetState );
Ptr<TrackerTargetState> getLastTargetState() const;
const std::vector<ConfidenceMap>& getConfidenceMaps() const;
const ConfidenceMap& getLastConfidenceMap() const;
Ptr<TrackerStateEstimator> getTrackerStateEstimator() const;
};
TrackerModel::modelEstimation
-----------------------------
Estimate the most likely target location
[AAM]_ ME, Model Estimation table I
.. ocv:function:: void TrackerModel::modelEstimation( const std::vector<Mat>& responses )
:param responses: Features extracted from :ocv:class:`TrackerFeatureSet`
TrackerModel::modelUpdate
-------------------------
Update the model
[AAM]_ MU, Model Update table I
.. ocv:function:: void TrackerModel::modelUpdate()
TrackerModel::runStateEstimator
-------------------------------
Run the TrackerStateEstimator, return true if is possible to estimate a new state, false otherwise
.. ocv:function:: bool TrackerModel::runStateEstimator()
TrackerModel::setTrackerStateEstimator
--------------------------------------
Set TrackerEstimator, return true if the tracker state estimator is added, false otherwise
.. ocv:function:: bool TrackerModel::setTrackerStateEstimator( Ptr<TrackerStateEstimator> trackerStateEstimator )
:param trackerStateEstimator: The :ocv:class:`TrackerStateEstimator`
.. note:: You can add only one :ocv:class:`TrackerStateEstimator`
TrackerModel::setLastTargetState
--------------------------------
Set the current :ocv:class:`TrackerTargetState` in the :c:type:`Trajectory`
.. ocv:function:: void TrackerModel::setLastTargetState( const Ptr<TrackerTargetState>& lastTargetState )
:param lastTargetState: The current :ocv:class:`TrackerTargetState`
TrackerModel::getLastTargetState
--------------------------------
Get the last :ocv:class:`TrackerTargetState` from :c:type:`Trajectory`
.. ocv:function:: Ptr<TrackerTargetState> TrackerModel::getLastTargetState() const
TrackerModel::getConfidenceMaps
-------------------------------
Get the list of the :c:type:`ConfidenceMap`
.. ocv:function:: const std::vector<ConfidenceMap>& TrackerModel::getConfidenceMaps() const
TrackerModel::getLastConfidenceMap
----------------------------------
Get the last :c:type:`ConfidenceMap` for the current frame
.. ocv:function:: const ConfidenceMap& TrackerModel::getLastConfidenceMap() const
TrackerModel::getTrackerStateEstimator
--------------------------------------
Get the :ocv:class:`TrackerStateEstimator`
.. ocv:function:: Ptr<TrackerStateEstimator> TrackerModel::getTrackerStateEstimator() const
Specialized TrackerStateEstimator
=================================
In [AMVOT]_ Statistical modeling (Fig. 3), Table III (generative) - IV (discriminative) - V (hybrid) are described the most known statistical model.
At moment :ocv:class:`TrackerStateEstimatorMILBoosting` and :ocv:class:`TrackerStateEstimatorAdaBoosting` are implemented.
TrackerStateEstimatorMILBoosting : TrackerStateEstimator
--------------------------------------------------------
TrackerStateEstimator based on Boosting
.. ocv:class:: TrackerStateEstimatorMILBoosting
TrackerStateEstimatorMILBoosting class::
class CV_EXPORTS_W TrackerStateEstimatorMILBoosting : public TrackerStateEstimator
{
public:
class TrackerMILTargetState : public TrackerTargetState
{
...
};
TrackerStateEstimatorMILBoosting( int nFeatures = 250 );
~TrackerStateEstimatorMILBoosting();
void setCurrentConfidenceMap( ConfidenceMap& confidenceMap );
};
TrackerMILTargetState : TrackerTargetState
------------------------------------------
Implementation of the target state for TrackerMILTargetState
.. ocv:class:: TrackerMILTargetState
TrackerMILTargetState class::
class TrackerMILTargetState : public TrackerTargetState
{
public:
TrackerMILTargetState( const Point2f& position, int targetWidth, int targetHeight, bool foreground, const Mat& features );
~TrackerMILTargetState(){};
void setTargetFg( bool foreground );
void setFeatures( const Mat& features );
bool isTargetFg() const;
Mat getFeatures() const;
};
TrackerStateEstimatorMILBoosting::TrackerMILTargetState::setTargetFg
--------------------------------------------------------------------
Set label: true for target foreground, false for background
.. ocv:function:: void TrackerStateEstimatorMILBoosting::TrackerMILTargetState::setTargetFg( bool foreground )
:param foreground: Label for background/foreground
TrackerStateEstimatorMILBoosting::TrackerMILTargetState::setFeatures
--------------------------------------------------------------------
Set the features extracted from :ocv:class:`TrackerFeatureSet`
.. ocv:function:: void TrackerStateEstimatorMILBoosting::TrackerMILTargetState::setFeatures( const Mat& features )
:param features: The features extracted
TrackerStateEstimatorMILBoosting::TrackerMILTargetState::isTargetFg
-------------------------------------------------------------------
Get the label. Return true for target foreground, false for background
.. ocv:function:: bool TrackerStateEstimatorMILBoosting::TrackerMILTargetState::isTargetFg() const
TrackerStateEstimatorMILBoosting::TrackerMILTargetState::getFeatures
--------------------------------------------------------------------
Get the features extracted
.. ocv:function:: void TrackerStateEstimatorMILBoosting::TrackerMILTargetState::setFeatures( const Mat& features )
TrackerStateEstimatorMILBoosting::TrackerStateEstimatorMILBoosting
------------------------------------------------------------------
Constructor
.. ocv:function:: TrackerStateEstimatorMILBoosting::TrackerStateEstimatorMILBoosting( int nFeatures=250 )
:param nFeatures: Number of features for each sample
TrackerStateEstimatorMILBoosting::setCurrentConfidenceMap
---------------------------------------------------------
Set the current confidenceMap
.. ocv:function:: void TrackerStateEstimatorMILBoosting::setCurrentConfidenceMap( ConfidenceMap& confidenceMap )
:param confidenceMap: The current :c:type:`ConfidenceMap`
TrackerStateEstimatorAdaBoosting : TrackerStateEstimator
--------------------------------------------------------
TrackerStateEstimatorAdaBoosting based on ADA-Boosting
.. ocv:class:: TrackerStateEstimatorAdaBoosting
TrackerStateEstimatorAdaBoosting class::
class CV_EXPORTS_W TrackerStateEstimatorAdaBoosting : public TrackerStateEstimator
{
public:
class TrackerAdaBoostingTargetState : public TrackerTargetState
{
...
};
TrackerStateEstimatorAdaBoosting( int numClassifer, int initIterations, int nFeatures, Size patchSize, const Rect& ROI, const std::vector<std::pair<float, float> >& meanSigma );
~TrackerStateEstimatorAdaBoosting();
Rect getSampleROI() const;
void setSampleROI( const Rect& ROI );
void setCurrentConfidenceMap( ConfidenceMap& confidenceMap );
std::vector<int> computeSelectedWeakClassifier();
std::vector<int> computeReplacedClassifier();
std::vector<int> computeSwappedClassifier();
void setMeanSigmaPair( const std::vector<std::pair<float, float> >& meanSigmaPair );
};
TrackerAdaBoostingTargetState : TrackerTargetState
--------------------------------------------------
Implementation of the target state for TrackerAdaBoostingTargetState
.. ocv:class:: TrackerAdaBoostingTargetState
TrackerAdaBoostingTargetState class::
class TrackerAdaBoostingTargetState : public TrackerTargetState
{
public:
TrackerAdaBoostingTargetState( const Point2f& position, int width, int height, bool foreground, const Mat& responses );
~TrackerAdaBoostingTargetState(){};
void setTargetResponses( const Mat& responses );
void setTargetFg( bool foreground );
Mat getTargetResponses() const;
bool isTargetFg() const;
};
TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::setTargetFg
----------------------------------------------------------------------------
Set label: true for target foreground, false for background
.. ocv:function:: void TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::setTargetFg( bool foreground )
:param foreground: Label for background/foreground
TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::setTargetResponses
-----------------------------------------------------------------------------------
Set the features extracted from :ocv:class:`TrackerFeatureSet`
.. ocv:function:: void TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::setTargetResponses( const Mat& responses )
:param responses: The features extracted
TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::isTargetFg
---------------------------------------------------------------------------
Get the label. Return true for target foreground, false for background
.. ocv:function:: bool TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::isTargetFg() const
TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::getTargetResponses
-----------------------------------------------------------------------------------
Get the features extracted
.. ocv:function:: Mat TrackerStateEstimatorAdaBoosting::TrackerAdaBoostingTargetState::getTargetResponses()
TrackerStateEstimatorAdaBoosting::TrackerStateEstimatorAdaBoosting
------------------------------------------------------------------
Constructor
.. ocv:function:: TrackerStateEstimatorAdaBoosting::TrackerStateEstimatorAdaBoosting( int numClassifer, int initIterations, int nFeatures, Size patchSize, const Rect& ROI, const std::vector<std::pair<float, float> >& meanSigma )
:param numClassifer: Number of base classifiers
:param initIterations: Number of iterations in the initialization
:param nFeatures: Number of features/weak classifiers
:param patchSize: tracking rect
:param ROI: initial ROI
:param meanSigma: pairs of mean/sigma
TrackerStateEstimatorAdaBoosting::setCurrentConfidenceMap
---------------------------------------------------------
Set the current confidenceMap
.. ocv:function:: void TrackerStateEstimatorAdaBoosting::setCurrentConfidenceMap( ConfidenceMap& confidenceMap )
:param confidenceMap: The current :c:type:`ConfidenceMap`
TrackerStateEstimatorAdaBoosting::getSampleROI
----------------------------------------------
Get the sampling ROI
.. ocv:function:: Rect TrackerStateEstimatorAdaBoosting::getSampleROI() const
TrackerStateEstimatorAdaBoosting::setSampleROI
----------------------------------------------
Set the sampling ROI
.. ocv:function:: void TrackerStateEstimatorAdaBoosting::setSampleROI( const Rect& ROI )
:param ROI: the sampling ROI
TrackerStateEstimatorAdaBoosting::computeSelectedWeakClassifier
---------------------------------------------------------------
Get the list of the selected weak classifiers for the classification step
.. ocv:function:: std::vector<int> TrackerStateEstimatorAdaBoosting::computeSelectedWeakClassifier()
TrackerStateEstimatorAdaBoosting::computeReplacedClassifier
-----------------------------------------------------------
Get the list of the weak classifiers that should be replaced
.. ocv:function:: std::vector<int> TrackerStateEstimatorAdaBoosting::computeReplacedClassifier()
TrackerStateEstimatorAdaBoosting::computeSwappedClassifier
----------------------------------------------------------
Get the list of the weak classifiers that replace those to be replaced
.. ocv:function:: std::vector<int> TrackerStateEstimatorAdaBoosting::computeSwappedClassifier()
TrackerStateEstimatorAdaBoosting::setMeanSigmaPair
--------------------------------------------------
Set the mean/sigma to instantiate possibly new classifiers
.. ocv:function:: void TrackerStateEstimatorAdaBoosting::setMeanSigmaPair( const std::vector<std::pair<float, float> >& meanSigmaPair )
:param meanSigmaPair: the mean/sigma pairs

@ -1,347 +0,0 @@
Common Interfaces of TrackerSampler
===================================
.. highlight:: cpp
TrackerSampler
--------------
Class that manages the sampler in order to select regions for the update the model of the tracker
[AAM]_ Sampling e Labeling. See table I and section III B
.. ocv:class:: TrackerSampler
TrackerSampler class::
class CV_EXPORTS_W TrackerSampler
{
public:
TrackerSampler();
~TrackerSampler();
void sampling( const Mat& image, Rect boundingBox );
const std::vector<std::pair<String, Ptr<TrackerSamplerAlgorithm> > >& getSamplers() const;
const std::vector<Mat>& getSamples() const;
bool addTrackerSamplerAlgorithm( String trackerSamplerAlgorithmType );
bool addTrackerSamplerAlgorithm( Ptr<TrackerSamplerAlgorithm>& sampler );
};
TrackerSampler is an aggregation of :ocv:class:`TrackerSamplerAlgorithm`
.. seealso::
:ocv:class:`TrackerSamplerAlgorithm`
TrackerSampler::sampling
------------------------
Computes the regions starting from a position in an image
.. ocv:function:: void TrackerSampler::sampling( const Mat& image, Rect boundingBox )
:param image: The current frame
:param boundingBox: The bounding box from which regions can be calculated
TrackerSampler::getSamplers
---------------------------
Return the collection of the :ocv:class:`TrackerSamplerAlgorithm`
.. ocv:function:: const std::vector<std::pair<String, Ptr<TrackerSamplerAlgorithm> > >& TrackerSampler::getSamplers() const
TrackerSampler::getSamples
--------------------------
Return the samples from all :ocv:class:`TrackerSamplerAlgorithm`, [AAM]_ Fig. 1 variable Sk
.. ocv:function:: const std::vector<Mat>& TrackerSampler::getSamples() const
TrackerSampler::addTrackerSamplerAlgorithm
------------------------------------------
Add TrackerSamplerAlgorithm in the collection.
Return true if sampler is added, false otherwise
.. ocv:function:: bool TrackerSampler::addTrackerSamplerAlgorithm( String trackerSamplerAlgorithmType )
:param trackerSamplerAlgorithmType: The TrackerSamplerAlgorithm name
.. ocv:function:: bool TrackerSampler::addTrackerSamplerAlgorithm( Ptr<TrackerSamplerAlgorithm>& sampler )
:param sampler: The TrackerSamplerAlgorithm class
The modes available now:
* ``"CSC"`` -- Current State Center
* ``"CS"`` -- Current State
* ``"PF"`` -- Particle Filtering
Example ``TrackerSamplerAlgorithm::addTrackerSamplerAlgorithm`` : ::
//sample usage:
TrackerSamplerCSC::Params CSCparameters;
Ptr<TrackerSamplerAlgorithm> CSCSampler = new TrackerSamplerCSC( CSCparameters );
if( !sampler->addTrackerSamplerAlgorithm( CSCSampler ) )
return false;
//or add CSC sampler with default parameters
//sampler->addTrackerSamplerAlgorithm( "CSC" );
.. note:: If you use the second method, you must initialize the TrackerSamplerAlgorithm
TrackerSamplerAlgorithm
-----------------------
Abstract base class for TrackerSamplerAlgorithm that represents the algorithm for the specific sampler.
.. ocv:class:: TrackerSamplerAlgorithm
TrackerSamplerAlgorithm class::
class CV_EXPORTS_W TrackerSamplerAlgorithm
{
public:
virtual ~TrackerSamplerAlgorithm();
static Ptr<TrackerSamplerAlgorithm> create( const String& trackerSamplerType );
bool sampling( const Mat& image, Rect boundingBox, std::vector<Mat>& sample );
String getClassName() const;
};
TrackerSamplerAlgorithm::create
-------------------------------
Create TrackerSamplerAlgorithm by tracker sampler type.
.. ocv:function:: static Ptr<TrackerSamplerAlgorithm> TrackerSamplerAlgorithm::create( const String& trackerSamplerType )
:param trackerSamplerType: The trackerSamplerType name
The modes available now:
* ``"CSC"`` -- Current State Center
* ``"CS"`` -- Current State
TrackerSamplerAlgorithm::sampling
---------------------------------
Computes the regions starting from a position in an image. Return true if samples are computed, false otherwise
.. ocv:function:: bool TrackerSamplerAlgorithm::sampling( const Mat& image, Rect boundingBox, std::vector<Mat>& sample )
:param image: The current frame
:param boundingBox: The bounding box from which regions can be calculated
:sample: The computed samples [AAM]_ Fig. 1 variable Sk
TrackerSamplerAlgorithm::getClassName
-------------------------------------
Get the name of the specific TrackerSamplerAlgorithm
.. ocv:function:: String TrackerSamplerAlgorithm::getClassName() const
Specialized TrackerSamplerAlgorithm
===================================
In [AAM]_ table I there are described the most known sampling strategies. At moment :ocv:class:`TrackerSamplerCSC` and :ocv:class:`TrackerSamplerCS` are implemented. Beside these, there is :ocv:class:`TrackerSamplerPF`, sampler based on particle filtering.
TrackerSamplerCSC : TrackerSamplerAlgorithm
-------------------------------------------
TrackerSampler based on CSC (current state centered), used by MIL algorithm TrackerMIL
.. ocv:class:: TrackerSamplerCSC
TrackerSamplerCSC class::
class CV_EXPORTS_W TrackerSamplerCSC
{
public:
TrackerSamplerCSC( const TrackerSamplerCSC::Params &parameters = TrackerSamplerCSC::Params() );
void setMode( int samplingMode );
~TrackerSamplerCSC();
};
TrackerSamplerCSC::Params
-------------------------
.. ocv:struct:: TrackerSamplerCSC::Params
List of TrackerSamplerCSC parameters::
struct CV_EXPORTS Params
{
Params();
float initInRad; // radius for gathering positive instances during init
float trackInPosRad; // radius for gathering positive instances during tracking
float searchWinSize; // size of search window
int initMaxNegNum; // # negative samples to use during init
int trackMaxPosNum; // # positive samples to use during training
int trackMaxNegNum; // # negative samples to use during training
};
TrackerSamplerCSC::TrackerSamplerCSC
------------------------------------
Constructor
.. ocv:function:: TrackerSamplerCSC::TrackerSamplerCSC( const TrackerSamplerCSC::Params &parameters = TrackerSamplerCSC::Params() )
:param parameters: TrackerSamplerCSC parameters :ocv:struct:`TrackerSamplerCSC::Params`
TrackerSamplerCSC::setMode
--------------------------
Set the sampling mode of TrackerSamplerCSC
.. ocv:function:: void TrackerSamplerCSC::setMode( int samplingMode )
:param samplingMode: The sampling mode
The modes are:
* ``"MODE_INIT_POS = 1"`` -- for the positive sampling in initialization step
* ``"MODE_INIT_NEG = 2"`` -- for the negative sampling in initialization step
* ``"MODE_TRACK_POS = 3"`` -- for the positive sampling in update step
* ``"MODE_TRACK_NEG = 4"`` -- for the negative sampling in update step
* ``"MODE_DETECT = 5"`` -- for the sampling in detection step
TrackerSamplerCS : TrackerSamplerAlgorithm
-------------------------------------------
TrackerSampler based on CS (current state), used by algorithm TrackerBoosting
.. ocv:class:: TrackerSamplerCS
TrackerSamplerCS class::
class CV_EXPORTS_W TrackerSamplerCS
{
public:
TrackerSamplerCS( const TrackerSamplerCS::Params &parameters = TrackerSamplerCS::Params() );
void setMode( int samplingMode );
~TrackerSamplerCS();
};
TrackerSamplerCS::Params
-------------------------
.. ocv:struct:: TrackerSamplerCS::Params
List of TrackerSamplerCS parameters::
struct CV_EXPORTS Params
{
Params();
float overlap; //overlapping for the search windows
float searchFactor; //search region parameter
};
TrackerSamplerCS::TrackerSamplerCS
------------------------------------
Constructor
.. ocv:function:: TrackerSamplerCS::TrackerSamplerCS( const TrackerSamplerCS::Params &parameters = TrackerSamplerCS::Params() )
:param parameters: TrackerSamplerCS parameters :ocv:struct:`TrackerSamplerCS::Params`
TrackerSamplerCS::setMode
--------------------------
Set the sampling mode of TrackerSamplerCS
.. ocv:function:: void TrackerSamplerCS::setMode( int samplingMode )
:param samplingMode: The sampling mode
The modes are:
* ``"MODE_POSITIVE = 1"`` -- for the positive sampling
* ``"MODE_NEGATIVE = 2"`` -- for the negative sampling
* ``"MODE_CLASSIFY = 3"`` -- for the sampling in classification step
TrackerSamplerPF : TrackerSamplerAlgorithm
-------------------------------------------
This sampler is based on particle filtering. In principle, it can be thought of as performing some sort of optimization (and indeed, this
tracker uses opencv's ``optim`` module), where tracker seeks to find the rectangle in given frame, which is the most *"similar"* to the initial
rectangle (the one, given through the constructor).
The optimization performed is stochastic and somehow resembles genetic algorithms, where on each new ``image`` received (submitted via ``TrackerSamplerPF::sampling()``) we start with the region bounded by ``boundingBox``, then generate several "perturbed" boxes, take the ones most similar to the original. This selection round is repeated several times. At the end, we hope that only the most promising box remaining, and these are combined to produce the subrectangle of ``image``, which is put as a sole element in array ``sample``.
It should be noted, that the definition of "similarity" between two rectangles is based on comparing their histograms. As experiments show, tracker is *not* very succesfull if target is assumed to strongly change its dimensions.
.. ocv:class:: TrackerSamplerPF
TrackerSamplerPF class::
class CV_EXPORTS_W TrackerSamplerPF : public TrackerSamplerAlgorithm{
public:
TrackerSamplerPF(const Mat& chosenRect,const TrackerSamplerPF::Params &parameters = TrackerSamplerPF::Params());
void sampling( const Mat& image, Rect boundingBox, std::vector<Mat>& sample ); //inherited from TrackerSamplerAlgorithmTrackerSamplerAlgorithm
};
TrackerSamplerPF::Params
-------------------------
.. ocv:struct:: TrackerSamplerPF::Params
This structure contains all the parameters that can be varied during the course of sampling algorithm. Below is the structure exposed,
together with its members briefly explained with reference to the above discussion on algorithm's working.
::
struct CV_EXPORTS Params
{
Params();
int iterationNum; //number of selection rounds
int particlesNum; //number of "perturbed" boxes on each round
double alpha; //with each new round we exponentially decrease the amount of "perturbing" we allow (like in simulated annealing)
//and this very alpha controls how fast annealing happens, ie. how fast perturbing decreases
Mat_<double> std; //initial values for perturbing (1-by-4 array, as each rectangle is given by 4 values -- coordinates of opposite vertices,
//hence we have 4 values to perturb)
};
TrackerSamplerPF::TrackerSamplerPF
------------------------------------
Constructor
.. ocv:function:: TrackerSamplerPF(const Mat& chosenRect,const TrackerSamplerPF::Params &parameters = TrackerSamplerPF::Params())
:param chosenRect: Initial rectangle, that is supposed to contain target we'd like to track.

@ -1,226 +0,0 @@
Tracker Algorithms
==================
.. highlight:: cpp
The following algorithms are implemented at the moment.
.. [MIL] B Babenko, M-H Yang, and S Belongie, Visual Tracking with Online Multiple Instance Learning, In CVPR, 2009
.. [OLB] H Grabner, M Grabner, and H Bischof, Real-time tracking via on-line boosting, In Proc. BMVC, volume 1, pages 47– 56, 2006
.. [MedianFlow] Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-Backward Error: Automatic Detection of Tracking Failures,” International Conference on Pattern Recognition, 2010, pp. 23-26.
.. [TLD] Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-Learning-Detection,” Pattern Analysis and Machine Intelligence 2011.
TrackerBoosting
---------------
This is a real-time object tracking based on a novel on-line version of the AdaBoost algorithm.
The classifier uses the surrounding background as negative examples in update step to avoid the drifting problem. The implementation is based on
[OLB]_.
.. ocv:class:: TrackerBoosting
Implementation of TrackerBoosting from :ocv:class:`Tracker`::
class CV_EXPORTS_W TrackerBoosting : public Tracker
{
public:
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
static Ptr<trackerBoosting> createTracker(const trackerBoosting::Params &parameters=trackerBoosting::Params());
virtual ~trackerBoosting(){};
protected:
bool initImpl( const Mat& image, const Rect2d& boundingBox );
bool updateImpl( const Mat& image, Rect2d& boundingBox );
};
TrackerBoosting::Params
-----------------------------------------------------------------------
.. ocv:struct:: TrackerBoosting::Params
List of BOOSTING parameters::
struct CV_EXPORTS Params
{
Params();
int numClassifiers; //the number of classifiers to use in a OnlineBoosting algorithm
float samplerOverlap; //search region parameters to use in a OnlineBoosting algorithm
float samplerSearchFactor; // search region parameters to use in a OnlineBoosting algorithm
int iterationInit; //the initial iterations
int featureSetNumFeatures; // #features
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
};
TrackerBoosting::createTracker
-----------------------------------------------------------------------
Constructor
.. ocv:function:: Ptr<trackerBoosting> TrackerBoosting::createTracker(const trackerBoosting::Params &parameters=trackerBoosting::Params())
:param parameters: BOOSTING parameters :ocv:struct:`TrackerBoosting::Params`
TrackerMIL
----------------------
The MIL algorithm trains a classifier in an online manner to separate the object from the background. Multiple Instance Learning avoids the drift problem for a robust tracking. The implementation is based on [MIL]_.
Original code can be found here http://vision.ucsd.edu/~bbabenko/project_miltrack.shtml
.. ocv:class:: TrackerMIL
Implementation of TrackerMIL from :ocv:class:`Tracker`::
class CV_EXPORTS_W TrackerMIL : public Tracker
{
public:
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
static Ptr<trackerMIL> createTracker(const trackerMIL::Params &parameters=trackerMIL::Params());
virtual ~trackerMIL(){};
protected:
bool initImpl( const Mat& image, const Rect2d& boundingBox );
bool updateImpl( const Mat& image, Rect2d& boundingBox );
};
TrackerMIL::Params
------------------
.. ocv:struct:: TrackerMIL::Params
List of MIL parameters::
struct CV_EXPORTS Params
{
Params();
//parameters for sampler
float samplerInitInRadius; // radius for gathering positive instances during init
int samplerInitMaxNegNum; // # negative samples to use during init
float samplerSearchWinSize; // size of search window
float samplerTrackInRadius; // radius for gathering positive instances during tracking
int samplerTrackMaxPosNum; // # positive samples to use during tracking
int samplerTrackMaxNegNum; // # negative samples to use during tracking
int featureSetNumFeatures; // # features
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
};
TrackerMIL::createTracker
-------------------------------
Constructor
.. ocv:function:: Ptr<trackerMIL> TrackerMIL::createTracker(const trackerMIL::Params &parameters=trackerMIL::Params())
:param parameters: MIL parameters :ocv:struct:`TrackerMIL::Params`
TrackerMedianFlow
----------------------
Implementation of a paper "Forward-Backward Error: Automatic Detection of Tracking Failures" by Z. Kalal, K. Mikolajczyk
and Jiri Matas. The implementation is based on [MedianFlow]_.
The tracker is suitable for very smooth and predictable movements when object is visible throughout the whole sequence. It's quite and
accurate for this type of problems (in particular, it was shown by authors to outperform MIL). During the implementation period the code at
http://www.aonsquared.co.uk/node/5, the courtesy of the author Arthur Amarra, was used for the reference purpose.
.. ocv:class:: TrackerMedianFlow
Implementation of TrackerMedianFlow from :ocv:class:`Tracker`::
class CV_EXPORTS_W TrackerMedianFlow : public Tracker
{
public:
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
static Ptr<trackerMedianFlow> createTracker(const trackerMedianFlow::Params &parameters=trackerMedianFlow::Params());
virtual ~trackerMedianFlow(){};
protected:
bool initImpl( const Mat& image, const Rect2d& boundingBox );
bool updateImpl( const Mat& image, Rect2d& boundingBox );
};
TrackerMedianFlow::Params
------------------------------------
.. ocv:struct:: TrackerMedianFlow::Params
List of MedianFlow parameters::
struct CV_EXPORTS Params
{
Params();
int pointsInGrid; //square root of number of keypoints used; increase it to trade
//accurateness for speed; default value is sensible and recommended
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
};
TrackerMedianFlow::createTracker
-----------------------------------
Constructor
.. ocv:function:: Ptr<trackerMedianFlow> TrackerMedianFlow::createTracker(const trackerMedianFlow::Params &parameters=trackerMedianFlow::Params())
:param parameters: Median Flow parameters :ocv:struct:`TrackerMedianFlow::Params`
TrackerTLD
----------------------
TLD is a novel tracking framework that explicitly decomposes the long-term tracking task into tracking, learning and detection. The tracker follows the object from frame to frame. The detector localizes all appearances that have been observed so far and corrects the tracker if necessary. The learning estimates detector’s errors and updates it to avoid these errors in the future. The implementation is based on [TLD]_.
The Median Flow algorithm (see above) was chosen as a tracking component in this implementation, following authors. Tracker is supposed to be able
to handle rapid motions, partial occlusions, object absence etc.
.. ocv:class:: TrackerTLD
Implementation of TrackerTLD from :ocv:class:`Tracker`::
class CV_EXPORTS_W TrackerTLD : public Tracker
{
public:
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
static Ptr<trackerTLD> createTracker(const trackerTLD::Params &parameters=trackerTLD::Params());
virtual ~trackerTLD(){};
protected:
bool initImpl( const Mat& image, const Rect2d& boundingBox );
bool updateImpl( const Mat& image, Rect2d& boundingBox );
};
TrackerTLD::Params
------------------------
.. ocv:struct:: TrackerTLD::Params
List of TLD parameters::
struct CV_EXPORTS Params
{
Params();
void read( const FileNode& fn );
void write( FileStorage& fs ) const;
};
TrackerTLD::createTracker
-------------------------------
Constructor
.. ocv:function:: Ptr<trackerTLD> TrackerTLD::createTracker(const trackerTLD::Params &parameters=trackerTLD::Params())
:param parameters: TLD parameters :ocv:struct:`TrackerTLD::Params`

@ -1,63 +0,0 @@
******************************************
tracking. Tracking API
******************************************
.. highlight:: cpp
Long-term optical tracking API
------------------------------
Long-term optical tracking is one of most important issue for many computer vision applications in real world scenario.
The development in this area is very fragmented and this API is an unique interface useful for plug several algorithms and compare them.
This work is partially based on [AAM]_ and [AMVOT]_.
This algorithms start from a bounding box of the target and with their internal representation they avoid the drift during the tracking.
These long-term trackers are able to evaluate online the quality of the location of the target in the new frame, without ground truth.
There are three main components: the TrackerSampler, the TrackerFeatureSet and the TrackerModel. The first component is the object that computes the patches over the frame based on the last target location.
The TrackerFeatureSet is the class that manages the Features, is possible plug many kind of these (HAAR, HOG, LBP, Feature2D, etc).
The last component is the internal representation of the target, it is the appearence model. It stores all state candidates and compute the trajectory (the most likely target states). The class TrackerTargetState represents a possible state of the target.
The TrackerSampler and the TrackerFeatureSet are the visual representation of the target, instead the TrackerModel is the statistical model.
A recent benchmark between these algorithms can be found in [OOT]_.
.. only:: plantuml
UML design:
-----------
.. toctree::
uml/package
uml/Tracker
uml/TrackerSampler
uml/TrackerFeature
uml/TrackerModel
To see how API works, try tracker demo:
https://github.com/lenlen/opencv/blob/tracking_api/samples/cpp/tracker.cpp
.. note:: This Tracking API has been designed with PlantUML. If you modify this API please change UML files under modules/tracking/doc/uml/
The following reference was used in the API
.. [AAM] S Salti, A Cavallaro, L Di Stefano, Adaptive Appearance Modeling for Video Tracking: Survey and Evaluation, IEEE Transactions on Image Processing, Vol. 21, Issue 10, October 2012, pp. 4334-4348
.. [AMVOT] X Li, W Hu, C Shen, Z Zhang, A Dick, A van den Hengel, A Survey of Appearance Models in Visual Object Tracking, ACM Transactions on Intelligent Systems and Technology (TIST), 2013
.. [OOT] Yi Wu and Jongwoo Lim and Ming-Hsuan Yang, Online Object Tracking: A Benchmark, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
Tracker classes:
----------------
.. toctree::
:maxdepth: 2
tracker_algorithms
common_interfaces_tracker
common_interfaces_tracker_sampler
common_interfaces_tracker_feature_set
common_interfaces_tracker_model

@ -1,40 +0,0 @@
Tracker diagram
===============
.. uml::
..@startuml
package "Tracker package" #DDDDDD {
class Algorithm
class Tracker{
Ptr<TrackerFeatureSet> featureSet;
Ptr<TrackerSampler> sampler;
Ptr<TrackerModel> model;
---
+static Ptr<Tracker> create(const string& trackerType);
+bool init(const Mat& image, const Rect& boundingBox);
+bool update(const Mat& image, Rect& boundingBox);
}
class Tracker
note right: Tracker is the general interface for each specialized trackers
class TrackerMIL{
+static Ptr<TrackerMIL> createTracker(const TrackerMIL::Params &parameters);
+virtual ~TrackerMIL();
}
class TrackerBoosting{
+static Ptr<TrackerBoosting> createTracker(const TrackerBoosting::Params &parameters);
+virtual ~TrackerBoosting();
}
Algorithm <|-- Tracker : virtual inheritance
Tracker <|-- TrackerMIL
Tracker <|-- TrackerBoosting
note "Single instance of the Tracker" as N1
TrackerBoosting .. N1
TrackerMIL .. N1
}
..@enduml

@ -1,60 +0,0 @@
TrackerFeatureSet diagram
=========================
.. uml::
..@startuml
package "TrackerFeature package" #DDDDDD {
class TrackerFeatureSet{
-vector<pair<string, Ptr<TrackerFeature> > > features
-vector<Mat> responses
...
TrackerFeatureSet();
~TrackerFeatureSet();
--
+extraction(const std::vector<Mat>& images);
+selection();
+removeOutliers();
+vector<Mat> response getResponses();
+vector<pair<string TrackerFeatureType, Ptr<TrackerFeature> > > getTrackerFeatures();
+bool addTrackerFeature(string trackerFeatureType);
+bool addTrackerFeature(Ptr<TrackerFeature>& feature);
-clearResponses();
}
class TrackerFeature <<virtual>>{
static Ptr<TrackerFeature> = create(const string& trackerFeatureType);
compute(const std::vector<Mat>& images, Mat& response);
selection(Mat& response, int npoints);
}
note bottom: Can be specialized as in table II\nA tracker can use more types of features
class TrackerFeatureFeature2D{
-vector<Keypoints> keypoints
---
TrackerFeatureFeature2D(string detectorType, string descriptorType);
~TrackerFeatureFeature2D();
---
compute(const std::vector<Mat>& images, Mat& response);
selection( Mat& response, int npoints);
}
class TrackerFeatureHOG{
TrackerFeatureHOG();
~TrackerFeatureHOG();
---
compute(const std::vector<Mat>& images, Mat& response);
selection(Mat& response, int npoints);
}
TrackerFeatureSet *-- TrackerFeature
TrackerFeature <|-- TrackerFeatureHOG
TrackerFeature <|-- TrackerFeatureFeature2D
note "Per readability and simplicity in this diagram\n there are only two TrackerFeature but you\n can considering the implementation of the other TrackerFeature" as N1
TrackerFeatureHOG .. N1
TrackerFeatureFeature2D .. N1
}
..@enduml

@ -1,67 +0,0 @@
TrackerModel diagram
====================
.. uml::
..@startuml
package "TrackerModel package" #DDDDDD {
class Typedef << (T,#FF7700) >>{
ConfidenceMap
Trajectory
}
class TrackerModel{
-vector<ConfidenceMap> confidenceMaps;
-Trajectory trajectory;
-Ptr<TrackerStateEstimator> stateEstimator;
...
TrackerModel();
~TrackerModel();
+bool setTrackerStateEstimator(Ptr<TrackerStateEstimator> trackerStateEstimator);
+Ptr<TrackerStateEstimator> getTrackerStateEstimator();
+void modelEstimation(const vector<Mat>& responses);
+void modelUpdate();
+void setLastTargetState(const Ptr<TrackerTargetState> lastTargetState);
+void runStateEstimator();
+const vector<ConfidenceMap>& getConfidenceMaps();
+const ConfidenceMap& getLastConfidenceMap();
}
class TrackerTargetState <<virtual>>{
Point2f targetPosition;
---
Point2f getTargetPosition();
void setTargetPosition(Point2f position);
}
class TrackerTargetState
note bottom: Each tracker can create own state
class TrackerStateEstimator <<virtual>>{
~TrackerStateEstimator();
static Ptr<TrackerStateEstimator> create(const String& trackeStateEstimatorType);
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
class TrackerStateEstimatorSVM{
TrackerStateEstimatorSVM()
~TrackerStateEstimatorSVM()
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
class TrackerStateEstimatorMILBoosting{
TrackerStateEstimatorMILBoosting()
~TrackerStateEstimatorMILBoosting()
Ptr<TrackerTargetState> estimate(const vector<ConfidenceMap>& confidenceMaps)
void update(vector<ConfidenceMap>& confidenceMaps)
}
TrackerModel -> TrackerStateEstimator: create
TrackerModel *-- TrackerTargetState
TrackerStateEstimator <|-- TrackerStateEstimatorMILBoosting
TrackerStateEstimator <|-- TrackerStateEstimatorSVM
}
..@enduml

@ -1,45 +0,0 @@
TrackerSampler diagram
======================
.. uml::
..@startuml
package "TrackerSampler package" #DDDDDD {
class TrackerSampler{
-vector<pair<String, Ptr<TrackerSamplerAlgorithm> > > samplers
-vector<Mat> samples;
...
TrackerSampler();
~TrackerSampler();
+sampling(const Mat& image, Rect boundingBox);
+const vector<pair<String, Ptr<TrackerSamplerAlgorithm> > >& getSamplers();
+const vector<Mat>& getSamples();
+bool addTrackerSamplerAlgorithm(String trackerSamplerAlgorithmType);
+bool addTrackerSamplerAlgorithm(Ptr<TrackerSamplerAlgorithm>& sampler);
---
-void clearSamples();
}
class TrackerSamplerAlgorithm{
~TrackerSamplerAlgorithm();
+static Ptr<TrackerSamplerAlgorithm> create(const String& trackerSamplerType);
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
note bottom: A tracker could sample the target\nor it could sample the target and the background
class TrackerSamplerCS{
TrackerSamplerCS();
~TrackerSamplerCS();
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
class TrackerSamplerCSC{
TrackerSamplerCSC();
~TrackerSamplerCSC();
+bool sampling(const Mat& image, Rect boundingBox, vector<Mat>& sample);
}
}
..@enduml

@ -1,15 +0,0 @@
General diagram
===============
.. uml::
..@startuml
package "Tracker"
package "TrackerFeature"
package "TrackerSampler"
package "TrackerModel"
Tracker -> TrackerModel: create
Tracker -> TrackerSampler: create
Tracker -> TrackerFeature: create
..@enduml

@ -1,93 +0,0 @@
Experimental 2D Features Algorithms
===================================
This section describes experimental algorithms for 2d feature detection.
StarFeatureDetector
-------------------
.. ocv:class:: StarFeatureDetector : public FeatureDetector
The class implements the keypoint detector introduced by [Agrawal08]_, synonym of ``StarDetector``. ::
class StarFeatureDetector : public FeatureDetector
{
public:
StarFeatureDetector( int maxSize=16, int responseThreshold=30,
int lineThresholdProjected = 10,
int lineThresholdBinarized=8, int suppressNonmaxSize=5 );
virtual void read( const FileNode& fn );
virtual void write( FileStorage& fs ) const;
protected:
...
};
.. [Agrawal08] Agrawal, M., Konolige, K., & Blas, M. R. (2008). Censure: Center surround extremas for realtime feature detection and matching. In Computer Vision–ECCV 2008 (pp. 102-115). Springer Berlin Heidelberg.
BriefDescriptorExtractor
------------------------
.. ocv:class:: BriefDescriptorExtractor : public DescriptorExtractor
Class for computing BRIEF descriptors described in a paper of Calonder M., Lepetit V.,
Strecha C., Fua P. *BRIEF: Binary Robust Independent Elementary Features* ,
11th European Conference on Computer Vision (ECCV), Heraklion, Crete. LNCS Springer, September 2010. ::
class BriefDescriptorExtractor : public DescriptorExtractor
{
public:
static const int PATCH_SIZE = 48;
static const int KERNEL_SIZE = 9;
// bytes is a length of descriptor in bytes. It can be equal 16, 32 or 64 bytes.
BriefDescriptorExtractor( int bytes = 32 );
virtual void read( const FileNode& );
virtual void write( FileStorage& ) const;
virtual int descriptorSize() const;
virtual int descriptorType() const;
virtual int defaultNorm() const;
protected:
...
};
.. note::
* A complete BRIEF extractor sample can be found at opencv_source_code/samples/cpp/brief_match_test.cpp
FREAK
-----
.. ocv:class:: FREAK : public DescriptorExtractor
Class implementing the FREAK (*Fast Retina Keypoint*) keypoint descriptor, described in [AOV12]_. The algorithm propose a novel keypoint descriptor inspired by the human visual system and more precisely the retina, coined Fast Retina Key- point (FREAK). A cascade of binary strings is computed by efficiently comparing image intensities over a retinal sampling pattern. FREAKs are in general faster to compute with lower memory load and also more robust than SIFT, SURF or BRISK. They are competitive alternatives to existing keypoints in particular for embedded applications.
.. [AOV12] A. Alahi, R. Ortiz, and P. Vandergheynst. FREAK: Fast Retina Keypoint. In IEEE Conference on Computer Vision and Pattern Recognition, 2012. CVPR 2012 Open Source Award Winner.
.. note::
* An example on how to use the FREAK descriptor can be found at opencv_source_code/samples/cpp/freak_demo.cpp
FREAK::FREAK
------------
The FREAK constructor
.. ocv:function:: FREAK::FREAK( bool orientationNormalized=true, bool scaleNormalized=true, float patternScale=22.0f, int nOctaves=4, const vector<int>& selectedPairs=vector<int>() )
:param orientationNormalized: Enable orientation normalization.
:param scaleNormalized: Enable scale normalization.
:param patternScale: Scaling of the description pattern.
:param nOctaves: Number of octaves covered by the detected keypoints.
:param selectedPairs: (Optional) user defined selected pairs indexes,
FREAK::selectPairs
------------------
Select the 512 best description pair indexes from an input (grayscale) image set. FREAK is available with a set of pairs learned off-line. Researchers can run a training process to learn their own set of pair. For more details read section 4.2 in: A. Alahi, R. Ortiz, and P. Vandergheynst. FREAK: Fast Retina Keypoint. In IEEE Conference on Computer Vision and Pattern Recognition, 2012.
We notice that for keypoint matching applications, image content has little effect on the selected pairs unless very specific what does matter is the detector type (blobs, corners,...) and the options used (scale/rotation invariance,...). Reduce corrThresh if not enough pairs are selected (43 points --> 903 possible pairs)
.. ocv:function:: vector<int> FREAK::selectPairs(const vector<Mat>& images, vector<vector<KeyPoint> >& keypoints, const double corrThresh = 0.7, bool verbose = true)
:param images: Grayscale image input set.
:param keypoints: Set of detected keypoints
:param corrThresh: Correlation threshold.
:param verbose: Prints pair selection informations.

@ -1,252 +0,0 @@
Non-free 2D Features Algorithms
=================================
This section describes two popular algorithms for 2d feature detection, SIFT and SURF, that are known to be patented. Use them at your own risk.
SIFT
----
.. ocv:class:: SIFT : public Feature2D
Class for extracting keypoints and computing descriptors using the Scale Invariant Feature Transform (SIFT) algorithm by D. Lowe [Lowe04]_.
.. [Lowe04] Lowe, D. G., “Distinctive Image Features from Scale-Invariant Keypoints”, International Journal of Computer Vision, 60, 2, pp. 91-110, 2004.
SIFT::SIFT
----------
The SIFT constructors.
.. ocv:function:: SIFT::SIFT( int nfeatures=0, int nOctaveLayers=3, double contrastThreshold=0.04, double edgeThreshold=10, double sigma=1.6)
.. ocv:pyfunction:: cv2.SIFT([, nfeatures[, nOctaveLayers[, contrastThreshold[, edgeThreshold[, sigma]]]]]) -> <SIFT object>
:param nfeatures: The number of best features to retain. The features are ranked by their scores (measured in SIFT algorithm as the local contrast)
:param nOctaveLayers: The number of layers in each octave. 3 is the value used in D. Lowe paper. The number of octaves is computed automatically from the image resolution.
:param contrastThreshold: The contrast threshold used to filter out weak features in semi-uniform (low-contrast) regions. The larger the threshold, the less features are produced by the detector.
:param edgeThreshold: The threshold used to filter out edge-like features. Note that the its meaning is different from the contrastThreshold, i.e. the larger the ``edgeThreshold``, the less features are filtered out (more features are retained).
:param sigma: The sigma of the Gaussian applied to the input image at the octave #0. If your image is captured with a weak camera with soft lenses, you might want to reduce the number.
SIFT::operator ()
-----------------
Extract features and computes their descriptors using SIFT algorithm
.. ocv:function:: void SIFT::operator()(InputArray img, InputArray mask, vector<KeyPoint>& keypoints, OutputArray descriptors, bool useProvidedKeypoints=false)
.. ocv:pyfunction:: cv2.SIFT.detect(image[, mask]) -> keypoints
.. ocv:pyfunction:: cv2.SIFT.compute(image, keypoints[, descriptors]) -> keypoints, descriptors
.. ocv:pyfunction:: cv2.SIFT.detectAndCompute(image, mask[, descriptors[, useProvidedKeypoints]]) -> keypoints, descriptors
:param img: Input 8-bit grayscale image
:param mask: Optional input mask that marks the regions where we should detect features.
:param keypoints: The input/output vector of keypoints
:param descriptors: The output matrix of descriptors. Pass ``cv::noArray()`` if you do not need them.
:param useProvidedKeypoints: Boolean flag. If it is true, the keypoint detector is not run. Instead, the provided vector of keypoints is used and the algorithm just computes their descriptors.
.. note:: Python API provides three functions. First one finds keypoints only. Second function computes the descriptors based on the keypoints we provide. Third function detects the keypoints and computes their descriptors. If you want both keypoints and descriptors, directly use third function as ``kp, des = cv2.SIFT.detectAndCompute(image, None)``
SURF
----
.. ocv:class:: SURF : public Feature2D
Class for extracting Speeded Up Robust Features from an image [Bay06]_. The class is derived from ``CvSURFParams`` structure, which specifies the algorithm parameters:
.. ocv:member:: int extended
* 0 means that the basic descriptors (64 elements each) shall be computed
* 1 means that the extended descriptors (128 elements each) shall be computed
.. ocv:member:: int upright
* 0 means that detector computes orientation of each feature.
* 1 means that the orientation is not computed (which is much, much faster). For example, if you match images from a stereo pair, or do image stitching, the matched features likely have very similar angles, and you can speed up feature extraction by setting ``upright=1``.
.. ocv:member:: double hessianThreshold
Threshold for the keypoint detector. Only features, whose hessian is larger than ``hessianThreshold`` are retained by the detector. Therefore, the larger the value, the less keypoints you will get. A good default value could be from 300 to 500, depending from the image contrast.
.. ocv:member:: int nOctaves
The number of a gaussian pyramid octaves that the detector uses. It is set to 4 by default. If you want to get very large features, use the larger value. If you want just small features, decrease it.
.. ocv:member:: int nOctaveLayers
The number of images within each octave of a gaussian pyramid. It is set to 2 by default.
.. [Bay06] Bay, H. and Tuytelaars, T. and Van Gool, L. "SURF: Speeded Up Robust Features", 9th European Conference on Computer Vision, 2006
.. note::
* An example using the SURF feature detector can be found at opencv_source_code/samples/cpp/generic_descriptor_match.cpp
* Another example using the SURF feature detector, extractor and matcher can be found at opencv_source_code/samples/cpp/matcher_simple.cpp
SURF::SURF
----------
The SURF extractor constructors.
.. ocv:function:: SURF::SURF()
.. ocv:function:: SURF::SURF( double hessianThreshold, int nOctaves=4, int nOctaveLayers=2, bool extended=true, bool upright=false )
.. ocv:pyfunction:: cv2.SURF([hessianThreshold[, nOctaves[, nOctaveLayers[, extended[, upright]]]]]) -> <SURF object>
:param hessianThreshold: Threshold for hessian keypoint detector used in SURF.
:param nOctaves: Number of pyramid octaves the keypoint detector will use.
:param nOctaveLayers: Number of octave layers within each octave.
:param extended: Extended descriptor flag (true - use extended 128-element descriptors; false - use 64-element descriptors).
:param upright: Up-right or rotated features flag (true - do not compute orientation of features; false - compute orientation).
SURF::operator()
----------------
Detects keypoints and computes SURF descriptors for them.
.. ocv:function:: void SURF::operator()(InputArray img, InputArray mask, vector<KeyPoint>& keypoints) const
.. ocv:function:: void SURF::operator()(InputArray img, InputArray mask, vector<KeyPoint>& keypoints, OutputArray descriptors, bool useProvidedKeypoints=false)
.. ocv:pyfunction:: cv2.SURF.detect(image[, mask]) -> keypoints
.. ocv:pyfunction:: cv2.SURF.compute(image, keypoints[, descriptors]) -> keypoints, descriptors
.. ocv:pyfunction:: cv2.SURF.detectAndCompute(image, mask[, descriptors[, useProvidedKeypoints]]) -> keypoints, descriptors
.. ocv:pyfunction:: cv2.SURF.detectAndCompute(image[, mask]) -> keypoints, descriptors
.. ocv:cfunction:: void cvExtractSURF( const CvArr* image, const CvArr* mask, CvSeq** keypoints, CvSeq** descriptors, CvMemStorage* storage, CvSURFParams params )
:param image: Input 8-bit grayscale image
:param mask: Optional input mask that marks the regions where we should detect features.
:param keypoints: The input/output vector of keypoints
:param descriptors: The output matrix of descriptors. Pass ``cv::noArray()`` if you do not need them.
:param useProvidedKeypoints: Boolean flag. If it is true, the keypoint detector is not run. Instead, the provided vector of keypoints is used and the algorithm just computes their descriptors.
:param storage: Memory storage for the output keypoints and descriptors in OpenCV 1.x API.
:param params: SURF algorithm parameters in OpenCV 1.x API.
The function is parallelized with the TBB library.
If you are using the C version, make sure you call ``cv::initModule_xfeatures2d()`` from ``xfeatures2d/nonfree.hpp``.
cuda::SURF_CUDA
---------------
.. ocv:class:: cuda::SURF_CUDA
Class used for extracting Speeded Up Robust Features (SURF) from an image. ::
class SURF_CUDA
{
public:
enum KeypointLayout
{
X_ROW = 0,
Y_ROW,
LAPLACIAN_ROW,
OCTAVE_ROW,
SIZE_ROW,
ANGLE_ROW,
HESSIAN_ROW,
ROWS_COUNT
};
//! the default constructor
SURF_CUDA();
//! the full constructor taking all the necessary parameters
explicit SURF_CUDA(double _hessianThreshold, int _nOctaves=4,
int _nOctaveLayers=2, bool _extended=false, float _keypointsRatio=0.01f);
//! returns the descriptor size in float's (64 or 128)
int descriptorSize() const;
//! upload host keypoints to device memory
void uploadKeypoints(const vector<KeyPoint>& keypoints,
GpuMat& keypointsGPU);
//! download keypoints from device to host memory
void downloadKeypoints(const GpuMat& keypointsGPU,
vector<KeyPoint>& keypoints);
//! download descriptors from device to host memory
void downloadDescriptors(const GpuMat& descriptorsGPU,
vector<float>& descriptors);
void operator()(const GpuMat& img, const GpuMat& mask,
GpuMat& keypoints);
void operator()(const GpuMat& img, const GpuMat& mask,
GpuMat& keypoints, GpuMat& descriptors,
bool useProvidedKeypoints = false,
bool calcOrientation = true);
void operator()(const GpuMat& img, const GpuMat& mask,
std::vector<KeyPoint>& keypoints);
void operator()(const GpuMat& img, const GpuMat& mask,
std::vector<KeyPoint>& keypoints, GpuMat& descriptors,
bool useProvidedKeypoints = false,
bool calcOrientation = true);
void operator()(const GpuMat& img, const GpuMat& mask,
std::vector<KeyPoint>& keypoints,
std::vector<float>& descriptors,
bool useProvidedKeypoints = false,
bool calcOrientation = true);
void releaseMemory();
// SURF parameters
double hessianThreshold;
int nOctaves;
int nOctaveLayers;
bool extended;
bool upright;
//! max keypoints = keypointsRatio * img.size().area()
float keypointsRatio;
GpuMat sum, mask1, maskSum, intBuffer;
GpuMat det, trace;
GpuMat maxPosBuffer;
};
The class ``SURF_CUDA`` implements Speeded Up Robust Features descriptor. There is a fast multi-scale Hessian keypoint detector that can be used to find the keypoints (which is the default option). But the descriptors can also be computed for the user-specified keypoints. Only 8-bit grayscale images are supported.
The class ``SURF_CUDA`` can store results in the GPU and CPU memory. It provides functions to convert results between CPU and GPU version ( ``uploadKeypoints``, ``downloadKeypoints``, ``downloadDescriptors`` ). The format of CPU results is the same as ``SURF`` results. GPU results are stored in ``GpuMat``. The ``keypoints`` matrix is :math:`\texttt{nFeatures} \times 7` matrix with the ``CV_32FC1`` type.
* ``keypoints.ptr<float>(X_ROW)[i]`` contains x coordinate of the i-th feature.
* ``keypoints.ptr<float>(Y_ROW)[i]`` contains y coordinate of the i-th feature.
* ``keypoints.ptr<float>(LAPLACIAN_ROW)[i]`` contains the laplacian sign of the i-th feature.
* ``keypoints.ptr<float>(OCTAVE_ROW)[i]`` contains the octave of the i-th feature.
* ``keypoints.ptr<float>(SIZE_ROW)[i]`` contains the size of the i-th feature.
* ``keypoints.ptr<float>(ANGLE_ROW)[i]`` contain orientation of the i-th feature.
* ``keypoints.ptr<float>(HESSIAN_ROW)[i]`` contains the response of the i-th feature.
The ``descriptors`` matrix is :math:`\texttt{nFeatures} \times \texttt{descriptorSize}` matrix with the ``CV_32FC1`` type.
The class ``SURF_CUDA`` uses some buffers and provides access to it. All buffers can be safely released between function calls.
.. seealso:: :ocv:class:`SURF`
.. note::
* An example for using the SURF keypoint matcher on GPU can be found at opencv_source_code/samples/gpu/surf_keypoint_matcher.cpp

@ -1,11 +0,0 @@
*****************************************
xfeatures2d. Extra 2D Features Framework
*****************************************
.. highlight:: cpp
.. toctree::
:maxdepth: 2
extra_features
nonfree_features

@ -1,259 +0,0 @@
.. highlight:: cpp
Domain Transform filter
====================================
This section describes interface for Domain Transform filter.
For more details about this filter see [Gastal11]_ and References_.
DTFilter
------------------------------------
.. ocv:class:: DTFilter : public Algorithm
Interface for realizations of Domain Transform filter.
createDTFilter
------------------------------------
Factory method, create instance of :ocv:class:`DTFilter` and produce initialization routines.
.. ocv:function:: Ptr<DTFilter> createDTFilter(InputArray guide, double sigmaSpatial, double sigmaColor, int mode = DTF_NC, int numIters = 3)
.. ocv:pyfunction:: cv2.createDTFilter(guide, sigmaSpatial, sigmaColor[, mode[, numIters]]) -> instance
:param guide: guided image (used to build transformed distance, which describes edge structure of guided image).
:param sigmaSpatial: :math:`{\sigma}_H` parameter in the original article, it's similar to the sigma in the coordinate space into :ocv:func:`bilateralFilter`.
:param sigmaColor: :math:`{\sigma}_r` parameter in the original article, it's similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
:param mode: one form three modes ``DTF_NC``, ``DTF_RF`` and ``DTF_IC`` which corresponds to three modes for filtering 2D signals in the article.
:param numIters: optional number of iterations used for filtering, 3 is quite enough.
For more details about Domain Transform filter parameters, see the original article [Gastal11]_ and `Domain Transform filter homepage <http://www.inf.ufrgs.br/~eslgastal/DomainTransform/>`_.
DTFilter::filter
------------------------------------
Produce domain transform filtering operation on source image.
.. ocv:function:: void DTFilter::filter(InputArray src, OutputArray dst, int dDepth = -1)
.. ocv:pyfunction:: cv2.DTFilter.filter(src, dst[, dDepth]) -> None
:param src: filtering image with unsigned 8-bit or floating-point 32-bit depth and up to 4 channels.
:param dst: destination image.
:param dDepth: optional depth of the output image. ``dDepth`` can be set to -1, which will be equivalent to ``src.depth()``.
dtFilter
------------------------------------
Simple one-line Domain Transform filter call.
If you have multiple images to filter with the same guided image then use :ocv:class:`DTFilter` interface to avoid extra computations on initialization stage.
.. ocv:function:: void dtFilter(InputArray guide, InputArray src, OutputArray dst, double sigmaSpatial, double sigmaColor, int mode = DTF_NC, int numIters = 3)
.. ocv:pyfunction:: cv2.dtFilter(guide, src, sigmaSpatial, sigmaColor[, mode[, numIters]]) -> None
:param guide: guided image (also called as joint image) with unsigned 8-bit or floating-point 32-bit depth and up to 4 channels.
:param src: filtering image with unsigned 8-bit or floating-point 32-bit depth and up to 4 channels.
:param sigmaSpatial: :math:`{\sigma}_H` parameter in the original article, it's similar to the sigma in the coordinate space into :ocv:func:`bilateralFilter`.
:param sigmaColor: :math:`{\sigma}_r` parameter in the original article, it's similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
:param mode: one form three modes ``DTF_NC``, ``DTF_RF`` and ``DTF_IC`` which corresponds to three modes for filtering 2D signals in the article.
:param numIters: optional number of iterations used for filtering, 3 is quite enough.
.. seealso:: :ocv:func:`bilateralFilter`, :ocv:func:`guidedFilter`, :ocv:func:`amFilter`
Guided Filter
====================================
This section describes interface for Guided Filter.
For more details about this filter see [Kaiming10]_ and References_.
GuidedFilter
------------------------------------
.. ocv:class:: GuidedFilter : public Algorithm
Interface for realizations of Guided Filter.
createGuidedFilter
------------------------------------
Factory method, create instance of :ocv:class:`GuidedFilter` and produce initialization routines.
.. ocv:function:: Ptr<GuidedFilter> createGuidedFilter(InputArray guide, int radius, double eps)
.. ocv:pyfunction:: cv2.createGuidedFilter(guide, radius, eps) -> instance
:param guide: guided image (or array of images) with up to 3 channels, if it have more then 3 channels then only first 3 channels will be used.
:param radius: radius of Guided Filter.
:param eps: regularization term of Guided Filter. :math:`{eps}^2` is similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
For more details about Guided Filter parameters, see the original article [Kaiming10]_.
GuidedFilter::filter
------------------------------------
Apply Guided Filter to the filtering image.
.. ocv:function:: void GuidedFilter::filter(InputArray src, OutputArray dst, int dDepth = -1)
.. ocv:pyfunction:: cv2.GuidedFilter.filter(src, dst[, dDepth]) -> None
:param src: filtering image with any numbers of channels.
:param dst: output image.
:param dDepth: optional depth of the output image. ``dDepth`` can be set to -1, which will be equivalent to ``src.depth()``.
guidedFilter
------------------------------------
Simple one-line Guided Filter call.
If you have multiple images to filter with the same guided image then use :ocv:class:`GuidedFilter` interface to avoid extra computations on initialization stage.
.. ocv:function:: void guidedFilter(InputArray guide, InputArray src, OutputArray dst, int radius, double eps, int dDepth = -1)
.. ocv:pyfunction:: cv2.guidedFilter(guide, src, dst, radius, eps, [, dDepth]) -> None
:param guide: guided image (or array of images) with up to 3 channels, if it have more then 3 channels then only first 3 channels will be used.
:param src: filtering image with any numbers of channels.
:param dst: output image.
:param radius: radius of Guided Filter.
:param eps: regularization term of Guided Filter. :math:`{eps}^2` is similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
:param dDepth: optional depth of the output image.
.. seealso:: :ocv:func:`bilateralFilter`, :ocv:func:`dtFilter`, :ocv:func:`amFilter`
Adaptive Manifold Filter
====================================
This section describes interface for Adaptive Manifold Filter.
For more details about this filter see [Gastal12]_ and References_.
AdaptiveManifoldFilter
------------------------------------
.. ocv:class:: AdaptiveManifoldFilter : public Algorithm
Interface for Adaptive Manifold Filter realizations.
Below listed optional parameters which may be set up with :ocv:func:`Algorithm::set` function.
.. ocv:member:: double sigma_s = 16.0
Spatial standard deviation.
.. ocv:member:: double sigma_r = 0.2
Color space standard deviation.
.. ocv:member:: int tree_height = -1
Height of the manifold tree (default = -1 : automatically computed).
.. ocv:member:: int num_pca_iterations = 1
Number of iterations to computed the eigenvector.
.. ocv:member:: bool adjust_outliers = false
Specify adjust outliers using Eq. 9 or not.
.. ocv:member:: bool use_RNG = true
Specify use random number generator to compute eigenvector or not.
createAMFilter
------------------------------------
Factory method, create instance of :ocv:class:`AdaptiveManifoldFilter` and produce some initialization routines.
.. ocv:function:: Ptr<AdaptiveManifoldFilter> createAMFilter(double sigma_s, double sigma_r, bool adjust_outliers = false)
.. ocv:pyfunction:: cv2.createAMFilter(sigma_s, sigma_r, adjust_outliers) -> instance
:param sigma_s: spatial standard deviation.
:param sigma_r: color space standard deviation, it is similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
:param adjust_outliers: optional, specify perform outliers adjust operation or not, (Eq. 9) in the original paper.
For more details about Adaptive Manifold Filter parameters, see the original article [Gastal12]_.
.. note::
Joint images with `CV_8U` and `CV_16U` depth converted to images with `CV_32F` depth and [0; 1] color range before processing.
Hence color space sigma `sigma_r` must be in [0; 1] range, unlike same sigmas in :ocv:func:`bilateralFilter` and :ocv:func:`dtFilter` functions.
AdaptiveManifoldFilter::filter
------------------------------------
Apply high-dimensional filtering using adaptive manifolds.
.. ocv:function:: void AdaptiveManifoldFilter::filter(InputArray src, OutputArray dst, InputArray joint = noArray())
.. ocv:pyfunction:: cv2.AdaptiveManifoldFilter.filter(src, dst[, joint]) -> None
:param src: filtering image with any numbers of channels.
:param dst: output image.
:param joint: optional joint (also called as guided) image with any numbers of channels.
amFilter
------------------------------------
Simple one-line Adaptive Manifold Filter call.
.. ocv:function:: void amFilter(InputArray joint, InputArray src, OutputArray dst, double sigma_s, double sigma_r, bool adjust_outliers = false)
.. ocv:pyfunction:: cv2.amFilter(joint, src, dst, sigma_s, sigma_r, [, adjust_outliers]) -> None
:param joint: joint (also called as guided) image or array of images with any numbers of channels.
:param src: filtering image with any numbers of channels.
:param dst: output image.
:param sigma_s: spatial standard deviation.
:param sigma_r: color space standard deviation, it is similar to the sigma in the color space into :ocv:func:`bilateralFilter`.
:param adjust_outliers: optional, specify perform outliers adjust operation or not, (Eq. 9) in the original paper.
.. note::
Joint images with `CV_8U` and `CV_16U` depth converted to images with `CV_32F` depth and [0; 1] color range before processing.
Hence color space sigma `sigma_r` must be in [0; 1] range, unlike same sigmas in :ocv:func:`bilateralFilter` and :ocv:func:`dtFilter` functions.
.. seealso:: :ocv:func:`bilateralFilter`, :ocv:func:`dtFilter`, :ocv:func:`guidedFilter`
Joint Bilateral Filter
====================================
jointBilateralFilter
------------------------------------
Applies the joint bilateral filter to an image.
.. ocv:function:: void jointBilateralFilter(InputArray joint, InputArray src, OutputArray dst, int d, double sigmaColor, double sigmaSpace, int borderType = BORDER_DEFAULT)
.. ocv:pyfunction:: cv2.jointBilateralFilter(joint, src, dst, d, sigmaColor, sigmaSpace, [, borderType]) -> None
:param joint: Joint 8-bit or floating-point, 1-channel or 3-channel image.
:param src: Source 8-bit or floating-point, 1-channel or 3-channel image with the same depth as joint image.
:param dst: Destination image of the same size and type as ``src`` .
:param d: Diameter of each pixel neighborhood that is used during filtering. If it is non-positive, it is computed from ``sigmaSpace`` .
:param sigmaColor: Filter sigma in the color space. A larger value of the parameter means that farther colors within the pixel neighborhood (see ``sigmaSpace`` ) will be mixed together, resulting in larger areas of semi-equal color.
:param sigmaSpace: Filter sigma in the coordinate space. A larger value of the parameter means that farther pixels will influence each other as long as their colors are close enough (see ``sigmaColor`` ). When ``d>0`` , it specifies the neighborhood size regardless of ``sigmaSpace`` . Otherwise, ``d`` is proportional to ``sigmaSpace`` .
.. note:: :ocv:func:`bilateralFilter` and :ocv:func:`jointBilateralFilter` use L1 norm to compute difference between colors.
.. seealso:: :ocv:func:`bilateralFilter`, :ocv:func:`amFilter`
References
==========
.. [Gastal11] E. Gastal and M. Oliveira, "Domain Transform for Edge-Aware Image and Video Processing", Proceedings of SIGGRAPH, 2011, vol. 30, pp. 69:1 - 69:12.
The paper is available `online <http://www.inf.ufrgs.br/~eslgastal/DomainTransform/>`__.
.. [Gastal12] E. Gastal and M. Oliveira, "Adaptive manifolds for real-time high-dimensional filtering," Proceedings of SIGGRAPH, 2012, vol. 31, pp. 33:1 - 33:13.
The paper is available `online <http://inf.ufrgs.br/~eslgastal/AdaptiveManifolds/>`__.
.. [Kaiming10] Kaiming He et. al., "Guided Image Filtering," ECCV 2010, pp. 1 - 14.
The paper is available `online <http://research.microsoft.com/en-us/um/people/kahe/eccv10/>`__.
.. [Tomasi98] Carlo Tomasi and Roberto Manduchi, “Bilateral filtering for gray and color images,” in Computer Vision, 1998. Sixth International Conference on . IEEE, 1998, pp. 839– 846.
The paper is available `online <https://www.cs.duke.edu/~tomasi/papers/tomasi/tomasiIccv98.pdf>`__.
.. [Ziyang13] Ziyang Ma et al., "Constant Time Weighted Median Filtering for Stereo Matching and Beyond," ICCV, 2013, pp. 49 - 56.
The paper is available `online <http://www.cv-foundation.org/openaccess/content_iccv_2013/papers/Ma_Constant_Time_Weighted_2013_ICCV_paper.pdf>`__.

@ -1,67 +0,0 @@
Structured forests for fast edge detection
******************************************
.. highlight:: cpp
This module contains implementations of modern structured edge detection algorithms,
i.e. algorithms which somehow takes into account pixel affinities in natural images.
StructuredEdgeDetection
-----------------------
.. ocv:class:: StructuredEdgeDetection : public Algorithm
Class implementing edge detection algorithm from [Dollar2013]_ ::
/*! \class StructuredEdgeDetection
Prediction part of [P. Dollar and C. L. Zitnick. Structured Forests for Fast Edge Detection, 2013].
*/
class CV_EXPORTS_W StructuredEdgeDetection : public Algorithm
{
public:
/*!
* The function detects edges in src and draw them to dst
*
* \param src : source image (RGB, float, in [0;1]) to detect edges
* \param dst : destination image (grayscale, float, in [0;1])
* where edges are drawn
*/
CV_WRAP virtual void detectEdges(const Mat src, Mat dst) = 0;
};
/*!
* The only available constructor loading data from model file
*
* \param model : name of the file where the model is stored
*/
CV_EXPORTS_W Ptr<StructuredEdgeDetection> createStructuredEdgeDetection(const String &model);
StructuredEdgeDetection::detectEdges
++++++++++++++++++++++++++++++++++++
.. ocv:function:: void detectEdges(const Mat src, Mat dst)
The function detects edges in src and draw them to dst. The algorithm underlies this function
is much more robust to texture presence, than common approaches, e.g. Sobel
:param src: source image (RGB, float, in [0;1]) to detect edges
:param dst: destination image (grayscale, float, in [0;1])
where edges are drawn
.. seealso::
:ocv:func:`Sobel`,
:ocv:func:`Canny`
createStructuredEdgeDetection
+++++++++++++++++++++++++++++
.. ocv:function:: Ptr<cv::StructuredEdgeDetection> createStructuredEdgeDetection(String model)
The only available constructor
:param model: model file name
.. [Dollar2013] P. Dollár, C. L. Zitnick, "Structured forests for fast edge detection",
IEEE International Conference on Computer Vision (ICCV), 2013,
pp. 1841-1848. `DOI <http://dx.doi.org/10.1109/ICCV.2013.231>`_

@ -1,118 +0,0 @@
Superpixels
===========
SuperpixelSEEDS
---------------
.. ocv:class:: SuperpixelSEEDS : public Algorithm
Class implementing the SEEDS (Superpixels Extracted via Energy-Driven Sampling) superpixels algorithm described in [VBRV14]_. The algorithm uses an efficient hill-climbing algorithm to optimize the superpixels' energy function that is based on color histograms and a boundary term, which is optional. The energy function encourages superpixels to be of the same color, and if the boundary term is activated, the superpixels have smooth boundaries and are of similar shape. In practice it starts from a regular grid of superpixels and moves the pixels or blocks of pixels at the boundaries to refine the solution. The algorithm runs in real-time using a single CPU.
.. [VBRV14] Michael Van den Bergh, Xavier Boix, Gemma Roig, Luc Van Gool: SEEDS: Superpixels Extracted via Energy-Driven Sampling. International Journal of Computer Vision (IJCV), 2014.
.. highlight:: cpp
SuperpixelSEEDS::createSuperpixelSEEDS()
----------------------------------------
Initializes a SuperpixelSEEDS object.
.. ocv:function:: SuperpixelSEEDS::createSuperpixelSEEDS(int image_width, int image_height, int image_channels, int num_superpixels, int num_levels, int use_prior = 2, int histogram_bins=5, bool double_step = false)
.. ocv:pyfunction:: cv2.SuperpixelSEEDS.createSuperpixelSEEDS(image_width, image_height, image_channels, num_superpixels, num_levels, use_prior = 2, histogram_bins=5, double_step = false) -> <SuperpixelSEEDS object>
:param image_width: Image width.
:param image_height: Image height.
:param image_channels: Number of channels of the image.
:param num_superpixels: Desired number of superpixels. Note that the actual number may be smaller due to restrictions (depending on the image size and num_levels). Use getNumberOfSuperpixels() to get the actual number.
:param num_levels: Number of block levels. The more levels, the more accurate is the segmentation, but needs more memory and CPU time.
:param prior: enable 3x3 shape smoothing term if >0. A larger value leads to smoother shapes. prior must be in the range [0, 5].
:param histogram_bins: Number of histogram bins.
:param double_step: If true, iterate each block level twice for higher accuracy.
The function initializes a SuperpixelSEEDS object for the input ``image``. It stores the parameters of the image: ``image_width``, ``image_height`` and ``image_channels``. It also sets the parameters of the SEEDS superpixel algorithm, which are: ``num_superpixels``, ``num_levels``, ``use_prior``, ``histogram_bins`` and ``double_step``.
The number of levels in ``num_levels`` defines the amount of block levels that the algorithm use in the optimization. The initialization is a grid, in which the superpixels are equally distributed through the width and the height of the image. The larger blocks correspond to the superpixel size, and the levels with smaller blocks are formed by dividing the larger blocks into 2 x 2 blocks of pixels, recursively until the smaller block level. An example of initialization of 4 block levels is illustrated in the following figure.
.. image:: pics/superpixels_blocks.png
SuperpixelSEEDS::iterate()
--------------------------
Calculates the superpixel segmentation on a given image with the initialized parameters in the SuperpixelSEEDS object. This function can be called again for other images without the need of initializing the algorithm with createSuperpixelSEEDS(). This save the computational cost of allocating memory for all the structures of the algorithm.
.. ocv:function:: void SuperpixelSEEDS::iterate(InputArray img, int num_iterations=4)
.. ocv:pyfunction:: cv2.SuperpixelSEEDS.iterate(image, num_iterations)
:param img: Input image. Supported formats: CV_8U, CV_16U, CV_32F. Image size & number of channels must match with the initialized image size & channels with the function createSuperpixelSEEDS(). It should be in HSV or Lab color space. Lab is a bit better, but also slower.
:param num_iterations: Number of pixel level iterations. Higher number improves the result.
The function computes the superpixels segmentation of an image with the parameters initialized with the function createSuperpixelSEEDS(). The algorithms starts from a grid of superpixels and then refines the boundaries by proposing updates of blocks of pixels that lie at the boundaries from large to smaller size, finalizing with proposing pixel updates. An illustrative example can be seen below.
.. image:: pics/superpixels_blocks2.png
SuperpixelSEEDS::getNumberOfSuperpixels()
-----------------------------------------
Calculates the superpixel segmentation on a given image stored in SuperpixelSEEDS object.
.. ocv:function:: void SuperpixelSEEDS::getNumberOfSuperpixels(InputArray img, int num_iterations=4)
.. ocv:pyfunction:: cv2.SuperpixelSEEDS.getNumberOfSuperpixels(img, num_iterations=4)
:param img: Input image. Supported formats: CV_8U, CV_16U, CV_32F image size & number of channels must match with the initialized image size & channels with the function createSuperpixelSEEDS().
:param num_iterations: Number of pixel level iterations. Higher number improves the result.
The function computes the superpixels segmentation of an image with the parameters initialized with the function createSuperpixelSEEDS().
SuperpixelSEEDS::getLabels()
----------------------------
Returns the segmentation labeling of the image. Each label represents a superpixel, and each pixel is assigned to one superpixel label.
.. ocv:function:: void SuperpixelSEEDS::getLabels(OutputArray labels_out)
.. ocv:pyfunction:: cv2.SuperpixelSEEDS.getLabels(labels_out)
:param labels_out: Return: A CV_32UC1 integer array containing the labels of the superpixel segmentation. The labels are in the range [0, getNumberOfSuperpixels()].
The function returns an image with ssthe labels of the superpixel segmentation. The labels are in the range [0, getNumberOfSuperpixels()].
SuperpixelSEEDS::getLabelContourMask()
--------------------------------------
Returns the mask of the superpixel segmentation stored in SuperpixelSEEDS object.
.. ocv:function:: void SuperpixelSEEDS::getLabelContourMask(OutputArray image, bool thick_line = false)
.. ocv:pyfunction:: cv2.SuperpixelSEEDS.getLabelContourMask(image, thick_line = false)
:param image: Return: CV_8UC1 image mask where -1 indicates that the pixel is a superpixel border, and 0 otherwise.
:param thick_line: If false, the border is only one pixel wide, otherwise all pixels at the border are masked.
The function return the boundaries of the superpixel segmentation.
.. note::
* (Python) A demo on how to generate superpixels in images from the webcam can be found at opencv_source_code/samples/python2/seeds.py
* (cpp) A demo on how to generate superpixels in images from the webcam can be found at opencv_source_code/modules/ximgproc/samples/seeds.cpp. By adding a file image as a command line argument, the static image will be used instead of the webcam.
* It will show a window with the video from the webcam with the superpixel boundaries marked in red (see below). Use Space to switch between different output modes. At the top of the window there are 4 sliders, from which the user can change on-the-fly the number of superpixels, the number of block levels, the strength of the boundary prior term to modify the shape, and the number of iterations at pixel level. This is useful to play with the parameters and set them to the user convenience. In the console the frame-rate of the algorithm is indicated.
.. image:: pics/superpixels_demo.png

@ -1,12 +0,0 @@
********************************************
ximgproc. Extended Image Processing
********************************************
.. highlight:: cpp
.. toctree::
:maxdepth: 2
structured_edge_detection
edge_aware_filters
superpixels

@ -1,236 +0,0 @@
.. highlight:: cpp
Integral Channel Features Detector
==================================
This section describes classes for object detection using WaldBoost and Integral
Channel Features from [Šochman05]_ and [Dollár09]_.
computeChannels
---------------
Compute channels for integral channel features evaluation
.. ocv:function:: void computeChannels(InputArray image, vector<Mat>& channels)
:param image: image for which channels should be computed
:param channels: output array for computed channels
FeatureEvaluator
----------------
Feature evaluation interface
.. ocv:class:: FeatureEvaluator : public Algorithm
FeatureEvaluator::setChannels
-----------------------------
Set channels for feature evaluation
.. ocv:function:: void FeatureEvaluator::setChannels(InputArrayOfArrays channels)
:param channels: array of channels to be set
FeatureEvaluator::setPosition
-----------------------------
Set window position to sample features with shift. By default position is (0, 0).
.. ocv:function:: void FeatureEvaluator::setPosition(Size position)
:param position: position to be set
FeatureEvaluator::evaluate
--------------------------
Evaluate feature value with given index for current channels and window position.
.. ocv:function:: int FeatureEvaluator::evaluate(size_t feature_ind) const
:param feature_ind: index of feature to be evaluated
FeatureEvaluator::evaluateAll
-----------------------------
Evaluate all features for current channels and window position.
.. ocv:function:: void FeatureEvaluator::evaluateAll(OutputArray feature_values)
:param feature_values: matrix-column of evaluated feature values
generateFeatures
----------------
Generate integral features. Returns vector of features.
.. ocv:function:: vector<vector<int> > generateFeatures(Size window_size, const string& type, int count=INT_MAX, int channel_count=10)
:param window_size: size of window in which features should be evaluated
:param type: feature type. Can be "icf" or "acf"
:param count: number of features to generate.
:param channel_count: number of feature channels
createFeatureEvaluator
----------------------
Construct feature evaluator.
.. ocv:function:: Ptr<FeatureEvaluator> createFeatureEvaluator(const vector<vector<int>>& features, const string& type)
:param features: features for evaluation
:param type: feature type. Can be "icf" or "acf"
WaldBoostParams
---------------
Parameters for WaldBoost. weak_count — number of weak learners, alpha — cascade
thresholding param.
::
struct CV_EXPORTS WaldBoostParams
{
int weak_count;
float alpha;
WaldBoostParams(): weak_count(100), alpha(0.02f)
{}
};
WaldBoost
---------
.. ocv:class:: WaldBoost : public Algorithm
WaldBoost::train
----------------
Train WaldBoost cascade for given data. Returns feature indices chosen for
cascade. Feature enumeration starts from 0.
.. ocv:function:: vector<int> WaldBoost::train(const Mat& data, const Mat& labels)
:param data: matrix of feature values, size M x N, one feature per row
:param labels: matrix of samples class labels, size 1 x N. Labels can be from {-1, +1}
WaldBoost::predict
------------------
Predict objects class given object that can compute object features. Returns
unnormed confidence value — measure of confidence that object is from class +1.
.. ocv:function:: float WaldBoost::predict(const Ptr<FeatureEvaluator>& feature_evaluator) const
:param feature_evaluator: object that can compute features by demand
WaldBoost::write
----------------
Write WaldBoost to FileStorage
.. ocv:function:: void WaldBoost::write(FileStorage& fs)
:param fs: FileStorage for output
WaldBoost::read
---------------
Write WaldBoost to FileNode
.. ocv:function:: void WaldBoost::read(const FileNode& node)
:param node: FileNode for reading
createWaldBoost
---------------
Construct WaldBoost object.
.. ocv:function:: Ptr<WaldBoost> createWaldBoost(const WaldBoostParams& params = WaldBoostParams())
ICFDetectorParams
-----------------
Params for ICFDetector training.
::
struct CV_EXPORTS ICFDetectorParams
{
int feature_count;
int weak_count;
int model_n_rows;
int model_n_cols;
int bg_per_image;
std::string features_type;
float alpha;
bool is_grayscale;
bool use_fast_log;
ICFDetectorParams(): feature_count(UINT_MAX), weak_count(100),
model_n_rows(56), model_n_cols(56), bg_per_image(5),
alpha(0.02), is_grayscale(false), use_fast_log(false)
{}
};
ICFDetector
-----------
.. ocv:class:: ICFDetector
ICFDetector::train
------------------
Train detector.
.. ocv:function:: void ICFDetector::train(const std::vector<String>& pos_filenames, const std::vector<String>& bg_filenames, ICFDetectorParams params = ICFDetectorParams())
:param pos_path: path to folder with images of objects (wildcards like ``/my/path/*.png`` are allowed)
:param bg_path: path to folder with background images
:param params: parameters for detector training
ICFDetector::detect
-------------------
Detect objects on image.
.. ocv:function:: void ICFDetector::detect(const Mat& image, vector<Rect>& objects, float scaleFactor, Size minSize, Size maxSize, float threshold, int slidingStep, std::vector<float>& values)
.. ocv:function:: detect(const Mat& img, std::vector<Rect>& objects, float minScaleFactor, float maxScaleFactor, float factorStep, float threshold, int slidingStep, std::vector<float>& values)
:param image: image for detection
:param objects: output array of bounding boxes
:param scaleFactor: scale between layers in detection pyramid
:param minSize: min size of objects in pixels
:param maxSize: max size of objects in pixels
:param minScaleFactor: min factor by which the image will be resized
:param maxScaleFactor: max factor by which the image will be resized
:param factorStep: scaling factor is incremented each pyramid layer according to this parameter
:param slidingStep: sliding window step
:param values: output vector with values of positive samples
ICFDetector::write
------------------
Write detector to FileStorage.
.. ocv:function:: void ICFDetector::write(FileStorage& fs) const
:param fs: FileStorage for output
ICFDetector::read
-----------------
Write ICFDetector to FileNode
.. ocv:function:: void ICFDetector::read(const FileNode& node)
:param node: FileNode for reading
.. [Šochman05] J. Šochman and J. Matas. WaldBoost – Learning for Time Constrained Sequential Detection", CVPR, 2005. The paper is available `online <https://dspace.cvut.cz/bitstream/handle/10467/9494/2005-Waldboost-learning-for-time-constrained-sequential-detection.pdf?sequence=1>`__.
.. [Dollár09] P. Dollár, Z. Tu, P. Perona and S. Belongie. "Integral Channel Features", BMCV 2009. The paper is available `online <http://vision.ucsd.edu/~pdollar/files/papers/DollarBMVC09ChnFtrs.pdf>`__.

@ -1,10 +0,0 @@
***************************************
xobjdetect. Extended object detection.
***************************************
.. highlight:: cpp
.. toctree::
:maxdepth: 2
integral_channel_features

@ -1,20 +0,0 @@
Image denoising techniques
**************************
.. highlight:: cpp
dctDenoising
------------
.. ocv:function:: void dctDenoising(const Mat &src, Mat &dst, const float sigma)
The function implements simple dct-based denoising,
link: http://www.ipol.im/pub/art/2011/ys-dct/.
:param src: source image
:param dst: destination image
:param sigma: expected noise standard deviation
:param psize: size of block side where dct is computed
.. seealso::
:ocv:func:`fastNlMeansDenoising`

@ -1,20 +0,0 @@
Single image inpainting
***********************
.. highlight:: cpp
Inpainting
----------
.. ocv:function:: void inpaint(const Mat &src, const Mat &mask, Mat &dst, const int algorithmType)
The function implements different single-image inpainting algorithms.
:param src: source image, it could be of any type and any number of channels from 1 to 4. In case of 3- and 4-channels images the function expect them in CIELab colorspace or similar one, where first color component shows intensity, while second and third shows colors. Nonetheless you can try any colorspaces.
:param mask: mask (CV_8UC1), where non-zero pixels indicate valid image area, while zero pixels indicate area to be inpainted
:param dst: destination image
:param algorithmType: expected noise standard deviation
* INPAINT_SHIFTMAP: This algorithm searches for dominant correspondences (transformations) of image patches and tries to seamlessly fill-in the area to be inpainted using this transformations. Look in the original paper [He2012]_ for details.
.. [He2012] K. He, J. Sun., "Statistics of Patch Offsets for Image Completion",
IEEE European Conference on Computer Vision (ICCV), 2012,
pp. 16-29. `DOI <http://dx.doi.org/10.1007/978-3-642-33709-3_2>`_

@ -1,22 +0,0 @@
Automatic white balance correction
**********************************
.. highlight:: cpp
balanceWhite
------------
.. ocv:function:: void balanceWhite(const Mat &src, Mat &dst, const int algorithmType, const float inputMin = 0.0f, const float inputMax = 255.0f, const float outputMin = 0.0f, const float outputMax = 255.0f)
The function implements different algorithm of automatic white balance, i.e. it tries to map image's white color to perceptual white (this can be violated due to specific illumination or camera settings).
:param algorithmType: type of the algorithm to use. Use WHITE_BALANCE_SIMPLE to perform smart histogram adjustments (ignoring 4% pixels with minimal and maximal values) for each channel.
:param inputMin: minimum value in the input image
:param inputMax: maximum value in the input image
:param outputMin: minimum value in the output image
:param outputMax: maximum value in the output image
.. seealso::
:ocv:func:`cvtColor`,
:ocv:func:`equalizeHist`

@ -1,10 +0,0 @@
***********************************************
xphoto. Additional photo processing algorithms
***********************************************
.. toctree::
:maxdepth: 2
Color balance <whitebalance>
Denoising <denoising>
Inpainting <inpainting>
Loading…
Cancel
Save