@ -1,418 +0,0 @@ |
||||
.. _Retina_Model: |
||||
|
||||
Discovering the human retina and its use for image processing |
||||
************************************************************* |
||||
|
||||
Goal |
||||
===== |
||||
|
||||
I present here a model of human retina that shows some interesting properties for image preprocessing and enhancement. |
||||
In this tutorial you will learn how to: |
||||
|
||||
.. container:: enumeratevisibleitemswithsquare |
||||
|
||||
+ discover the main two channels outing from your retina |
||||
|
||||
+ see the basics to use the retina model |
||||
|
||||
+ discover some parameters tweaks |
||||
|
||||
|
||||
General overview |
||||
================ |
||||
|
||||
The proposed model originates from Jeanny Herault's research [herault2010]_ at `Gipsa <http://www.gipsa-lab.inpg.fr>`_. It is involved in image processing applications with `Listic <http://www.listic.univ-savoie.fr>`_ (code maintainer and user) lab. This is not a complete model but it already present interesting properties that can be involved for enhanced image processing experience. The model allows the following human retina properties to be used : |
||||
|
||||
* spectral whitening that has 3 important effects: high spatio-temporal frequency signals canceling (noise), mid-frequencies details enhancement and low frequencies luminance energy reduction. This *all in one* property directly allows visual signals cleaning of classical undesired distortions introduced by image sensors and input luminance range. |
||||
|
||||
* local logarithmic luminance compression allows details to be enhanced even in low light conditions. |
||||
|
||||
* decorrelation of the details information (Parvocellular output channel) and transient information (events, motion made available at the Magnocellular output channel). |
||||
|
||||
The first two points are illustrated below : |
||||
|
||||
In the figure below, the OpenEXR image sample *CrissyField.exr*, a High Dynamic Range image is shown. In order to make it visible on this web-page, the original input image is linearly rescaled to the classical image luminance range [0-255] and is converted to 8bit/channel format. Such strong conversion hides many details because of too strong local contrasts. Furthermore, noise energy is also strong and pollutes visual information. |
||||
|
||||
.. image:: images/retina_TreeHdr_small.jpg |
||||
:alt: A High dynamic range image linearly rescaled within range [0-255]. |
||||
:align: center |
||||
|
||||
In the following image, applying the ideas proposed in [benoit2010]_, as your retina does, local luminance adaptation, spatial noise removal and spectral whitening work together and transmit accurate information on lower range 8bit data channels. On this picture, noise in significantly removed, local details hidden by strong luminance contrasts are enhanced. Output image keeps its naturalness and visual content is enhanced. Color processing is based on the color multiplexing/demultiplexing method proposed in [chaix2007]_. |
||||
|
||||
.. image:: images/retina_TreeHdr_retina.jpg |
||||
:alt: A High dynamic range image compressed within range [0-255] using the retina. |
||||
:align: center |
||||
|
||||
|
||||
*Note :* image sample can be downloaded from the `OpenEXR website <http://www.openexr.com>`_. Regarding this demonstration, before retina processing, input image has been linearly rescaled within 0-255 keeping its channels float format. 5% of its histogram ends has been cut (mostly removes wrong HDR pixels). Check out the sample *opencv/samples/cpp/OpenEXRimages_HighDynamicRange_Retina_toneMapping.cpp* for similar processing. The following demonstration will only consider classical 8bit/channel images. |
||||
|
||||
The retina model output channels |
||||
================================ |
||||
|
||||
The retina model presents two outputs that benefit from the above cited behaviors. |
||||
|
||||
* The first one is called the Parvocellular channel. It is mainly active in the foveal retina area (high resolution central vision with color sensitive photo-receptors), its aim is to provide accurate color vision for visual details remaining static on the retina. On the other hand objects moving on the retina projection are blurred. |
||||
|
||||
* The second well known channel is the Magnocellular channel. It is mainly active in the retina peripheral vision and send signals related to change events (motion, transient events, etc.). These outing signals also help visual system to focus/center retina on 'transient'/moving areas for more detailed analysis thus improving visual scene context and object classification. |
||||
|
||||
**NOTE :** regarding the proposed model, contrary to the real retina, we apply these two channels on the entire input images using the same resolution. This allows enhanced visual details and motion information to be extracted on all the considered images... but remember, that these two channels are complementary. For example, if Magnocellular channel gives strong energy in an area, then, the Parvocellular channel is certainly blurred there since there is a transient event. |
||||
|
||||
As an illustration, we apply in the following the retina model on a webcam video stream of a dark visual scene. In this visual scene, captured in an amphitheater of the university, some students are moving while talking to the teacher. |
||||
|
||||
In this video sequence, because of the dark ambiance, signal to noise ratio is low and color artifacts are present on visual features edges because of the low quality image capture tool-chain. |
||||
|
||||
.. image:: images/studentsSample_input.jpg |
||||
:alt: an input video stream extract sample |
||||
:align: center |
||||
|
||||
Below is shown the retina foveal vision applied on the entire image. In the used retina configuration, global luminance is preserved and local contrasts are enhanced. Also, signal to noise ratio is improved : since high frequency spatio-temporal noise is reduced, enhanced details are not corrupted by any enhanced noise. |
||||
|
||||
.. image:: images/studentsSample_parvo.jpg |
||||
:alt: the retina Parvocellular output. Enhanced details, luminance adaptation and noise removal. A processing tool for image analysis. |
||||
:align: center |
||||
|
||||
Below is the output of the Magnocellular output of the retina model. Its signals are strong where transient events occur. Here, a student is moving at the bottom of the image thus generating high energy. The remaining of the image is static however, it is corrupted by a strong noise. Here, the retina filters out most of the noise thus generating low false motion area 'alarms'. This channel can be used as a transient/moving areas detector : it would provide relevant information for a low cost segmentation tool that would highlight areas in which an event is occurring. |
||||
|
||||
.. image:: images/studentsSample_magno.jpg |
||||
:alt: the retina Magnocellular output. Enhanced transient signals (motion, etc.). A preprocessing tool for event detection. |
||||
:align: center |
||||
|
||||
Retina use case |
||||
=============== |
||||
|
||||
This model can be used basically for spatio-temporal video effects but also in the aim of : |
||||
|
||||
* performing texture analysis with enhanced signal to noise ratio and enhanced details robust against input images luminance ranges (check out the Parvocellular retina channel output) |
||||
|
||||
* performing motion analysis also taking benefit of the previously cited properties. |
||||
|
||||
Literature |
||||
========== |
||||
For more information, refer to the following papers : |
||||
|
||||
.. [benoit2010] Benoit A., Caplier A., Durette B., Herault, J., "Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing", Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011> |
||||
|
||||
* Please have a look at the reference work of Jeanny Herault that you can read in his book : |
||||
|
||||
.. [herault2010] Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception (Progress in Neural Processing),By: Jeanny Herault, ISBN: 9814273686. WAPI (Tower ID): 113266891. |
||||
|
||||
This retina filter code includes the research contributions of phd/research collegues from which code has been redrawn by the author : |
||||
|
||||
* take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color mosaicing/demosaicing and his reference paper: |
||||
|
||||
.. [chaix2007] B. Chaix de Lavarene, D. Alleysson, B. Durette, J. Herault (2007). "Efficient demosaicing through recursive filtering", IEEE International Conference on Image Processing ICIP 2007 |
||||
|
||||
* take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is also proposed and originates from Jeanny's discussions. More informations in the above cited Jeanny Heraults's book. |
||||
|
||||
Code tutorial |
||||
============= |
||||
|
||||
Please refer to the original tutorial source code in file *opencv_folder/samples/cpp/tutorial_code/bioinspired/retina_tutorial.cpp*. |
||||
|
||||
**Note :** do not forget that the retina model is included in the following namespace : *cv::bioinspired*. |
||||
|
||||
To compile it, assuming OpenCV is correctly installed, use the following command. It requires the opencv_core *(cv::Mat and friends objects management)*, opencv_highgui *(display and image/video read)* and opencv_bioinspired *(Retina description)* libraries to compile. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// compile |
||||
gcc retina_tutorial.cpp -o Retina_tuto -lopencv_core -lopencv_highgui -lopencv_bioinspired |
||||
|
||||
// Run commands : add 'log' as a last parameter to apply a spatial log sampling (simulates retina sampling) |
||||
// run on webcam |
||||
./Retina_tuto -video |
||||
// run on video file |
||||
./Retina_tuto -video myVideo.avi |
||||
// run on an image |
||||
./Retina_tuto -image myPicture.jpg |
||||
// run on an image with log sampling |
||||
./Retina_tuto -image myPicture.jpg log |
||||
|
||||
Here is a code explanation : |
||||
|
||||
Retina definition is present in the bioinspired package and a simple include allows to use it. You can rather use the specific header : *opencv2/bioinspired.hpp* if you prefer but then include the other required openv modules : *opencv2/core.hpp* and *opencv2/highgui.hpp* |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
#include "opencv2/opencv.hpp" |
||||
|
||||
Provide user some hints to run the program with a help function |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// the help procedure |
||||
static void help(std::string errorMessage) |
||||
{ |
||||
std::cout<<"Program init error : "<<errorMessage<<std::endl; |
||||
std::cout<<"\nProgram call procedure : retinaDemo [processing mode] [Optional : media target] [Optional LAST parameter: \"log\" to activate retina log sampling]"<<std::endl; |
||||
std::cout<<"\t[processing mode] :"<<std::endl; |
||||
std::cout<<"\t -image : for still image processing"<<std::endl; |
||||
std::cout<<"\t -video : for video stream processing"<<std::endl; |
||||
std::cout<<"\t[Optional : media target] :"<<std::endl; |
||||
std::cout<<"\t if processing an image or video file, then, specify the path and filename of the target to process"<<std::endl; |
||||
std::cout<<"\t leave empty if processing video stream coming from a connected video device"<<std::endl; |
||||
std::cout<<"\t[Optional : activate retina log sampling] : an optional last parameter can be specified for retina spatial log sampling"<<std::endl; |
||||
std::cout<<"\t set \"log\" without quotes to activate this sampling, output frame size will be divided by 4"<<std::endl; |
||||
std::cout<<"\nExamples:"<<std::endl; |
||||
std::cout<<"\t-Image processing : ./retinaDemo -image lena.jpg"<<std::endl; |
||||
std::cout<<"\t-Image processing with log sampling : ./retinaDemo -image lena.jpg log"<<std::endl; |
||||
std::cout<<"\t-Video processing : ./retinaDemo -video myMovie.mp4"<<std::endl; |
||||
std::cout<<"\t-Live video processing : ./retinaDemo -video"<<std::endl; |
||||
std::cout<<"\nPlease start again with new parameters"<<std::endl; |
||||
std::cout<<"****************************************************"<<std::endl; |
||||
std::cout<<" NOTE : this program generates the default retina parameters file 'RetinaDefaultParameters.xml'"<<std::endl; |
||||
std::cout<<" => you can use this to fine tune parameters and load them if you save to file 'RetinaSpecificParameters.xml'"<<std::endl; |
||||
} |
||||
|
||||
Then, start the main program and first declare a *cv::Mat* matrix in which input images will be loaded. Also allocate a *cv::VideoCapture* object ready to load video streams (if necessary) |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
int main(int argc, char* argv[]) { |
||||
// declare the retina input buffer... that will be fed differently in regard of the input media |
||||
cv::Mat inputFrame; |
||||
cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here |
||||
|
||||
|
||||
In the main program, before processing, first check input command parameters. Here it loads a first input image coming from a single loaded image (if user chose command *-image*) or from a video stream (if user chose command *-video*). Also, if the user added *log* command at the end of its program call, the spatial logarithmic image sampling performed by the retina is taken into account by the Boolean flag *useLogSampling*. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// welcome message |
||||
std::cout<<"****************************************************"<<std::endl; |
||||
std::cout<<"* Retina demonstration : demonstrates the use of is a wrapper class of the Gipsa/Listic Labs retina model."<<std::endl; |
||||
std::cout<<"* This demo will try to load the file 'RetinaSpecificParameters.xml' (if exists).\nTo create it, copy the autogenerated template 'RetinaDefaultParameters.xml'.\nThen twaek it with your own retina parameters."<<std::endl; |
||||
// basic input arguments checking |
||||
if (argc<2) |
||||
{ |
||||
help("bad number of parameter"); |
||||
return -1; |
||||
} |
||||
|
||||
bool useLogSampling = !strcmp(argv[argc-1], "log"); // check if user wants retina log sampling processing |
||||
|
||||
std::string inputMediaType=argv[1]; |
||||
|
||||
////////////////////////////////////////////////////////////////////////////// |
||||
// checking input media type (still image, video file, live video acquisition) |
||||
if (!strcmp(inputMediaType.c_str(), "-image") && argc >= 3) |
||||
{ |
||||
std::cout<<"RetinaDemo: processing image "<<argv[2]<<std::endl; |
||||
// image processing case |
||||
inputFrame = cv::imread(std::string(argv[2]), 1); // load image in RGB mode |
||||
}else |
||||
if (!strcmp(inputMediaType.c_str(), "-video")) |
||||
{ |
||||
if (argc == 2 || (argc == 3 && useLogSampling)) // attempt to grab images from a video capture device |
||||
{ |
||||
videoCapture.open(0); |
||||
}else// attempt to grab images from a video filestream |
||||
{ |
||||
std::cout<<"RetinaDemo: processing video stream "<<argv[2]<<std::endl; |
||||
videoCapture.open(argv[2]); |
||||
} |
||||
|
||||
// grab a first frame to check if everything is ok |
||||
videoCapture>>inputFrame; |
||||
}else |
||||
{ |
||||
// bad command parameter |
||||
help("bad command parameter"); |
||||
return -1; |
||||
} |
||||
|
||||
Once all input parameters are processed, a first image should have been loaded, if not, display error and stop program : |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
if (inputFrame.empty()) |
||||
{ |
||||
help("Input media could not be loaded, aborting"); |
||||
return -1; |
||||
} |
||||
|
||||
Now, everything is ready to run the retina model. I propose here to allocate a retina instance and to manage the eventual log sampling option. The Retina constructor expects at least a cv::Size object that shows the input data size that will have to be managed. One can activate other options such as color and its related color multiplexing strategy (here Bayer multiplexing is chosen using *enum cv::bioinspired::RETINA_COLOR_BAYER*). If using log sampling, the image reduction factor (smaller output images) and log sampling strengh can be adjusted. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// pointer to a retina object |
||||
cv::Ptr<cv::bioinspired::Retina> myRetina; |
||||
|
||||
// if the last parameter is 'log', then activate log sampling (favour foveal vision and subsamples peripheral vision) |
||||
if (useLogSampling) |
||||
{ |
||||
myRetina = cv::bioinspired::createRetina(inputFrame.size(), true, cv::bioinspired::RETINA_COLOR_BAYER, true, 2.0, 10.0); |
||||
} |
||||
else// -> else allocate "classical" retina : |
||||
myRetina = cv::bioinspired::createRetina(inputFrame.size()); |
||||
|
||||
Once done, the proposed code writes a default xml file that contains the default parameters of the retina. This is useful to make your own config using this template. Here generated template xml file is called *RetinaDefaultParameters.xml*. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// save default retina parameters file in order to let you see this and maybe modify it and reload using method "setup" |
||||
myRetina->write("RetinaDefaultParameters.xml"); |
||||
|
||||
In the following line, the retina attempts to load another xml file called *RetinaSpecificParameters.xml*. If you created it and introduced your own setup, it will be loaded, in the other case, default retina parameters are used. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// load parameters if file exists |
||||
myRetina->setup("RetinaSpecificParameters.xml"); |
||||
|
||||
It is not required here but just to show it is possible, you can reset the retina buffers to zero to force it to forget past events. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// reset all retina buffers (imagine you close your eyes for a long time) |
||||
myRetina->clearBuffers(); |
||||
|
||||
Now, it is time to run the retina ! First create some output buffers ready to receive the two retina channels outputs |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// declare retina output buffers |
||||
cv::Mat retinaOutput_parvo; |
||||
cv::Mat retinaOutput_magno; |
||||
|
||||
Then, run retina in a loop, load new frames from video sequence if necessary and get retina outputs back to dedicated buffers. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// processing loop with no stop condition |
||||
while(true) |
||||
{ |
||||
// if using video stream, then, grabbing a new frame, else, input remains the same |
||||
if (videoCapture.isOpened()) |
||||
videoCapture>>inputFrame; |
||||
|
||||
// run retina filter on the loaded input frame |
||||
myRetina->run(inputFrame); |
||||
// Retrieve and display retina output |
||||
myRetina->getParvo(retinaOutput_parvo); |
||||
myRetina->getMagno(retinaOutput_magno); |
||||
cv::imshow("retina input", inputFrame); |
||||
cv::imshow("Retina Parvo", retinaOutput_parvo); |
||||
cv::imshow("Retina Magno", retinaOutput_magno); |
||||
cv::waitKey(10); |
||||
} |
||||
|
||||
That's done ! But if you want to secure the system, take care and manage Exceptions. The retina can throw some when it sees irrelevant data (no input frame, wrong setup, etc.). |
||||
Then, i recommend to surround all the retina code by a try/catch system like this : |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
try{ |
||||
// pointer to a retina object |
||||
cv::Ptr<cv::Retina> myRetina; |
||||
[---] |
||||
// processing loop with no stop condition |
||||
while(true) |
||||
{ |
||||
[---] |
||||
} |
||||
|
||||
}catch(cv::Exception e) |
||||
{ |
||||
std::cerr<<"Error using Retina : "<<e.what()<<std::endl; |
||||
} |
||||
|
||||
Retina parameters, what to do ? |
||||
=============================== |
||||
|
||||
First, it is recommended to read the reference paper : |
||||
|
||||
* Benoit A., Caplier A., Durette B., Herault, J., *"Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing"*, Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011> |
||||
|
||||
Once done open the configuration file *RetinaDefaultParameters.xml* generated by the demo and let's have a look at it. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
<?xml version="1.0"?> |
||||
<opencv_storage> |
||||
<OPLandIPLparvo> |
||||
<colorMode>1</colorMode> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity> |
||||
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant> |
||||
<photoreceptorsSpatialConstant>5.7e-01</photoreceptorsSpatialConstant> |
||||
<horizontalCellsGain>0.01</horizontalCellsGain> |
||||
<hcellsTemporalConstant>0.5</hcellsTemporalConstant> |
||||
<hcellsSpatialConstant>7.</hcellsSpatialConstant> |
||||
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo> |
||||
<IPLmagno> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<parasolCells_beta>0.</parasolCells_beta> |
||||
<parasolCells_tau>0.</parasolCells_tau> |
||||
<parasolCells_k>7.</parasolCells_k> |
||||
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency> |
||||
<V0CompressionParameter>9.5e-01</V0CompressionParameter> |
||||
<localAdaptintegration_tau>0.</localAdaptintegration_tau> |
||||
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno> |
||||
</opencv_storage> |
||||
|
||||
Here are some hints but actually, the best parameter setup depends more on what you want to do with the retina rather than the images input that you give to retina. Apart from the more specific case of High Dynamic Range images (HDR) that require more specific setup for specific luminance compression objective, the retina behaviors should be rather stable from content to content. Note that OpenCV is able to manage such HDR format thanks to the OpenEXR images compatibility. |
||||
|
||||
Then, if the application target requires details enhancement prior to specific image processing, you need to know if mean luminance information is required or not. If not, the the retina can cancel or significantly reduce its energy thus giving more visibility to higher spatial frequency details. |
||||
|
||||
|
||||
Basic parameters |
||||
---------------- |
||||
|
||||
The most simple parameters are the following : |
||||
|
||||
* **colorMode** : let the retina process color information (if 1) or gray scale images (if 0). In this last case, only the first channel of the input will be processed. |
||||
|
||||
* **normaliseOutput** : each channel has this parameter, if value is 1, then the considered channel output is rescaled between 0 and 255. Take care in this case at the Magnocellular output level (motion/transient channel detection). Residual noise will also be rescaled ! |
||||
|
||||
**Note :** using color requires color channels multiplexing/demultipexing which requires more processing. You can expect much faster processing using gray levels : it would require around 30 product per pixel for all the retina processes and it has recently been parallelized for multicore architectures. |
||||
|
||||
Photo-receptors parameters |
||||
-------------------------- |
||||
|
||||
The following parameters act on the entry point of the retina - photo-receptors - and impact all the following processes. These sensors are low pass spatio-temporal filters that smooth temporal and spatial data and also adjust there sensitivity to local luminance thus improving details extraction and high frequency noise canceling. |
||||
|
||||
* **photoreceptorsLocalAdaptationSensitivity** between 0 and 1. Values close to 1 allow high luminance log compression effect at the photo-receptors level. Values closer to 0 give a more linear sensitivity. Increased alone, it can burn the *Parvo (details channel)* output image. If adjusted in collaboration with **ganglionCellsSensitivity** images can be very contrasted whatever the local luminance there is... at the price of a naturalness decrease. |
||||
|
||||
* **photoreceptorsTemporalConstant** this setups the temporal constant of the low pass filter effect at the entry of the retina. High value lead to strong temporal smoothing effect : moving objects are blurred and can disappear while static object are favored. But when starting the retina processing, stable state is reached lately. |
||||
|
||||
* **photoreceptorsSpatialConstant** specifies the spatial constant related to photo-receptors low pass filter effect. This parameters specify the minimum allowed spatial signal period allowed in the following. Typically, this filter should cut high frequency noise. Then a 0 value doesn't cut anything noise while higher values start to cut high spatial frequencies and more and more lower frequencies... Then, do not go to high if you wanna see some details of the input images ! A good compromise for color images is 0.53 since this won't affect too much the color spectrum. Higher values would lead to gray and blurred output images. |
||||
|
||||
Horizontal cells parameters |
||||
--------------------------- |
||||
|
||||
This parameter set tunes the neural network connected to the photo-receptors, the horizontal cells. It modulates photo-receptors sensitivity and completes the processing for final spectral whitening (part of the spatial band pass effect thus favoring visual details enhancement). |
||||
|
||||
* **horizontalCellsGain** here is a critical parameter ! If you are not interested by the mean luminance and focus on details enhancement, then, set to zero. But if you want to keep some environment luminance data, let some low spatial frequencies pass into the system and set a higher value (<1). |
||||
|
||||
* **hcellsTemporalConstant** similar to photo-receptors, this acts on the temporal constant of a low pass temporal filter that smooths input data. Here, a high value generates a high retina after effect while a lower value makes the retina more reactive. This value should be lower than **photoreceptorsTemporalConstant** to limit strong retina after effects. |
||||
|
||||
* **hcellsSpatialConstant** is the spatial constant of the low pass filter of these cells filter. It specifies the lowest spatial frequency allowed in the following. Visually, a high value leads to very low spatial frequencies processing and leads to salient halo effects. Lower values reduce this effect but the limit is : do not go lower than the value of **photoreceptorsSpatialConstant**. Those 2 parameters actually specify the spatial band-pass of the retina. |
||||
|
||||
**NOTE** after the processing managed by the previous parameters, input data is cleaned from noise and luminance in already partly enhanced. The following parameters act on the last processing stages of the two outing retina signals. |
||||
|
||||
Parvo (details channel) dedicated parameter |
||||
------------------------------------------- |
||||
|
||||
* **ganglionCellsSensitivity** specifies the strength of the final local adaptation occurring at the output of this details dedicated channel. Parameter values remain between 0 and 1. Low value tend to give a linear response while higher values enforces the remaining low contrasted areas. |
||||
|
||||
**Note :** this parameter can correct eventual burned images by favoring low energetic details of the visual scene, even in bright areas. |
||||
|
||||
IPL Magno (motion/transient channel) parameters |
||||
----------------------------------------------- |
||||
|
||||
Once image information is cleaned, this channel acts as a high pass temporal filter that only selects signals related to transient signals (events, motion, etc.). A low pass spatial filter smooths extracted transient data and a final logarithmic compression enhances low transient events thus enhancing event sensitivity. |
||||
|
||||
* **parasolCells_beta** generally set to zero, can be considered as an amplifier gain at the entry point of this processing stage. Generally set to 0. |
||||
|
||||
* **parasolCells_tau** the temporal smoothing effect that can be added |
||||
|
||||
* **parasolCells_k** the spatial constant of the spatial filtering effect, set it at a high value to favor low spatial frequency signals that are lower subject to residual noise. |
||||
|
||||
* **amacrinCellsTemporalCutFrequency** specifies the temporal constant of the high pass filter. High values let slow transient events to be selected. |
||||
|
||||
* **V0CompressionParameter** specifies the strength of the log compression. Similar behaviors to previous description but here it enforces sensitivity of transient events. |
||||
|
||||
* **localAdaptintegration_tau** generally set to 0, no real use here actually |
||||
|
||||
* **localAdaptintegration_k** specifies the size of the area on which local adaptation is performed. Low values lead to short range local adaptation (higher sensitivity to noise), high values secure log compression. |
@ -1,36 +0,0 @@ |
||||
.. _Table-Of-Content-Bioinspired: |
||||
|
||||
*bioinspired* module. Algorithms inspired from biological models |
||||
---------------------------------------------------------------- |
||||
|
||||
Here you will learn how to use additional modules of OpenCV defined in the "bioinspired" module. |
||||
|
||||
.. include:: ../../definitions/tocDefinitions.rst |
||||
|
||||
+ |
||||
.. tabularcolumns:: m{100pt} m{300pt} |
||||
.. cssclass:: toctableopencv |
||||
|
||||
=============== ====================================================== |
||||
|RetinaDemoImg| **Title:** :ref:`Retina_Model` |
||||
|
||||
*Compatibility:* > OpenCV 2.4 |
||||
|
||||
*Author:* |Author_AlexB| |
||||
|
||||
You will learn how to process images and video streams with a model of retina filter for details enhancement, spatio-temporal noise removal, luminance correction and spatio-temporal events detection. |
||||
|
||||
=============== ====================================================== |
||||
|
||||
.. |RetinaDemoImg| image:: images/retina_TreeHdr_small.jpg |
||||
:height: 90pt |
||||
:width: 90pt |
||||
|
||||
.. raw:: latex |
||||
|
||||
\pagebreak |
||||
|
||||
.. toctree:: |
||||
:hidden: |
||||
|
||||
../retina_model/retina_model |
@ -1,37 +0,0 @@ |
||||
.. _Table-Of-Content-CVV: |
||||
|
||||
*cvv* module. GUI for Interactive Visual Debugging |
||||
-------------------------------------------------- |
||||
|
||||
Here you will learn how to use the cvv module to ease programming computer vision software through visual debugging aids. |
||||
|
||||
.. include:: ../../definitions/tocDefinitions.rst |
||||
|
||||
+ |
||||
.. tabularcolumns:: m{100pt} m{300pt} |
||||
.. cssclass:: toctableopencv |
||||
|
||||
=============== ====================================================== |
||||
|cvvIntro| *Title:* :ref:`Visual_Debugging_Introduction` |
||||
|
||||
*Compatibility:* > OpenCV 2.4.8 |
||||
|
||||
*Author:* |Author_Bihlmaier| |
||||
|
||||
We will learn how to debug our applications in a visual and interactive way. |
||||
|
||||
=============== ====================================================== |
||||
|
||||
.. |cvvIntro| image:: images/Visual_Debugging_Introduction_Tutorial_Cover.jpg |
||||
:height: 90pt |
||||
:width: 90pt |
||||
|
||||
|
||||
.. raw:: latex |
||||
|
||||
\pagebreak |
||||
|
||||
.. toctree:: |
||||
:hidden: |
||||
|
||||
../visual_debugging_introduction/visual_debugging_introduction |
@ -1,372 +0,0 @@ |
||||
.. _Visual_Debugging_Introduction: |
||||
|
||||
Interactive Visual Debugging of Computer Vision applications |
||||
************************************************************ |
||||
|
||||
What is the most common way to debug computer vision applications? |
||||
Usually the answer is temporary, hacked together, custom code that must be removed from the code for release compilation. |
||||
|
||||
In this tutorial we will show how to use the visual debugging features of the **cvv** module (*opencv2/cvv/cvv.hpp*) instead. |
||||
|
||||
|
||||
Goals |
||||
====== |
||||
|
||||
In this tutorial you will learn how to: |
||||
|
||||
* Add cvv debug calls to your application |
||||
* Use the visual debug GUI |
||||
* Enable and disable the visual debug features during compilation (with zero runtime overhead when disabled) |
||||
|
||||
|
||||
Code |
||||
===== |
||||
|
||||
The example code |
||||
|
||||
* captures images (*highgui*), e.g. from a webcam, |
||||
* applies some filters to each image (*imgproc*), |
||||
* detects image features and matches them to the previous image (*features2d*). |
||||
|
||||
If the program is compiled without visual debugging (see CMakeLists.txt below) the only result is some information printed to the command line. |
||||
We want to demonstrate how much debugging or development functionality is added by just a few lines of *cvv* commands. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// system includes |
||||
#include <getopt.h> |
||||
#include <iostream> |
||||
|
||||
// library includes |
||||
#include <opencv2/highgui/highgui.hpp> |
||||
#include <opencv2/imgproc/imgproc.hpp> |
||||
#include <opencv2/features2d/features2d.hpp> |
||||
|
||||
// Visual debugging |
||||
#include <opencv2/cvv/cvv.hpp> |
||||
|
||||
|
||||
// helper function to convert objects that support operator<<() to std::string |
||||
template<class T> std::string toString(const T& p_arg) |
||||
{ |
||||
std::stringstream ss; |
||||
|
||||
ss << p_arg; |
||||
|
||||
return ss.str(); |
||||
} |
||||
|
||||
|
||||
void |
||||
usage() |
||||
{ |
||||
printf("usage: cvvt [-r WxH]\n"); |
||||
printf("-h print this help\n"); |
||||
printf("-r WxH change resolution to width W and height H\n"); |
||||
} |
||||
|
||||
|
||||
int |
||||
main(int argc, char** argv) |
||||
{ |
||||
#ifdef CVVISUAL_DEBUGMODE |
||||
std::cout << "Visual debugging is ENABLED" << std::endl; |
||||
#else |
||||
std::cout << "Visual debugging is DISABLED" << std::endl; |
||||
#endif |
||||
|
||||
cv::Size* resolution = nullptr; |
||||
|
||||
// parse options |
||||
const char* optstring = "hr:"; |
||||
int opt; |
||||
while ((opt = getopt(argc, argv, optstring)) != -1) { |
||||
switch (opt) { |
||||
case 'h': |
||||
usage(); |
||||
return 0; |
||||
break; |
||||
case 'r': |
||||
{ |
||||
char dummych; |
||||
resolution = new cv::Size(); |
||||
if (sscanf(optarg, "%d%c%d", &resolution->width, &dummych, &resolution->height) != 3) { |
||||
printf("%s not a valid resolution\n", optarg); |
||||
return 1; |
||||
} |
||||
} |
||||
break; |
||||
default: |
||||
usage(); |
||||
return 2; |
||||
} |
||||
} |
||||
|
||||
// setup video capture |
||||
cv::VideoCapture capture(0); |
||||
if (!capture.isOpened()) { |
||||
std::cout << "Could not open VideoCapture" << std::endl; |
||||
return 3; |
||||
} |
||||
|
||||
if (resolution) { |
||||
printf("Setting resolution to %dx%d\n", resolution->width, resolution->height); |
||||
capture.set(CV_CAP_PROP_FRAME_WIDTH, resolution->width); |
||||
capture.set(CV_CAP_PROP_FRAME_HEIGHT, resolution->height); |
||||
} |
||||
|
||||
|
||||
cv::Mat prevImgGray; |
||||
std::vector<cv::KeyPoint> prevKeypoints; |
||||
cv::Mat prevDescriptors; |
||||
|
||||
int maxFeatureCount = 500; |
||||
cv::ORB detector(maxFeatureCount); |
||||
|
||||
cv::BFMatcher matcher(cv::NORM_HAMMING); |
||||
|
||||
for (int imgId = 0; imgId < 10; imgId++) { |
||||
// capture a frame |
||||
cv::Mat imgRead; |
||||
capture >> imgRead; |
||||
printf("%d: image captured\n", imgId); |
||||
|
||||
std::string imgIdString{"imgRead"}; |
||||
imgIdString += toString(imgId); |
||||
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str()); |
||||
|
||||
// convert to grayscale |
||||
cv::Mat imgGray; |
||||
cv::cvtColor(imgRead, imgGray, CV_BGR2GRAY); |
||||
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray"); |
||||
|
||||
// filter edges using Canny on smoothed image |
||||
cv::Mat imgGraySmooth; |
||||
cv::GaussianBlur(imgGray, imgGraySmooth, cv::Size(9, 9), 2, 2); |
||||
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed"); |
||||
cv::Mat imgEdges; |
||||
cv::Canny(imgGraySmooth, imgEdges, 50, 150); |
||||
cvv::showImage(imgEdges, CVVISUAL_LOCATION, "edges"); |
||||
|
||||
// dilate edges |
||||
cv::Mat imgEdgesDilated; |
||||
cv::dilate(imgEdges, imgEdgesDilated, cv::getStructuringElement(cv::MORPH_RECT, cv::Size(7, 7), cv::Point(3, 3))); |
||||
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges"); |
||||
|
||||
// detect ORB features |
||||
std::vector<cv::KeyPoint> keypoints; |
||||
cv::Mat descriptors; |
||||
detector(imgGray, cv::noArray(), keypoints, descriptors); |
||||
printf("%d: detected %zd keypoints\n", imgId, keypoints.size()); |
||||
|
||||
// match them to previous image (if available) |
||||
if (!prevImgGray.empty()) { |
||||
std::vector<cv::DMatch> matches; |
||||
matcher.match(prevDescriptors, descriptors, matches); |
||||
printf("%d: all matches size=%zd\n", imgId, matches.size()); |
||||
std::string allMatchIdString{"all matches "}; |
||||
allMatchIdString += toString(imgId-1) + "<->" + toString(imgId); |
||||
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str()); |
||||
|
||||
// remove worst (as defined by match distance) bestRatio quantile |
||||
double bestRatio = 0.8; |
||||
std::sort(matches.begin(), matches.end()); |
||||
matches.resize(int(bestRatio * matches.size())); |
||||
printf("%d: best matches size=%zd\n", imgId, matches.size()); |
||||
std::string bestMatchIdString{"best " + toString(bestRatio) + " matches "}; |
||||
bestMatchIdString += toString(imgId-1) + "<->" + toString(imgId); |
||||
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, bestMatchIdString.c_str()); |
||||
} |
||||
|
||||
prevImgGray = imgGray; |
||||
prevKeypoints = keypoints; |
||||
prevDescriptors = descriptors; |
||||
} |
||||
|
||||
cvv::finalShow(); |
||||
|
||||
return 0; |
||||
} |
||||
|
||||
|
||||
.. code-block:: cmake |
||||
|
||||
cmake_minimum_required(VERSION 2.8) |
||||
|
||||
project(cvvisual_test) |
||||
|
||||
SET(CMAKE_PREFIX_PATH ~/software/opencv/install) |
||||
|
||||
SET(CMAKE_CXX_COMPILER "g++-4.8") |
||||
SET(CMAKE_CXX_FLAGS "-std=c++11 -O2 -pthread -Wall -Werror") |
||||
|
||||
# (un)set: cmake -DCVV_DEBUG_MODE=OFF .. |
||||
OPTION(CVV_DEBUG_MODE "cvvisual-debug-mode" ON) |
||||
if(CVV_DEBUG_MODE MATCHES ON) |
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DCVVISUAL_DEBUGMODE") |
||||
endif() |
||||
|
||||
|
||||
FIND_PACKAGE(OpenCV REQUIRED) |
||||
include_directories(${OpenCV_INCLUDE_DIRS}) |
||||
|
||||
add_executable(cvvt main.cpp) |
||||
target_link_libraries(cvvt |
||||
opencv_core opencv_highgui opencv_imgproc opencv_features2d |
||||
opencv_cvv |
||||
) |
||||
|
||||
|
||||
Explanation |
||||
============ |
||||
|
||||
#. We compile the program either using the above CmakeLists.txt with Option *CVV_DEBUG_MODE=ON* (*cmake -DCVV_DEBUG_MODE=ON*) or by adding the corresponding define *CVVISUAL_DEBUGMODE* to our compiler (e.g. *g++ -DCVVISUAL_DEBUGMODE*). |
||||
|
||||
#. The first cvv call simply shows the image (similar to *imshow*) with the imgIdString as comment. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str()); |
||||
|
||||
The image is added to the overview tab in the visual debug GUI and the cvv call blocks. |
||||
|
||||
|
||||
.. image:: images/01_overview_single.jpg |
||||
:alt: Overview with image of first cvv call |
||||
:align: center |
||||
|
||||
The image can then be selected and viewed |
||||
|
||||
.. image:: images/02_single_image_view.jpg |
||||
:alt: Display image added through cvv::showImage |
||||
:align: center |
||||
|
||||
Whenever you want to continue in the code, i.e. unblock the cvv call, you can |
||||
either continue until the next cvv call (*Step*), continue until the last cvv |
||||
call (*>>*) or run the application until it exists (*Close*). |
||||
|
||||
We decide to press the green *Step* button. |
||||
|
||||
|
||||
#. The next cvv calls are used to debug all kinds of filter operations, i.e. operations that take a picture as input and return a picture as output. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray"); |
||||
|
||||
As with every cvv call, you first end up in the overview. |
||||
|
||||
.. image:: images/03_overview_two.jpg |
||||
:alt: Overview with two cvv calls after pressing Step |
||||
:align: center |
||||
|
||||
We decide not to care about the conversion to gray scale and press *Step*. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed"); |
||||
|
||||
If you open the filter call, you will end up in the so called "DefaultFilterView". |
||||
Both images are shown next to each other and you can (synchronized) zoom into them. |
||||
|
||||
.. image:: images/04_default_filter_view.jpg |
||||
:alt: Default filter view displaying a gray scale image and its corresponding GaussianBlur filtered one |
||||
:align: center |
||||
|
||||
When you go to very high zoom levels, each pixel is annotated with its numeric values. |
||||
|
||||
.. image:: images/05_default_filter_view_high_zoom.jpg |
||||
:alt: Default filter view at very high zoom levels |
||||
:align: center |
||||
|
||||
We press *Step* twice and have a look at the dilated image. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges"); |
||||
|
||||
The DefaultFilterView showing both images |
||||
|
||||
.. image:: images/06_default_filter_view_edges.jpg |
||||
:alt: Default filter view showing an edge image and the image after dilate() |
||||
:align: center |
||||
|
||||
Now we use the *View* selector in the top right and select the "DualFilterView". |
||||
We select "Changed Pixels" as filter and apply it (middle image). |
||||
|
||||
.. image:: images/07_dual_filter_view_edges.jpg |
||||
:alt: Dual filter view showing an edge image and the image after dilate() |
||||
:align: center |
||||
|
||||
After we had a close look at these images, perhaps using different views, filters or other GUI features, we decide to let the program run through. Therefore we press the yellow *>>* button. |
||||
|
||||
The program will block at |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::finalShow(); |
||||
|
||||
and display the overview with everything that was passed to cvv in the meantime. |
||||
|
||||
.. image:: images/08_overview_all.jpg |
||||
:alt: Overview displaying all cvv calls up to finalShow() |
||||
:align: center |
||||
|
||||
#. The cvv debugDMatch call is used in a situation where there are two images each with a set of descriptors that are matched to each other. |
||||
|
||||
We pass both images, both sets of keypoints and their matching to the visual debug module. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str()); |
||||
|
||||
Since we want to have a look at matches, we use the filter capabilities (*#type match*) in the overview to only show match calls. |
||||
|
||||
.. image:: images/09_overview_filtered_type_match.jpg |
||||
:alt: Overview displaying only match calls |
||||
:align: center |
||||
|
||||
We want to have a closer look at one of them, e.g. to tune our parameters that use the matching. |
||||
The view has various settings how to display keypoints and matches. |
||||
Furthermore, there is a mouseover tooltip. |
||||
|
||||
.. image:: images/10_line_match_view.jpg |
||||
:alt: Line match view |
||||
:align: center |
||||
|
||||
We see (visual debugging!) that there are many bad matches. |
||||
We decide that only 70% of the matches should be shown - those 70% with the lowest match distance. |
||||
|
||||
.. image:: images/11_line_match_view_portion_selector.jpg |
||||
:alt: Line match view showing the best 70% matches, i.e. lowest match distance |
||||
:align: center |
||||
|
||||
Having successfully reduced the visual distraction, we want to see more clearly what changed between the two images. |
||||
We select the "TranslationMatchView" that shows to where the keypoint was matched in a different way. |
||||
|
||||
.. image:: images/12_translation_match_view_portion_selector.jpg |
||||
:alt: Translation match view |
||||
:align: center |
||||
|
||||
It is easy to see that the cup was moved to the left during the two images. |
||||
|
||||
Although, cvv is all about interactively *seeing* the computer vision bugs, this is complemented by a "RawView" that allows to have a look at the underlying numeric data. |
||||
|
||||
.. image:: images/13_raw_view.jpg |
||||
:alt: Raw view of matches |
||||
:align: center |
||||
|
||||
#. There are many more useful features contained in the cvv GUI. For instance, one can group the overview tab. |
||||
|
||||
.. image:: images/14_overview_group_by_line.jpg |
||||
:alt: Overview grouped by call line |
||||
:align: center |
||||
|
||||
|
||||
Result |
||||
======= |
||||
|
||||
* By adding a view expressive lines to our computer vision program we can interactively debug it through different visualizations. |
||||
* Once we are done developing/debugging we do not have to remove those lines. We simply disable cvv debugging (*cmake -DCVV_DEBUG_MODE=OFF* or g++ without *-DCVVISUAL_DEBUGMODE*) and our programs runs without any debug overhead. |
||||
|
||||
Enjoy computer vision! |
@ -1,151 +0,0 @@ |
||||
.. ximgproc: |
||||
|
||||
Structured forests for fast edge detection |
||||
****************************************** |
||||
|
||||
Introduction |
||||
------------ |
||||
In this tutorial you will learn how to use structured forests for the purpose of edge detection in an image. |
||||
|
||||
Examples |
||||
-------- |
||||
|
||||
.. image:: images/01.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/02.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/03.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/04.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/05.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/06.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/07.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/08.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/09.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/10.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/11.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
.. image:: images/12.jpg |
||||
:height: 238pt |
||||
:width: 750pt |
||||
:alt: First example |
||||
:align: center |
||||
|
||||
**Note :** binarization techniques like Canny edge detector are applicable |
||||
to edges produced by both algorithms (``Sobel`` and ``StructuredEdgeDetection::detectEdges``). |
||||
|
||||
Source Code |
||||
----------- |
||||
|
||||
.. literalinclude:: ../../../../modules/ximpgroc/samples/cpp/structured_edge_detection.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
:tab-width: 4 |
||||
|
||||
Explanation |
||||
----------- |
||||
|
||||
1. **Load source color image** |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cv::Mat image = cv::imread(inFilename, 1); |
||||
if ( image.empty() ) |
||||
{ |
||||
printf("Cannot read image file: %s\n", inFilename.c_str()); |
||||
return -1; |
||||
} |
||||
|
||||
2. **Convert source image to [0;1] range** |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
image.convertTo(image, cv::DataType<float>::type, 1/255.0); |
||||
|
||||
3. **Run main algorithm** |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cv::Mat edges(image.size(), image.type()); |
||||
|
||||
cv::Ptr<StructuredEdgeDetection> pDollar = |
||||
cv::createStructuredEdgeDetection(modelFilename); |
||||
pDollar->detectEdges(image, edges); |
||||
|
||||
4. **Show results** |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
if ( outFilename == "" ) |
||||
{ |
||||
cv::namedWindow("edges", 1); |
||||
cv::imshow("edges", edges); |
||||
|
||||
cv::waitKey(0); |
||||
} |
||||
else |
||||
cv::imwrite(outFilename, 255*edges); |
||||
|
||||
Literature |
||||
---------- |
||||
For more information, refer to the following papers : |
||||
|
||||
.. [Dollar2013] Dollar P., Zitnick C. L., "Structured forests for fast edge detection", |
||||
IEEE International Conference on Computer Vision (ICCV), 2013, |
||||
pp. 1841-1848. `DOI <http://dx.doi.org/10.1109/ICCV.2013.231>`_ |
||||
|
||||
.. [Lim2013] Lim J. J., Zitnick C. L., Dollar P., "Sketch Tokens: A Learned |
||||
Mid-level Representation for Contour and Object Detection", |
||||
Comoputer Vision and Pattern Recognition (CVPR), 2013, |
||||
pp. 3158-3165. `DOI <http://dx.doi.org/10.1109/CVPR.2013.406>`_ |
@ -1,115 +0,0 @@ |
||||
.. ximgproc: |
||||
|
||||
Structured forest training |
||||
************************** |
||||
|
||||
Introduction |
||||
------------ |
||||
In this tutorial we show how to train your own structured forest using author's initial Matlab implementation. |
||||
|
||||
Training pipeline |
||||
----------------- |
||||
|
||||
1. Download "Piotr's Toolbox" from `link <http://vision.ucsd.edu/~pdollar/toolbox/doc/index.html>`_ |
||||
and put it into separate directory, e.g. PToolbox |
||||
|
||||
2. Download BSDS500 dataset from `link <http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/BSR/>` |
||||
and put it into separate directory named exactly BSR |
||||
|
||||
3. Add both directory and their subdirectories to Matlab path. |
||||
|
||||
4. Download detector code from `link <http://research.microsoft.com/en-us/downloads/389109f6-b4e8-404c-84bf-239f7cbf4e3d/>` |
||||
and put it into root directory. Now you should have :: |
||||
|
||||
. |
||||
BSR |
||||
PToolbox |
||||
models |
||||
private |
||||
Contents.m |
||||
edgesChns.m |
||||
edgesDemo.m |
||||
edgesDemoRgbd.m |
||||
edgesDetect.m |
||||
edgesEval.m |
||||
edgesEvalDir.m |
||||
edgesEvalImg.m |
||||
edgesEvalPlot.m |
||||
edgesSweeps.m |
||||
edgesTrain.m |
||||
license.txt |
||||
readme.txt |
||||
|
||||
5. Rename models/forest/modelFinal.mat to models/forest/modelFinal.mat.backup |
||||
|
||||
6. Open edgesChns.m and comment lines 26--41. Add after commented lines the following: :: |
||||
|
||||
shrink=opts.shrink; |
||||
chns = single(getFeatures( im2double(I) )); |
||||
|
||||
7. Now it is time to compile promised getFeatures. I do with the following code: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
#include <cv.h> |
||||
#include <highgui.h> |
||||
|
||||
#include <mat.h> |
||||
#include <mex.h> |
||||
|
||||
#include "MxArray.hpp" // https://github.com/kyamagu/mexopencv |
||||
|
||||
class NewRFFeatureGetter : public cv::RFFeatureGetter |
||||
{ |
||||
public: |
||||
NewRFFeatureGetter() : name("NewRFFeatureGetter"){} |
||||
|
||||
virtual void getFeatures(const cv::Mat &src, NChannelsMat &features, |
||||
const int gnrmRad, const int gsmthRad, |
||||
const int shrink, const int outNum, const int gradNum) const |
||||
{ |
||||
// here your feature extraction code, the default one is: |
||||
// resulting features Mat should be n-channels, floating point matrix |
||||
} |
||||
|
||||
protected: |
||||
cv::String name; |
||||
}; |
||||
|
||||
MEXFUNCTION_LINKAGE void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) |
||||
{ |
||||
if (nlhs != 1) mexErrMsgTxt("nlhs != 1"); |
||||
if (nrhs != 1) mexErrMsgTxt("nrhs != 1"); |
||||
|
||||
cv::Mat src = MxArray(prhs[0]).toMat(); |
||||
src.convertTo(src, cv::DataType<float>::type); |
||||
|
||||
std::string modelFile = MxArray(prhs[1]).toString(); |
||||
NewRFFeatureGetter *pDollar = createNewRFFeatureGetter(); |
||||
|
||||
cv::Mat edges; |
||||
pDollar->getFeatures(src, edges, 4, 0, 2, 13, 4); |
||||
// you can use other numbers here |
||||
|
||||
edges.convertTo(edges, cv::DataType<double>::type); |
||||
|
||||
plhs[0] = MxArray(edges); |
||||
} |
||||
|
||||
8. Place compiled mex file into root dir and run edgesDemo. |
||||
You will need to wait a couple of hours after that the new model |
||||
will appear inside models/forest/. |
||||
|
||||
9. The final step is converting trained model from Matlab binary format |
||||
to YAML which you can use with our ocv::StructuredEdgeDetection. |
||||
For this purpose run opencv_contrib/doc/tutorials/ximpgroc/training/modelConvert(model, "model.yml") |
||||
|
||||
How to use your model |
||||
--------------------- |
||||
|
||||
Just use expanded constructor with above defined class NewRFFeatureGetter |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
cv::StructuredEdgeDetection pDollar |
||||
= cv::createStructuredEdgeDetection( modelName, makePtr<NewRFFeatureGetter>() ); |
@ -1,10 +0,0 @@ |
||||
******************************************************************** |
||||
bioinspired. Biologically inspired vision models and derivated tools |
||||
******************************************************************** |
||||
|
||||
The module provides biological visual systems models (human visual system and others). It also provides derivated objects that take advantage of those bio-inspired models. |
||||
|
||||
.. toctree:: |
||||
:maxdepth: 2 |
||||
|
||||
Human retina documentation <retina> |
@ -1,493 +0,0 @@ |
||||
Retina : a Bio mimetic human retina model |
||||
***************************************** |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
Retina |
||||
====== |
||||
.. ocv:class:: Retina : public Algorithm |
||||
|
||||
**Note** : do not forget that the retina model is included in the following namespace : *cv::bioinspired*. |
||||
|
||||
Introduction |
||||
++++++++++++ |
||||
|
||||
Class which provides the main controls to the Gipsa/Listic labs human retina model. This is a non separable spatio-temporal filter modelling the two main retina information channels : |
||||
|
||||
* foveal vision for detailled color vision : the parvocellular pathway. |
||||
|
||||
* peripheral vision for sensitive transient signals detection (motion and events) : the magnocellular pathway. |
||||
|
||||
From a general point of view, this filter whitens the image spectrum and corrects luminance thanks to local adaptation. An other important property is its hability to filter out spatio-temporal noise while enhancing details. |
||||
This model originates from Jeanny Herault work [Herault2010]_. It has been involved in Alexandre Benoit phd and his current research [Benoit2010]_, [Strat2013]_ (he currently maintains this module within OpenCV). It includes the work of other Jeanny's phd student such as [Chaix2007]_ and the log polar transformations of Barthelemy Durette described in Jeanny's book. |
||||
|
||||
**NOTES :** |
||||
|
||||
* For ease of use in computer vision applications, the two retina channels are applied homogeneously on all the input images. This does not follow the real retina topology but this can still be done using the log sampling capabilities proposed within the class. |
||||
|
||||
* Extend the retina description and code use in the tutorial/contrib section for complementary explanations. |
||||
|
||||
Preliminary illustration |
||||
++++++++++++++++++++++++ |
||||
|
||||
As a preliminary presentation, let's start with a visual example. We propose to apply the filter on a low quality color jpeg image with backlight problems. Here is the considered input... *"Well, my eyes were able to see more that this strange black shadow..."* |
||||
|
||||
.. image:: images/retinaInput.jpg |
||||
:alt: a low quality color jpeg image with backlight problems. |
||||
:align: center |
||||
|
||||
Below, the retina foveal model applied on the entire image with default parameters. Here contours are enforced, halo effects are voluntary visible with this configuration. See parameters discussion below and increase horizontalCellsGain near 1 to remove them. |
||||
|
||||
.. image:: images/retinaOutput_default.jpg |
||||
:alt: the retina foveal model applied on the entire image with default parameters. Here contours are enforced, luminance is corrected and halo effects are voluntary visible with this configuration, increase horizontalCellsGain near 1 to remove them. |
||||
:align: center |
||||
|
||||
Below, a second retina foveal model output applied on the entire image with a parameters setup focused on naturalness perception. *"Hey, i now recognize my cat, looking at the mountains at the end of the day !"*. Here contours are enforced, luminance is corrected but halos are avoided with this configuration. The backlight effect is corrected and highlight details are still preserved. Then, even on a low quality jpeg image, if some luminance information remains, the retina is able to reconstruct a proper visual signal. Such configuration is also usefull for High Dynamic Range (*HDR*) images compression to 8bit images as discussed in [Benoit2010]_ and in the demonstration codes discussed below. |
||||
As shown at the end of the page, parameters change from defaults are : |
||||
|
||||
* horizontalCellsGain=0.3 |
||||
|
||||
* photoreceptorsLocalAdaptationSensitivity=ganglioncellsSensitivity=0.89. |
||||
|
||||
.. image:: images/retinaOutput_realistic.jpg |
||||
:alt: the retina foveal model applied on the entire image with 'naturalness' parameters. Here contours are enforced but are avoided with this configuration, horizontalCellsGain is 0.3 and photoreceptorsLocalAdaptationSensitivity=ganglioncellsSensitivity=0.89. |
||||
:align: center |
||||
|
||||
As observed in this preliminary demo, the retina can be settled up with various parameters, by default, as shown on the figure above, the retina strongly reduces mean luminance energy and enforces all details of the visual scene. Luminance energy and halo effects can be modulated (exagerated to cancelled as shown on the two examples). In order to use your own parameters, you can use at least one time the *write(String fs)* method which will write a proper XML file with all default parameters. Then, tweak it on your own and reload them at any time using method *setup(String fs)*. These methods update a *Retina::RetinaParameters* member structure that is described hereafter. XML parameters file samples are shown at the end of the page. |
||||
|
||||
Here is an overview of the abstract Retina interface, allocate one instance with the *createRetina* functions.:: |
||||
|
||||
namespace cv{namespace bioinspired{ |
||||
|
||||
class Retina : public Algorithm |
||||
{ |
||||
public: |
||||
// parameters setup instance |
||||
struct RetinaParameters; // this class is detailled later |
||||
|
||||
// main method for input frame processing (all use method, can also perform High Dynamic Range tone mapping) |
||||
void run (InputArray inputImage); |
||||
|
||||
// specific method aiming at correcting luminance only (faster High Dynamic Range tone mapping) |
||||
void applyFastToneMapping(InputArray inputImage, OutputArray outputToneMappedImage) |
||||
|
||||
// output buffers retreival methods |
||||
// -> foveal color vision details channel with luminance and noise correction |
||||
void getParvo (OutputArray retinaOutput_parvo); |
||||
void getParvoRAW (OutputArray retinaOutput_parvo);// retreive original output buffers without any normalisation |
||||
const Mat getParvoRAW () const;// retreive original output buffers without any normalisation |
||||
// -> peripheral monochrome motion and events (transient information) channel |
||||
void getMagno (OutputArray retinaOutput_magno); |
||||
void getMagnoRAW (OutputArray retinaOutput_magno); // retreive original output buffers without any normalisation |
||||
const Mat getMagnoRAW () const;// retreive original output buffers without any normalisation |
||||
|
||||
// reset retina buffers... equivalent to closing your eyes for some seconds |
||||
void clearBuffers (); |
||||
|
||||
// retreive input and output buffers sizes |
||||
Size getInputSize (); |
||||
Size getOutputSize (); |
||||
|
||||
// setup methods with specific parameters specification of global xml config file loading/write |
||||
void setup (String retinaParameterFile="", const bool applyDefaultSetupOnFailure=true); |
||||
void setup (FileStorage &fs, const bool applyDefaultSetupOnFailure=true); |
||||
void setup (RetinaParameters newParameters); |
||||
struct Retina::RetinaParameters getParameters (); |
||||
const String printSetup (); |
||||
virtual void write (String fs) const; |
||||
virtual void write (FileStorage &fs) const; |
||||
void setupOPLandIPLParvoChannel (const bool colorMode=true, const bool normaliseOutput=true, const float photoreceptorsLocalAdaptationSensitivity=0.7, const float photoreceptorsTemporalConstant=0.5, const float photoreceptorsSpatialConstant=0.53, const float horizontalCellsGain=0, const float HcellsTemporalConstant=1, const float HcellsSpatialConstant=7, const float ganglionCellsSensitivity=0.7); |
||||
void setupIPLMagnoChannel (const bool normaliseOutput=true, const float parasolCells_beta=0, const float parasolCells_tau=0, const float parasolCells_k=7, const float amacrinCellsTemporalCutFrequency=1.2, const float V0CompressionParameter=0.95, const float localAdaptintegration_tau=0, const float localAdaptintegration_k=7); |
||||
void setColorSaturation (const bool saturateColors=true, const float colorSaturationValue=4.0); |
||||
void activateMovingContoursProcessing (const bool activate); |
||||
void activateContoursProcessing (const bool activate); |
||||
}; |
||||
|
||||
// Allocators |
||||
cv::Ptr<Retina> createRetina (Size inputSize); |
||||
cv::Ptr<Retina> createRetina (Size inputSize, const bool colorMode, RETINA_COLORSAMPLINGMETHOD colorSamplingMethod=RETINA_COLOR_BAYER, const bool useRetinaLogSampling=false, const double reductionFactor=1.0, const double samplingStrenght=10.0); |
||||
}} // cv and bioinspired namespaces end |
||||
|
||||
.. Sample code:: |
||||
|
||||
* An example on retina tone mapping can be found at opencv_source_code/samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp |
||||
* An example on retina tone mapping on video input can be found at opencv_source_code/samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp |
||||
* A complete example illustrating the retina interface can be found at opencv_source_code/samples/cpp/retinaDemo.cpp |
||||
|
||||
Description |
||||
+++++++++++ |
||||
|
||||
Class which allows the `Gipsa <http://www.gipsa-lab.inpg.fr>`_ (preliminary work) / `Listic <http://www.listic.univ-savoie.fr>`_ (code maintainer and user) labs retina model to be used. This class allows human retina spatio-temporal image processing to be applied on still images, images sequences and video sequences. Briefly, here are the main human retina model properties: |
||||
|
||||
* spectral whithening (mid-frequency details enhancement) |
||||
|
||||
* high frequency spatio-temporal noise reduction (temporal noise and high frequency spatial noise are minimized) |
||||
|
||||
* low frequency luminance reduction (luminance range compression) : high luminance regions do not hide details in darker regions anymore |
||||
|
||||
* local logarithmic luminance compression allows details to be enhanced even in low light conditions |
||||
|
||||
Use : this model can be used basically for spatio-temporal video effects but also in the aim of : |
||||
|
||||
* performing texture analysis with enhanced signal to noise ratio and enhanced details robust against input images luminance ranges (check out the parvocellular retina channel output, by using the provided **getParvo** methods) |
||||
|
||||
* performing motion analysis also taking benefit of the previously cited properties (check out the magnocellular retina channel output, by using the provided **getMagno** methods) |
||||
|
||||
* general image/video sequence description using either one or both channels. An example of the use of Retina in a Bag of Words approach is given in [Strat2013]_. |
||||
|
||||
Literature |
||||
========== |
||||
For more information, refer to the following papers : |
||||
|
||||
* Model description : |
||||
|
||||
.. [Benoit2010] Benoit A., Caplier A., Durette B., Herault, J., "Using Human Visual System Modeling For Bio-Inspired Low Level Image Processing", Elsevier, Computer Vision and Image Understanding 114 (2010), pp. 758-773. DOI <http://dx.doi.org/10.1016/j.cviu.2010.01.011> |
||||
|
||||
* Model use in a Bag of Words approach : |
||||
|
||||
.. [Strat2013] Strat S., Benoit A., Lambert P., "Retina enhanced SIFT descriptors for video indexing", CBMI2013, Veszprém, Hungary, 2013. |
||||
|
||||
* Please have a look at the reference work of Jeanny Herault that you can read in his book : |
||||
|
||||
.. [Herault2010] Vision: Images, Signals and Neural Networks: Models of Neural Processing in Visual Perception (Progress in Neural Processing),By: Jeanny Herault, ISBN: 9814273686. WAPI (Tower ID): 113266891. |
||||
|
||||
This retina filter code includes the research contributions of phd/research collegues from which code has been redrawn by the author : |
||||
|
||||
* take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color mosaicing/demosaicing and his reference paper: |
||||
|
||||
.. [Chaix2007] B. Chaix de Lavarene, D. Alleysson, B. Durette, J. Herault (2007). "Efficient demosaicing through recursive filtering", IEEE International Conference on Image Processing ICIP 2007 |
||||
|
||||
* take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is also proposed and originates from Jeanny's discussions. More informations in the above cited Jeanny Heraults's book. |
||||
|
||||
* Meylan&al work on HDR tone mapping that is implemented as a specific method within the model : |
||||
|
||||
.. [Meylan2007] L. Meylan , D. Alleysson, S. Susstrunk, "A Model of Retinal Local Adaptation for the Tone Mapping of Color Filter Array Images", Journal of Optical Society of America, A, Vol. 24, N 9, September, 1st, 2007, pp. 2807-2816 |
||||
|
||||
Demos and experiments ! |
||||
======================= |
||||
|
||||
**NOTE : Complementary to the following examples, have a look at the Retina tutorial in the tutorial/contrib section for complementary explanations.** |
||||
|
||||
Take a look at the provided C++ examples provided with OpenCV : |
||||
|
||||
* **samples/cpp/retinademo.cpp** shows how to use the retina module for details enhancement (Parvo channel output) and transient maps observation (Magno channel output). You can play with images, video sequences and webcam video. |
||||
Typical uses are (provided your OpenCV installation is situated in folder *OpenCVReleaseFolder*) |
||||
|
||||
* image processing : **OpenCVReleaseFolder/bin/retinademo -image myPicture.jpg** |
||||
|
||||
* video processing : **OpenCVReleaseFolder/bin/retinademo -video myMovie.avi** |
||||
|
||||
* webcam processing: **OpenCVReleaseFolder/bin/retinademo -video** |
||||
|
||||
**Note :** This demo generates the file *RetinaDefaultParameters.xml* which contains the default parameters of the retina. Then, rename this as *RetinaSpecificParameters.xml*, adjust the parameters the way you want and reload the program to check the effect. |
||||
|
||||
|
||||
* **samples/cpp/OpenEXRimages_HDR_Retina_toneMapping.cpp** shows how to use the retina to perform High Dynamic Range (HDR) luminance compression |
||||
|
||||
Then, take a HDR image using bracketing with your camera and generate an OpenEXR image and then process it using the demo. |
||||
|
||||
Typical use, supposing that you have the OpenEXR image such as *memorial.exr* (present in the samples/cpp/ folder) |
||||
|
||||
**OpenCVReleaseFolder/bin/OpenEXRimages_HDR_Retina_toneMapping memorial.exr [optional: 'fast']** |
||||
|
||||
Note that some sliders are made available to allow you to play with luminance compression. |
||||
|
||||
If not using the 'fast' option, then, tone mapping is performed using the full retina model [Benoit2010]_. It includes spectral whitening that allows luminance energy to be reduced. When using the 'fast' option, then, a simpler method is used, it is an adaptation of the algorithm presented in [Meylan2007]_. This method gives also good results and is faster to process but it sometimes requires some more parameters adjustement. |
||||
|
||||
|
||||
Methods description |
||||
=================== |
||||
|
||||
Here are detailled the main methods to control the retina model |
||||
|
||||
Ptr<Retina>::createRetina |
||||
+++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: Ptr<cv::bioinspired::Retina> createRetina(Size inputSize) |
||||
.. ocv:function:: Ptr<cv::bioinspired::Retina> createRetina(Size inputSize, const bool colorMode, cv::bioinspired::RETINA_COLORSAMPLINGMETHOD colorSamplingMethod = cv::bioinspired::RETINA_COLOR_BAYER, const bool useRetinaLogSampling = false, const double reductionFactor = 1.0, const double samplingStrenght = 10.0 ) |
||||
|
||||
Constructors from standardized interfaces : retreive a smart pointer to a Retina instance |
||||
|
||||
:param inputSize: the input frame size |
||||
:param colorMode: the chosen processing mode : with or without color processing |
||||
:param colorSamplingMethod: specifies which kind of color sampling will be used : |
||||
|
||||
* cv::bioinspired::RETINA_COLOR_RANDOM: each pixel position is either R, G or B in a random choice |
||||
|
||||
* cv::bioinspired::RETINA_COLOR_DIAGONAL: color sampling is RGBRGBRGB..., line 2 BRGBRGBRG..., line 3, GBRGBRGBR... |
||||
|
||||
* cv::bioinspired::RETINA_COLOR_BAYER: standard bayer sampling |
||||
|
||||
:param useRetinaLogSampling: activate retina log sampling, if true, the 2 following parameters can be used |
||||
:param reductionFactor: only usefull if param useRetinaLogSampling=true, specifies the reduction factor of the output frame (as the center (fovea) is high resolution and corners can be underscaled, then a reduction of the output is allowed without precision leak |
||||
:param samplingStrenght: only usefull if param useRetinaLogSampling=true, specifies the strenght of the log scale that is applied |
||||
|
||||
Retina::activateContoursProcessing |
||||
++++++++++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::activateContoursProcessing(const bool activate) |
||||
|
||||
Activate/desactivate the Parvocellular pathway processing (contours information extraction), by default, it is activated |
||||
|
||||
:param activate: true if Parvocellular (contours information extraction) output should be activated, false if not... if activated, the Parvocellular output can be retrieved using the **getParvo** methods |
||||
|
||||
Retina::activateMovingContoursProcessing |
||||
++++++++++++++++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::activateMovingContoursProcessing(const bool activate) |
||||
|
||||
Activate/desactivate the Magnocellular pathway processing (motion information extraction), by default, it is activated |
||||
|
||||
:param activate: true if Magnocellular output should be activated, false if not... if activated, the Magnocellular output can be retrieved using the **getMagno** methods |
||||
|
||||
Retina::clearBuffers |
||||
++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::clearBuffers() |
||||
|
||||
Clears all retina buffers (equivalent to opening the eyes after a long period of eye close ;o) whatchout the temporal transition occuring just after this method call. |
||||
|
||||
Retina::getParvo |
||||
++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::getParvo( OutputArray retinaOutput_parvo ) |
||||
.. ocv:function:: void Retina::getParvoRAW( OutputArray retinaOutput_parvo ) |
||||
.. ocv:function:: const Mat Retina::getParvoRAW() const |
||||
|
||||
Accessor of the details channel of the retina (models foveal vision). Warning, getParvoRAW methods return buffers that are not rescaled within range [0;255] while the non RAW method allows a normalized matrix to be retrieved. |
||||
|
||||
:param retinaOutput_parvo: the output buffer (reallocated if necessary), format can be : |
||||
|
||||
* a Mat, this output is rescaled for standard 8bits image processing use in OpenCV |
||||
|
||||
* RAW methods actually return a 1D matrix (encoding is R1, R2, ... Rn, G1, G2, ..., Gn, B1, B2, ...Bn), this output is the original retina filter model output, without any quantification or rescaling. |
||||
|
||||
Retina::getMagno |
||||
++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::getMagno( OutputArray retinaOutput_magno ) |
||||
.. ocv:function:: void Retina::getMagnoRAW( OutputArray retinaOutput_magno ) |
||||
.. ocv:function:: const Mat Retina::getMagnoRAW() const |
||||
|
||||
Accessor of the motion channel of the retina (models peripheral vision). Warning, getMagnoRAW methods return buffers that are not rescaled within range [0;255] while the non RAW method allows a normalized matrix to be retrieved. |
||||
|
||||
:param retinaOutput_magno: the output buffer (reallocated if necessary), format can be : |
||||
|
||||
* a Mat, this output is rescaled for standard 8bits image processing use in OpenCV |
||||
|
||||
* RAW methods actually return a 1D matrix (encoding is M1, M2,... Mn), this output is the original retina filter model output, without any quantification or rescaling. |
||||
|
||||
Retina::getInputSize |
||||
++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: Size Retina::getInputSize() |
||||
|
||||
Retreive retina input buffer size |
||||
|
||||
:return: the retina input buffer size |
||||
|
||||
Retina::getOutputSize |
||||
+++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: Size Retina::getOutputSize() |
||||
|
||||
Retreive retina output buffer size that can be different from the input if a spatial log transformation is applied |
||||
|
||||
:return: the retina output buffer size |
||||
|
||||
Retina::printSetup |
||||
++++++++++++++++++ |
||||
|
||||
.. ocv:function:: const String Retina::printSetup() |
||||
|
||||
Outputs a string showing the used parameters setup |
||||
|
||||
:return: a string which contains formated parameters information |
||||
|
||||
Retina::run |
||||
+++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::run(InputArray inputImage) |
||||
|
||||
Method which allows retina to be applied on an input image, after run, encapsulated retina module is ready to deliver its outputs using dedicated acccessors, see getParvo and getMagno methods |
||||
|
||||
:param inputImage: the input Mat image to be processed, can be gray level or BGR coded in any format (from 8bit to 16bits) |
||||
|
||||
Retina::applyFastToneMapping |
||||
++++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::applyFastToneMapping(InputArray inputImage, OutputArray outputToneMappedImage) |
||||
|
||||
Method which processes an image in the aim to correct its luminance : correct backlight problems, enhance details in shadows. This method is designed to perform High Dynamic Range image tone mapping (compress >8bit/pixel images to 8bit/pixel). This is a simplified version of the Retina Parvocellular model (simplified version of the run/getParvo methods call) since it does not include the spatio-temporal filter modelling the Outer Plexiform Layer of the retina that performs spectral whitening and many other stuff. However, it works great for tone mapping and in a faster way. |
||||
|
||||
Check the demos and experiments section to see examples and the way to perform tone mapping using the original retina model and the method. |
||||
|
||||
:param inputImage: the input image to process (should be coded in float format : CV_32F, CV_32FC1, CV_32F_C3, CV_32F_C4, the 4th channel won't be considered). |
||||
:param outputToneMappedImage: the output 8bit/channel tone mapped image (CV_8U or CV_8UC3 format). |
||||
|
||||
Retina::setColorSaturation |
||||
++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::setColorSaturation(const bool saturateColors = true, const float colorSaturationValue = 4.0 ) |
||||
|
||||
Activate color saturation as the final step of the color demultiplexing process -> this saturation is a sigmoide function applied to each channel of the demultiplexed image. |
||||
|
||||
:param saturateColors: boolean that activates color saturation (if true) or desactivate (if false) |
||||
:param colorSaturationValue: the saturation factor : a simple factor applied on the chrominance buffers |
||||
|
||||
|
||||
Retina::setup |
||||
+++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::setup(String retinaParameterFile = "", const bool applyDefaultSetupOnFailure = true ) |
||||
.. ocv:function:: void Retina::setup(FileStorage & fs, const bool applyDefaultSetupOnFailure = true ) |
||||
.. ocv:function:: void Retina::setup(RetinaParameters newParameters) |
||||
|
||||
Try to open an XML retina parameters file to adjust current retina instance setup => if the xml file does not exist, then default setup is applied => warning, Exceptions are thrown if read XML file is not valid |
||||
|
||||
:param retinaParameterFile: the parameters filename |
||||
:param applyDefaultSetupOnFailure: set to true if an error must be thrown on error |
||||
:param fs: the open Filestorage which contains retina parameters |
||||
:param newParameters: a parameters structures updated with the new target configuration. You can retreive the current parameers structure using method *Retina::RetinaParameters Retina::getParameters()* and update it before running method *setup*. |
||||
|
||||
Retina::write |
||||
+++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::write( String fs ) const |
||||
.. ocv:function:: void Retina::write( FileStorage& fs ) const |
||||
|
||||
Write xml/yml formated parameters information |
||||
|
||||
:param fs: the filename of the xml file that will be open and writen with formatted parameters information |
||||
|
||||
Retina::setupIPLMagnoChannel |
||||
++++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::setupIPLMagnoChannel(const bool normaliseOutput = true, const float parasolCells_beta = 0, const float parasolCells_tau = 0, const float parasolCells_k = 7, const float amacrinCellsTemporalCutFrequency = 1.2, const float V0CompressionParameter = 0.95, const float localAdaptintegration_tau = 0, const float localAdaptintegration_k = 7 ) |
||||
|
||||
Set parameters values for the Inner Plexiform Layer (IPL) magnocellular channel this channel processes signals output from OPL processing stage in peripheral vision, it allows motion information enhancement. It is decorrelated from the details channel. See reference papers for more details. |
||||
|
||||
:param normaliseOutput: specifies if (true) output is rescaled between 0 and 255 of not (false) |
||||
:param parasolCells_beta: the low pass filter gain used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), typical value is 0 |
||||
:param parasolCells_tau: the low pass filter time constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is frame, typical value is 0 (immediate response) |
||||
:param parasolCells_k: the low pass filter spatial constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is pixels, typical value is 5 |
||||
:param amacrinCellsTemporalCutFrequency: the time constant of the first order high pass fiter of the magnocellular way (motion information channel), unit is frames, typical value is 1.2 |
||||
:param V0CompressionParameter: the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.95 |
||||
:param localAdaptintegration_tau: specifies the temporal constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation |
||||
:param localAdaptintegration_k: specifies the spatial constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation |
||||
|
||||
Retina::setupOPLandIPLParvoChannel |
||||
++++++++++++++++++++++++++++++++++ |
||||
|
||||
.. ocv:function:: void Retina::setupOPLandIPLParvoChannel(const bool colorMode = true, const bool normaliseOutput = true, const float photoreceptorsLocalAdaptationSensitivity = 0.7, const float photoreceptorsTemporalConstant = 0.5, const float photoreceptorsSpatialConstant = 0.53, const float horizontalCellsGain = 0, const float HcellsTemporalConstant = 1, const float HcellsSpatialConstant = 7, const float ganglionCellsSensitivity = 0.7 ) |
||||
|
||||
Setup the OPL and IPL parvo channels (see biologocal model) OPL is referred as Outer Plexiform Layer of the retina, it allows the spatio-temporal filtering which withens the spectrum and reduces spatio-temporal noise while attenuating global luminance (low frequency energy) IPL parvo is the OPL next processing stage, it refers to a part of the Inner Plexiform layer of the retina, it allows high contours sensitivity in foveal vision. See reference papers for more informations. |
||||
|
||||
:param colorMode: specifies if (true) color is processed of not (false) to then processing gray level image |
||||
:param normaliseOutput: specifies if (true) output is rescaled between 0 and 255 of not (false) |
||||
:param photoreceptorsLocalAdaptationSensitivity: the photoreceptors sensitivity renage is 0-1 (more log compression effect when value increases) |
||||
:param photoreceptorsTemporalConstant: the time constant of the first order low pass filter of the photoreceptors, use it to cut high temporal frequencies (noise or fast motion), unit is frames, typical value is 1 frame |
||||
:param photoreceptorsSpatialConstant: the spatial constant of the first order low pass filter of the photoreceptors, use it to cut high spatial frequencies (noise or thick contours), unit is pixels, typical value is 1 pixel |
||||
:param horizontalCellsGain: gain of the horizontal cells network, if 0, then the mean value of the output is zero, if the parameter is near 1, then, the luminance is not filtered and is still reachable at the output, typicall value is 0 |
||||
:param HcellsTemporalConstant: the time constant of the first order low pass filter of the horizontal cells, use it to cut low temporal frequencies (local luminance variations), unit is frames, typical value is 1 frame, as the photoreceptors |
||||
:param HcellsSpatialConstant: the spatial constant of the first order low pass filter of the horizontal cells, use it to cut low spatial frequencies (local luminance), unit is pixels, typical value is 5 pixel, this value is also used for local contrast computing when computing the local contrast adaptation at the ganglion cells level (Inner Plexiform Layer parvocellular channel model) |
||||
:param ganglionCellsSensitivity: the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.7 |
||||
|
||||
|
||||
Retina::RetinaParameters |
||||
======================== |
||||
|
||||
.. ocv:struct:: Retina::RetinaParameters |
||||
|
||||
This structure merges all the parameters that can be adjusted threw the **Retina::setup()**, **Retina::setupOPLandIPLParvoChannel** and **Retina::setupIPLMagnoChannel** setup methods |
||||
Parameters structure for better clarity, check explenations on the comments of methods : setupOPLandIPLParvoChannel and setupIPLMagnoChannel. :: |
||||
|
||||
class RetinaParameters{ |
||||
struct OPLandIplParvoParameters{ // Outer Plexiform Layer (OPL) and Inner Plexiform Layer Parvocellular (IplParvo) parameters |
||||
OPLandIplParvoParameters():colorMode(true), |
||||
normaliseOutput(true), // specifies if (true) output is rescaled between 0 and 255 of not (false) |
||||
photoreceptorsLocalAdaptationSensitivity(0.7f), // the photoreceptors sensitivity renage is 0-1 (more log compression effect when value increases) |
||||
photoreceptorsTemporalConstant(0.5f),// the time constant of the first order low pass filter of the photoreceptors, use it to cut high temporal frequencies (noise or fast motion), unit is frames, typical value is 1 frame |
||||
photoreceptorsSpatialConstant(0.53f),// the spatial constant of the first order low pass filter of the photoreceptors, use it to cut high spatial frequencies (noise or thick contours), unit is pixels, typical value is 1 pixel |
||||
horizontalCellsGain(0.0f),//gain of the horizontal cells network, if 0, then the mean value of the output is zero, if the parameter is near 1, then, the luminance is not filtered and is still reachable at the output, typicall value is 0 |
||||
hcellsTemporalConstant(1.f),// the time constant of the first order low pass filter of the horizontal cells, use it to cut low temporal frequencies (local luminance variations), unit is frames, typical value is 1 frame, as the photoreceptors. Reduce to 0.5 to limit retina after effects. |
||||
hcellsSpatialConstant(7.f),//the spatial constant of the first order low pass filter of the horizontal cells, use it to cut low spatial frequencies (local luminance), unit is pixels, typical value is 5 pixel, this value is also used for local contrast computing when computing the local contrast adaptation at the ganglion cells level (Inner Plexiform Layer parvocellular channel model) |
||||
ganglionCellsSensitivity(0.7f)//the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.7 |
||||
{};// default setup |
||||
bool colorMode, normaliseOutput; |
||||
float photoreceptorsLocalAdaptationSensitivity, photoreceptorsTemporalConstant, photoreceptorsSpatialConstant, horizontalCellsGain, hcellsTemporalConstant, hcellsSpatialConstant, ganglionCellsSensitivity; |
||||
}; |
||||
struct IplMagnoParameters{ // Inner Plexiform Layer Magnocellular channel (IplMagno) |
||||
IplMagnoParameters(): |
||||
normaliseOutput(true), //specifies if (true) output is rescaled between 0 and 255 of not (false) |
||||
parasolCells_beta(0.f), // the low pass filter gain used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), typical value is 0 |
||||
parasolCells_tau(0.f), //the low pass filter time constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is frame, typical value is 0 (immediate response) |
||||
parasolCells_k(7.f), //the low pass filter spatial constant used for local contrast adaptation at the IPL level of the retina (for ganglion cells local adaptation), unit is pixels, typical value is 5 |
||||
amacrinCellsTemporalCutFrequency(1.2f), //the time constant of the first order high pass fiter of the magnocellular way (motion information channel), unit is frames, typical value is 1.2 |
||||
V0CompressionParameter(0.95f), the compression strengh of the ganglion cells local adaptation output, set a value between 0.6 and 1 for best results, a high value increases more the low value sensitivity... and the output saturates faster, recommended value: 0.95 |
||||
localAdaptintegration_tau(0.f), // specifies the temporal constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation |
||||
localAdaptintegration_k(7.f) // specifies the spatial constant of the low pas filter involved in the computation of the local "motion mean" for the local adaptation computation |
||||
{};// default setup |
||||
bool normaliseOutput; |
||||
float parasolCells_beta, parasolCells_tau, parasolCells_k, amacrinCellsTemporalCutFrequency, V0CompressionParameter, localAdaptintegration_tau, localAdaptintegration_k; |
||||
}; |
||||
struct OPLandIplParvoParameters OPLandIplParvo; |
||||
struct IplMagnoParameters IplMagno; |
||||
}; |
||||
|
||||
Retina parameters files examples |
||||
++++++++++++++++++++++++++++++++ |
||||
|
||||
Here is the default configuration file of the retina module. It gives results such as the first retina output shown on the top of this page. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
<?xml version="1.0"?> |
||||
<opencv_storage> |
||||
<OPLandIPLparvo> |
||||
<colorMode>1</colorMode> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity> |
||||
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant> |
||||
<photoreceptorsSpatialConstant>5.3e-01</photoreceptorsSpatialConstant> |
||||
<horizontalCellsGain>0.01</horizontalCellsGain> |
||||
<hcellsTemporalConstant>0.5</hcellsTemporalConstant> |
||||
<hcellsSpatialConstant>7.</hcellsSpatialConstant> |
||||
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo> |
||||
<IPLmagno> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<parasolCells_beta>0.</parasolCells_beta> |
||||
<parasolCells_tau>0.</parasolCells_tau> |
||||
<parasolCells_k>7.</parasolCells_k> |
||||
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency> |
||||
<V0CompressionParameter>9.5e-01</V0CompressionParameter> |
||||
<localAdaptintegration_tau>0.</localAdaptintegration_tau> |
||||
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno> |
||||
</opencv_storage> |
||||
|
||||
Here is the 'realistic" setup used to obtain the second retina output shown on the top of this page. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
<?xml version="1.0"?> |
||||
<opencv_storage> |
||||
<OPLandIPLparvo> |
||||
<colorMode>1</colorMode> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<photoreceptorsLocalAdaptationSensitivity>8.9e-01</photoreceptorsLocalAdaptationSensitivity> |
||||
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant> |
||||
<photoreceptorsSpatialConstant>5.3e-01</photoreceptorsSpatialConstant> |
||||
<horizontalCellsGain>0.3</horizontalCellsGain> |
||||
<hcellsTemporalConstant>0.5</hcellsTemporalConstant> |
||||
<hcellsSpatialConstant>7.</hcellsSpatialConstant> |
||||
<ganglionCellsSensitivity>8.9e-01</ganglionCellsSensitivity></OPLandIPLparvo> |
||||
<IPLmagno> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<parasolCells_beta>0.</parasolCells_beta> |
||||
<parasolCells_tau>0.</parasolCells_tau> |
||||
<parasolCells_k>7.</parasolCells_k> |
||||
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency> |
||||
<V0CompressionParameter>9.5e-01</V0CompressionParameter> |
||||
<localAdaptintegration_tau>0.</localAdaptintegration_tau> |
||||
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno> |
||||
</opencv_storage> |
After Width: | Height: | Size: 147 KiB |
After Width: | Height: | Size: 49 KiB |
After Width: | Height: | Size: 78 KiB |
After Width: | Height: | Size: 28 KiB |
After Width: | Height: | Size: 68 KiB |
@ -0,0 +1,476 @@ |
||||
Discovering the human retina and its use for image processing {#tutorial_bioinspired_retina_model} |
||||
============================================================= |
||||
|
||||
Goal |
||||
---- |
||||
|
||||
I present here a model of human retina that shows some interesting properties for image |
||||
preprocessing and enhancement. In this tutorial you will learn how to: |
||||
|
||||
- discover the main two channels outing from your retina |
||||
- see the basics to use the retina model |
||||
- discover some parameters tweaks |
||||
|
||||
General overview |
||||
---------------- |
||||
|
||||
The proposed model originates from Jeanny Herault's research @cite Herault2010 at |
||||
[Gipsa](http://www.gipsa-lab.inpg.fr). It is involved in image processing applications with |
||||
[Listic](http://www.listic.univ-savoie.fr) (code maintainer and user) lab. This is not a complete |
||||
model but it already present interesting properties that can be involved for enhanced image |
||||
processing experience. The model allows the following human retina properties to be used : |
||||
|
||||
- spectral whitening that has 3 important effects: high spatio-temporal frequency signals |
||||
canceling (noise), mid-frequencies details enhancement and low frequencies luminance energy |
||||
reduction. This *all in one* property directly allows visual signals cleaning of classical |
||||
undesired distortions introduced by image sensors and input luminance range. |
||||
- local logarithmic luminance compression allows details to be enhanced even in low light |
||||
conditions. |
||||
- decorrelation of the details information (Parvocellular output channel) and transient |
||||
information (events, motion made available at the Magnocellular output channel). |
||||
|
||||
The first two points are illustrated below : |
||||
|
||||
In the figure below, the OpenEXR image sample *CrissyField.exr*, a High Dynamic Range image is |
||||
shown. In order to make it visible on this web-page, the original input image is linearly rescaled |
||||
to the classical image luminance range [0-255] and is converted to 8bit/channel format. Such strong |
||||
conversion hides many details because of too strong local contrasts. Furthermore, noise energy is |
||||
also strong and pollutes visual information. |
||||
|
||||
![image](images/retina_TreeHdr_small.jpg) |
||||
|
||||
In the following image, applying the ideas proposed in @cite Benoit2010, as your retina does, local |
||||
luminance adaptation, spatial noise removal and spectral whitening work together and transmit |
||||
accurate information on lower range 8bit data channels. On this picture, noise in significantly |
||||
removed, local details hidden by strong luminance contrasts are enhanced. Output image keeps its |
||||
naturalness and visual content is enhanced. Color processing is based on the color |
||||
multiplexing/demultiplexing method proposed in @cite Chaix2007 . |
||||
|
||||
![image](images/retina_TreeHdr_retina.jpg) |
||||
|
||||
*Note :* image sample can be downloaded from the [OpenEXR website](http://www.openexr.com). |
||||
Regarding this demonstration, before retina processing, input image has been linearly rescaled |
||||
within 0-255 keeping its channels float format. 5% of its histogram ends has been cut (mostly |
||||
removes wrong HDR pixels). Check out the sample |
||||
*opencv/samples/cpp/OpenEXRimages_HighDynamicRange_Retina_toneMapping.cpp* for similar |
||||
processing. The following demonstration will only consider classical 8bit/channel images. |
||||
|
||||
The retina model output channels |
||||
-------------------------------- |
||||
|
||||
The retina model presents two outputs that benefit from the above cited behaviors. |
||||
|
||||
- The first one is called the Parvocellular channel. It is mainly active in the foveal retina area |
||||
(high resolution central vision with color sensitive photo-receptors), its aim is to provide |
||||
accurate color vision for visual details remaining static on the retina. On the other hand |
||||
objects moving on the retina projection are blurred. |
||||
- The second well known channel is the Magnocellular channel. It is mainly active in the retina |
||||
peripheral vision and send signals related to change events (motion, transient events, etc.). |
||||
These outing signals also help visual system to focus/center retina on 'transient'/moving areas |
||||
for more detailed analysis thus improving visual scene context and object classification. |
||||
|
||||
**NOTE :** regarding the proposed model, contrary to the real retina, we apply these two channels on |
||||
the entire input images using the same resolution. This allows enhanced visual details and motion |
||||
information to be extracted on all the considered images... but remember, that these two channels |
||||
are complementary. For example, if Magnocellular channel gives strong energy in an area, then, the |
||||
Parvocellular channel is certainly blurred there since there is a transient event. |
||||
|
||||
As an illustration, we apply in the following the retina model on a webcam video stream of a dark |
||||
visual scene. In this visual scene, captured in an amphitheater of the university, some students are |
||||
moving while talking to the teacher. |
||||
|
||||
In this video sequence, because of the dark ambiance, signal to noise ratio is low and color |
||||
artifacts are present on visual features edges because of the low quality image capture tool-chain. |
||||
|
||||
![image](images/studentsSample_input.jpg) |
||||
|
||||
Below is shown the retina foveal vision applied on the entire image. In the used retina |
||||
configuration, global luminance is preserved and local contrasts are enhanced. Also, signal to noise |
||||
ratio is improved : since high frequency spatio-temporal noise is reduced, enhanced details are not |
||||
corrupted by any enhanced noise. |
||||
|
||||
![image](images/studentsSample_parvo.jpg) |
||||
|
||||
Below is the output of the Magnocellular output of the retina model. Its signals are strong where |
||||
transient events occur. Here, a student is moving at the bottom of the image thus generating high |
||||
energy. The remaining of the image is static however, it is corrupted by a strong noise. Here, the |
||||
retina filters out most of the noise thus generating low false motion area 'alarms'. This channel |
||||
can be used as a transient/moving areas detector : it would provide relevant information for a low |
||||
cost segmentation tool that would highlight areas in which an event is occurring. |
||||
|
||||
![image](images/studentsSample_magno.jpg) |
||||
|
||||
Retina use case |
||||
--------------- |
||||
|
||||
This model can be used basically for spatio-temporal video effects but also in the aim of : |
||||
|
||||
- performing texture analysis with enhanced signal to noise ratio and enhanced details robust |
||||
against input images luminance ranges (check out the Parvocellular retina channel output) |
||||
- performing motion analysis also taking benefit of the previously cited properties. |
||||
|
||||
Literature |
||||
---------- |
||||
|
||||
For more information, refer to the following papers : @cite Benoit2010 |
||||
|
||||
- Please have a look at the reference work of Jeanny Herault that you can read in his book @cite Herault2010 |
||||
|
||||
This retina filter code includes the research contributions of phd/research collegues from which |
||||
code has been redrawn by the author : |
||||
|
||||
- take a look at the *retinacolor.hpp* module to discover Brice Chaix de Lavarene phD color |
||||
mosaicing/demosaicing and his reference paper @cite Chaix2007 |
||||
|
||||
- take a look at *imagelogpolprojection.hpp* to discover retina spatial log sampling which |
||||
originates from Barthelemy Durette phd with Jeanny Herault. A Retina / V1 cortex projection is |
||||
also proposed and originates from Jeanny's discussions. More informations in the above cited |
||||
Jeanny Heraults's book. |
||||
|
||||
Code tutorial |
||||
------------- |
||||
|
||||
Please refer to the original tutorial source code in file |
||||
*opencv_folder/samples/cpp/tutorial_code/bioinspired/retina_tutorial.cpp*. |
||||
|
||||
@note do not forget that the retina model is included in the following namespace: cv::bioinspired |
||||
|
||||
To compile it, assuming OpenCV is correctly installed, use the following command. It requires the |
||||
opencv_core *(cv::Mat and friends objects management)*, opencv_highgui *(display and image/video |
||||
read)* and opencv_bioinspired *(Retina description)* libraries to compile. |
||||
|
||||
@code{.sh} |
||||
// compile |
||||
gcc retina_tutorial.cpp -o Retina_tuto -lopencv_core -lopencv_highgui -lopencv_bioinspired |
||||
|
||||
// Run commands : add 'log' as a last parameter to apply a spatial log sampling (simulates retina sampling) |
||||
// run on webcam |
||||
./Retina_tuto -video |
||||
// run on video file |
||||
./Retina_tuto -video myVideo.avi |
||||
// run on an image |
||||
./Retina_tuto -image myPicture.jpg |
||||
// run on an image with log sampling |
||||
./Retina_tuto -image myPicture.jpg log |
||||
@endcode |
||||
|
||||
Here is a code explanation : |
||||
|
||||
Retina definition is present in the bioinspired package and a simple include allows to use it. You |
||||
can rather use the specific header : *opencv2/bioinspired.hpp* if you prefer but then include the |
||||
other required openv modules : *opencv2/core.hpp* and *opencv2/highgui.hpp* |
||||
@code{.cpp} |
||||
#include "opencv2/opencv.hpp" |
||||
@endcode |
||||
Provide user some hints to run the program with a help function |
||||
@code{.cpp} |
||||
// the help procedure |
||||
static void help(std::string errorMessage) |
||||
{ |
||||
std::cout<<"Program init error : "<<errorMessage<<std::endl; |
||||
std::cout<<"\nProgram call procedure : retinaDemo [processing mode] [Optional : media target] [Optional LAST parameter: \"log\" to activate retina log sampling]"<<std::endl; |
||||
std::cout<<"\t[processing mode] :"<<std::endl; |
||||
std::cout<<"\t -image : for still image processing"<<std::endl; |
||||
std::cout<<"\t -video : for video stream processing"<<std::endl; |
||||
std::cout<<"\t[Optional : media target] :"<<std::endl; |
||||
std::cout<<"\t if processing an image or video file, then, specify the path and filename of the target to process"<<std::endl; |
||||
std::cout<<"\t leave empty if processing video stream coming from a connected video device"<<std::endl; |
||||
std::cout<<"\t[Optional : activate retina log sampling] : an optional last parameter can be specified for retina spatial log sampling"<<std::endl; |
||||
std::cout<<"\t set \"log\" without quotes to activate this sampling, output frame size will be divided by 4"<<std::endl; |
||||
std::cout<<"\nExamples:"<<std::endl; |
||||
std::cout<<"\t-Image processing : ./retinaDemo -image lena.jpg"<<std::endl; |
||||
std::cout<<"\t-Image processing with log sampling : ./retinaDemo -image lena.jpg log"<<std::endl; |
||||
std::cout<<"\t-Video processing : ./retinaDemo -video myMovie.mp4"<<std::endl; |
||||
std::cout<<"\t-Live video processing : ./retinaDemo -video"<<std::endl; |
||||
std::cout<<"\nPlease start again with new parameters"<<std::endl; |
||||
std::cout<<"****************************************************"<<std::endl; |
||||
std::cout<<" NOTE : this program generates the default retina parameters file 'RetinaDefaultParameters.xml'"<<std::endl; |
||||
std::cout<<" => you can use this to fine tune parameters and load them if you save to file 'RetinaSpecificParameters.xml'"<<std::endl; |
||||
} |
||||
@endcode |
||||
Then, start the main program and first declare a *cv::Mat* matrix in which input images will be |
||||
loaded. Also allocate a *cv::VideoCapture* object ready to load video streams (if necessary) |
||||
@code{.cpp} |
||||
int main(int argc, char* argv[]) { |
||||
// declare the retina input buffer... that will be fed differently in regard of the input media |
||||
cv::Mat inputFrame; |
||||
cv::VideoCapture videoCapture; // in case a video media is used, its manager is declared here |
||||
@endcode |
||||
In the main program, before processing, first check input command parameters. Here it loads a first |
||||
input image coming from a single loaded image (if user chose command *-image*) or from a video |
||||
stream (if user chose command *-video*). Also, if the user added *log* command at the end of its |
||||
program call, the spatial logarithmic image sampling performed by the retina is taken into account |
||||
by the Boolean flag *useLogSampling*. |
||||
@code{.cpp} |
||||
// welcome message |
||||
std::cout<<"****************************************************"<<std::endl; |
||||
std::cout<<"* Retina demonstration : demonstrates the use of is a wrapper class of the Gipsa/Listic Labs retina model."<<std::endl; |
||||
std::cout<<"* This demo will try to load the file 'RetinaSpecificParameters.xml' (if exists).\nTo create it, copy the autogenerated template 'RetinaDefaultParameters.xml'.\nThen twaek it with your own retina parameters."<<std::endl; |
||||
// basic input arguments checking |
||||
if (argc<2) |
||||
{ |
||||
help("bad number of parameter"); |
||||
return -1; |
||||
} |
||||
|
||||
bool useLogSampling = !strcmp(argv[argc-1], "log"); // check if user wants retina log sampling processing |
||||
|
||||
std::string inputMediaType=argv[1]; |
||||
|
||||
////////////////////////////////////////////////////////////////////////////// |
||||
// checking input media type (still image, video file, live video acquisition) |
||||
if (!strcmp(inputMediaType.c_str(), "-image") && argc >= 3) |
||||
{ |
||||
std::cout<<"RetinaDemo: processing image "<<argv[2]<<std::endl; |
||||
// image processing case |
||||
inputFrame = cv::imread(std::string(argv[2]), 1); // load image in RGB mode |
||||
}else |
||||
if (!strcmp(inputMediaType.c_str(), "-video")) |
||||
{ |
||||
if (argc == 2 || (argc == 3 && useLogSampling)) // attempt to grab images from a video capture device |
||||
{ |
||||
videoCapture.open(0); |
||||
}else// attempt to grab images from a video filestream |
||||
{ |
||||
std::cout<<"RetinaDemo: processing video stream "<<argv[2]<<std::endl; |
||||
videoCapture.open(argv[2]); |
||||
} |
||||
|
||||
// grab a first frame to check if everything is ok |
||||
videoCapture>>inputFrame; |
||||
}else |
||||
{ |
||||
// bad command parameter |
||||
help("bad command parameter"); |
||||
return -1; |
||||
} |
||||
@endcode |
||||
Once all input parameters are processed, a first image should have been loaded, if not, display |
||||
error and stop program : |
||||
@code{.cpp} |
||||
if (inputFrame.empty()) |
||||
{ |
||||
help("Input media could not be loaded, aborting"); |
||||
return -1; |
||||
} |
||||
@endcode |
||||
Now, everything is ready to run the retina model. I propose here to allocate a retina instance and |
||||
to manage the eventual log sampling option. The Retina constructor expects at least a cv::Size |
||||
object that shows the input data size that will have to be managed. One can activate other options |
||||
such as color and its related color multiplexing strategy (here Bayer multiplexing is chosen using |
||||
*enum cv::bioinspired::RETINA_COLOR_BAYER*). If using log sampling, the image reduction factor |
||||
(smaller output images) and log sampling strengh can be adjusted. |
||||
@code{.cpp} |
||||
// pointer to a retina object |
||||
cv::Ptr<cv::bioinspired::Retina> myRetina; |
||||
|
||||
// if the last parameter is 'log', then activate log sampling (favour foveal vision and subsamples peripheral vision) |
||||
if (useLogSampling) |
||||
{ |
||||
myRetina = cv::bioinspired::createRetina(inputFrame.size(), true, cv::bioinspired::RETINA_COLOR_BAYER, true, 2.0, 10.0); |
||||
} |
||||
else// -> else allocate "classical" retina : |
||||
myRetina = cv::bioinspired::createRetina(inputFrame.size()); |
||||
@endcode |
||||
Once done, the proposed code writes a default xml file that contains the default parameters of the |
||||
retina. This is useful to make your own config using this template. Here generated template xml file |
||||
is called *RetinaDefaultParameters.xml*. |
||||
@code{.cpp} |
||||
// save default retina parameters file in order to let you see this and maybe modify it and reload using method "setup" |
||||
myRetina->write("RetinaDefaultParameters.xml"); |
||||
@endcode |
||||
In the following line, the retina attempts to load another xml file called |
||||
*RetinaSpecificParameters.xml*. If you created it and introduced your own setup, it will be loaded, |
||||
in the other case, default retina parameters are used. |
||||
@code{.cpp} |
||||
// load parameters if file exists |
||||
myRetina->setup("RetinaSpecificParameters.xml"); |
||||
@endcode |
||||
It is not required here but just to show it is possible, you can reset the retina buffers to zero to |
||||
force it to forget past events. |
||||
@code{.cpp} |
||||
// reset all retina buffers (imagine you close your eyes for a long time) |
||||
myRetina->clearBuffers(); |
||||
@endcode |
||||
Now, it is time to run the retina ! First create some output buffers ready to receive the two retina |
||||
channels outputs |
||||
@code{.cpp} |
||||
// declare retina output buffers |
||||
cv::Mat retinaOutput_parvo; |
||||
cv::Mat retinaOutput_magno; |
||||
@endcode |
||||
Then, run retina in a loop, load new frames from video sequence if necessary and get retina outputs |
||||
back to dedicated buffers. |
||||
@code{.cpp} |
||||
// processing loop with no stop condition |
||||
while(true) |
||||
{ |
||||
// if using video stream, then, grabbing a new frame, else, input remains the same |
||||
if (videoCapture.isOpened()) |
||||
videoCapture>>inputFrame; |
||||
|
||||
// run retina filter on the loaded input frame |
||||
myRetina->run(inputFrame); |
||||
// Retrieve and display retina output |
||||
myRetina->getParvo(retinaOutput_parvo); |
||||
myRetina->getMagno(retinaOutput_magno); |
||||
cv::imshow("retina input", inputFrame); |
||||
cv::imshow("Retina Parvo", retinaOutput_parvo); |
||||
cv::imshow("Retina Magno", retinaOutput_magno); |
||||
cv::waitKey(10); |
||||
} |
||||
@endcode |
||||
That's done ! But if you want to secure the system, take care and manage Exceptions. The retina can |
||||
throw some when it sees irrelevant data (no input frame, wrong setup, etc.). Then, i recommend to |
||||
surround all the retina code by a try/catch system like this : |
||||
@code{.cpp} |
||||
try{ |
||||
// pointer to a retina object |
||||
cv::Ptr<cv::Retina> myRetina; |
||||
[---] |
||||
// processing loop with no stop condition |
||||
while(true) |
||||
{ |
||||
[---] |
||||
} |
||||
|
||||
}catch(cv::Exception e) |
||||
{ |
||||
std::cerr<<"Error using Retina : "<<e.what()<<std::endl; |
||||
} |
||||
@endcode |
||||
|
||||
Retina parameters, what to do ? |
||||
------------------------------- |
||||
|
||||
First, it is recommended to read the reference paper @cite Benoit2010 |
||||
|
||||
Once done open the configuration file *RetinaDefaultParameters.xml* generated by the demo and let's |
||||
have a look at it. |
||||
@code{.cpp} |
||||
<?xml version="1.0"?> |
||||
<opencv_storage> |
||||
<OPLandIPLparvo> |
||||
<colorMode>1</colorMode> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<photoreceptorsLocalAdaptationSensitivity>7.5e-01</photoreceptorsLocalAdaptationSensitivity> |
||||
<photoreceptorsTemporalConstant>9.0e-01</photoreceptorsTemporalConstant> |
||||
<photoreceptorsSpatialConstant>5.7e-01</photoreceptorsSpatialConstant> |
||||
<horizontalCellsGain>0.01</horizontalCellsGain> |
||||
<hcellsTemporalConstant>0.5</hcellsTemporalConstant> |
||||
<hcellsSpatialConstant>7.</hcellsSpatialConstant> |
||||
<ganglionCellsSensitivity>7.5e-01</ganglionCellsSensitivity></OPLandIPLparvo> |
||||
<IPLmagno> |
||||
<normaliseOutput>1</normaliseOutput> |
||||
<parasolCells_beta>0.</parasolCells_beta> |
||||
<parasolCells_tau>0.</parasolCells_tau> |
||||
<parasolCells_k>7.</parasolCells_k> |
||||
<amacrinCellsTemporalCutFrequency>2.0e+00</amacrinCellsTemporalCutFrequency> |
||||
<V0CompressionParameter>9.5e-01</V0CompressionParameter> |
||||
<localAdaptintegration_tau>0.</localAdaptintegration_tau> |
||||
<localAdaptintegration_k>7.</localAdaptintegration_k></IPLmagno> |
||||
</opencv_storage> |
||||
@endcode |
||||
Here are some hints but actually, the best parameter setup depends more on what you want to do with |
||||
the retina rather than the images input that you give to retina. Apart from the more specific case |
||||
of High Dynamic Range images (HDR) that require more specific setup for specific luminance |
||||
compression objective, the retina behaviors should be rather stable from content to content. Note |
||||
that OpenCV is able to manage such HDR format thanks to the OpenEXR images compatibility. |
||||
|
||||
Then, if the application target requires details enhancement prior to specific image processing, you |
||||
need to know if mean luminance information is required or not. If not, the the retina can cancel or |
||||
significantly reduce its energy thus giving more visibility to higher spatial frequency details. |
||||
|
||||
### Basic parameters |
||||
|
||||
The most simple parameters are the following : |
||||
|
||||
- **colorMode** : let the retina process color information (if 1) or gray scale images (if 0). In |
||||
this last case, only the first channel of the input will be processed. |
||||
- **normaliseOutput** : each channel has this parameter, if value is 1, then the considered |
||||
channel output is rescaled between 0 and 255. Take care in this case at the Magnocellular output |
||||
level (motion/transient channel detection). Residual noise will also be rescaled ! |
||||
|
||||
**Note :** using color requires color channels multiplexing/demultipexing which requires more |
||||
processing. You can expect much faster processing using gray levels : it would require around 30 |
||||
product per pixel for all the retina processes and it has recently been parallelized for multicore |
||||
architectures. |
||||
|
||||
### Photo-receptors parameters |
||||
|
||||
The following parameters act on the entry point of the retina - photo-receptors - and impact all the |
||||
following processes. These sensors are low pass spatio-temporal filters that smooth temporal and |
||||
spatial data and also adjust there sensitivity to local luminance thus improving details extraction |
||||
and high frequency noise canceling. |
||||
|
||||
- **photoreceptorsLocalAdaptationSensitivity** between 0 and 1. Values close to 1 allow high |
||||
luminance log compression effect at the photo-receptors level. Values closer to 0 give a more |
||||
linear sensitivity. Increased alone, it can burn the *Parvo (details channel)* output image. If |
||||
adjusted in collaboration with **ganglionCellsSensitivity** images can be very contrasted |
||||
whatever the local luminance there is... at the price of a naturalness decrease. |
||||
- **photoreceptorsTemporalConstant** this setups the temporal constant of the low pass filter |
||||
effect at the entry of the retina. High value lead to strong temporal smoothing effect : moving |
||||
objects are blurred and can disappear while static object are favored. But when starting the |
||||
retina processing, stable state is reached lately. |
||||
- **photoreceptorsSpatialConstant** specifies the spatial constant related to photo-receptors low |
||||
pass filter effect. This parameters specify the minimum allowed spatial signal period allowed in |
||||
the following. Typically, this filter should cut high frequency noise. Then a 0 value doesn't |
||||
cut anything noise while higher values start to cut high spatial frequencies and more and more |
||||
lower frequencies... Then, do not go to high if you wanna see some details of the input images ! |
||||
A good compromise for color images is 0.53 since this won't affect too much the color spectrum. |
||||
Higher values would lead to gray and blurred output images. |
||||
|
||||
### Horizontal cells parameters |
||||
|
||||
This parameter set tunes the neural network connected to the photo-receptors, the horizontal cells. |
||||
It modulates photo-receptors sensitivity and completes the processing for final spectral whitening |
||||
(part of the spatial band pass effect thus favoring visual details enhancement). |
||||
|
||||
- **horizontalCellsGain** here is a critical parameter ! If you are not interested by the mean |
||||
luminance and focus on details enhancement, then, set to zero. But if you want to keep some |
||||
environment luminance data, let some low spatial frequencies pass into the system and set a |
||||
higher value (\<1). |
||||
- **hcellsTemporalConstant** similar to photo-receptors, this acts on the temporal constant of a |
||||
low pass temporal filter that smooths input data. Here, a high value generates a high retina |
||||
after effect while a lower value makes the retina more reactive. This value should be lower than |
||||
**photoreceptorsTemporalConstant** to limit strong retina after effects. |
||||
- **hcellsSpatialConstant** is the spatial constant of the low pass filter of these cells filter. |
||||
It specifies the lowest spatial frequency allowed in the following. Visually, a high value leads |
||||
to very low spatial frequencies processing and leads to salient halo effects. Lower values |
||||
reduce this effect but the limit is : do not go lower than the value of |
||||
**photoreceptorsSpatialConstant**. Those 2 parameters actually specify the spatial band-pass of |
||||
the retina. |
||||
|
||||
**NOTE** after the processing managed by the previous parameters, input data is cleaned from noise |
||||
and luminance in already partly enhanced. The following parameters act on the last processing stages |
||||
of the two outing retina signals. |
||||
|
||||
### Parvo (details channel) dedicated parameter |
||||
|
||||
- **ganglionCellsSensitivity** specifies the strength of the final local adaptation occurring at |
||||
the output of this details dedicated channel. Parameter values remain between 0 and 1. Low value |
||||
tend to give a linear response while higher values enforces the remaining low contrasted areas. |
||||
|
||||
**Note :** this parameter can correct eventual burned images by favoring low energetic details of |
||||
the visual scene, even in bright areas. |
||||
|
||||
### IPL Magno (motion/transient channel) parameters |
||||
|
||||
Once image information is cleaned, this channel acts as a high pass temporal filter that only |
||||
selects signals related to transient signals (events, motion, etc.). A low pass spatial filter |
||||
smooths extracted transient data and a final logarithmic compression enhances low transient events |
||||
thus enhancing event sensitivity. |
||||
|
||||
- **parasolCells_beta** generally set to zero, can be considered as an amplifier gain at the |
||||
entry point of this processing stage. Generally set to 0. |
||||
- **parasolCells_tau** the temporal smoothing effect that can be added |
||||
- **parasolCells_k** the spatial constant of the spatial filtering effect, set it at a high value |
||||
to favor low spatial frequency signals that are lower subject to residual noise. |
||||
- **amacrinCellsTemporalCutFrequency** specifies the temporal constant of the high pass filter. |
||||
High values let slow transient events to be selected. |
||||
- **V0CompressionParameter** specifies the strength of the log compression. Similar behaviors to |
||||
previous description but here it enforces sensitivity of transient events. |
||||
- **localAdaptintegration_tau** generally set to 0, no real use here actually |
||||
- **localAdaptintegration_k** specifies the size of the area on which local adaptation is |
||||
performed. Low values lead to short range local adaptation (higher sensitivity to noise), high |
||||
values secure log compression. |
@ -1,202 +0,0 @@ |
||||
Custom Calibration Pattern |
||||
========================== |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
CustomPattern |
||||
------------- |
||||
A custom pattern class that can be used to calibrate a camera and to further track the translation and rotation of the pattern. Defaultly it uses an ``ORB`` feature detector and a ``BruteForce-Hamming(2)`` descriptor matcher to find the location of the pattern feature points that will subsequently be used for calibration. |
||||
|
||||
.. ocv:class:: CustomPattern : public Algorithm |
||||
|
||||
|
||||
CustomPattern::CustomPattern |
||||
---------------------------- |
||||
CustomPattern constructor. |
||||
|
||||
.. ocv:function:: CustomPattern() |
||||
|
||||
|
||||
CustomPattern::create |
||||
--------------------- |
||||
A method that initializes the class and generates the necessary detectors, extractors and matchers. |
||||
|
||||
.. ocv:function:: bool create(InputArray pattern, const Size2f boardSize, OutputArray output = noArray()) |
||||
|
||||
:param pattern: The image, which will be used as a pattern. If the desired pattern is part of a bigger image, you can crop it out using image(roi). |
||||
|
||||
:param boardSize: The size of the pattern in physical dimensions. These will be used to scale the points when the calibration occurs. |
||||
|
||||
:param output: A matrix that is the same as the input pattern image, but has all the feature points drawn on it. |
||||
|
||||
:return Returns whether the initialization was successful or not. Possible reason for failure may be that no feature points were detected. |
||||
|
||||
.. seealso:: |
||||
|
||||
:ocv:func:`getFeatureDetector`, |
||||
:ocv:func:`getDescriptorExtractor`, |
||||
:ocv:func:`getDescriptorMatcher` |
||||
|
||||
.. note:: |
||||
|
||||
* Determine the number of detected feature points can be done through :ocv:func:`getPatternPoints` method. |
||||
|
||||
* The feature detector, extractor and matcher cannot be changed after initialization. |
||||
|
||||
|
||||
|
||||
CustomPattern::findPattern |
||||
-------------------------- |
||||
Finds the pattern in the input image |
||||
|
||||
.. ocv:function:: bool findPattern(InputArray image, OutputArray matched_features, OutputArray pattern_points, const double ratio = 0.7, const double proj_error = 8.0, const bool refine_position = false, OutputArray out = noArray(), OutputArray H = noArray(), OutputArray pattern_corners = noArray()); |
||||
|
||||
:param image: The input image where the pattern is searched for. |
||||
|
||||
:param matched_features: A ``vector<Point2f>`` of the projections of calibration pattern points, matched in the image. The points correspond to the ``pattern_points``.``matched_features`` and ``pattern_points`` have the same size. |
||||
|
||||
:param pattern_points: A ``vector<Point3f>`` of calibration pattern points in the calibration pattern coordinate space. |
||||
|
||||
:param ratio: A ratio used to threshold matches based on D. Lowe's point ratio test. |
||||
|
||||
:param proj_error: The maximum projection error that is allowed when the found points are back projected. A lower projection error will be beneficial for eliminating mismatches. Higher values are recommended when the camera lens has greater distortions. |
||||
|
||||
:param refine_position: Whether to refine the position of the feature points with :ocv:func:`cornerSubPix`. |
||||
|
||||
:param out: An image showing the matched feature points and a contour around the estimated pattern. |
||||
|
||||
:param H: The homography transformation matrix between the pattern and the current image. |
||||
|
||||
:param pattern_corners: A ``vector<Point2f>`` containing the 4 corners of the found pattern. |
||||
|
||||
:return The method return whether the pattern was found or not. |
||||
|
||||
|
||||
CustomPattern::isInitialized |
||||
---------------------------- |
||||
|
||||
.. ocv:function:: bool isInitialized() |
||||
|
||||
:return If the class is initialized or not. |
||||
|
||||
|
||||
CustomPattern::getPatternPoints |
||||
------------------------------- |
||||
|
||||
.. ocv:function:: void getPatternPoints(OutputArray original_points) |
||||
|
||||
:param original_points: Fills the vector with the points found in the pattern. |
||||
|
||||
|
||||
CustomPattern::getPixelSize |
||||
--------------------------- |
||||
.. ocv:function:: double getPixelSize() |
||||
|
||||
:return Get the physical pixel size as initialized by the pattern. |
||||
|
||||
|
||||
CustomPattern::setFeatureDetector |
||||
--------------------------------- |
||||
.. ocv:function:: bool setFeatureDetector(Ptr<FeatureDetector> featureDetector) |
||||
|
||||
:param featureDetector: Set a new FeatureDetector. |
||||
|
||||
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`. |
||||
|
||||
.. note:: |
||||
|
||||
* It is left to user discretion to select matching feature detector, extractor and matchers. Please consult the documentation for each to confirm coherence. |
||||
|
||||
|
||||
CustomPattern::setDescriptorExtractor |
||||
------------------------------------- |
||||
.. ocv:function:: bool setDescriptorExtractor(Ptr<DescriptorExtractor> extractor) |
||||
|
||||
:param extractor: Set a new DescriptorExtractor. |
||||
|
||||
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`. |
||||
|
||||
|
||||
CustomPattern::setDescriptorMatcher |
||||
----------------------------------- |
||||
.. ocv:function:: bool setDescriptorMatcher(Ptr<DescriptorMatcher> matcher) |
||||
|
||||
:param matcher: Set a new DescriptorMatcher. |
||||
|
||||
:return Is it successfully set? Will fail if the object is already initialized by :ocv:func:`create`. |
||||
|
||||
|
||||
CustomPattern::getFeatureDetector |
||||
--------------------------------- |
||||
.. ocv:function:: Ptr<FeatureDetector> getFeatureDetector() |
||||
|
||||
:return The used FeatureDetector. |
||||
|
||||
|
||||
CustomPattern::getDescriptorExtractor |
||||
------------------------------------- |
||||
.. ocv:function:: Ptr<DescriptorExtractor> getDescriptorExtractor() |
||||
|
||||
:return The used DescriptorExtractor. |
||||
|
||||
|
||||
CustomPattern::getDescriptorMatcher |
||||
----------------------------------- |
||||
.. ocv:function:: Ptr<DescriptorMatcher> getDescriptorMatcher() |
||||
|
||||
:return The used DescriptorMatcher. |
||||
|
||||
|
||||
CustomPattern::calibrate |
||||
------------------------ |
||||
Calibrates the camera. |
||||
|
||||
.. ocv:function:: double calibrate(InputArrayOfArrays objectPoints, InputArrayOfArrays imagePoints, Size imageSize, InputOutputArray cameraMatrix, InputOutputArray distCoeffs, OutputArrayOfArrays rvecs, OutputArrayOfArrays tvecs, int flags = 0, TermCriteria criteria = TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 30, DBL_EPSILON)) |
||||
|
||||
See :ocv:func:`calibrateCamera` for parameter information. |
||||
|
||||
|
||||
CustomPattern::findRt |
||||
--------------------- |
||||
Finds the rotation and translation vectors of the pattern. |
||||
|
||||
.. ocv:function:: bool findRt(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = ITERATIVE) |
||||
.. ocv:function:: bool findRt(InputArray image, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int flags = ITERATIVE) |
||||
|
||||
:param image: The image, in which the rotation and translation of the pattern will be found. |
||||
|
||||
See :ocv:func:`solvePnP` for parameter information. |
||||
|
||||
|
||||
CustomPattern::findRtRANSAC |
||||
--------------------------- |
||||
Finds the rotation and translation vectors of the pattern using RANSAC. |
||||
|
||||
.. ocv:function:: bool findRtRANSAC(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100, float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = ITERATIVE) |
||||
.. ocv:function:: bool findRtRANSAC(InputArray image, InputArray cameraMatrix, InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, int iterationsCount = 100, float reprojectionError = 8.0, int minInliersCount = 100, OutputArray inliers = noArray(), int flags = ITERATIVE) |
||||
|
||||
:param image: The image, in which the rotation and translation of the pattern will be found. |
||||
|
||||
See :ocv:func:`solvePnPRANSAC` for parameter information. |
||||
|
||||
|
||||
CustomPattern::drawOrientation |
||||
------------------------------ |
||||
Draws the ``(x,y,z)`` axis on the image, in the center of the pattern, showing the orientation of the pattern. |
||||
|
||||
.. ocv:function:: void drawOrientation(InputOutputArray image, InputArray tvec, InputArray rvec, InputArray cameraMatrix, InputArray distCoeffs, double axis_length = 3, int axis_width = 2) |
||||
|
||||
:param image: The image, based on which the rotation and translation was calculated. The axis will be drawn in color - ``x`` - in red, ``y`` - in green, ``z`` - in blue. |
||||
|
||||
:param tvec: Translation vector. |
||||
|
||||
:param rvec: Rotation vector. |
||||
|
||||
:param cameraMatrix: The camera matrix. |
||||
|
||||
:param distCoeffs: The distortion coefficients. |
||||
|
||||
:param axis_length: The length of the axis symbol. |
||||
|
||||
:param axis_width: The width of the axis symbol. |
||||
|
@ -1,11 +0,0 @@ |
||||
********************************************************************* |
||||
cvv. GUI for Interactive Visual Debugging of Computer Vision Programs |
||||
********************************************************************* |
||||
|
||||
The module provides an interactive GUI to debug and incrementally design computer vision algorithms. The debug statements can remain in the code after development and aid in further changes because they have neglectable overhead if the program is compiled in release mode. |
||||
|
||||
.. toctree:: |
||||
:maxdepth: 2 |
||||
|
||||
CVV API Documentation <cvv_api> |
||||
CVV GUI Documentation <cvv_gui> |
@ -1,85 +0,0 @@ |
||||
CVV : the API |
||||
************* |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
|
||||
Introduction |
||||
++++++++++++ |
||||
|
||||
Namespace for all functions is **cvv**, i.e. *cvv::showImage()*. |
||||
|
||||
Compilation: |
||||
|
||||
* For development, i.e. for cvv GUI to show up, compile your code using cvv with *g++ -DCVVISUAL_DEBUGMODE*. |
||||
* For release, i.e. cvv calls doing nothing, compile your code without above flag. |
||||
|
||||
See cvv tutorial for a commented example application using cvv. |
||||
|
||||
|
||||
|
||||
|
||||
API Functions |
||||
+++++++++++++ |
||||
|
||||
|
||||
showImage |
||||
--------- |
||||
Add a single image to debug GUI (similar to :imshow:`imshow <>`). |
||||
|
||||
.. ocv:function:: void showImage(InputArray img, const CallMetaData& metaData, const string& description, const string& view) |
||||
|
||||
:param img: Image to show in debug GUI. |
||||
:param metaData: Properly initialized CallMetaData struct, i.e. information about file, line and function name for GUI. Use CVVISUAL_LOCATION macro. |
||||
:param description: Human readable description to provide context to image. |
||||
:param view: Preselect view that will be used to visualize this image in GUI. Other views can still be selected in GUI later on. |
||||
|
||||
|
||||
|
||||
debugFilter |
||||
----------- |
||||
Add two images to debug GUI for comparison. Usually the input and output of some filter operation, whose result should be inspected. |
||||
|
||||
.. ocv:function:: void debugFilter(InputArray original, InputArray result, const CallMetaData& metaData, const string& description, const string& view) |
||||
|
||||
:param original: First image for comparison, e.g. filter input. |
||||
:param result: Second image for comparison, e.g. filter output. |
||||
:param metaData: See :ocv:func:`showImage` |
||||
:param description: See :ocv:func:`showImage` |
||||
:param view: See :ocv:func:`showImage` |
||||
|
||||
|
||||
|
||||
debugDMatch |
||||
----------- |
||||
Add a filled in :basicstructures:`DMatch <dmatch>` to debug GUI. The matches can are visualized for interactive inspection in different GUI views (one similar to an interactive :draw_matches:`drawMatches<>`). |
||||
|
||||
.. ocv:function:: void debugDMatch(InputArray img1, std::vector<cv::KeyPoint> keypoints1, InputArray img2, std::vector<cv::KeyPoint> keypoints2, std::vector<cv::DMatch> matches, const CallMetaData& metaData, const string& description, const string& view, bool useTrainDescriptor) |
||||
|
||||
:param img1: First image used in :basicstructures:`DMatch <dmatch>`. |
||||
:param keypoints1: Keypoints of first image. |
||||
:param img2: Second image used in DMatch. |
||||
:param keypoints2: Keypoints of second image. |
||||
:param metaData: See :ocv:func:`showImage` |
||||
:param description: See :ocv:func:`showImage` |
||||
:param view: See :ocv:func:`showImage` |
||||
:param useTrainDescriptor: Use :basicstructures:`DMatch <dmatch>`'s train descriptor index instead of query descriptor index. |
||||
|
||||
|
||||
|
||||
finalShow |
||||
--------- |
||||
This function **must** be called *once* *after* all cvv calls if any. |
||||
As an alternative create an instance of FinalShowCaller, which calls finalShow() in its destructor (RAII-style). |
||||
|
||||
.. ocv:function:: void finalShow() |
||||
|
||||
|
||||
|
||||
setDebugFlag |
||||
------------ |
||||
Enable or disable cvv for current translation unit and thread (disabled this way has higher - but still low - overhead compared to using the compile flags). |
||||
|
||||
.. ocv:function:: void setDebugFlag(bool active) |
||||
|
||||
:param active: See above |
@ -1,24 +0,0 @@ |
||||
CVV : the GUI |
||||
************* |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
|
||||
Introduction |
||||
++++++++++++ |
||||
|
||||
For now: See cvv tutorial. |
||||
|
||||
|
||||
|
||||
Overview |
||||
++++++++ |
||||
|
||||
|
||||
Filter |
||||
------ |
||||
|
||||
|
||||
|
||||
Views |
||||
++++++++ |
After Width: | Height: | Size: 45 KiB |
After Width: | Height: | Size: 85 KiB |
After Width: | Height: | Size: 53 KiB |
After Width: | Height: | Size: 99 KiB |
After Width: | Height: | Size: 73 KiB |
After Width: | Height: | Size: 81 KiB |
After Width: | Height: | Size: 116 KiB |
After Width: | Height: | Size: 101 KiB |
After Width: | Height: | Size: 108 KiB |
After Width: | Height: | Size: 52 KiB |
After Width: | Height: | Size: 114 KiB |
After Width: | Height: | Size: 156 KiB |
After Width: | Height: | Size: 143 KiB |
After Width: | Height: | Size: 136 KiB |
After Width: | Height: | Size: 146 KiB |
After Width: | Height: | Size: 93 KiB |
After Width: | Height: | Size: 2.5 KiB |
@ -0,0 +1,182 @@ |
||||
Interactive Visual Debugging of Computer Vision applications {#tutorial_cvv_introduction} |
||||
============================================================ |
||||
|
||||
What is the most common way to debug computer vision applications? Usually the answer is temporary, |
||||
hacked together, custom code that must be removed from the code for release compilation. |
||||
|
||||
In this tutorial we will show how to use the visual debugging features of the **cvv** module |
||||
(*opencv2/cvv.hpp*) instead. |
||||
|
||||
Goals |
||||
----- |
||||
|
||||
In this tutorial you will learn how to: |
||||
|
||||
- Add cvv debug calls to your application |
||||
- Use the visual debug GUI |
||||
- Enable and disable the visual debug features during compilation (with zero runtime overhead when |
||||
disabled) |
||||
|
||||
Code |
||||
---- |
||||
|
||||
The example code |
||||
|
||||
- captures images (*videoio*), e.g. from a webcam, |
||||
- applies some filters to each image (*imgproc*), |
||||
- detects image features and matches them to the previous image (*features2d*). |
||||
|
||||
If the program is compiled without visual debugging (see CMakeLists.txt below) the only result is |
||||
some information printed to the command line. We want to demonstrate how much debugging or |
||||
development functionality is added by just a few lines of *cvv* commands. |
||||
|
||||
@includelineno cvv/samples/cvv_demo.cpp |
||||
|
||||
@code{.cmake} |
||||
cmake_minimum_required(VERSION 2.8) |
||||
|
||||
project(cvvisual_test) |
||||
|
||||
SET(CMAKE_PREFIX_PATH ~/software/opencv/install) |
||||
|
||||
SET(CMAKE_CXX_COMPILER "g++-4.8") |
||||
SET(CMAKE_CXX_FLAGS "-std=c++11 -O2 -pthread -Wall -Werror") |
||||
|
||||
# (un)set: cmake -DCVV_DEBUG_MODE=OFF .. |
||||
OPTION(CVV_DEBUG_MODE "cvvisual-debug-mode" ON) |
||||
if(CVV_DEBUG_MODE MATCHES ON) |
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DCVVISUAL_DEBUGMODE") |
||||
endif() |
||||
|
||||
|
||||
FIND_PACKAGE(OpenCV REQUIRED) |
||||
include_directories(${OpenCV_INCLUDE_DIRS}) |
||||
|
||||
add_executable(cvvt main.cpp) |
||||
target_link_libraries(cvvt |
||||
opencv_core opencv_videoio opencv_imgproc opencv_features2d |
||||
opencv_cvv |
||||
) |
||||
@endcode |
||||
|
||||
Explanation |
||||
----------- |
||||
|
||||
-# We compile the program either using the above CmakeLists.txt with Option *CVV_DEBUG_MODE=ON* |
||||
(*cmake -DCVV_DEBUG_MODE=ON*) or by adding the corresponding define *CVVISUAL_DEBUGMODE* to |
||||
our compiler (e.g. *g++ -DCVVISUAL_DEBUGMODE*). |
||||
-# The first cvv call simply shows the image (similar to *imshow*) with the imgIdString as comment. |
||||
@code{.cpp} |
||||
cvv::showImage(imgRead, CVVISUAL_LOCATION, imgIdString.c_str()); |
||||
@endcode |
||||
The image is added to the overview tab in the visual debug GUI and the cvv call blocks. |
||||
|
||||
![image](images/01_overview_single.jpg) |
||||
|
||||
The image can then be selected and viewed |
||||
|
||||
![image](images/02_single_image_view.jpg) |
||||
|
||||
Whenever you want to continue in the code, i.e. unblock the cvv call, you can either continue |
||||
until the next cvv call (*Step*), continue until the last cvv call (*\>\>*) or run the |
||||
application until it exists (*Close*). |
||||
|
||||
We decide to press the green *Step* button. |
||||
|
||||
-# The next cvv calls are used to debug all kinds of filter operations, i.e. operations that take a |
||||
picture as input and return a picture as output. |
||||
@code{.cpp} |
||||
cvv::debugFilter(imgRead, imgGray, CVVISUAL_LOCATION, "to gray"); |
||||
@endcode |
||||
As with every cvv call, you first end up in the overview. |
||||
|
||||
![image](images/03_overview_two.jpg) |
||||
|
||||
We decide not to care about the conversion to gray scale and press *Step*. |
||||
@code{.cpp} |
||||
cvv::debugFilter(imgGray, imgGraySmooth, CVVISUAL_LOCATION, "smoothed"); |
||||
@endcode |
||||
If you open the filter call, you will end up in the so called "DefaultFilterView". Both images |
||||
are shown next to each other and you can (synchronized) zoom into them. |
||||
|
||||
![image](images/04_default_filter_view.jpg) |
||||
|
||||
When you go to very high zoom levels, each pixel is annotated with its numeric values. |
||||
|
||||
![image](images/05_default_filter_view_high_zoom.jpg) |
||||
|
||||
We press *Step* twice and have a look at the dilated image. |
||||
@code{.cpp} |
||||
cvv::debugFilter(imgEdges, imgEdgesDilated, CVVISUAL_LOCATION, "dilated edges"); |
||||
@endcode |
||||
The DefaultFilterView showing both images |
||||
|
||||
![image](images/06_default_filter_view_edges.jpg) |
||||
|
||||
Now we use the *View* selector in the top right and select the "DualFilterView". We select |
||||
"Changed Pixels" as filter and apply it (middle image). |
||||
|
||||
![image](images/07_dual_filter_view_edges.jpg) |
||||
|
||||
After we had a close look at these images, perhaps using different views, filters or other GUI |
||||
features, we decide to let the program run through. Therefore we press the yellow *\>\>* button. |
||||
|
||||
The program will block at |
||||
@code{.cpp} |
||||
cvv::finalShow(); |
||||
@endcode |
||||
and display the overview with everything that was passed to cvv in the meantime. |
||||
|
||||
![image](images/08_overview_all.jpg) |
||||
|
||||
-# The cvv debugDMatch call is used in a situation where there are two images each with a set of |
||||
descriptors that are matched to each other. |
||||
|
||||
We pass both images, both sets of keypoints and their matching to the visual debug module. |
||||
@code{.cpp} |
||||
cvv::debugDMatch(prevImgGray, prevKeypoints, imgGray, keypoints, matches, CVVISUAL_LOCATION, allMatchIdString.c_str()); |
||||
@endcode |
||||
Since we want to have a look at matches, we use the filter capabilities (*\#type match*) in the |
||||
overview to only show match calls. |
||||
|
||||
![image](images/09_overview_filtered_type_match.jpg) |
||||
|
||||
We want to have a closer look at one of them, e.g. to tune our parameters that use the matching. |
||||
The view has various settings how to display keypoints and matches. Furthermore, there is a |
||||
mouseover tooltip. |
||||
|
||||
![image](images/10_line_match_view.jpg) |
||||
|
||||
We see (visual debugging!) that there are many bad matches. We decide that only 70% of the |
||||
matches should be shown - those 70% with the lowest match distance. |
||||
|
||||
![image](images/11_line_match_view_portion_selector.jpg) |
||||
|
||||
Having successfully reduced the visual distraction, we want to see more clearly what changed |
||||
between the two images. We select the "TranslationMatchView" that shows to where the keypoint |
||||
was matched in a different way. |
||||
|
||||
![image](images/12_translation_match_view_portion_selector.jpg) |
||||
|
||||
It is easy to see that the cup was moved to the left during the two images. |
||||
|
||||
Although, cvv is all about interactively *seeing* the computer vision bugs, this is complemented |
||||
by a "RawView" that allows to have a look at the underlying numeric data. |
||||
|
||||
![image](images/13_raw_view.jpg) |
||||
|
||||
-# There are many more useful features contained in the cvv GUI. For instance, one can group the |
||||
overview tab. |
||||
|
||||
![image](images/14_overview_group_by_line.jpg) |
||||
|
||||
Result |
||||
------ |
||||
|
||||
- By adding a view expressive lines to our computer vision program we can interactively debug it |
||||
through different visualizations. |
||||
- Once we are done developing/debugging we do not have to remove those lines. We simply disable |
||||
cvv debugging (*cmake -DCVV_DEBUG_MODE=OFF* or g++ without *-DCVVISUAL_DEBUGMODE*) and our |
||||
programs runs without any debug overhead. |
||||
|
||||
Enjoy computer vision! |
@ -1,36 +0,0 @@ |
||||
HMDB: A Large Human Motion Database |
||||
=================================== |
||||
.. ocv:class:: AR_hmdb |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"HMDB: A Large Human Motion Database"`: http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: hmdb51_org.rar & test_train_splits.rar. |
||||
|
||||
2. Unpack them. Unpack all archives from directory: hmdb51_org/ and remove them. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_ar_hmdb -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
Benchmark |
||||
""""""""" |
||||
|
||||
For this dataset was implemented benchmark with accuracy: 0.107407 (using precomputed HOG/HOF "STIP" features from site, averaging for 3 splits) |
||||
|
||||
To run this benchmark execute: |
||||
|
||||
.. code-block:: bash |
||||
|
||||
./opencv/build/bin/example_datasets_ar_hmdb_benchmark -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
(precomputed features should be unpacked in the same folder: /home/user/path_to_unpacked_folders/hmdb51_org_stips/. Also unpack all archives from directory: hmdb51_org_stips/ and remove them.) |
||||
|
||||
**References:** |
||||
|
||||
.. [Kuehne11] H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A Large Video Database for Human Motion Recognition. ICCV, 2011 |
||||
|
||||
.. [Laptev08] I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning Realistic Human Actions From Movies. CVPR, 2008 |
||||
|
||||
|
@ -1,18 +0,0 @@ |
||||
Sports-1M Dataset |
||||
================= |
||||
.. ocv:class:: AR_sports |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Sports-1M Dataset"`: http://cs.stanford.edu/people/karpathy/deepvideo/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files (git clone https://code.google.com/p/sports-1m-dataset/). |
||||
|
||||
2. To load data run: ./opencv/build/bin/example_datasets_ar_sports -p=/home/user/path_to_downloaded_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [KarpathyCVPR14] Andrej Karpathy and George Toderici and Sanketh Shetty and Thomas Leung and Rahul Sukthankar and Li Fei-Fei. Large-scale Video Classification with Convolutional Neural Networks. CVPR, 2014 |
||||
|
@ -1,121 +0,0 @@ |
||||
******************************************************* |
||||
datasets. Framework for working with different datasets |
||||
******************************************************* |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
The datasets module includes classes for working with different datasets: load data, evaluate different algorithms on them, contains benchmarks, etc. |
||||
|
||||
It is planned to have: |
||||
|
||||
* basic: loading code for all datasets to help start work with them. |
||||
* next stage: quick benchmarks for all datasets to show how to solve them using OpenCV and implement evaluation code. |
||||
* finally: implement on OpenCV state-of-the-art algorithms, which solve these tasks. |
||||
|
||||
.. toctree:: |
||||
:hidden: |
||||
|
||||
ar_hmdb |
||||
ar_sports |
||||
fr_adience |
||||
fr_lfw |
||||
gr_chalearn |
||||
gr_skig |
||||
hpe_humaneva |
||||
hpe_parse |
||||
ir_affine |
||||
ir_robot |
||||
is_bsds |
||||
is_weizmann |
||||
msm_epfl |
||||
msm_middlebury |
||||
or_imagenet |
||||
or_mnist |
||||
or_sun |
||||
pd_caltech |
||||
slam_kitti |
||||
slam_tumindoor |
||||
tr_chars |
||||
tr_svt |
||||
|
||||
Action Recognition |
||||
------------------ |
||||
|
||||
:doc:`ar_hmdb` [#f1]_ |
||||
|
||||
:doc:`ar_sports` |
||||
|
||||
Face Recognition |
||||
---------------- |
||||
|
||||
:doc:`fr_adience` |
||||
|
||||
:doc:`fr_lfw` [#f1]_ |
||||
|
||||
Gesture Recognition |
||||
------------------- |
||||
|
||||
:doc:`gr_chalearn` |
||||
|
||||
:doc:`gr_skig` |
||||
|
||||
Human Pose Estimation |
||||
--------------------- |
||||
|
||||
:doc:`hpe_humaneva` |
||||
|
||||
:doc:`hpe_parse` |
||||
|
||||
Image Registration |
||||
------------------ |
||||
|
||||
:doc:`ir_affine` |
||||
|
||||
:doc:`ir_robot` |
||||
|
||||
Image Segmentation |
||||
------------------ |
||||
|
||||
:doc:`is_bsds` |
||||
|
||||
:doc:`is_weizmann` |
||||
|
||||
Multiview Stereo Matching |
||||
------------------------- |
||||
|
||||
:doc:`msm_epfl` |
||||
|
||||
:doc:`msm_middlebury` |
||||
|
||||
Object Recognition |
||||
------------------ |
||||
|
||||
:doc:`or_imagenet` |
||||
|
||||
:doc:`or_mnist` [#f2]_ |
||||
|
||||
:doc:`or_sun` |
||||
|
||||
Pedestrian Detection |
||||
-------------------- |
||||
|
||||
:doc:`pd_caltech` [#f2]_ |
||||
|
||||
SLAM |
||||
---- |
||||
|
||||
:doc:`slam_kitti` |
||||
|
||||
:doc:`slam_tumindoor` |
||||
|
||||
Text Recognition |
||||
---------------- |
||||
|
||||
:doc:`tr_chars` |
||||
|
||||
:doc:`tr_svt` [#f1]_ |
||||
|
||||
*Footnotes* |
||||
|
||||
.. [#f1] Benchmark implemented |
||||
.. [#f2] Not used in Vision Challenge |
@ -1,20 +0,0 @@ |
||||
Adience |
||||
======= |
||||
.. ocv:class:: FR_adience |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Adience"`: http://www.openu.ac.il/home/hassner/Adience/data.html |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download any dataset file: faces.tar.gz\\aligned.tar.gz and files with splits: fold_0_data.txt-fold_4_data.txt, fold_frontal_0_data.txt-fold_frontal_4_data.txt. (For face recognition task another splits should be created) |
||||
|
||||
2. Unpack dataset file to some folder and place split files into the same folder. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_fr_adience -p=/home/user/path_to_created_folder/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Eidinger] E. Eidinger, R. Enbar, and T. Hassner. Age and Gender Estimation of Unfiltered Faces |
||||
|
@ -1,31 +0,0 @@ |
||||
Labeled Faces in the Wild |
||||
========================= |
||||
.. ocv:class:: FR_lfw |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Labeled Faces in the Wild"`: http://vis-www.cs.umass.edu/lfw/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download any dataset file: lfw.tgz\\lfwa.tar.gz\\lfw-deepfunneled.tgz\\lfw-funneled.tgz and files with pairs: 10 test splits: pairs.txt and developer train split: pairsDevTrain.txt. |
||||
|
||||
2. Unpack dataset file and place pairs.txt and pairsDevTrain.txt in created folder. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_fr_lfw -p=/home/user/path_to_unpacked_folder/lfw2/ |
||||
|
||||
Benchmark |
||||
""""""""" |
||||
|
||||
For this dataset was implemented benchmark with accuracy: 0.623833 +- 0.005223 (train split: pairsDevTrain.txt, dataset: lfwa) |
||||
|
||||
To run this benchmark execute: |
||||
|
||||
.. code-block:: bash |
||||
|
||||
./opencv/build/bin/example_datasets_fr_lfw_benchmark -p=/home/user/path_to_unpacked_folder/lfw2/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Huang07] G.B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. 2007 |
||||
|
@ -1,22 +0,0 @@ |
||||
ChaLearn Looking at People |
||||
========================== |
||||
.. ocv:class:: GR_chalearn |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"ChaLearn Looking at People"`: http://gesture.chalearn.org/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. Follow instruction from site above, download files for dataset "Track 3: Gesture Recognition": Train1.zip-Train5.zip, Validation1.zip-Validation3.zip (Register on site: www.codalab.org and accept the terms and conditions of competition: https://www.codalab.org/competitions/991#learn_the_details There are three mirrors for downloading dataset files. When I downloaded data only mirror: "Universitat Oberta de Catalunya" works). |
||||
|
||||
2. Unpack train archives Train1.zip-Train5.zip to folder Train/, validation archives Validation1.zip-Validation3.zip to folder Validation/ |
||||
|
||||
3. Unpack all archives in Train/ & Validation/ in the folders with the same names, for example: Sample0001.zip to Sample0001/ |
||||
|
||||
4. To load data run: ./opencv/build/bin/example_datasets_gr_chalearn -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Escalera14] S. Escalera, X. Baró, J. Gonzàlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce-López, H.J. Escalante, J. Shotton, I. Guyon, "ChaLearn Looking at People Challenge 2014: Dataset and Results", ECCV Workshops, 2014 |
||||
|
@ -1,20 +0,0 @@ |
||||
Sheffield Kinect Gesture Dataset |
||||
================================ |
||||
.. ocv:class:: GR_skig |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Sheffield Kinect Gesture Dataset"`: http://lshao.staff.shef.ac.uk/data/SheffieldKinectGesture.htm |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: subject1_dep.7z-subject6_dep.7z, subject1_rgb.7z-subject6_rgb.7z. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_gr_skig -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Liu13] L. Liu and L. Shao, “Learning Discriminative Representations from RGB-D Video Data”, In Proc. International Joint Conference on Artificial Intelligence (IJCAI), Beijing, China, 2013. |
||||
|
@ -1,22 +0,0 @@ |
||||
HumanEva Dataset |
||||
================ |
||||
.. ocv:class:: HPE_humaneva |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"HumanEva Dataset"`: http://humaneva.is.tue.mpg.de |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files for HumanEva-I (tar) & HumanEva-II. |
||||
|
||||
2. Unpack them to HumanEva_1 & HumanEva_2 accordingly. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_hpe_humaneva -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Sigal10] L. Sigal, A. Balan and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion, In International Journal of Computer Vision, Vol. 87 (1-2), 2010 |
||||
|
||||
.. [Sigal06] L. Sigal and M. J. Black. HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion, Techniacl Report CS-06-08, Brown University, 2006 |
||||
|
@ -1,20 +0,0 @@ |
||||
PARSE Dataset |
||||
============= |
||||
.. ocv:class:: HPE_parse |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"PARSE Dataset"`: http://www.ics.uci.edu/~dramanan/papers/parse/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset file: people.zip. |
||||
|
||||
2. Unpack it. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_hpe_parse -p=/home/user/path_to_unpacked_folder/people_all/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Ramanan06] D. Ramanan "Learning to Parse Images of Articulated Bodies." Neural Info. Proc. Systems (NIPS) To appear. Dec 2006. |
||||
|
@ -1,20 +0,0 @@ |
||||
Affine Covariant Regions Datasets |
||||
================================= |
||||
.. ocv:class:: IR_affine |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Affine Covariant Regions Datasets"`: http://www.robots.ox.ac.uk/~vgg/data/data-aff.html |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: bark\\bikes\\boat\\graf\\leuven\\trees\\ubc\\wall.tar.gz. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data, for example, for "bark", run: ./opencv/build/bin/example_datasets_ir_affine -p=/home/user/path_to_unpacked_folder/bark/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Mikolajczyk05] K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool. A Comparison of Affine Region Detectors. International Journal of Computer Vision, Volume 65, Number 1/2, page 43--72, 2005 |
||||
|
@ -1,20 +0,0 @@ |
||||
Robot Data Set |
||||
============== |
||||
.. ocv:class:: IR_robot |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Robot Data Set, Point Feature Data Set – 2010"`: http://roboimagedata.compute.dtu.dk/?page_id=24 |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: SET001_6.tar.gz-SET055_60.tar.gz |
||||
|
||||
2. Unpack them to one folder. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_ir_robot -p=/home/user/path_to_unpacked_folder/ |
||||
|
||||
**References:** |
||||
|
||||
.. [aanæsinteresting] Aan{\ae}s, H. and Dahl, A.L. and Steenstrup Pedersen, K. Interesting Interest Points. International Journal of Computer Vision. 2012. |
||||
|
@ -1,20 +0,0 @@ |
||||
The Berkeley Segmentation Dataset and Benchmark |
||||
=============================================== |
||||
.. ocv:class:: IS_bsds |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"The Berkeley Segmentation Dataset and Benchmark"`: https://www.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: BSDS300-human.tgz & BSDS300-images.tgz. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_is_bsds -p=/home/user/path_to_unpacked_folder/BSDS300/ |
||||
|
||||
**References:** |
||||
|
||||
.. [MartinFTM01] D. Martin and C. Fowlkes and D. Tal and J. Malik. A Database of Human Segmented Natural Images and its Application to Evaluating Segmentation Algorithms and Measuring Ecological Statistics. 2001 |
||||
|
@ -1,20 +0,0 @@ |
||||
Weizmann Segmentation Evaluation Database |
||||
========================================= |
||||
.. ocv:class:: IS_weizmann |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Weizmann Segmentation Evaluation Database"`: http://www.wisdom.weizmann.ac.il/~vision/Seg_Evaluation_DB/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: Weizmann_Seg_DB_1obj.ZIP & Weizmann_Seg_DB_2obj.ZIP. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data, for example, for 1 object dataset, run: ./opencv/build/bin/example_datasets_is_weizmann -p=/home/user/path_to_unpacked_folder/1obj/ |
||||
|
||||
**References:** |
||||
|
||||
.. [AlpertGBB07] Sharon Alpert and Meirav Galun and Ronen Basri and Achi Brandt. Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration. 2007 |
||||
|
@ -1,20 +0,0 @@ |
||||
EPFL Multi-View Stereo |
||||
====================== |
||||
.. ocv:class:: MSM_epfl |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"EPFL Multi-View Stereo"`: http://cvlabwww.epfl.ch/~strecha/multiview/denseMVS.html |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: castle_dense\\castle_dense_large\\castle_entry\\fountain\\herzjesu_dense\\herzjesu_dense_large_bounding\\cameras\\images\\p.tar.gz. |
||||
|
||||
2. Unpack them in separate folder for each object. For example, for "fountain", in folder fountain/ : fountain_dense_bounding.tar.gz -> bounding/, fountain_dense_cameras.tar.gz -> camera/, fountain_dense_images.tar.gz -> png/, fountain_dense_p.tar.gz -> P/ |
||||
|
||||
3. To load data, for example, for "fountain", run: ./opencv/build/bin/example_datasets_msm_epfl -p=/home/user/path_to_unpacked_folder/fountain/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Strecha08] C. Strecha, W. von Hansen, L. Van Gool, P. Fua, U. Thoennessen. On Benchmarking Camera Calibration and Multi-View Stereo for High Resolution Imagery. CVPR, 2008 |
||||
|
@ -1,20 +0,0 @@ |
||||
Stereo – Middlebury Computer Vision |
||||
=================================== |
||||
.. ocv:class:: MSM_middlebury |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Stereo – Middlebury Computer Vision"`: http://vision.middlebury.edu/mview/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: dino\\dinoRing\\dinoSparseRing\\temple\\templeRing\\templeSparseRing.zip |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data, for example "temple" dataset, run: ./opencv/build/bin/example_datasets_msm_middlebury -p=/home/user/path_to_unpacked_folder/temple/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Seitz06] S. M. Seitz, B. Curless, J. Diebel, D. Scharstein, R. Szeliski. A Comparison and Evaluation of Multi-View Stereo Reconstruction Algorithms, CVPR, 2006 |
||||
|
@ -1,39 +0,0 @@ |
||||
ImageNet |
||||
======== |
||||
.. ocv:class:: OR_imagenet |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"ImageNet"`: http://www.image-net.org/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: ILSVRC2010_images_train.tar\\ILSVRC2010_images_test.tar\\ILSVRC2010_images_val.tar & devkit: ILSVRC2010_devkit-1.0.tar.gz (Implemented loading of 2010 dataset as only this dataset has ground truth for test data, but structure for ILSVRC2014 is similar) |
||||
|
||||
2. Unpack them to: some_folder/train/\\some_folder/test/\\some_folder/val & some_folder/ILSVRC2010_validation_ground_truth.txt\\some_folder/ILSVRC2010_test_ground_truth.txt. |
||||
|
||||
3. Create file with labels: some_folder/labels.txt, for example, using :ref:`python script <python-script>` below (each file's row format: synset,labelID,description. For example: "n07751451,18,plum"). |
||||
|
||||
4. Unpack all tar files in train. |
||||
|
||||
5. To load data run: ./opencv/build/bin/example_datasets_or_imagenet -p=/home/user/some_folder/ |
||||
|
||||
.. _python-script: |
||||
|
||||
Python script to parse meta.mat: |
||||
|
||||
:: |
||||
|
||||
import scipy.io |
||||
meta_mat = scipy.io.loadmat("devkit-1.0/data/meta.mat") |
||||
|
||||
labels_dic = dict((m[0][1][0], m[0][0][0][0]-1) for m in meta_mat['synsets'] |
||||
label_names_dic = dict((m[0][1][0], m[0][2][0]) for m in meta_mat['synsets'] |
||||
|
||||
for label in labels_dic.keys(): |
||||
print "{0},{1},{2}".format(label, labels_dic[label], label_names_dic[label]) |
||||
|
||||
**References:** |
||||
|
||||
.. [ILSVRCarxiv14] Olga Russakovsky and Jia Deng and Hao Su and Jonathan Krause and Sanjeev Satheesh and Sean Ma and Zhiheng Huang and Andrej Karpathy and Aditya Khosla and Michael Bernstein and Alexander C. Berg and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. 2014 |
||||
|
@ -1,20 +0,0 @@ |
||||
MNIST |
||||
===== |
||||
.. ocv:class:: OR_mnist |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"MNIST"`: http://yann.lecun.com/exdb/mnist/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: t10k-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-images-idx3-ubyte.gz, train-labels-idx1-ubyte.gz. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_or_mnist -p=/home/user/path_to_unpacked_files/ |
||||
|
||||
**References:** |
||||
|
||||
.. [LeCun98a] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998. |
||||
|
@ -1,22 +0,0 @@ |
||||
SUN Database |
||||
============ |
||||
.. ocv:class:: OR_sun |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"SUN Database, Scene Recognition Benchmark. SUN397"`: http://vision.cs.princeton.edu/projects/2010/SUN/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset file: SUN397.tar & file with splits: Partitions.zip |
||||
|
||||
2. Unpack SUN397.tar into folder: SUN397/ & Partitions.zip into folder: SUN397/Partitions/ |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_or_sun -p=/home/user/path_to_unpacked_files/SUN397/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Xiao10] J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN Database: Large-scale Scene Recognition from Abbey to Zoo. IEEE Conference on Computer Vision and Pattern Recognition. CVPR, 2010 |
||||
|
||||
.. [Xiao14] J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva. SUN Database: Exploring a Large Collection of Scene Categories. International Journal of Computer Vision. IJCV, 2014 |
||||
|
@ -1,29 +0,0 @@ |
||||
Caltech Pedestrian Detection Benchmark |
||||
====================================== |
||||
.. ocv:class:: PD_caltech |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"Caltech Pedestrian Detection Benchmark"`: http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/ |
||||
|
||||
.. note:: First version of Caltech Pedestrian dataset loading. |
||||
|
||||
Code to unpack all frames from seq files commented as their number is huge! |
||||
So currently load only meta information without data. |
||||
|
||||
Also ground truth isn't processed, as need to convert it from mat files first. |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: set00.tar-set10.tar. |
||||
|
||||
2. Unpack them to separate folder. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_pd_caltech -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Dollár12] P. Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: An Evaluation of the State of the Art. PAMI, 2012. |
||||
|
||||
.. [DollárCVPR09] P. Dollár, C. Wojek, B. Schiele and P. Perona. Pedestrian Detection: A Benchmark. CVPR, 2009 |
||||
|
@ -1,24 +0,0 @@ |
||||
KITTI Vision Benchmark |
||||
====================== |
||||
.. ocv:class:: SLAM_kitti |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"KITTI Vision Benchmark"`: http://www.cvlibs.net/datasets/kitti/eval_odometry.php |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download "Odometry" dataset files: data_odometry_gray\\data_odometry_color\\data_odometry_velodyne\\data_odometry_poses\\data_odometry_calib.zip. |
||||
|
||||
2. Unpack data_odometry_poses.zip, it creates folder dataset/poses/. After that unpack data_odometry_gray.zip, data_odometry_color.zip, data_odometry_velodyne.zip. Folder dataset/sequences/ will be created with folders 00/..21/. Each of these folders will contain: image_0/, image_1/, image_2/, image_3/, velodyne/ and files calib.txt & times.txt. These two last files will be replaced after unpacking data_odometry_calib.zip at the end. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_slam_kitti -p=/home/user/path_to_unpacked_folder/dataset/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Geiger2012CVPR] Andreas Geiger and Philip Lenz and Raquel Urtasun. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. CVPR, 2012 |
||||
|
||||
.. [Geiger2013IJRR] Andreas Geiger and Philip Lenz and Christoph Stiller and Raquel Urtasun. Vision meets Robotics: The KITTI Dataset. IJRR, 2013 |
||||
|
||||
.. [Fritsch2013ITSC] Jannik Fritsch and Tobias Kuehnl and Andreas Geiger. A New Performance Measure and Evaluation Benchmark for Road Detection Algorithms. ITSC, 2013 |
||||
|
@ -1,20 +0,0 @@ |
||||
TUMindoor Dataset |
||||
================= |
||||
.. ocv:class:: SLAM_tumindoor |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"TUMindoor Dataset"`: http://www.navvis.lmt.ei.tum.de/dataset/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: dslr\\info\\ladybug\\pointcloud.tar.bz2 for each dataset: 11-11-28 (1st floor)\\11-12-13 (1st floor N1)\\11-12-17a (4th floor)\\11-12-17b (3rd floor)\\11-12-17c (Ground I)\\11-12-18a (Ground II)\\11-12-18b (2nd floor) |
||||
|
||||
2. Unpack them in separate folder for each dataset. dslr.tar.bz2 -> dslr/, info.tar.bz2 -> info/, ladybug.tar.bz2 -> ladybug/, pointcloud.tar.bz2 -> pointcloud/. |
||||
|
||||
3. To load each dataset run: ./opencv/build/bin/example_datasets_slam_tumindoor -p=/home/user/path_to_unpacked_folders/ |
||||
|
||||
**References:** |
||||
|
||||
.. [TUMindoor] R. Huitl and G. Schroth and S. Hilsenbeck and F. Schweiger and E. Steinbach. {TUM}indoor: An Extensive Image and Point Cloud Dataset for Visual Indoor Localization and Mapping. 2012 |
||||
|
@ -1,22 +0,0 @@ |
||||
The Chars74K Dataset |
||||
==================== |
||||
.. ocv:class:: TR_chars |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"The Chars74K Dataset"`: http://www.ee.surrey.ac.uk/CVSSP/demos/chars74k/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset files: EnglishFnt\\EnglishHnd\\EnglishImg\\KannadaHnd\\KannadaImg.tgz, ListsTXT.tgz. |
||||
|
||||
2. Unpack them. |
||||
|
||||
3. Move .m files from folder ListsTXT/ to appropriate folder. For example, English/list_English_Img.m for EnglishImg.tgz. |
||||
|
||||
4. To load data, for example "EnglishImg", run: ./opencv/build/bin/example_datasets_tr_chars -p=/home/user/path_to_unpacked_folder/English/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Campos09] T. E. de Campos, B. R. Babu and M. Varma. Character recognition in natural images. In Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2009 |
||||
|
@ -1,33 +0,0 @@ |
||||
The Street View Text Dataset |
||||
============================ |
||||
.. ocv:class:: TR_svt |
||||
|
||||
Implements loading dataset: |
||||
|
||||
_`"The Street View Text Dataset"`: http://vision.ucsd.edu/~kai/svt/ |
||||
|
||||
.. note:: Usage |
||||
|
||||
1. From link above download dataset file: svt.zip. |
||||
|
||||
2. Unpack it. |
||||
|
||||
3. To load data run: ./opencv/build/bin/example_datasets_tr_svt -p=/home/user/path_to_unpacked_folder/svt/svt1/ |
||||
|
||||
Benchmark |
||||
""""""""" |
||||
|
||||
For this dataset was implemented benchmark with accuracy (mean f1): 0.217 |
||||
|
||||
To run benchmark execute: |
||||
|
||||
.. code-block:: bash |
||||
|
||||
./opencv/build/bin/example_datasets_tr_svt_benchmark -p=/home/user/path_to_unpacked_folders/svt/svt1/ |
||||
|
||||
**References:** |
||||
|
||||
.. [Wang11] Kai Wang, Boris Babenko and Serge Belongie. End-to-end Scene Text Recognition. ICCV, 2011 |
||||
|
||||
.. [Wang10] Kai Wang and Serge Belongie. Word Spotting in the Wild. ECCV, 2010 |
||||
|
@ -1,10 +0,0 @@ |
||||
*************************************** |
||||
face. Face Recognition |
||||
*************************************** |
||||
|
||||
The module contains some recently added functionality that has not been stabilized, or functionality that is considered optional. |
||||
|
||||
.. toctree:: |
||||
:maxdepth: 2 |
||||
|
||||
FaceRecognizer Documentation <index> |
@ -1,412 +0,0 @@ |
||||
FaceRecognizer |
||||
============== |
||||
|
||||
.. highlight:: cpp |
||||
|
||||
.. Sample code:: |
||||
|
||||
* An example using the FaceRecognizer class can be found at opencv_source_code/samples/cpp/facerec_demo.cpp |
||||
|
||||
* (Python) An example using the FaceRecognizer class can be found at opencv_source_code/samples/python2/facerec_demo.py |
||||
|
||||
FaceRecognizer |
||||
-------------- |
||||
|
||||
.. ocv:class:: FaceRecognizer : public Algorithm |
||||
|
||||
All face recognition models in OpenCV are derived from the abstract base class :ocv:class:`FaceRecognizer`, which provides |
||||
a unified access to all face recongition algorithms in OpenCV. :: |
||||
|
||||
class FaceRecognizer : public Algorithm |
||||
{ |
||||
public: |
||||
//! virtual destructor |
||||
virtual ~FaceRecognizer() {} |
||||
|
||||
// Trains a FaceRecognizer. |
||||
virtual void train(InputArray src, InputArray labels) = 0; |
||||
|
||||
// Updates a FaceRecognizer. |
||||
virtual void update(InputArrayOfArrays src, InputArray labels); |
||||
|
||||
// Gets a prediction from a FaceRecognizer. |
||||
virtual int predict(InputArray src) const = 0; |
||||
|
||||
// Predicts the label and confidence for a given sample. |
||||
virtual void predict(InputArray src, int &label, double &confidence) const = 0; |
||||
|
||||
// Serializes this object to a given filename. |
||||
virtual void save(const String& filename) const; |
||||
|
||||
// Deserializes this object from a given filename. |
||||
virtual void load(const String& filename); |
||||
|
||||
// Serializes this object to a given cv::FileStorage. |
||||
virtual void save(FileStorage& fs) const = 0; |
||||
|
||||
// Deserializes this object from a given cv::FileStorage. |
||||
virtual void load(const FileStorage& fs) = 0; |
||||
|
||||
// Sets additional string info for the label |
||||
virtual void setLabelInfo(int label, const String& strInfo); |
||||
|
||||
// Gets string info by label |
||||
virtual String getLabelInfo(int label); |
||||
|
||||
// Gets labels by string info |
||||
virtual vector<int> getLabelsByString(const String& str); |
||||
}; |
||||
|
||||
|
||||
Description |
||||
+++++++++++ |
||||
|
||||
I'll go a bit more into detail explaining :ocv:class:`FaceRecognizer`, because it doesn't look like a powerful interface at first sight. But: Every :ocv:class:`FaceRecognizer` is an :ocv:class:`Algorithm`, so you can easily get/set all model internals (if allowed by the implementation). :ocv:class:`Algorithm` is a relatively new OpenCV concept, which is available since the 2.4 release. I suggest you take a look at its description. |
||||
|
||||
:ocv:class:`Algorithm` provides the following features for all derived classes: |
||||
|
||||
* So called “virtual constructor”. That is, each Algorithm derivative is registered at program start and you can get the list of registered algorithms and create instance of a particular algorithm by its name (see :ocv:func:`Algorithm::create`). If you plan to add your own algorithms, it is good practice to add a unique prefix to your algorithms to distinguish them from other algorithms. |
||||
|
||||
* Setting/Retrieving algorithm parameters by name. If you used video capturing functionality from OpenCV highgui module, you are probably familar with :ocv:cfunc:`cvSetCaptureProperty`, :ocv:cfunc:`cvGetCaptureProperty`, :ocv:func:`VideoCapture::set` and :ocv:func:`VideoCapture::get`. :ocv:class:`Algorithm` provides similar method where instead of integer id's you specify the parameter names as text Strings. See :ocv:func:`Algorithm::set` and :ocv:func:`Algorithm::get` for details. |
||||
|
||||
* Reading and writing parameters from/to XML or YAML files. Every Algorithm derivative can store all its parameters and then read them back. There is no need to re-implement it each time. |
||||
|
||||
Moreover every :ocv:class:`FaceRecognizer` supports the: |
||||
|
||||
* **Training** of a :ocv:class:`FaceRecognizer` with :ocv:func:`FaceRecognizer::train` on a given set of images (your face database!). |
||||
|
||||
* **Prediction** of a given sample image, that means a face. The image is given as a :ocv:class:`Mat`. |
||||
|
||||
* **Loading/Saving** the model state from/to a given XML or YAML. |
||||
|
||||
* **Setting/Getting labels info**, that is stored as a string. String labels info is useful for keeping names of the recognized people. |
||||
|
||||
.. note:: When using the FaceRecognizer interface in combination with Python, please stick to Python 2. Some underlying scripts like create_csv will not work in other versions, like Python 3. |
||||
|
||||
Setting the Thresholds |
||||
+++++++++++++++++++++++ |
||||
|
||||
Sometimes you run into the situation, when you want to apply a threshold on the prediction. A common scenario in face recognition is to tell, whether a face belongs to the training dataset or if it is unknown. You might wonder, why there's no public API in :ocv:class:`FaceRecognizer` to set the threshold for the prediction, but rest assured: It's supported. It just means there's no generic way in an abstract class to provide an interface for setting/getting the thresholds of *every possible* :ocv:class:`FaceRecognizer` algorithm. The appropriate place to set the thresholds is in the constructor of the specific :ocv:class:`FaceRecognizer` and since every :ocv:class:`FaceRecognizer` is a :ocv:class:`Algorithm` (see above), you can get/set the thresholds at runtime! |
||||
|
||||
Here is an example of setting a threshold for the Eigenfaces method, when creating the model: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Let's say we want to keep 10 Eigenfaces and have a threshold value of 10.0 |
||||
int num_components = 10; |
||||
double threshold = 10.0; |
||||
// Then if you want to have a cv::FaceRecognizer with a confidence threshold, |
||||
// create the concrete implementation with the appropiate parameters: |
||||
Ptr<FaceRecognizer> model = createEigenFaceRecognizer(num_components, threshold); |
||||
|
||||
Sometimes it's impossible to train the model, just to experiment with threshold values. Thanks to :ocv:class:`Algorithm` it's possible to set internal model thresholds during runtime. Let's see how we would set/get the prediction for the Eigenface model, we've created above: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// The following line reads the threshold from the Eigenfaces model: |
||||
double current_threshold = model->getDouble("threshold"); |
||||
// And this line sets the threshold to 0.0: |
||||
model->set("threshold", 0.0); |
||||
|
||||
If you've set the threshold to ``0.0`` as we did above, then: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// |
||||
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE); |
||||
// Get a prediction from the model. Note: We've set a threshold of 0.0 above, |
||||
// since the distance is almost always larger than 0.0, you'll get -1 as |
||||
// label, which indicates, this face is unknown |
||||
int predicted_label = model->predict(img); |
||||
// ... |
||||
|
||||
is going to yield ``-1`` as predicted label, which states this face is unknown. |
||||
|
||||
Getting the name of a FaceRecognizer |
||||
+++++++++++++++++++++++++++++++++++++ |
||||
|
||||
Since every :ocv:class:`FaceRecognizer` is a :ocv:class:`Algorithm`, you can use :ocv:func:`Algorithm::name` to get the name of a :ocv:class:`FaceRecognizer`: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Create a FaceRecognizer: |
||||
Ptr<FaceRecognizer> model = createEigenFaceRecognizer(); |
||||
// And here's how to get its name: |
||||
String name = model->name(); |
||||
|
||||
|
||||
FaceRecognizer::train |
||||
--------------------- |
||||
|
||||
Trains a FaceRecognizer with given data and associated labels. |
||||
|
||||
.. ocv:function:: void FaceRecognizer::train( InputArrayOfArrays src, InputArray labels ) = 0 |
||||
|
||||
:param src: The training images, that means the faces you want to learn. The data has to be given as a ``vector<Mat>``. |
||||
|
||||
:param labels: The labels corresponding to the images have to be given either as a ``vector<int>`` or a |
||||
|
||||
The following source code snippet shows you how to learn a Fisherfaces model on a given set of images. The images are read with :ocv:func:`imread` and pushed into a ``std::vector<Mat>``. The labels of each image are stored within a ``std::vector<int>`` (you could also use a :ocv:class:`Mat` of type `CV_32SC1`). Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label. For the available :ocv:class:`FaceRecognizer` you don't have to pay any attention to the order of the labels, just make sure same persons have the same label: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// holds images and labels |
||||
vector<Mat> images; |
||||
vector<int> labels; |
||||
// images for first person |
||||
images.push_back(imread("person0/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0); |
||||
images.push_back(imread("person0/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0); |
||||
images.push_back(imread("person0/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(0); |
||||
// images for second person |
||||
images.push_back(imread("person1/0.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1); |
||||
images.push_back(imread("person1/1.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1); |
||||
images.push_back(imread("person1/2.jpg", CV_LOAD_IMAGE_GRAYSCALE)); labels.push_back(1); |
||||
|
||||
Now that you have read some images, we can create a new :ocv:class:`FaceRecognizer`. In this example I'll create a Fisherfaces model and decide to keep all of the possible Fisherfaces: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Create a new Fisherfaces model and retain all available Fisherfaces, |
||||
// this is the most common usage of this specific FaceRecognizer: |
||||
// |
||||
Ptr<FaceRecognizer> model = createFisherFaceRecognizer(); |
||||
|
||||
And finally train it on the given dataset (the face images and labels): |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// This is the common interface to train all of the available cv::FaceRecognizer |
||||
// implementations: |
||||
// |
||||
model->train(images, labels); |
||||
|
||||
FaceRecognizer::update |
||||
---------------------- |
||||
|
||||
Updates a FaceRecognizer with given data and associated labels. |
||||
|
||||
.. ocv:function:: void FaceRecognizer::update( InputArrayOfArrays src, InputArray labels ) |
||||
|
||||
:param src: The training images, that means the faces you want to learn. The data has to be given as a ``vector<Mat>``. |
||||
|
||||
:param labels: The labels corresponding to the images have to be given either as a ``vector<int>`` or a |
||||
|
||||
This method updates a (probably trained) :ocv:class:`FaceRecognizer`, but only if the algorithm supports it. The Local Binary Patterns Histograms (LBPH) recognizer (see :ocv:func:`createLBPHFaceRecognizer`) can be updated. For the Eigenfaces and Fisherfaces method, this is algorithmically not possible and you have to re-estimate the model with :ocv:func:`FaceRecognizer::train`. In any case, a call to train empties the existing model and learns a new model, while update does not delete any model data. |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Create a new LBPH model (it can be updated) and use the default parameters, |
||||
// this is the most common usage of this specific FaceRecognizer: |
||||
// |
||||
Ptr<FaceRecognizer> model = createLBPHFaceRecognizer(); |
||||
// This is the common interface to train all of the available cv::FaceRecognizer |
||||
// implementations: |
||||
// |
||||
model->train(images, labels); |
||||
// Some containers to hold new image: |
||||
vector<Mat> newImages; |
||||
vector<int> newLabels; |
||||
// You should add some images to the containers: |
||||
// |
||||
// ... |
||||
// |
||||
// Now updating the model is as easy as calling: |
||||
model->update(newImages,newLabels); |
||||
// This will preserve the old model data and extend the existing model |
||||
// with the new features extracted from newImages! |
||||
|
||||
Calling update on an Eigenfaces model (see :ocv:func:`createEigenFaceRecognizer`), which doesn't support updating, will throw an error similar to: |
||||
|
||||
.. code-block:: none |
||||
|
||||
OpenCV Error: The function/feature is not implemented (This FaceRecognizer (FaceRecognizer.Eigenfaces) does not support updating, you have to use FaceRecognizer::train to update it.) in update, file /home/philipp/git/opencv/modules/contrib/src/facerec.cpp, line 305 |
||||
terminate called after throwing an instance of 'cv::Exception' |
||||
|
||||
Please note: The :ocv:class:`FaceRecognizer` does not store your training images, because this would be very memory intense and it's not the responsibility of te :ocv:class:`FaceRecognizer` to do so. The caller is responsible for maintaining the dataset, he want to work with. |
||||
|
||||
FaceRecognizer::predict |
||||
----------------------- |
||||
|
||||
.. ocv:function:: int FaceRecognizer::predict( InputArray src ) const = 0 |
||||
.. ocv:function:: void FaceRecognizer::predict( InputArray src, int & label, double & confidence ) const = 0 |
||||
|
||||
Predicts a label and associated confidence (e.g. distance) for a given input image. |
||||
|
||||
:param src: Sample image to get a prediction from. |
||||
:param label: The predicted label for the given image. |
||||
:param confidence: Associated confidence (e.g. distance) for the predicted label. |
||||
|
||||
The suffix ``const`` means that prediction does not affect the internal model |
||||
state, so the method can be safely called from within different threads. |
||||
|
||||
The following example shows how to get a prediction from a trained model: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
using namespace cv; |
||||
// Do your initialization here (create the cv::FaceRecognizer model) ... |
||||
// ... |
||||
// Read in a sample image: |
||||
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE); |
||||
// And get a prediction from the cv::FaceRecognizer: |
||||
int predicted = model->predict(img); |
||||
|
||||
Or to get a prediction and the associated confidence (e.g. distance): |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
using namespace cv; |
||||
// Do your initialization here (create the cv::FaceRecognizer model) ... |
||||
// ... |
||||
Mat img = imread("person1/3.jpg", CV_LOAD_IMAGE_GRAYSCALE); |
||||
// Some variables for the predicted label and associated confidence (e.g. distance): |
||||
int predicted_label = -1; |
||||
double predicted_confidence = 0.0; |
||||
// Get the prediction and associated confidence from the model |
||||
model->predict(img, predicted_label, predicted_confidence); |
||||
|
||||
FaceRecognizer::save |
||||
-------------------- |
||||
|
||||
Saves a :ocv:class:`FaceRecognizer` and its model state. |
||||
|
||||
.. ocv:function:: void FaceRecognizer::save(const String& filename) const |
||||
|
||||
Saves this model to a given filename, either as XML or YAML. |
||||
|
||||
:param filename: The filename to store this :ocv:class:`FaceRecognizer` to (either XML/YAML). |
||||
|
||||
.. ocv:function:: void FaceRecognizer::save(FileStorage& fs) const |
||||
|
||||
Saves this model to a given :ocv:class:`FileStorage`. |
||||
|
||||
:param fs: The :ocv:class:`FileStorage` to store this :ocv:class:`FaceRecognizer` to. |
||||
|
||||
|
||||
Every :ocv:class:`FaceRecognizer` overwrites ``FaceRecognizer::save(FileStorage& fs)`` |
||||
to save the internal model state. ``FaceRecognizer::save(const String& filename)`` saves |
||||
the state of a model to the given filename. |
||||
|
||||
The suffix ``const`` means that prediction does not affect the internal model |
||||
state, so the method can be safely called from within different threads. |
||||
|
||||
FaceRecognizer::load |
||||
-------------------- |
||||
|
||||
Loads a :ocv:class:`FaceRecognizer` and its model state. |
||||
|
||||
.. ocv:function:: void FaceRecognizer::load( const String& filename ) |
||||
.. ocv:function:: void FaceRecognizer::load( const FileStorage& fs ) = 0 |
||||
|
||||
Loads a persisted model and state from a given XML or YAML file . Every |
||||
:ocv:class:`FaceRecognizer` has to overwrite ``FaceRecognizer::load(FileStorage& fs)`` |
||||
to enable loading the model state. ``FaceRecognizer::load(FileStorage& fs)`` in |
||||
turn gets called by ``FaceRecognizer::load(const String& filename)``, to ease |
||||
saving a model. |
||||
|
||||
FaceRecognizer::setLabelInfo |
||||
----------------------------- |
||||
|
||||
Sets string info for the specified model's label. |
||||
.. ocv:function:: void FaceRecognizer::setLabelInfo(int label, const String& strInfo) |
||||
|
||||
The string info is replaced by the provided value if it was set before for the specified label. |
||||
|
||||
FaceRecognizer::getLabelInfo |
||||
---------------------------- |
||||
|
||||
Gets string information by label. |
||||
.. ocv:function:: String FaceRecognizer::getLabelInfo(int label) |
||||
|
||||
If an unknown label id is provided or there is no label information associated with the specified label id the method returns an empty string. |
||||
|
||||
FaceRecognizer::getLabelsByString |
||||
--------------------------------- |
||||
Gets vector of labels by string. |
||||
|
||||
.. ocv:function:: vector<int> FaceRecognizer::getLabelsByString(const String& str) |
||||
|
||||
The function searches for the labels containing the specified sub-string in the associated string info. |
||||
|
||||
createEigenFaceRecognizer |
||||
------------------------- |
||||
|
||||
.. ocv:function:: Ptr<FaceRecognizer> createEigenFaceRecognizer(int num_components = 0, double threshold = DBL_MAX) |
||||
|
||||
:param num_components: The number of components (read: Eigenfaces) kept for this Principal Component Analysis. As a hint: There's no rule how many components (read: Eigenfaces) should be kept for good reconstruction capabilities. It is based on your input data, so experiment with the number. Keeping 80 components should almost always be sufficient. |
||||
|
||||
:param threshold: The threshold applied in the prediction. |
||||
|
||||
Notes: |
||||
++++++ |
||||
|
||||
* Training and prediction must be done on grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces. |
||||
* **THE EIGENFACES METHOD MAKES THE ASSUMPTION, THAT THE TRAINING AND TEST IMAGES ARE OF EQUAL SIZE.** (caps-lock, because I got so many mails asking for this). You have to make sure your input data has the correct shape, else a meaningful exception is thrown. Use :ocv:func:`resize` to resize the images. |
||||
* This model does not support updating. |
||||
|
||||
Model internal data: |
||||
++++++++++++++++++++ |
||||
|
||||
* ``num_components`` see :ocv:func:`createEigenFaceRecognizer`. |
||||
* ``threshold`` see :ocv:func:`createEigenFaceRecognizer`. |
||||
* ``eigenvalues`` The eigenvalues for this Principal Component Analysis (ordered descending). |
||||
* ``eigenvectors`` The eigenvectors for this Principal Component Analysis (ordered by their eigenvalue). |
||||
* ``mean`` The sample mean calculated from the training data. |
||||
* ``projections`` The projections of the training data. |
||||
* ``labels`` The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1. |
||||
|
||||
createFisherFaceRecognizer |
||||
-------------------------- |
||||
|
||||
.. ocv:function:: Ptr<FaceRecognizer> createFisherFaceRecognizer(int num_components = 0, double threshold = DBL_MAX) |
||||
|
||||
:param num_components: The number of components (read: Fisherfaces) kept for this Linear Discriminant Analysis with the Fisherfaces criterion. It's useful to keep all components, that means the number of your classes ``c`` (read: subjects, persons you want to recognize). If you leave this at the default (``0``) or set it to a value less-equal ``0`` or greater ``(c-1)``, it will be set to the correct number ``(c-1)`` automatically. |
||||
|
||||
:param threshold: The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1. |
||||
|
||||
Notes: |
||||
++++++ |
||||
|
||||
* Training and prediction must be done on grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces. |
||||
* **THE FISHERFACES METHOD MAKES THE ASSUMPTION, THAT THE TRAINING AND TEST IMAGES ARE OF EQUAL SIZE.** (caps-lock, because I got so many mails asking for this). You have to make sure your input data has the correct shape, else a meaningful exception is thrown. Use :ocv:func:`resize` to resize the images. |
||||
* This model does not support updating. |
||||
|
||||
Model internal data: |
||||
++++++++++++++++++++ |
||||
|
||||
* ``num_components`` see :ocv:func:`createFisherFaceRecognizer`. |
||||
* ``threshold`` see :ocv:func:`createFisherFaceRecognizer`. |
||||
* ``eigenvalues`` The eigenvalues for this Linear Discriminant Analysis (ordered descending). |
||||
* ``eigenvectors`` The eigenvectors for this Linear Discriminant Analysis (ordered by their eigenvalue). |
||||
* ``mean`` The sample mean calculated from the training data. |
||||
* ``projections`` The projections of the training data. |
||||
* ``labels`` The labels corresponding to the projections. |
||||
|
||||
|
||||
createLBPHFaceRecognizer |
||||
------------------------- |
||||
|
||||
.. ocv:function:: Ptr<FaceRecognizer> createLBPHFaceRecognizer(int radius=1, int neighbors=8, int grid_x=8, int grid_y=8, double threshold = DBL_MAX) |
||||
|
||||
:param radius: The radius used for building the Circular Local Binary Pattern. The greater the radius, the |
||||
:param neighbors: The number of sample points to build a Circular Local Binary Pattern from. An appropriate value is to use `` 8`` sample points. Keep in mind: the more sample points you include, the higher the computational cost. |
||||
:param grid_x: The number of cells in the horizontal direction, ``8`` is a common value used in publications. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector. |
||||
:param grid_y: The number of cells in the vertical direction, ``8`` is a common value used in publications. The more cells, the finer the grid, the higher the dimensionality of the resulting feature vector. |
||||
:param threshold: The threshold applied in the prediction. If the distance to the nearest neighbor is larger than the threshold, this method returns -1. |
||||
|
||||
Notes: |
||||
++++++ |
||||
|
||||
* The Circular Local Binary Patterns (used in training and prediction) expect the data given as grayscale images, use :ocv:func:`cvtColor` to convert between the color spaces. |
||||
* This model supports updating. |
||||
|
||||
Model internal data: |
||||
++++++++++++++++++++ |
||||
|
||||
* ``radius`` see :ocv:func:`createLBPHFaceRecognizer`. |
||||
* ``neighbors`` see :ocv:func:`createLBPHFaceRecognizer`. |
||||
* ``grid_x`` see :ocv:func:`createLBPHFaceRecognizer`. |
||||
* ``grid_y`` see :ocv:func:`createLBPHFaceRecognizer`. |
||||
* ``threshold`` see :ocv:func:`createLBPHFaceRecognizer`. |
||||
* ``histograms`` Local Binary Patterns Histograms calculated from the given training data (empty if none was given). |
||||
* ``labels`` Labels corresponding to the calculated Local Binary Patterns Histograms. |
@ -1,86 +0,0 @@ |
||||
Changelog |
||||
========= |
||||
|
||||
Release 0.05 |
||||
------------ |
||||
|
||||
This library is now included in the official OpenCV distribution (from 2.4 on). |
||||
The :ocv:class`FaceRecognizer` is now an :ocv:class:`Algorithm`, which better fits into the overall |
||||
OpenCV API. |
||||
|
||||
To reduce the confusion on user side and minimize my work, libfacerec and OpenCV |
||||
have been synchronized and are now based on the same interfaces and implementation. |
||||
|
||||
The library now has an extensive documentation: |
||||
|
||||
* The API is explained in detail and with a lot of code examples. |
||||
* The face recognition guide I had written for Python and GNU Octave/MATLAB has been adapted to the new OpenCV C++ ``cv::FaceRecognizer``. |
||||
* A tutorial for gender classification with Fisherfaces. |
||||
* A tutorial for face recognition in videos (e.g. webcam). |
||||
|
||||
|
||||
Release highlights |
||||
++++++++++++++++++ |
||||
|
||||
* There are no single highlights to pick from, this release is a highlight itself. |
||||
|
||||
Release 0.04 |
||||
------------ |
||||
|
||||
This version is fully Windows-compatible and works with OpenCV 2.3.1. Several |
||||
bugfixes, but none influenced the recognition rate. |
||||
|
||||
Release highlights |
||||
++++++++++++++++++ |
||||
|
||||
* A whole lot of exceptions with meaningful error messages. |
||||
* A tutorial for Windows users: `http://bytefish.de/blog/opencv_visual_studio_and_libfacerec <http://bytefish.de/blog/opencv_visual_studio_and_libfacerec>`_ |
||||
|
||||
|
||||
Release 0.03 |
||||
------------ |
||||
|
||||
Reworked the library to provide separate implementations in cpp files, because |
||||
it's the preferred way of contributing OpenCV libraries. This means the library |
||||
is not header-only anymore. Slight API changes were done, please see the |
||||
documentation for details. |
||||
|
||||
Release highlights |
||||
++++++++++++++++++ |
||||
|
||||
* New Unit Tests (for LBP Histograms) make the library more robust. |
||||
* Added more documentation. |
||||
|
||||
|
||||
Release 0.02 |
||||
------------ |
||||
|
||||
Reworked the library to provide separate implementations in cpp files, because |
||||
it's the preferred way of contributing OpenCV libraries. This means the library |
||||
is not header-only anymore. Slight API changes were done, please see the |
||||
documentation for details. |
||||
|
||||
Release highlights |
||||
++++++++++++++++++ |
||||
|
||||
* New Unit Tests (for LBP Histograms) make the library more robust. |
||||
* Added a documentation and changelog in reStructuredText. |
||||
|
||||
Release 0.01 |
||||
------------ |
||||
|
||||
Initial release as header-only library. |
||||
|
||||
Release highlights |
||||
++++++++++++++++++ |
||||
|
||||
* Colormaps for OpenCV to enhance the visualization. |
||||
* Face Recognition algorithms implemented: |
||||
|
||||
* Eigenfaces [TP91]_ |
||||
* Fisherfaces [BHK97]_ |
||||
* Local Binary Patterns Histograms [AHP04]_ |
||||
|
||||
* Added persistence facilities to store the models with a common API. |
||||
* Unit Tests (using `gtest <http://code.google.com/p/googletest/>`_). |
||||
* Providing a CMakeLists.txt to enable easy cross-platform building. |
@ -1,628 +0,0 @@ |
||||
Face Recognition with OpenCV |
||||
############################ |
||||
|
||||
.. contents:: Table of Contents |
||||
:depth: 3 |
||||
|
||||
Introduction |
||||
============ |
||||
|
||||
`OpenCV (Open Source Computer Vision) <http://opencv.org>`_ is a popular computer vision library started by `Intel <http://www.intel.com>`_ in 1999. The cross-platform library sets its focus on real-time image processing and includes patent-free implementations of the latest computer vision algorithms. In 2008 `Willow Garage <http://www.willowgarage.com>`_ took over support and OpenCV 2.3.1 now comes with a programming interface to C, C++, `Python <http://www.python.org>`_ and `Android <http://www.android.com>`_. OpenCV is released under a BSD license so it is used in academic projects and commercial products alike. |
||||
|
||||
OpenCV 2.4 now comes with the very new :ocv:class:`FaceRecognizer` class for face recognition, so you can start experimenting with face recognition right away. This document is the guide I've wished for, when I was working myself into face recognition. It shows you how to perform face recognition with :ocv:class:`FaceRecognizer` in OpenCV (with full source code listings) and gives you an introduction into the algorithms behind. I'll also show how to create the visualizations you can find in many publications, because a lot of people asked for. |
||||
|
||||
The currently available algorithms are: |
||||
|
||||
* Eigenfaces (see :ocv:func:`createEigenFaceRecognizer`) |
||||
* Fisherfaces (see :ocv:func:`createFisherFaceRecognizer`) |
||||
* Local Binary Patterns Histograms (see :ocv:func:`createLBPHFaceRecognizer`) |
||||
|
||||
You don't need to copy and paste the source code examples from this page, because they are available in the ``src`` folder coming with this documentation. If you have built OpenCV with the samples turned on, chances are good you have them compiled already! Although it might be interesting for very advanced users, I've decided to leave the implementation details out as I am afraid they confuse new users. |
||||
|
||||
All code in this document is released under the `BSD license <http://www.opensource.org/licenses/bsd-license>`_, so feel free to use it for your projects. |
||||
|
||||
Face Recognition |
||||
================ |
||||
|
||||
Face recognition is an easy task for humans. Experiments in [Tu06]_ have shown, that even one to three day old babies are able to distinguish between known faces. So how hard could it be for a computer? It turns out we know little about human recognition to date. Are inner features (eyes, nose, mouth) or outer features (head shape, hairline) used for a successful face recognition? How do we analyze an image and how does the brain encode it? It was shown by `David Hubel <http://en.wikipedia.org/wiki/David_H._Hubel>`_ and `Torsten Wiesel <http://en.wikipedia.org/wiki/Torsten_Wiesel>`_, that our brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges, angles or movement. Since we don't see the world as scattered pieces, our visual cortex must somehow combine the different sources of information into useful patterns. Automatic face recognition is all about extracting those meaningful features from an image, putting them into a useful representation and performing some kind of classification on them. |
||||
|
||||
Face recognition based on the geometric features of a face is probably the most intuitive approach to face recognition. One of the first automated face recognition systems was described in [Kanade73]_: marker points (position of eyes, ears, nose, ...) were used to build a feature vector (distance between the points, angle between them, ...). The recognition was performed by calculating the euclidean distance between feature vectors of a probe and reference image. Such a method is robust against changes in illumination by its nature, but has a huge drawback: the accurate registration of the marker points is complicated, even with state of the art algorithms. Some of the latest work on geometric face recognition was carried out in [Bru92]_. A 22-dimensional feature vector was used and experiments on large datasets have shown, that geometrical features alone my not carry enough information for face recognition. |
||||
|
||||
The Eigenfaces method described in [TP91]_ took a holistic approach to face recognition: A facial image is a point from a high-dimensional image space and a lower-dimensional representation is found, where classification becomes easy. The lower-dimensional subspace is found with Principal Component Analysis, which identifies the axes with maximum variance. While this kind of transformation is optimal from a reconstruction standpoint, it doesn't take any class labels into account. Imagine a situation where the variance is generated from external sources, let it be light. The axes with maximum variance do not necessarily contain any discriminative information at all, hence a classification becomes impossible. So a class-specific projection with a Linear Discriminant Analysis was applied to face recognition in [BHK97]_. The basic idea is to minimize the variance within a class, while maximizing the variance between the classes at the same time. |
||||
|
||||
Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of the input data only local regions of an image are described, the extracted features are (hopefully) more robust against partial occlusion, illumation and small sample size. Algorithms used for a local feature extraction are Gabor Wavelets ([Wiskott97]_), Discrete Cosinus Transform ([Messer06]_) and Local Binary Patterns ([AHP04]_). It's still an open research question what's the best way to preserve spatial information when applying a local feature extraction, because spatial information is potentially useful information. |
||||
|
||||
Face Database |
||||
============== |
||||
|
||||
Let's get some data to experiment with first. I don't want to do a toy example here. We are doing face recognition, so you'll need some face images! You can either create your own dataset or start with one of the available face databases, `http://face-rec.org/databases/ <http://face-rec.org/databases>`_ gives you an up-to-date overview. Three interesting databases are (parts of the description are quoted from `http://face-rec.org <http://face-rec.org>`_): |
||||
|
||||
* `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_ The AT&T Facedatabase, sometimes also referred to as *ORL Database of Faces*, contains ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). |
||||
|
||||
* `Yale Facedatabase A <http://vision.ucsd.edu/content/yale-face-database>`_, also known as Yalefaces. The AT&T Facedatabase is good for initial tests, but it's a fairly easy database. The Eigenfaces method already has a 97% recognition rate on it, so you won't see any great improvements with other algorithms. The Yale Facedatabase A (also known as Yalefaces) is a more appropriate dataset for initial experiments, because the recognition problem is harder. The database consists of 15 people (14 male, 1 female) each with 11 grayscale images sized :math:`320 \times 243` pixel. There are changes in the light conditions (center light, left light, right light), facial expressions (happy, normal, sad, sleepy, surprised, wink) and glasses (glasses, no-glasses). |
||||
|
||||
The original images are not cropped and aligned. Please look into the :ref:`appendixft` for a Python script, that does the job for you. |
||||
|
||||
* `Extended Yale Facedatabase B <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html>`_ The Extended Yale Facedatabase B contains 2414 images of 38 different people in its cropped version. The focus of this database is set on extracting features that are robust to illumination, the images have almost no variation in emotion/occlusion/... . I personally think, that this dataset is too large for the experiments I perform in this document. You better use the `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_ for intial testing. A first version of the Yale Facedatabase B was used in [BHK97]_ to see how the Eigenfaces and Fisherfaces method perform under heavy illumination changes. [Lee05]_ used the same setup to take 16128 images of 28 people. The Extended Yale Facedatabase B is the merge of the two databases, which is now known as Extended Yalefacedatabase B. |
||||
|
||||
Preparing the data |
||||
------------------- |
||||
|
||||
Once we have acquired some data, we'll need to read it in our program. In the demo applications I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this: |
||||
|
||||
.. code-block:: none |
||||
|
||||
/path/to/image.ext;0 |
||||
|
||||
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign the label ``0`` to the image. Think of the label as the subject (the person) this image belongs to, so same subjects (persons) should have the same label. |
||||
|
||||
Download the AT&T Facedatabase from AT&T Facedatabase and the corresponding CSV file from at.txt, which looks like this (file is without ... of course): |
||||
|
||||
.. code-block:: none |
||||
|
||||
./at/s1/1.pgm;0 |
||||
./at/s1/2.pgm;0 |
||||
... |
||||
./at/s2/1.pgm;1 |
||||
./at/s2/2.pgm;1 |
||||
... |
||||
./at/s40/1.pgm;39 |
||||
./at/s40/2.pgm;39 |
||||
|
||||
Imagine I have extracted the files to ``D:/data/at`` and have downloaded the CSV file to ``D:/data/at.txt``. Then you would simply need to Search & Replace ``./`` with ``D:/data/``. You can do that in an editor of your choice, every sufficiently advanced editor can do this. Once you have a CSV file with valid filenames and labels, you can run any of the demos by passing the path to the CSV file as parameter: |
||||
|
||||
.. code-block:: none |
||||
|
||||
facerec_demo.exe D:/data/at.txt |
||||
|
||||
Creating the CSV File |
||||
+++++++++++++++++++++ |
||||
|
||||
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``): |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data/at$ tree |
||||
. |
||||
|-- s1 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|-- s2 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
... |
||||
|-- s40 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|
||||
|
||||
Then simply call create_csv.py with the path to the folder, just like this and you could save the output: |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data$ python create_csv.py |
||||
at/s13/2.pgm;0 |
||||
at/s13/7.pgm;0 |
||||
at/s13/6.pgm;0 |
||||
at/s13/9.pgm;0 |
||||
at/s13/5.pgm;0 |
||||
at/s13/3.pgm;0 |
||||
at/s13/4.pgm;0 |
||||
at/s13/10.pgm;0 |
||||
at/s13/8.pgm;0 |
||||
at/s13/1.pgm;0 |
||||
at/s17/2.pgm;1 |
||||
at/s17/7.pgm;1 |
||||
at/s17/6.pgm;1 |
||||
at/s17/9.pgm;1 |
||||
at/s17/5.pgm;1 |
||||
at/s17/3.pgm;1 |
||||
[...] |
||||
|
||||
Please see the :ref:`appendixft` for additional informations. |
||||
|
||||
Eigenfaces |
||||
========== |
||||
|
||||
The problem with the image representation we are given is its high dimensionality. Two-dimensional :math:`p \times q` grayscale images span a :math:`m = pq`-dimensional vector space, so an image with :math:`100 \times 100` pixels lies in a :math:`10,000`-dimensional image space already. The question is: Are all dimensions equally useful for us? We can only make a decision if there's any variance in data, so what we are looking for are the components that account for most of the information. The Principal Component Analysis (PCA) was independently proposed by `Karl Pearson <http://en.wikipedia.org/wiki/Karl_Pearson>`_ (1901) and `Harold Hotelling <http://en.wikipedia.org/wiki/Harold_Hotelling>`_ (1933) to turn a set of possibly correlated variables into a smaller set of uncorrelated variables. The idea is, that a high-dimensional dataset is often described by correlated variables and therefore only a few meaningful dimensions account for most of the information. The PCA method finds the directions with the greatest variance in the data, called principal components. |
||||
|
||||
Algorithmic Description |
||||
----------------------- |
||||
|
||||
Let :math:`X = \{ x_{1}, x_{2}, \ldots, x_{n} \}` be a random vector with observations :math:`x_i \in R^{d}`. |
||||
|
||||
1. Compute the mean :math:`\mu` |
||||
|
||||
.. math:: |
||||
|
||||
\mu = \frac{1}{n} \sum_{i=1}^{n} x_{i} |
||||
|
||||
2. Compute the the Covariance Matrix `S` |
||||
|
||||
.. math:: |
||||
|
||||
S = \frac{1}{n} \sum_{i=1}^{n} (x_{i} - \mu) (x_{i} - \mu)^{T}` |
||||
|
||||
3. Compute the eigenvalues :math:`\lambda_{i}` and eigenvectors :math:`v_{i}` of :math:`S` |
||||
|
||||
.. math:: |
||||
|
||||
S v_{i} = \lambda_{i} v_{i}, i=1,2,\ldots,n |
||||
|
||||
4. Order the eigenvectors descending by their eigenvalue. The :math:`k` principal components are the eigenvectors corresponding to the :math:`k` largest eigenvalues. |
||||
|
||||
The :math:`k` principal components of the observed vector :math:`x` are then given by: |
||||
|
||||
.. math:: |
||||
|
||||
y = W^{T} (x - \mu) |
||||
|
||||
|
||||
where :math:`W = (v_{1}, v_{2}, \ldots, v_{k})`. |
||||
|
||||
The reconstruction from the PCA basis is given by: |
||||
|
||||
.. math:: |
||||
|
||||
x = W y + \mu |
||||
|
||||
where :math:`W = (v_{1}, v_{2}, \ldots, v_{k})`. |
||||
|
||||
|
||||
The Eigenfaces method then performs face recognition by: |
||||
|
||||
* Projecting all training samples into the PCA subspace. |
||||
* Projecting the query image into the PCA subspace. |
||||
* Finding the nearest neighbor between the projected training images and the projected query image. |
||||
|
||||
Still there's one problem left to solve. Imagine we are given :math:`400` images sized :math:`100 \times 100` pixel. The Principal Component Analysis solves the covariance matrix :math:`S = X X^{T}`, where :math:`{size}(X) = 10000 \times 400` in our example. You would end up with a :math:`10000 \times 10000` matrix, roughly :math:`0.8 GB`. Solving this problem isn't feasible, so we'll need to apply a trick. From your linear algebra lessons you know that a :math:`M \times N` matrix with :math:`M > N` can only have :math:`N - 1` non-zero eigenvalues. So it's possible to take the eigenvalue decomposition :math:`S = X^{T} X` of size :math:`N \times N` instead: |
||||
|
||||
.. math:: |
||||
|
||||
X^{T} X v_{i} = \lambda_{i} v{i} |
||||
|
||||
|
||||
and get the original eigenvectors of :math:`S = X X^{T}` with a left multiplication of the data matrix: |
||||
|
||||
.. math:: |
||||
|
||||
X X^{T} (X v_{i}) = \lambda_{i} (X v_{i}) |
||||
|
||||
The resulting eigenvectors are orthogonal, to get orthonormal eigenvectors they need to be normalized to unit length. I don't want to turn this into a publication, so please look into [Duda01]_ for the derivation and proof of the equations. |
||||
|
||||
Eigenfaces in OpenCV |
||||
-------------------- |
||||
|
||||
For the first source code example, I'll go through it with you. I am first giving you the whole source code listing, and after this we'll look at the most important lines in detail. Please note: every source code listing is commented in detail, so you should have no problems following it. |
||||
|
||||
.. literalinclude:: src/facerec_eigenfaces.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
The source code for this demo application is also available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_eigenfaces.cpp <src/facerec_eigenfaces.cpp>` |
||||
|
||||
|
||||
I've used the jet colormap, so you can see how the grayscale values are distributed within the specific Eigenfaces. You can see, that the Eigenfaces do not only encode facial features, but also the illumination in the images (see the left light in Eigenface \#4, right light in Eigenfaces \#5): |
||||
|
||||
.. image:: img/eigenfaces_opencv.png |
||||
:align: center |
||||
|
||||
We've already seen, that we can reconstruct a face from its lower dimensional approximation. So let's see how many Eigenfaces are needed for a good reconstruction. I'll do a subplot with :math:`10,30,\ldots,310` Eigenfaces: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Display or save the image reconstruction at some predefined steps: |
||||
for(int num_components = 10; num_components < 300; num_components+=15) { |
||||
// slice the eigenvectors from the model |
||||
Mat evs = Mat(W, Range::all(), Range(0, num_components)); |
||||
Mat projection = subspaceProject(evs, mean, images[0].reshape(1,1)); |
||||
Mat reconstruction = subspaceReconstruct(evs, mean, projection); |
||||
// Normalize the result: |
||||
reconstruction = norm_0_255(reconstruction.reshape(1, images[0].rows)); |
||||
// Display or save: |
||||
if(argc == 2) { |
||||
imshow(format("eigenface_reconstruction_%d", num_components), reconstruction); |
||||
} else { |
||||
imwrite(format("%s/eigenface_reconstruction_%d.png", output_folder.c_str(), num_components), reconstruction); |
||||
} |
||||
} |
||||
|
||||
10 Eigenvectors are obviously not sufficient for a good image reconstruction, 50 Eigenvectors may already be sufficient to encode important facial features. You'll get a good reconstruction with approximately 300 Eigenvectors for the AT&T Facedatabase. There are rule of thumbs how many Eigenfaces you should choose for a successful face recognition, but it heavily depends on the input data. [Zhao03]_ is the perfect point to start researching for this: |
||||
|
||||
.. image:: img/eigenface_reconstruction_opencv.png |
||||
:align: center |
||||
|
||||
|
||||
Fisherfaces |
||||
============ |
||||
|
||||
The Principal Component Analysis (PCA), which is the core of the Eigenfaces method, finds a linear combination of features that maximizes the total variance in data. While this is clearly a powerful way to represent data, it doesn't consider any classes and so a lot of discriminative information *may* be lost when throwing components away. Imagine a situation where the variance in your data is generated by an external source, let it be the light. The components identified by a PCA do not necessarily contain any discriminative information at all, so the projected samples are smeared together and a classification becomes impossible (see `http://www.bytefish.de/wiki/pca_lda_with_gnu_octave <http://www.bytefish.de/wiki/pca_lda_with_gnu_octave>`_ for an example). |
||||
|
||||
The Linear Discriminant Analysis performs a class-specific dimensionality reduction and was invented by the great statistician `Sir R. A. Fisher <http://en.wikipedia.org/wiki/Ronald_Fisher>`_. He successfully used it for classifying flowers in his 1936 paper *The use of multiple measurements in taxonomic problems* [Fisher36]_. In order to find the combination of features that separates best between classes the Linear Discriminant Analysis maximizes the ratio of between-classes to within-classes scatter, instead of maximizing the overall scatter. The idea is simple: same classes should cluster tightly together, while different classes are as far away as possible from each other in the lower-dimensional representation. This was also recognized by `Belhumeur <http://www.cs.columbia.edu/~belhumeur/>`_, `Hespanha <http://www.ece.ucsb.edu/~hespanha/>`_ and `Kriegman <http://cseweb.ucsd.edu/~kriegman/>`_ and so they applied a Discriminant Analysis to face recognition in [BHK97]_. |
||||
|
||||
Algorithmic Description |
||||
----------------------- |
||||
|
||||
Let :math:`X` be a random vector with samples drawn from :math:`c` classes: |
||||
|
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
X & = & \{X_1,X_2,\ldots,X_c\} \\ |
||||
X_i & = & \{x_1, x_2, \ldots, x_n\} |
||||
\end{align*} |
||||
|
||||
|
||||
The scatter matrices :math:`S_{B}` and `S_{W}` are calculated as: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
S_{B} & = & \sum_{i=1}^{c} N_{i} (\mu_i - \mu)(\mu_i - \mu)^{T} \\ |
||||
S_{W} & = & \sum_{i=1}^{c} \sum_{x_{j} \in X_{i}} (x_j - \mu_i)(x_j - \mu_i)^{T} |
||||
\end{align*} |
||||
|
||||
, where :math:`\mu` is the total mean: |
||||
|
||||
.. math:: |
||||
|
||||
\mu = \frac{1}{N} \sum_{i=1}^{N} x_i |
||||
|
||||
And :math:`\mu_i` is the mean of class :math:`i \in \{1,\ldots,c\}`: |
||||
|
||||
.. math:: |
||||
|
||||
\mu_i = \frac{1}{|X_i|} \sum_{x_j \in X_i} x_j |
||||
|
||||
Fisher's classic algorithm now looks for a projection :math:`W`, that maximizes the class separability criterion: |
||||
|
||||
.. math:: |
||||
|
||||
W_{opt} = \operatorname{arg\,max}_{W} \frac{|W^T S_B W|}{|W^T S_W W|} |
||||
|
||||
|
||||
Following [BHK97]_, a solution for this optimization problem is given by solving the General Eigenvalue Problem: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
S_{B} v_{i} & = & \lambda_{i} S_w v_{i} \nonumber \\ |
||||
S_{W}^{-1} S_{B} v_{i} & = & \lambda_{i} v_{i} |
||||
\end{align*} |
||||
|
||||
There's one problem left to solve: The rank of :math:`S_{W}` is at most :math:`(N-c)`, with :math:`N` samples and :math:`c` classes. In pattern recognition problems the number of samples :math:`N` is almost always samller than the dimension of the input data (the number of pixels), so the scatter matrix :math:`S_{W}` becomes singular (see [RJ91]_). In [BHK97]_ this was solved by performing a Principal Component Analysis on the data and projecting the samples into the :math:`(N-c)`-dimensional space. A Linear Discriminant Analysis was then performed on the reduced data, because :math:`S_{W}` isn't singular anymore. |
||||
|
||||
The optimization problem can then be rewritten as: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
W_{pca} & = & \operatorname{arg\,max}_{W} |W^T S_T W| \\ |
||||
W_{fld} & = & \operatorname{arg\,max}_{W} \frac{|W^T W_{pca}^T S_{B} W_{pca} W|}{|W^T W_{pca}^T S_{W} W_{pca} W|} |
||||
\end{align*} |
||||
|
||||
The transformation matrix :math:`W`, that projects a sample into the :math:`(c-1)`-dimensional space is then given by: |
||||
|
||||
.. math:: |
||||
|
||||
W = W_{fld}^{T} W_{pca}^{T} |
||||
|
||||
Fisherfaces in OpenCV |
||||
--------------------- |
||||
|
||||
.. literalinclude:: src/facerec_fisherfaces.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
The source code for this demo application is also available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_fisherfaces.cpp <src/facerec_fisherfaces.cpp>` |
||||
|
||||
|
||||
For this example I am going to use the Yale Facedatabase A, just because the plots are nicer. Each Fisherface has the same length as an original image, thus it can be displayed as an image. The demo shows (or saves) the first, at most 16 Fisherfaces: |
||||
|
||||
.. image:: img/fisherfaces_opencv.png |
||||
:align: center |
||||
|
||||
The Fisherfaces method learns a class-specific transformation matrix, so the they do not capture illumination as obviously as the Eigenfaces method. The Discriminant Analysis instead finds the facial features to discriminate between the persons. It's important to mention, that the performance of the Fisherfaces heavily depends on the input data as well. Practically said: if you learn the Fisherfaces for well-illuminated pictures only and you try to recognize faces in bad-illuminated scenes, then method is likely to find the wrong components (just because those features may not be predominant on bad illuminated images). This is somewhat logical, since the method had no chance to learn the illumination. |
||||
|
||||
The Fisherfaces allow a reconstruction of the projected image, just like the Eigenfaces did. But since we only identified the features to distinguish between subjects, you can't expect a nice reconstruction of the original image. For the Fisherfaces method we'll project the sample image onto each of the Fisherfaces instead. So you'll have a nice visualization, which feature each of the Fisherfaces describes: |
||||
|
||||
.. code-block:: cpp |
||||
|
||||
// Display or save the image reconstruction at some predefined steps: |
||||
for(int num_component = 0; num_component < min(16, W.cols); num_component++) { |
||||
// Slice the Fisherface from the model: |
||||
Mat ev = W.col(num_component); |
||||
Mat projection = subspaceProject(ev, mean, images[0].reshape(1,1)); |
||||
Mat reconstruction = subspaceReconstruct(ev, mean, projection); |
||||
// Normalize the result: |
||||
reconstruction = norm_0_255(reconstruction.reshape(1, images[0].rows)); |
||||
// Display or save: |
||||
if(argc == 2) { |
||||
imshow(format("fisherface_reconstruction_%d", num_component), reconstruction); |
||||
} else { |
||||
imwrite(format("%s/fisherface_reconstruction_%d.png", output_folder.c_str(), num_component), reconstruction); |
||||
} |
||||
} |
||||
|
||||
The differences may be subtle for the human eyes, but you should be able to see some differences: |
||||
|
||||
.. image:: img/fisherface_reconstruction_opencv.png |
||||
:align: center |
||||
|
||||
|
||||
Local Binary Patterns Histograms |
||||
================================ |
||||
|
||||
Eigenfaces and Fisherfaces take a somewhat holistic approach to face recognition. You treat your data as a vector somewhere in a high-dimensional image space. We all know high-dimensionality is bad, so a lower-dimensional subspace is identified, where (probably) useful information is preserved. The Eigenfaces approach maximizes the total scatter, which can lead to problems if the variance is generated by an external source, because components with a maximum variance over all classes aren't necessarily useful for classification (see `http://www.bytefish.de/wiki/pca_lda_with_gnu_octave <http://www.bytefish.de/wiki/pca_lda_with_gnu_octave>`_). So to preserve some discriminative information we applied a Linear Discriminant Analysis and optimized as described in the Fisherfaces method. The Fisherfaces method worked great... at least for the constrained scenario we've assumed in our model. |
||||
|
||||
Now real life isn't perfect. You simply can't guarantee perfect light settings in your images or 10 different images of a person. So what if there's only one image for each person? Our covariance estimates for the subspace *may* be horribly wrong, so will the recognition. Remember the Eigenfaces method had a 96% recognition rate on the AT&T Facedatabase? How many images do we actually need to get such useful estimates? Here are the Rank-1 recognition rates of the Eigenfaces and Fisherfaces method on the AT&T Facedatabase, which is a fairly easy image database: |
||||
|
||||
.. image:: img/at_database_small_sample_size.png |
||||
:scale: 60% |
||||
:align: center |
||||
|
||||
So in order to get good recognition rates you'll need at least 8(+-1) images for each person and the Fisherfaces method doesn't really help here. The above experiment is a 10-fold cross validated result carried out with the facerec framework at: `https://github.com/bytefish/facerec <https://github.com/bytefish/facerec>`_. This is not a publication, so I won't back these figures with a deep mathematical analysis. Please have a look into [KM01]_ for a detailed analysis of both methods, when it comes to small training datasets. |
||||
|
||||
So some research concentrated on extracting local features from images. The idea is to not look at the whole image as a high-dimensional vector, but describe only local features of an object. The features you extract this way will have a low-dimensionality implicitly. A fine idea! But you'll soon observe the image representation we are given doesn't only suffer from illumination variations. Think of things like scale, translation or rotation in images - your local description has to be at least a bit robust against those things. Just like ``SIFT``, the Local Binary Patterns methodology has its roots in 2D texture analysis. The basic idea of Local Binary Patterns is to summarize the local structure in an image by comparing each pixel with its neighborhood. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater-equal its neighbor, then denote it with 1 and 0 if not. You'll end up with a binary number for each pixel, just like 11001111. So with 8 surrounding pixels you'll end up with 2^8 possible combinations, called *Local Binary Patterns* or sometimes referred to as *LBP codes*. The first LBP operator described in literature actually used a fixed 3 x 3 neighborhood just like this: |
||||
|
||||
.. image:: img/lbp/lbp.png |
||||
:scale: 80% |
||||
:align: center |
||||
|
||||
Algorithmic Description |
||||
----------------------- |
||||
|
||||
A more formal description of the LBP operator can be given as: |
||||
|
||||
.. math:: |
||||
|
||||
LBP(x_c, y_c) = \sum_{p=0}^{P-1} 2^p s(i_p - i_c) |
||||
|
||||
, with :math:`(x_c, y_c)` as central pixel with intensity :math:`i_c`; and :math:`i_n` being the intensity of the the neighbor pixel. :math:`s` is the sign function defined as: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{equation} |
||||
s(x) = |
||||
\begin{cases} |
||||
1 & \text{if $x \geq 0$}\\ |
||||
0 & \text{else} |
||||
\end{cases} |
||||
\end{equation} |
||||
|
||||
This description enables you to capture very fine grained details in images. In fact the authors were able to compete with state of the art results for texture classification. Soon after the operator was published it was noted, that a fixed neighborhood fails to encode details differing in scale. So the operator was extended to use a variable neighborhood in [AHP04]_. The idea is to align an abritrary number of neighbors on a circle with a variable radius, which enables to capture the following neighborhoods: |
||||
|
||||
.. image:: img/lbp/patterns.png |
||||
:scale: 80% |
||||
:align: center |
||||
|
||||
For a given Point :math:`(x_c,y_c)` the position of the neighbor :math:`(x_p,y_p), p \in P` can be calculated by: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
x_{p} & = & x_c + R \cos({\frac{2\pi p}{P}})\\ |
||||
y_{p} & = & y_c - R \sin({\frac{2\pi p}{P}}) |
||||
\end{align*} |
||||
|
||||
Where :math:`R` is the radius of the circle and :math:`P` is the number of sample points. |
||||
|
||||
The operator is an extension to the original LBP codes, so it's sometimes called *Extended LBP* (also referred to as *Circular LBP*) . If a points coordinate on the circle doesn't correspond to image coordinates, the point get's interpolated. Computer science has a bunch of clever interpolation schemes, the OpenCV implementation does a bilinear interpolation: |
||||
|
||||
.. math:: |
||||
:nowrap: |
||||
|
||||
\begin{align*} |
||||
f(x,y) \approx \begin{bmatrix} |
||||
1-x & x \end{bmatrix} \begin{bmatrix} |
||||
f(0,0) & f(0,1) \\ |
||||
f(1,0) & f(1,1) \end{bmatrix} \begin{bmatrix} |
||||
1-y \\ |
||||
y \end{bmatrix}. |
||||
\end{align*} |
||||
|
||||
By definition the LBP operator is robust against monotonic gray scale transformations. We can easily verify this by looking at the LBP image of an artificially modified image (so you see what an LBP image looks like!): |
||||
|
||||
.. image:: img/lbp/lbp_yale.jpg |
||||
:scale: 60% |
||||
:align: center |
||||
|
||||
So what's left to do is how to incorporate the spatial information in the face recognition model. The representation proposed by Ahonen et. al [AHP04]_ is to divide the LBP image into :math:`m` local regions and extract a histogram from each. The spatially enhanced feature vector is then obtained by concatenating the local histograms (**not merging them**). These histograms are called *Local Binary Patterns Histograms*. |
||||
|
||||
Local Binary Patterns Histograms in OpenCV |
||||
------------------------------------------ |
||||
|
||||
.. literalinclude:: src/facerec_lbph.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
The source code for this demo application is also available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_lbph.cpp <src/facerec_lbph.cpp>` |
||||
|
||||
Conclusion |
||||
========== |
||||
|
||||
You've learned how to use the new :ocv:class:`FaceRecognizer` in real applications. After reading the document you also know how the algorithms work, so now it's time for you to experiment with the available algorithms. Use them, improve them and let the OpenCV community participate! |
||||
|
||||
Credits |
||||
======= |
||||
|
||||
This document wouldn't be possible without the kind permission to use the face images of the *AT&T Database of Faces* and the *Yale Facedatabase A/B*. |
||||
|
||||
The Database of Faces |
||||
--------------------- |
||||
|
||||
** Important: when using these images, please give credit to "AT&T Laboratories, Cambridge." ** |
||||
|
||||
The Database of Faces, formerly *The ORL Database of Faces*, contains a set of face images taken between April 1992 and April 1994. The database was used in the context of a face recognition project carried out in collaboration with the Speech, Vision and Robotics Group of the Cambridge University Engineering Department. |
||||
|
||||
There are ten different images of each of 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). |
||||
|
||||
The files are in PGM format. The size of each image is 92x112 pixels, with 256 grey levels per pixel. The images are organised in 40 directories (one for each subject), which have names of the form sX, where X indicates the subject number (between 1 and 40). In each of these directories, there are ten different images of that subject, which have names of the form Y.pgm, where Y is the image number for that subject (between 1 and 10). |
||||
|
||||
A copy of the database can be retrieved from: `http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip <http://www.cl.cam.ac.uk/research/dtg/attarchive/pub/data/att_faces.zip>`_. |
||||
|
||||
Yale Facedatabase A |
||||
------------------- |
||||
|
||||
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B.* |
||||
|
||||
The Yale Face Database A (size 6.4MB) contains 165 grayscale images in GIF format of 15 individuals. There are 11 images per subject, one per different facial expression or configuration: center-light, w/glasses, happy, left-light, w/no glasses, normal, right-light, sad, sleepy, surprised, and wink. (Source: `http://cvc.yale.edu/projects/yalefaces/yalefaces.html <http://cvc.yale.edu/projects/yalefaces/yalefaces.html>`_) |
||||
|
||||
Yale Facedatabase B |
||||
-------------------- |
||||
|
||||
*With the permission of the authors I am allowed to show a small number of images (say subject 1 and all the variations) and all images such as Fisherfaces and Eigenfaces from either Yale Facedatabase A or the Yale Facedatabase B.* |
||||
|
||||
The extended Yale Face Database B contains 16128 images of 28 human subjects under 9 poses and 64 illumination conditions. The data format of this database is the same as the Yale Face Database B. Please refer to the homepage of the Yale Face Database B (or one copy of this page) for more detailed information of the data format. |
||||
|
||||
You are free to use the extended Yale Face Database B for research purposes. All publications which use this database should acknowledge the use of "the Exteded Yale Face Database B" and reference Athinodoros Georghiades, Peter Belhumeur, and David Kriegman's paper, "From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose", PAMI, 2001, `[bibtex] <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/athosref.html>`_. |
||||
|
||||
The extended database as opposed to the original Yale Face Database B with 10 subjects was first reported by Kuang-Chih Lee, Jeffrey Ho, and David Kriegman in "Acquiring Linear Subspaces for Face Recognition under Variable Lighting, PAMI, May, 2005 `[pdf] <http://vision.ucsd.edu/~leekc/papers/9pltsIEEE.pdf>`_." All test image data used in the experiments are manually aligned, cropped, and then re-sized to 168x192 images. If you publish your experimental results with the cropped images, please reference the PAMI2005 paper as well. (Source: `http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html <http://vision.ucsd.edu/~leekc/ExtYaleDatabase/ExtYaleB.html>`_) |
||||
|
||||
Literature |
||||
========== |
||||
|
||||
.. [AHP04] Ahonen, T., Hadid, A., and Pietikainen, M. *Face Recognition with Local Binary Patterns.* Computer Vision - ECCV 2004 (2004), 469–481. |
||||
|
||||
.. [BHK97] Belhumeur, P. N., Hespanha, J., and Kriegman, D. *Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection.* IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 7 (1997), 711–720. |
||||
|
||||
.. [Bru92] Brunelli, R., Poggio, T. *Face Recognition through Geometrical Features.* European Conference on Computer Vision (ECCV) 1992, S. 792–800. |
||||
|
||||
.. [Duda01] Duda, Richard O. and Hart, Peter E. and Stork, David G., *Pattern Classification* (2nd Edition) 2001. |
||||
|
||||
.. [Fisher36] Fisher, R. A. *The use of multiple measurements in taxonomic problems.* Annals Eugen. 7 (1936), 179–188. |
||||
|
||||
.. [GBK01] Georghiades, A.S. and Belhumeur, P.N. and Kriegman, D.J., *From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose* IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 6 (2001), 643-660. |
||||
|
||||
.. [Kanade73] Kanade, T. *Picture processing system by computer complex and recognition of human faces.* PhD thesis, Kyoto University, November 1973 |
||||
|
||||
.. [KM01] Martinez, A and Kak, A. *PCA versus LDA* IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 23, No.2, pp. 228-233, 2001. |
||||
|
||||
.. [Lee05] Lee, K., Ho, J., Kriegman, D. *Acquiring Linear Subspaces for Face Recognition under Variable Lighting.* In: IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 27 (2005), Nr. 5 |
||||
|
||||
.. [Messer06] Messer, K. et al. *Performance Characterisation of Face Recognition Algorithms and Their Sensitivity to Severe Illumination Changes.* In: In: ICB, 2006, S. 1–11. |
||||
|
||||
.. [RJ91] S. Raudys and A.K. Jain. *Small sample size effects in statistical pattern recognition: Recommendations for practitioneers.* - IEEE Transactions on Pattern Analysis and Machine Intelligence 13, 3 (1991), 252-264. |
||||
|
||||
.. [Tan10] Tan, X., and Triggs, B. *Enhanced local texture feature sets for face recognition under difficult lighting conditions.* IEEE Transactions on Image Processing 19 (2010), 1635–650. |
||||
|
||||
.. [TP91] Turk, M., and Pentland, A. *Eigenfaces for recognition.* Journal of Cognitive Neuroscience 3 (1991), 71–86. |
||||
|
||||
.. [Tu06] Chiara Turati, Viola Macchi Cassia, F. S., and Leo, I. *Newborns face recognition: Role of inner and outer facial features. Child Development* 77, 2 (2006), 297–311. |
||||
|
||||
.. [Wiskott97] Wiskott, L., Fellous, J., Krüger, N., Malsburg, C. *Face Recognition By Elastic Bunch Graph Matching.* IEEE Transactions on Pattern Analysis and Machine Intelligence 19 (1997), S. 775–779 |
||||
|
||||
.. [Zhao03] Zhao, W., Chellappa, R., Phillips, P., and Rosenfeld, A. Face recognition: A literature survey. ACM Computing Surveys (CSUR) 35, 4 (2003), 399–458. |
||||
|
||||
.. _appendixft: |
||||
|
||||
Appendix |
||||
======== |
||||
|
||||
Creating the CSV File |
||||
--------------------- |
||||
|
||||
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``): |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data/at$ tree |
||||
. |
||||
|-- s1 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|-- s2 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
... |
||||
|-- s40 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|
||||
|
||||
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output: |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data$ python create_csv.py |
||||
at/s13/2.pgm;0 |
||||
at/s13/7.pgm;0 |
||||
at/s13/6.pgm;0 |
||||
at/s13/9.pgm;0 |
||||
at/s13/5.pgm;0 |
||||
at/s13/3.pgm;0 |
||||
at/s13/4.pgm;0 |
||||
at/s13/10.pgm;0 |
||||
at/s13/8.pgm;0 |
||||
at/s13/1.pgm;0 |
||||
at/s17/2.pgm;1 |
||||
at/s17/7.pgm;1 |
||||
at/s17/6.pgm;1 |
||||
at/s17/9.pgm;1 |
||||
at/s17/5.pgm;1 |
||||
at/s17/3.pgm;1 |
||||
[...] |
||||
|
||||
Here is the script, if you can't find it: |
||||
|
||||
.. literalinclude:: ./src/create_csv.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Aligning Face Images |
||||
--------------------- |
||||
|
||||
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where: |
||||
|
||||
* *eye_left* is the position of the left eye |
||||
* *eye_right* is the position of the right eye |
||||
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction) |
||||
* *dest_sz* is the size of the output image |
||||
|
||||
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes. |
||||
|
||||
.. literalinclude:: ./src/crop_face.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have. |
||||
|
||||
Here are some examples: |
||||
|
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| Configuration | Cropped, Scaled, Rotated Face | |
||||
+=================================+============================================================================+ |
||||
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_10_10_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_20_20_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ./img/tutorial/gender_classification/arnie_30_30_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ./img/tutorial/gender_classification/arnie_20_20_70_70.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
|
||||
CSV for the AT&T Facedatabase |
||||
------------------------------ |
||||
|
||||
.. literalinclude:: etc/at.txt |
||||
:language: none |
||||
:linenos: |
@ -1,31 +0,0 @@ |
||||
FaceRecognizer - Face Recognition with OpenCV |
||||
############################################## |
||||
|
||||
OpenCV 2.4 now comes with the very new :ocv:class:`FaceRecognizer` class for face recognition. This documentation is going to explain you :doc:`the API <facerec_api>` in detail and it will give you a lot of help to get started (full source code examples). :doc:`Face Recognition with OpenCV <facerec_tutorial>` is the definite guide to the new :ocv:class:`FaceRecognizer`. There's also a :doc:`tutorial on gender classification <tutorial/facerec_gender_classification>`, a :doc:`tutorial for face recognition in videos <tutorial/facerec_video_recognition>` and it's shown :doc:`how to load & save your results <tutorial/facerec_save_load>`. |
||||
|
||||
These documents are the help I have wished for, when I was working myself into face recognition. I hope you also think the new :ocv:class:`FaceRecognizer` is a useful addition to OpenCV. |
||||
|
||||
Please issue any feature requests and/or bugs on the official OpenCV bug tracker at: |
||||
|
||||
* http://code.opencv.org/projects/opencv/issues |
||||
|
||||
Contents |
||||
======== |
||||
|
||||
|
||||
.. toctree:: |
||||
:maxdepth: 1 |
||||
|
||||
FaceRecognizer API <facerec_api> |
||||
Guide to Face Recognition with OpenCV <facerec_tutorial> |
||||
Tutorial on Gender Classification <tutorial/facerec_gender_classification> |
||||
Tutorial on Face Recognition in Videos <tutorial/facerec_video_recognition> |
||||
Tutorial On Saving & Loading a FaceRecognizer <tutorial/facerec_save_load> |
||||
Changelog <facerec_changelog> |
||||
|
||||
Indices and tables |
||||
================== |
||||
|
||||
* :ref:`genindex` |
||||
* :ref:`modindex` |
||||
* :ref:`search` |
@ -1,233 +0,0 @@ |
||||
Gender Classification with OpenCV |
||||
================================= |
||||
|
||||
.. contents:: Table of Contents |
||||
:depth: 3 |
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
A lot of people interested in face recognition, also want to know how to perform image classification tasks like: |
||||
|
||||
* Gender Classification (Gender Detection) |
||||
* Emotion Classification (Emotion Detection) |
||||
* Glasses Classification (Glasses Detection) |
||||
* ... |
||||
|
||||
This is has become very, very easy with the new :ocv:class:`FaceRecognizer` class. In this tutorial I'll show you how to perform gender classification with OpenCV on a set of face images. You'll also learn how to align your images to enhance the recognition results. If you want to do emotion classification instead of gender classification, all you need to do is to update is your training data and the configuration you pass to the demo. |
||||
|
||||
Prerequisites |
||||
-------------- |
||||
|
||||
For gender classification of faces, you'll need some images of male and female faces first. I've decided to search faces of celebrities using `Google Images <http://www.google.com/images>`_ with the faces filter turned on (my god, they have great algorithms at `Google <http://www.google.com>`_!). My database has 8 male and 5 female subjects, each with 10 images. Here are the names, if you don't know who to search: |
||||
|
||||
* Angelina Jolie |
||||
* Arnold Schwarzenegger |
||||
* Brad Pitt |
||||
* Emma Watson |
||||
* George Clooney |
||||
* Jennifer Lopez |
||||
* Johnny Depp |
||||
* Justin Timberlake |
||||
* Katy Perry |
||||
* Keanu Reeves |
||||
* Naomi Watts |
||||
* Patrick Stewart |
||||
* Tom Cruise |
||||
|
||||
Once you have acquired some images, you'll need to read them. In the demo application I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this: |
||||
|
||||
.. code-block:: none |
||||
|
||||
/path/to/image.ext;0 |
||||
|
||||
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign a label ``0`` to the image. Think of the label as the subject (the person, the gender or whatever comes to your mind). In the gender classification scenario, the label is the gender the person has. I'll give the label ``0`` to *male* persons and the label ``1`` is for *female* subjects. So my CSV file looks like this: |
||||
|
||||
.. code-block:: none |
||||
|
||||
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_01.jpg;0 |
||||
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_02.jpg;0 |
||||
/home/philipp/facerec/data/gender/male/keanu_reeves/keanu_reeves_03.jpg;0 |
||||
... |
||||
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_01.jpg;1 |
||||
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_02.jpg;1 |
||||
/home/philipp/facerec/data/gender/female/katy_perry/katy_perry_03.jpg;1 |
||||
... |
||||
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_01.jpg;0 |
||||
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_02.jpg;0 |
||||
/home/philipp/facerec/data/gender/male/brad_pitt/brad_pitt_03.jpg;0 |
||||
... |
||||
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_08.jpg;1 |
||||
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_02.jpg;1 |
||||
/home/philipp/facerec/data/gender/female/emma_watson/emma_watson_03.jpg;1 |
||||
|
||||
All images for this example were chosen to have a frontal face perspective. They have been cropped, scaled and rotated to be aligned at the eyes, just like this set of George Clooney images: |
||||
|
||||
.. image:: ../img/tutorial/gender_classification/clooney_set.png |
||||
:align: center |
||||
|
||||
You really don't want to create the CSV file by hand. And you really don't want scale, rotate & translate the images manually. I have prepared you two Python scripts ``create_csv.py`` and ``crop_face.py``, you can find them in the ``src`` folder coming with this documentation. You'll see how to use them in the :ref:`appendixfgc`. |
||||
|
||||
Fisherfaces for Gender Classification |
||||
-------------------------------------- |
||||
|
||||
If you want to decide whether a person is *male* or *female*, you have to learn the discriminative features of both classes. The Eigenfaces method is based on the Principal Component Analysis, which is an unsupervised statistical model and not suitable for this task. Please see the Face Recognition tutorial for insights into the algorithms. The Fisherfaces instead yields a class-specific linear projection, so it is much better suited for the gender classification task. `http://www.bytefish.de/blog/gender_classification <http://www.bytefish.de/blog/gender_classification>`_ shows the recognition rate of the Fisherfaces method for gender classification. |
||||
|
||||
The Fisherfaces method achieves a 98% recognition rate in a subject-independent cross-validation. A subject-independent cross-validation means *images of the person under test are never used for learning the model*. And could you believe it: you can simply use the facerec_fisherfaces demo, that's inlcuded in OpenCV. |
||||
|
||||
Fisherfaces in OpenCV |
||||
--------------------- |
||||
|
||||
The source code for this demo application is also available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_fisherfaces.cpp <../src/facerec_fisherfaces.cpp>` |
||||
|
||||
.. literalinclude:: ../src/facerec_fisherfaces.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
Running the Demo |
||||
---------------- |
||||
|
||||
If you are in Windows, then simply start the demo by running (from command line): |
||||
|
||||
.. code-block:: none |
||||
|
||||
facerec_fisherfaces.exe C:/path/to/your/csv.ext |
||||
|
||||
If you are in Linux, then simply start the demo by running: |
||||
|
||||
.. code-block:: none |
||||
|
||||
./facerec_fisherfaces /path/to/your/csv.ext |
||||
|
||||
If you don't want to display the images, but save them, then pass the desired path to the demo. It works like this in Windows: |
||||
|
||||
.. code-block:: none |
||||
|
||||
facerec_fisherfaces.exe C:/path/to/your/csv.ext C:/path/to/store/results/at |
||||
|
||||
And in Linux: |
||||
|
||||
.. code-block:: none |
||||
|
||||
./facerec_fisherfaces /path/to/your/csv.ext /path/to/store/results/at |
||||
|
||||
Results |
||||
------- |
||||
|
||||
If you run the program with your CSV file as parameter, you'll see the Fisherface that separates between male and female images. I've decided to apply a Jet colormap in this demo, so you can see which features the method identifies: |
||||
|
||||
.. image:: ../img/tutorial/gender_classification/fisherface_0.png |
||||
|
||||
The demo also shows the average face of the male and female training images you have passed: |
||||
|
||||
.. image:: ../img/tutorial/gender_classification/mean.png |
||||
|
||||
Moreover it the demo should yield the prediction for the correct gender: |
||||
|
||||
.. code-block:: none |
||||
|
||||
Predicted class = 1 / Actual class = 1. |
||||
|
||||
And for advanced users I have also shown the Eigenvalue for the Fisherface: |
||||
|
||||
.. code-block:: none |
||||
|
||||
Eigenvalue #0 = 152.49493 |
||||
|
||||
And the Fisherfaces reconstruction: |
||||
|
||||
.. image:: ../img/tutorial/gender_classification/fisherface_reconstruction_0.png |
||||
|
||||
I hope this gives you an idea how to approach gender classification and the other image classification tasks. |
||||
|
||||
.. _appendixfgc: |
||||
|
||||
Appendix |
||||
-------- |
||||
|
||||
Creating the CSV File |
||||
+++++++++++++++++++++ |
||||
|
||||
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``): |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data/at$ tree |
||||
. |
||||
|-- s1 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|-- s2 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
... |
||||
|-- s40 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|
||||
|
||||
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output: |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data$ python create_csv.py |
||||
at/s13/2.pgm;0 |
||||
at/s13/7.pgm;0 |
||||
at/s13/6.pgm;0 |
||||
at/s13/9.pgm;0 |
||||
at/s13/5.pgm;0 |
||||
at/s13/3.pgm;0 |
||||
at/s13/4.pgm;0 |
||||
at/s13/10.pgm;0 |
||||
at/s13/8.pgm;0 |
||||
at/s13/1.pgm;0 |
||||
at/s17/2.pgm;1 |
||||
at/s17/7.pgm;1 |
||||
at/s17/6.pgm;1 |
||||
at/s17/9.pgm;1 |
||||
at/s17/5.pgm;1 |
||||
at/s17/3.pgm;1 |
||||
[...] |
||||
|
||||
Here is the script, if you can't find it: |
||||
|
||||
.. literalinclude:: ../src/create_csv.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Aligning Face Images |
||||
++++++++++++++++++++ |
||||
|
||||
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where: |
||||
|
||||
* *eye_left* is the position of the left eye |
||||
* *eye_right* is the position of the right eye |
||||
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction) |
||||
* *dest_sz* is the size of the output image |
||||
|
||||
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes. |
||||
|
||||
.. literalinclude:: ../src/crop_face.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have. |
||||
|
||||
Here are some examples: |
||||
|
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| Configuration | Cropped, Scaled, Rotated Face | |
||||
+=================================+============================================================================+ |
||||
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_10_10_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_30_30_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_70_70.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
@ -1,46 +0,0 @@ |
||||
Saving and Loading a FaceRecognizer |
||||
=================================== |
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
Saving and loading a :ocv:class:`FaceRecognizer` is very important. Training a FaceRecognizer can be a very time-intense task, plus it's often impossible to ship the whole face database to the user of your product. The task of saving and loading a FaceRecognizer is easy with :ocv:class:`FaceRecognizer`. You only have to call :ocv:func:`FaceRecognizer::load` for loading and :ocv:func:`FaceRecognizer::save` for saving a :ocv:class:`FaceRecognizer`. |
||||
|
||||
I'll adapt the Eigenfaces example from the :doc:`../facerec_tutorial`: Imagine we want to learn the Eigenfaces of the `AT&T Facedatabase <http://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html>`_, store the model to a YAML file and then load it again. |
||||
|
||||
From the loaded model, we'll get a prediction, show the mean, Eigenfaces and the image reconstruction. |
||||
|
||||
Using FaceRecognizer::save and FaceRecognizer::load |
||||
----------------------------------------------------- |
||||
|
||||
The source code for this demo application is also available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_save_load.cpp <../src/facerec_save_load.cpp>` |
||||
|
||||
.. literalinclude:: ../src/facerec_save_load.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
Results |
||||
------- |
||||
|
||||
``eigenfaces_at.yml`` then contains the model state, we'll simply look at the first 10 lines with ``head eigenfaces_at.yml``: |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/github/libfacerec-build$ head eigenfaces_at.yml |
||||
%YAML:1.0 |
||||
num_components: 399 |
||||
mean: !!opencv-matrix |
||||
rows: 1 |
||||
cols: 10304 |
||||
dt: d |
||||
data: [ 8.5558897243107765e+01, 8.5511278195488714e+01, |
||||
8.5854636591478695e+01, 8.5796992481203006e+01, |
||||
8.5952380952380949e+01, 8.6162907268170414e+01, |
||||
8.6082706766917283e+01, 8.5776942355889716e+01, |
||||
|
||||
And here is the Reconstruction, which is the same as the original: |
||||
|
||||
.. image:: ../img/eigenface_reconstruction_opencv.png |
||||
:align: center |
@ -1,207 +0,0 @@ |
||||
Face Recognition in Videos with OpenCV |
||||
======================================= |
||||
|
||||
.. contents:: Table of Contents |
||||
:depth: 3 |
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
Whenever you hear the term *face recognition*, you instantly think of surveillance in videos. So performing face recognition in videos (e.g. webcam) is one of the most requested features I have got. I have heard your cries, so here it is. An application, that shows you how to do face recognition in videos! For the face detection part we'll use the awesome :ocv:class:`CascadeClassifier` and we'll use :ocv:class:`FaceRecognizer` for face recognition. This example uses the Fisherfaces method for face recognition, because it is robust against large changes in illumination. |
||||
|
||||
Here is what the final application looks like. As you can see I am only writing the id of the recognized person above the detected face (by the way this id is Arnold Schwarzenegger for my data set): |
||||
|
||||
.. image:: ../img/tutorial/facerec_video/facerec_video.png |
||||
:align: center |
||||
:scale: 70% |
||||
|
||||
This demo is a basis for your research and it shows you how to implement face recognition in videos. You probably want to extend the application and make it more sophisticated: You could combine the id with the name, then show the confidence of the prediction, recognize the emotion... and and and. But before you send mails, asking what these Haar-Cascade thing is or what a CSV is: Make sure you have read the entire tutorial. It's all explained in here. If you just want to scroll down to the code, please note: |
||||
|
||||
* The available Haar-Cascades for face detection are located in the ``data`` folder of your OpenCV installation! One of the available Haar-Cascades for face detection is for example ``/path/to/opencv/data/haarcascades/haarcascade_frontalface_default.xml``. |
||||
|
||||
I encourage you to experiment with the application. Play around with the available :ocv:class:`FaceRecognizer` implementations, try the available cascades in OpenCV and see if you can improve your results! |
||||
|
||||
Prerequisites |
||||
-------------- |
||||
|
||||
You want to do face recognition, so you need some face images to learn a :ocv:class:`FaceRecognizer` on. I have decided to reuse the images from the gender classification example: :doc:`facerec_gender_classification`. |
||||
|
||||
I have the following celebrities in my training data set: |
||||
|
||||
* Angelina Jolie |
||||
* Arnold Schwarzenegger |
||||
* Brad Pitt |
||||
* George Clooney |
||||
* Johnny Depp |
||||
* Justin Timberlake |
||||
* Katy Perry |
||||
* Keanu Reeves |
||||
* Patrick Stewart |
||||
* Tom Cruise |
||||
|
||||
In the demo I have decided to read the images from a very simple CSV file. Why? Because it's the simplest platform-independent approach I can think of. However, if you know a simpler solution please ping me about it. Basically all the CSV file needs to contain are lines composed of a ``filename`` followed by a ``;`` followed by the ``label`` (as *integer number*), making up a line like this: |
||||
|
||||
.. code-block:: none |
||||
|
||||
/path/to/image.ext;0 |
||||
|
||||
Let's dissect the line. ``/path/to/image.ext`` is the path to an image, probably something like this if you are in Windows: ``C:/faces/person0/image0.jpg``. Then there is the separator ``;`` and finally we assign a label ``0`` to the image. Think of the label as the subject (the person, the gender or whatever comes to your mind). In the face recognition scenario, the label is the person this image belongs to. In the gender classification scenario, the label is the gender the person has. So my CSV file looks like this: |
||||
|
||||
.. code-block:: none |
||||
|
||||
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_01.jpg;0 |
||||
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_02.jpg;0 |
||||
/home/philipp/facerec/data/c/keanu_reeves/keanu_reeves_03.jpg;0 |
||||
... |
||||
/home/philipp/facerec/data/c/katy_perry/katy_perry_01.jpg;1 |
||||
/home/philipp/facerec/data/c/katy_perry/katy_perry_02.jpg;1 |
||||
/home/philipp/facerec/data/c/katy_perry/katy_perry_03.jpg;1 |
||||
... |
||||
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_01.jpg;2 |
||||
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_02.jpg;2 |
||||
/home/philipp/facerec/data/c/brad_pitt/brad_pitt_03.jpg;2 |
||||
... |
||||
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_08.jpg;6 |
||||
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_05.jpg;6 |
||||
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_02.jpg;6 |
||||
/home/philipp/facerec/data/c1/crop_arnold_schwarzenegger/crop_03.jpg;6 |
||||
|
||||
All images for this example were chosen to have a frontal face perspective. They have been cropped, scaled and rotated to be aligned at the eyes, just like this set of George Clooney images: |
||||
|
||||
.. image:: ../img/tutorial/gender_classification/clooney_set.png |
||||
:align: center |
||||
|
||||
Face Recongition from Videos |
||||
----------------------------- |
||||
|
||||
The source code for the demo is available in the ``src`` folder coming with this documentation: |
||||
|
||||
* :download:`src/facerec_video.cpp <../src/facerec_video.cpp>` |
||||
|
||||
This demo uses the :ocv:class:`CascadeClassifier`: |
||||
|
||||
.. literalinclude:: ../src/facerec_video.cpp |
||||
:language: cpp |
||||
:linenos: |
||||
|
||||
Running the Demo |
||||
---------------- |
||||
|
||||
You'll need: |
||||
|
||||
* The path to a valid Haar-Cascade for detecting a face with a :ocv:class:`CascadeClassifier`. |
||||
* The path to a valid CSV File for learning a :ocv:class:`FaceRecognizer`. |
||||
* A webcam and its device id (you don't know the device id? Simply start from 0 on and see what happens). |
||||
|
||||
If you are in Windows, then simply start the demo by running (from command line): |
||||
|
||||
.. code-block:: none |
||||
|
||||
facerec_video.exe <C:/path/to/your/haar_cascade.xml> <C:/path/to/your/csv.ext> <video device> |
||||
|
||||
If you are in Linux, then simply start the demo by running: |
||||
|
||||
.. code-block:: none |
||||
|
||||
./facerec_video </path/to/your/haar_cascade.xml> </path/to/your/csv.ext> <video device> |
||||
|
||||
An example. If the haar-cascade is at ``C:/opencv/data/haarcascades/haarcascade_frontalface_default.xml``, the CSV file is at ``C:/facerec/data/celebrities.txt`` and I have a webcam with deviceId ``1``, then I would call the demo with: |
||||
|
||||
.. code-block:: none |
||||
|
||||
facerec_video.exe C:/opencv/data/haarcascades/haarcascade_frontalface_default.xml C:/facerec/data/celebrities.txt 1 |
||||
|
||||
That's it. |
||||
|
||||
Results |
||||
------- |
||||
|
||||
Enjoy! |
||||
|
||||
Appendix |
||||
-------- |
||||
|
||||
Creating the CSV File |
||||
+++++++++++++++++++++ |
||||
|
||||
You don't really want to create the CSV file by hand. I have prepared you a little Python script ``create_csv.py`` (you find it at ``/src/create_csv.py`` coming with this tutorial) that automatically creates you a CSV file. If you have your images in hierarchie like this (``/basepath/<subject>/<image.ext>``): |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data/at$ tree |
||||
. |
||||
|-- s1 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|-- s2 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
... |
||||
|-- s40 |
||||
| |-- 1.pgm |
||||
| |-- ... |
||||
| |-- 10.pgm |
||||
|
||||
|
||||
Then simply call ``create_csv.py`` with the path to the folder, just like this and you could save the output: |
||||
|
||||
.. code-block:: none |
||||
|
||||
philipp@mango:~/facerec/data$ python create_csv.py |
||||
at/s13/2.pgm;0 |
||||
at/s13/7.pgm;0 |
||||
at/s13/6.pgm;0 |
||||
at/s13/9.pgm;0 |
||||
at/s13/5.pgm;0 |
||||
at/s13/3.pgm;0 |
||||
at/s13/4.pgm;0 |
||||
at/s13/10.pgm;0 |
||||
at/s13/8.pgm;0 |
||||
at/s13/1.pgm;0 |
||||
at/s17/2.pgm;1 |
||||
at/s17/7.pgm;1 |
||||
at/s17/6.pgm;1 |
||||
at/s17/9.pgm;1 |
||||
at/s17/5.pgm;1 |
||||
at/s17/3.pgm;1 |
||||
[...] |
||||
|
||||
Here is the script, if you can't find it: |
||||
|
||||
.. literalinclude:: ../src/create_csv.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Aligning Face Images |
||||
++++++++++++++++++++ |
||||
|
||||
An accurate alignment of your image data is especially important in tasks like emotion detection, were you need as much detail as possible. Believe me... You don't want to do this by hand. So I've prepared you a tiny Python script. The code is really easy to use. To scale, rotate and crop the face image you just need to call *CropFace(image, eye_left, eye_right, offset_pct, dest_sz)*, where: |
||||
|
||||
* *eye_left* is the position of the left eye |
||||
* *eye_right* is the position of the right eye |
||||
* *offset_pct* is the percent of the image you want to keep next to the eyes (horizontal, vertical direction) |
||||
* *dest_sz* is the size of the output image |
||||
|
||||
If you are using the same *offset_pct* and *dest_sz* for your images, they are all aligned at the eyes. |
||||
|
||||
.. literalinclude:: ../src/crop_face.py |
||||
:language: python |
||||
:linenos: |
||||
|
||||
Imagine we are given `this photo of Arnold Schwarzenegger <http://en.wikipedia.org/wiki/File:Arnold_Schwarzenegger_edit%28ws%29.jpg>`_, which is under a Public Domain license. The (x,y)-position of the eyes is approximately *(252,364)* for the left and *(420,366)* for the right eye. Now you only need to define the horizontal offset, vertical offset and the size your scaled, rotated & cropped face should have. |
||||
|
||||
Here are some examples: |
||||
|
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| Configuration | Cropped, Scaled, Rotated Face | |
||||
+=================================+============================================================================+ |
||||
| 0.1 (10%), 0.1 (10%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_10_10_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.3 (30%), 0.3 (30%), (200,200) | .. image:: ../img/tutorial/gender_classification/arnie_30_30_200_200.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
||||
| 0.2 (20%), 0.2 (20%), (70,70) | .. image:: ../img/tutorial/gender_classification/arnie_20_20_70_70.jpg | |
||||
+---------------------------------+----------------------------------------------------------------------------+ |
@ -0,0 +1,400 @@ |
||||
/home/philipp/facerec/data/at/s13/2.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/7.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/6.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/9.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/5.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/3.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/4.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/10.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/8.pgm;12 |
||||
/home/philipp/facerec/data/at/s13/1.pgm;12 |
||||
/home/philipp/facerec/data/at/s17/2.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/7.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/6.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/9.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/5.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/3.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/4.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/10.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/8.pgm;16 |
||||
/home/philipp/facerec/data/at/s17/1.pgm;16 |
||||
/home/philipp/facerec/data/at/s32/2.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/7.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/6.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/9.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/5.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/3.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/4.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/10.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/8.pgm;31 |
||||
/home/philipp/facerec/data/at/s32/1.pgm;31 |
||||
/home/philipp/facerec/data/at/s10/2.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/7.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/6.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/9.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/5.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/3.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/4.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/10.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/8.pgm;9 |
||||
/home/philipp/facerec/data/at/s10/1.pgm;9 |
||||
/home/philipp/facerec/data/at/s27/2.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/7.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/6.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/9.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/5.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/3.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/4.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/10.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/8.pgm;26 |
||||
/home/philipp/facerec/data/at/s27/1.pgm;26 |
||||
/home/philipp/facerec/data/at/s5/2.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/7.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/6.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/9.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/5.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/3.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/4.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/10.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/8.pgm;4 |
||||
/home/philipp/facerec/data/at/s5/1.pgm;4 |
||||
/home/philipp/facerec/data/at/s20/2.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/7.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/6.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/9.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/5.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/3.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/4.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/10.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/8.pgm;19 |
||||
/home/philipp/facerec/data/at/s20/1.pgm;19 |
||||
/home/philipp/facerec/data/at/s30/2.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/7.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/6.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/9.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/5.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/3.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/4.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/10.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/8.pgm;29 |
||||
/home/philipp/facerec/data/at/s30/1.pgm;29 |
||||
/home/philipp/facerec/data/at/s39/2.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/7.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/6.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/9.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/5.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/3.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/4.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/10.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/8.pgm;38 |
||||
/home/philipp/facerec/data/at/s39/1.pgm;38 |
||||
/home/philipp/facerec/data/at/s35/2.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/7.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/6.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/9.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/5.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/3.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/4.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/10.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/8.pgm;34 |
||||
/home/philipp/facerec/data/at/s35/1.pgm;34 |
||||
/home/philipp/facerec/data/at/s23/2.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/7.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/6.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/9.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/5.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/3.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/4.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/10.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/8.pgm;22 |
||||
/home/philipp/facerec/data/at/s23/1.pgm;22 |
||||
/home/philipp/facerec/data/at/s4/2.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/7.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/6.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/9.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/5.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/3.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/4.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/10.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/8.pgm;3 |
||||
/home/philipp/facerec/data/at/s4/1.pgm;3 |
||||
/home/philipp/facerec/data/at/s9/2.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/7.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/6.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/9.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/5.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/3.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/4.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/10.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/8.pgm;8 |
||||
/home/philipp/facerec/data/at/s9/1.pgm;8 |
||||
/home/philipp/facerec/data/at/s37/2.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/7.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/6.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/9.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/5.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/3.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/4.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/10.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/8.pgm;36 |
||||
/home/philipp/facerec/data/at/s37/1.pgm;36 |
||||
/home/philipp/facerec/data/at/s24/2.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/7.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/6.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/9.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/5.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/3.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/4.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/10.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/8.pgm;23 |
||||
/home/philipp/facerec/data/at/s24/1.pgm;23 |
||||
/home/philipp/facerec/data/at/s19/2.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/7.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/6.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/9.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/5.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/3.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/4.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/10.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/8.pgm;18 |
||||
/home/philipp/facerec/data/at/s19/1.pgm;18 |
||||
/home/philipp/facerec/data/at/s8/2.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/7.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/6.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/9.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/5.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/3.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/4.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/10.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/8.pgm;7 |
||||
/home/philipp/facerec/data/at/s8/1.pgm;7 |
||||
/home/philipp/facerec/data/at/s21/2.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/7.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/6.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/9.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/5.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/3.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/4.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/10.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/8.pgm;20 |
||||
/home/philipp/facerec/data/at/s21/1.pgm;20 |
||||
/home/philipp/facerec/data/at/s1/2.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/7.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/6.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/9.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/5.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/3.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/4.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/10.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/8.pgm;0 |
||||
/home/philipp/facerec/data/at/s1/1.pgm;0 |
||||
/home/philipp/facerec/data/at/s7/2.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/7.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/6.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/9.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/5.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/3.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/4.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/10.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/8.pgm;6 |
||||
/home/philipp/facerec/data/at/s7/1.pgm;6 |
||||
/home/philipp/facerec/data/at/s16/2.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/7.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/6.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/9.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/5.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/3.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/4.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/10.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/8.pgm;15 |
||||
/home/philipp/facerec/data/at/s16/1.pgm;15 |
||||
/home/philipp/facerec/data/at/s36/2.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/7.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/6.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/9.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/5.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/3.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/4.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/10.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/8.pgm;35 |
||||
/home/philipp/facerec/data/at/s36/1.pgm;35 |
||||
/home/philipp/facerec/data/at/s25/2.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/7.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/6.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/9.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/5.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/3.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/4.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/10.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/8.pgm;24 |
||||
/home/philipp/facerec/data/at/s25/1.pgm;24 |
||||
/home/philipp/facerec/data/at/s14/2.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/7.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/6.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/9.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/5.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/3.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/4.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/10.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/8.pgm;13 |
||||
/home/philipp/facerec/data/at/s14/1.pgm;13 |
||||
/home/philipp/facerec/data/at/s34/2.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/7.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/6.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/9.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/5.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/3.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/4.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/10.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/8.pgm;33 |
||||
/home/philipp/facerec/data/at/s34/1.pgm;33 |
||||
/home/philipp/facerec/data/at/s11/2.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/7.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/6.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/9.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/5.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/3.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/4.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/10.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/8.pgm;10 |
||||
/home/philipp/facerec/data/at/s11/1.pgm;10 |
||||
/home/philipp/facerec/data/at/s26/2.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/7.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/6.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/9.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/5.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/3.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/4.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/10.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/8.pgm;25 |
||||
/home/philipp/facerec/data/at/s26/1.pgm;25 |
||||
/home/philipp/facerec/data/at/s18/2.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/7.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/6.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/9.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/5.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/3.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/4.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/10.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/8.pgm;17 |
||||
/home/philipp/facerec/data/at/s18/1.pgm;17 |
||||
/home/philipp/facerec/data/at/s29/2.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/7.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/6.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/9.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/5.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/3.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/4.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/10.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/8.pgm;28 |
||||
/home/philipp/facerec/data/at/s29/1.pgm;28 |
||||
/home/philipp/facerec/data/at/s33/2.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/7.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/6.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/9.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/5.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/3.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/4.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/10.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/8.pgm;32 |
||||
/home/philipp/facerec/data/at/s33/1.pgm;32 |
||||
/home/philipp/facerec/data/at/s12/2.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/7.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/6.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/9.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/5.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/3.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/4.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/10.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/8.pgm;11 |
||||
/home/philipp/facerec/data/at/s12/1.pgm;11 |
||||
/home/philipp/facerec/data/at/s6/2.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/7.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/6.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/9.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/5.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/3.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/4.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/10.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/8.pgm;5 |
||||
/home/philipp/facerec/data/at/s6/1.pgm;5 |
||||
/home/philipp/facerec/data/at/s22/2.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/7.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/6.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/9.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/5.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/3.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/4.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/10.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/8.pgm;21 |
||||
/home/philipp/facerec/data/at/s22/1.pgm;21 |
||||
/home/philipp/facerec/data/at/s15/2.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/7.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/6.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/9.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/5.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/3.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/4.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/10.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/8.pgm;14 |
||||
/home/philipp/facerec/data/at/s15/1.pgm;14 |
||||
/home/philipp/facerec/data/at/s2/2.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/7.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/6.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/9.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/5.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/3.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/4.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/10.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/8.pgm;1 |
||||
/home/philipp/facerec/data/at/s2/1.pgm;1 |
||||
/home/philipp/facerec/data/at/s31/2.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/7.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/6.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/9.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/5.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/3.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/4.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/10.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/8.pgm;30 |
||||
/home/philipp/facerec/data/at/s31/1.pgm;30 |
||||
/home/philipp/facerec/data/at/s28/2.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/7.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/6.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/9.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/5.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/3.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/4.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/10.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/8.pgm;27 |
||||
/home/philipp/facerec/data/at/s28/1.pgm;27 |
||||
/home/philipp/facerec/data/at/s40/2.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/7.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/6.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/9.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/5.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/3.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/4.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/10.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/8.pgm;39 |
||||
/home/philipp/facerec/data/at/s40/1.pgm;39 |
||||
/home/philipp/facerec/data/at/s3/2.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/7.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/6.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/9.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/5.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/3.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/4.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/10.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/8.pgm;2 |
||||
/home/philipp/facerec/data/at/s3/1.pgm;2 |
||||
/home/philipp/facerec/data/at/s38/2.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/7.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/6.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/9.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/5.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/3.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/4.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/10.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/8.pgm;37 |
||||
/home/philipp/facerec/data/at/s38/1.pgm;37 |
@ -0,0 +1,169 @@ |
||||
/*
|
||||
* Copyright (c) 2011. Philipp Wagner <bytefish[at]gmx[dot]de>. |
||||
* Released to public domain under terms of the BSD Simplified license. |
||||
* |
||||
* Redistribution and use in source and binary forms, with or without |
||||
* modification, are permitted provided that the following conditions are met: |
||||
* * Redistributions of source code must retain the above copyright |
||||
* notice, this list of conditions and the following disclaimer. |
||||
* * Redistributions in binary form must reproduce the above copyright |
||||
* notice, this list of conditions and the following disclaimer in the |
||||
* documentation and/or other materials provided with the distribution. |
||||
* * Neither the name of the organization nor the names of its contributors |
||||
* may be used to endorse or promote products derived from this software |
||||
* without specific prior written permission. |
||||
* |
||||
* See <http://www.opensource.org/licenses/bsd-license>
|
||||
*/ |
||||
|
||||
#include "opencv2/core.hpp" |
||||
#include "opencv2/face.hpp" |
||||
#include "opencv2/highgui.hpp" |
||||
|
||||
#include <iostream> |
||||
#include <fstream> |
||||
#include <sstream> |
||||
|
||||
using namespace cv; |
||||
using namespace cv::face; |
||||
using namespace std; |
||||
|
||||
static Mat norm_0_255(InputArray _src) { |
||||
Mat src = _src.getMat(); |
||||
// Create and return normalized image:
|
||||
Mat dst; |
||||
switch(src.channels()) { |
||||
case 1: |
||||
cv::normalize(_src, dst, 0, 255, NORM_MINMAX, CV_8UC1); |
||||
break; |
||||
case 3: |
||||
cv::normalize(_src, dst, 0, 255, NORM_MINMAX, CV_8UC3); |
||||
break; |
||||
default: |
||||
src.copyTo(dst); |
||||
break; |
||||
} |
||||
return dst; |
||||
} |
||||
|
||||
static void read_csv(const string& filename, vector<Mat>& images, vector<int>& labels, char separator = ';') { |
||||
std::ifstream file(filename.c_str(), ifstream::in); |
||||
if (!file) { |
||||
string error_message = "No valid input file was given, please check the given filename."; |
||||
CV_Error(CV_StsBadArg, error_message); |
||||
} |
||||
string line, path, classlabel; |
||||
while (getline(file, line)) { |
||||
stringstream liness(line); |
||||
getline(liness, path, separator); |
||||
getline(liness, classlabel); |
||||
if(!path.empty() && !classlabel.empty()) { |
||||
images.push_back(imread(path, 0)); |
||||
labels.push_back(atoi(classlabel.c_str())); |
||||
} |
||||
} |
||||
} |
||||
|
||||
int main(int argc, const char *argv[]) { |
||||
// Check for valid command line arguments, print usage
|
||||
// if no arguments were given.
|
||||
if (argc != 2) { |
||||
cout << "usage: " << argv[0] << " <csv.ext>" << endl; |
||||
exit(1); |
||||
} |
||||
// Get the path to your CSV.
|
||||
string fn_csv = string(argv[1]); |
||||
// These vectors hold the images and corresponding labels.
|
||||
vector<Mat> images; |
||||
vector<int> labels; |
||||
// Read in the data. This can fail if no valid
|
||||
// input filename is given.
|
||||
try { |
||||
read_csv(fn_csv, images, labels); |
||||
} catch (cv::Exception& e) { |
||||
cerr << "Error opening file \"" << fn_csv << "\". Reason: " << e.msg << endl; |
||||
// nothing more we can do
|
||||
exit(1); |
||||
} |
||||
// Quit if there are not enough images for this demo.
|
||||
if(images.size() <= 1) { |
||||
string error_message = "This demo needs at least 2 images to work. Please add more images to your data set!"; |
||||
CV_Error(CV_StsError, error_message); |
||||
} |
||||
// Get the height from the first image. We'll need this
|
||||
// later in code to reshape the images to their original
|
||||
// size:
|
||||
int height = images[0].rows; |
||||
// The following lines simply get the last images from
|
||||
// your dataset and remove it from the vector. This is
|
||||
// done, so that the training data (which we learn the
|
||||
// cv::FaceRecognizer on) and the test data we test
|
||||
// the model with, do not overlap.
|
||||
Mat testSample = images[images.size() - 1]; |
||||
int testLabel = labels[labels.size() - 1]; |
||||
images.pop_back(); |
||||
labels.pop_back(); |
||||
// The following lines create an Eigenfaces model for
|
||||
// face recognition and train it with the images and
|
||||
// labels read from the given CSV file.
|
||||
// This here is a full PCA, if you just want to keep
|
||||
// 10 principal components (read Eigenfaces), then call
|
||||
// the factory method like this:
|
||||
//
|
||||
// cv::createEigenFaceRecognizer(10);
|
||||
//
|
||||
// If you want to create a FaceRecognizer with a
|
||||
// confidennce threshold, call it with:
|
||||
//
|
||||
// cv::createEigenFaceRecognizer(10, 123.0);
|
||||
//
|
||||
Ptr<FaceRecognizer> model = createFisherFaceRecognizer(); |
||||
model->train(images, labels); |
||||
// The following line predicts the label of a given
|
||||
// test image:
|
||||
int predictedLabel = model->predict(testSample); |
||||
//
|
||||
// To get the confidence of a prediction call the model with:
|
||||
//
|
||||
// int predictedLabel = -1;
|
||||
// double confidence = 0.0;
|
||||
// model->predict(testSample, predictedLabel, confidence);
|
||||
//
|
||||
string result_message = format("Predicted class = %d / Actual class = %d.", predictedLabel, testLabel); |
||||
cout << result_message << endl; |
||||
// Sometimes you'll need to get/set internal model data,
|
||||
// which isn't exposed by the public cv::FaceRecognizer.
|
||||
// Since each cv::FaceRecognizer is derived from a
|
||||
// cv::Algorithm, you can query the data.
|
||||
//
|
||||
// First we'll use it to set the threshold of the FaceRecognizer
|
||||
// to 0.0 without retraining the model. This can be useful if
|
||||
// you are evaluating the model:
|
||||
//
|
||||
model->set("threshold", 0.0); |
||||
// Now the threshold of this model is set to 0.0. A prediction
|
||||
// now returns -1, as it's impossible to have a distance below
|
||||
// it
|
||||
predictedLabel = model->predict(testSample); |
||||
cout << "Predicted class = " << predictedLabel << endl; |
||||
// Here is how to get the eigenvalues of this Eigenfaces model:
|
||||
Mat eigenvalues = model->getMat("eigenvalues"); |
||||
// And we can do the same to display the Eigenvectors (read Eigenfaces):
|
||||
Mat W = model->getMat("eigenvectors"); |
||||
// From this we will display the (at most) first 10 Eigenfaces:
|
||||
for (int i = 0; i < min(10, W.cols); i++) { |
||||
string msg = format("Eigenvalue #%d = %.5f", i, eigenvalues.at<double>(i)); |
||||
cout << msg << endl; |
||||
// get eigenvector #i
|
||||
Mat ev = W.col(i).clone(); |
||||
// Reshape to original size & normalize to [0...255] for imshow.
|
||||
Mat grayscale = norm_0_255(ev.reshape(1, height)); |
||||
// Show the image & apply a Jet colormap for better sensing.
|
||||
Mat cgrayscale; |
||||
applyColorMap(grayscale, cgrayscale, COLORMAP_JET); |
||||
imshow(format("%d", i), cgrayscale); |
||||
} |
||||
waitKey(0); |
||||
|
||||
return 0; |
||||
} |
Before Width: | Height: | Size: 290 KiB After Width: | Height: | Size: 290 KiB |
Before Width: | Height: | Size: 5.4 KiB After Width: | Height: | Size: 5.4 KiB |
Before Width: | Height: | Size: 6.2 KiB After Width: | Height: | Size: 6.2 KiB |
Before Width: | Height: | Size: 1.8 KiB After Width: | Height: | Size: 1.8 KiB |
Before Width: | Height: | Size: 7.1 KiB After Width: | Height: | Size: 7.1 KiB |
Before Width: | Height: | Size: 92 KiB After Width: | Height: | Size: 92 KiB |
Before Width: | Height: | Size: 36 KiB After Width: | Height: | Size: 36 KiB |
Before Width: | Height: | Size: 10 KiB After Width: | Height: | Size: 10 KiB |
Before Width: | Height: | Size: 9.9 KiB After Width: | Height: | Size: 9.9 KiB |
Before Width: | Height: | Size: 33 KiB After Width: | Height: | Size: 33 KiB |
Before Width: | Height: | Size: 171 KiB After Width: | Height: | Size: 171 KiB |
Before Width: | Height: | Size: 108 KiB After Width: | Height: | Size: 108 KiB |
Before Width: | Height: | Size: 111 KiB After Width: | Height: | Size: 111 KiB |
Before Width: | Height: | Size: 281 KiB After Width: | Height: | Size: 281 KiB |
Before Width: | Height: | Size: 15 KiB After Width: | Height: | Size: 15 KiB |
Before Width: | Height: | Size: 83 KiB After Width: | Height: | Size: 83 KiB |
Before Width: | Height: | Size: 18 KiB After Width: | Height: | Size: 18 KiB |