parent
812ce48c36
commit
4ccbd44559
8 changed files with 857 additions and 1 deletions
@ -0,0 +1,110 @@ |
||||
Features2d {#tutorial_ug_features2d} |
||||
========== |
||||
|
||||
Detectors |
||||
--------- |
||||
|
||||
Descriptors |
||||
----------- |
||||
|
||||
Matching keypoints |
||||
------------------ |
||||
|
||||
### The code |
||||
|
||||
We will start with a short sample \`opencv/samples/cpp/matcher_simple.cpp\`: |
||||
|
||||
@code{.cpp} |
||||
Mat img1 = imread(argv[1], IMREAD_GRAYSCALE); |
||||
Mat img2 = imread(argv[2], IMREAD_GRAYSCALE); |
||||
if(img1.empty() || img2.empty()) |
||||
{ |
||||
printf("Can't read one of the images\n"); |
||||
return -1; |
||||
} |
||||
|
||||
// detecting keypoints |
||||
SurfFeatureDetector detector(400); |
||||
vector<KeyPoint> keypoints1, keypoints2; |
||||
detector.detect(img1, keypoints1); |
||||
detector.detect(img2, keypoints2); |
||||
|
||||
// computing descriptors |
||||
SurfDescriptorExtractor extractor; |
||||
Mat descriptors1, descriptors2; |
||||
extractor.compute(img1, keypoints1, descriptors1); |
||||
extractor.compute(img2, keypoints2, descriptors2); |
||||
|
||||
// matching descriptors |
||||
BruteForceMatcher<L2<float> > matcher; |
||||
vector<DMatch> matches; |
||||
matcher.match(descriptors1, descriptors2, matches); |
||||
|
||||
// drawing the results |
||||
namedWindow("matches", 1); |
||||
Mat img_matches; |
||||
drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches); |
||||
imshow("matches", img_matches); |
||||
waitKey(0); |
||||
@endcode |
||||
|
||||
### The code explained |
||||
|
||||
Let us break the code down. |
||||
@code{.cpp} |
||||
Mat img1 = imread(argv[1], IMREAD_GRAYSCALE); |
||||
Mat img2 = imread(argv[2], IMREAD_GRAYSCALE); |
||||
if(img1.empty() || img2.empty()) |
||||
{ |
||||
printf("Can't read one of the images\n"); |
||||
return -1; |
||||
} |
||||
@endcode |
||||
We load two images and check if they are loaded correctly. |
||||
@code{.cpp} |
||||
// detecting keypoints |
||||
Ptr<FeatureDetector> detector = FastFeatureDetector::create(15); |
||||
vector<KeyPoint> keypoints1, keypoints2; |
||||
detector->detect(img1, keypoints1); |
||||
detector->detect(img2, keypoints2); |
||||
@endcode |
||||
First, we create an instance of a keypoint detector. All detectors inherit the abstract |
||||
FeatureDetector interface, but the constructors are algorithm-dependent. The first argument to each |
||||
detector usually controls the balance between the amount of keypoints and their stability. The range |
||||
of values is different for different detectors (For instance, *FAST* threshold has the meaning of |
||||
pixel intensity difference and usually varies in the region *[0,40]*. *SURF* threshold is applied to |
||||
a Hessian of an image and usually takes on values larger than *100*), so use defaults in case of |
||||
doubt. |
||||
@code{.cpp} |
||||
// computing descriptors |
||||
Ptr<SURF> extractor = SURF::create(); |
||||
Mat descriptors1, descriptors2; |
||||
extractor->compute(img1, keypoints1, descriptors1); |
||||
extractor->compute(img2, keypoints2, descriptors2); |
||||
@endcode |
||||
We create an instance of descriptor extractor. The most of OpenCV descriptors inherit |
||||
DescriptorExtractor abstract interface. Then we compute descriptors for each of the keypoints. The |
||||
output Mat of the DescriptorExtractor::compute method contains a descriptor in a row *i* for each |
||||
*i*-th keypoint. Note that the method can modify the keypoints vector by removing the keypoints such |
||||
that a descriptor for them is not defined (usually these are the keypoints near image border). The |
||||
method makes sure that the ouptut keypoints and descriptors are consistent with each other (so that |
||||
the number of keypoints is equal to the descriptors row count). : |
||||
@code{.cpp} |
||||
// matching descriptors |
||||
BruteForceMatcher<L2<float> > matcher; |
||||
vector<DMatch> matches; |
||||
matcher.match(descriptors1, descriptors2, matches); |
||||
@endcode |
||||
Now that we have descriptors for both images, we can match them. First, we create a matcher that for |
||||
each descriptor from image 2 does exhaustive search for the nearest descriptor in image 1 using |
||||
Euclidean metric. Manhattan distance is also implemented as well as a Hamming distance for Brief |
||||
descriptor. The output vector matches contains pairs of corresponding points indices. : |
||||
@code{.cpp} |
||||
// drawing the results |
||||
namedWindow("matches", 1); |
||||
Mat img_matches; |
||||
drawMatches(img1, keypoints1, img2, keypoints2, matches, img_matches); |
||||
imshow("matches", img_matches); |
||||
waitKey(0); |
||||
@endcode |
||||
The final part of the sample is about visualizing the matching results. |
@ -0,0 +1,141 @@ |
||||
HighGUI {#tutorial_ug_highgui} |
||||
======= |
||||
|
||||
Using Kinect and other OpenNI compatible depth sensors |
||||
------------------------------------------------------ |
||||
|
||||
Depth sensors compatible with OpenNI (Kinect, XtionPRO, ...) are supported through VideoCapture |
||||
class. Depth map, RGB image and some other formats of output can be retrieved by using familiar |
||||
interface of VideoCapture. |
||||
|
||||
In order to use depth sensor with OpenCV you should do the following preliminary steps: |
||||
|
||||
-# Install OpenNI library (from here <http://www.openni.org/downloadfiles>) and PrimeSensor Module |
||||
for OpenNI (from here <https://github.com/avin2/SensorKinect>). The installation should be done |
||||
to default folders listed in the instructions of these products, e.g.: |
||||
@code{.text} |
||||
OpenNI: |
||||
Linux & MacOSX: |
||||
Libs into: /usr/lib |
||||
Includes into: /usr/include/ni |
||||
Windows: |
||||
Libs into: c:/Program Files/OpenNI/Lib |
||||
Includes into: c:/Program Files/OpenNI/Include |
||||
PrimeSensor Module: |
||||
Linux & MacOSX: |
||||
Bins into: /usr/bin |
||||
Windows: |
||||
Bins into: c:/Program Files/Prime Sense/Sensor/Bin |
||||
@endcode |
||||
If one or both products were installed to the other folders, the user should change |
||||
corresponding CMake variables OPENNI_LIB_DIR, OPENNI_INCLUDE_DIR or/and |
||||
OPENNI_PRIME_SENSOR_MODULE_BIN_DIR. |
||||
|
||||
-# Configure OpenCV with OpenNI support by setting WITH_OPENNI flag in CMake. If OpenNI is found |
||||
in install folders OpenCV will be built with OpenNI library (see a status OpenNI in CMake log) |
||||
whereas PrimeSensor Modules can not be found (see a status OpenNI PrimeSensor Modules in CMake |
||||
log). Without PrimeSensor module OpenCV will be successfully compiled with OpenNI library, but |
||||
VideoCapture object will not grab data from Kinect sensor. |
||||
|
||||
-# Build OpenCV. |
||||
|
||||
VideoCapture can retrieve the following data: |
||||
|
||||
-# data given from depth generator: |
||||
- CAP_OPENNI_DEPTH_MAP - depth values in mm (CV_16UC1) |
||||
- CAP_OPENNI_POINT_CLOUD_MAP - XYZ in meters (CV_32FC3) |
||||
- CAP_OPENNI_DISPARITY_MAP - disparity in pixels (CV_8UC1) |
||||
- CAP_OPENNI_DISPARITY_MAP_32F - disparity in pixels (CV_32FC1) |
||||
- CAP_OPENNI_VALID_DEPTH_MASK - mask of valid pixels (not ocluded, not shaded etc.) |
||||
(CV_8UC1) |
||||
|
||||
-# data given from RGB image generator: |
||||
- CAP_OPENNI_BGR_IMAGE - color image (CV_8UC3) |
||||
- CAP_OPENNI_GRAY_IMAGE - gray image (CV_8UC1) |
||||
|
||||
In order to get depth map from depth sensor use VideoCapture::operator \>\>, e. g. : |
||||
@code{.cpp} |
||||
VideoCapture capture( CAP_OPENNI ); |
||||
for(;;) |
||||
{ |
||||
Mat depthMap; |
||||
capture >> depthMap; |
||||
|
||||
if( waitKey( 30 ) >= 0 ) |
||||
break; |
||||
} |
||||
@endcode |
||||
For getting several data maps use VideoCapture::grab and VideoCapture::retrieve, e.g. : |
||||
@code{.cpp} |
||||
VideoCapture capture(0); // or CAP_OPENNI |
||||
for(;;) |
||||
{ |
||||
Mat depthMap; |
||||
Mat bgrImage; |
||||
|
||||
capture.grab(); |
||||
|
||||
capture.retrieve( depthMap, CAP_OPENNI_DEPTH_MAP ); |
||||
capture.retrieve( bgrImage, CAP_OPENNI_BGR_IMAGE ); |
||||
|
||||
if( waitKey( 30 ) >= 0 ) |
||||
break; |
||||
} |
||||
@endcode |
||||
For setting and getting some property of sensor\` data generators use VideoCapture::set and |
||||
VideoCapture::get methods respectively, e.g. : |
||||
@code{.cpp} |
||||
VideoCapture capture( CAP_OPENNI ); |
||||
capture.set( CAP_OPENNI_IMAGE_GENERATOR_OUTPUT_MODE, CAP_OPENNI_VGA_30HZ ); |
||||
cout << "FPS " << capture.get( CAP_OPENNI_IMAGE_GENERATOR+CAP_PROP_FPS ) << endl; |
||||
@endcode |
||||
Since two types of sensor's data generators are supported (image generator and depth generator), |
||||
there are two flags that should be used to set/get property of the needed generator: |
||||
|
||||
- CAP_OPENNI_IMAGE_GENERATOR -- A flag for access to the image generator properties. |
||||
- CAP_OPENNI_DEPTH_GENERATOR -- A flag for access to the depth generator properties. This flag |
||||
value is assumed by default if neither of the two possible values of the property is not set. |
||||
|
||||
Some depth sensors (for example XtionPRO) do not have image generator. In order to check it you can |
||||
get CAP_OPENNI_IMAGE_GENERATOR_PRESENT property. |
||||
@code{.cpp} |
||||
bool isImageGeneratorPresent = capture.get( CAP_PROP_OPENNI_IMAGE_GENERATOR_PRESENT ) != 0; // or == 1 |
||||
@endcode |
||||
Flags specifing the needed generator type must be used in combination with particular generator |
||||
property. The following properties of cameras available through OpenNI interfaces are supported: |
||||
|
||||
- For image generator: |
||||
|
||||
- CAP_PROP_OPENNI_OUTPUT_MODE -- Three output modes are supported: CAP_OPENNI_VGA_30HZ |
||||
used by default (image generator returns images in VGA resolution with 30 FPS), |
||||
CAP_OPENNI_SXGA_15HZ (image generator returns images in SXGA resolution with 15 FPS) and |
||||
CAP_OPENNI_SXGA_30HZ (image generator returns images in SXGA resolution with 30 FPS, the |
||||
mode is supported by XtionPRO Live); depth generator's maps are always in VGA resolution. |
||||
|
||||
- For depth generator: |
||||
|
||||
- CAP_PROP_OPENNI_REGISTRATION -- Flag that registers the remapping depth map to image map |
||||
by changing depth generator's view point (if the flag is "on") or sets this view point to |
||||
its normal one (if the flag is "off"). The registration process’s resulting images are |
||||
pixel-aligned,which means that every pixel in the image is aligned to a pixel in the depth |
||||
image. |
||||
|
||||
Next properties are available for getting only: |
||||
|
||||
- CAP_PROP_OPENNI_FRAME_MAX_DEPTH -- A maximum supported depth of Kinect in mm. |
||||
- CAP_PROP_OPENNI_BASELINE -- Baseline value in mm. |
||||
- CAP_PROP_OPENNI_FOCAL_LENGTH -- A focal length in pixels. |
||||
- CAP_PROP_FRAME_WIDTH -- Frame width in pixels. |
||||
- CAP_PROP_FRAME_HEIGHT -- Frame height in pixels. |
||||
- CAP_PROP_FPS -- Frame rate in FPS. |
||||
|
||||
- Some typical flags combinations "generator type + property" are defined as single flags: |
||||
|
||||
- CAP_OPENNI_IMAGE_GENERATOR_OUTPUT_MODE = CAP_OPENNI_IMAGE_GENERATOR + CAP_PROP_OPENNI_OUTPUT_MODE |
||||
- CAP_OPENNI_DEPTH_GENERATOR_BASELINE = CAP_OPENNI_DEPTH_GENERATOR + CAP_PROP_OPENNI_BASELINE |
||||
- CAP_OPENNI_DEPTH_GENERATOR_FOCAL_LENGTH = CAP_OPENNI_DEPTH_GENERATOR + CAP_PROP_OPENNI_FOCAL_LENGTH |
||||
- CAP_OPENNI_DEPTH_GENERATOR_REGISTRATION = CAP_OPENNI_DEPTH_GENERATOR + CAP_PROP_OPENNI_REGISTRATION |
||||
|
||||
For more information please refer to the example of usage |
||||
[openniccaptureccpp](https://github.com/Itseez/opencv/tree/master/samples/cpp/openni_capture.cpp) in |
||||
opencv/samples/cpp folder. |
@ -0,0 +1,85 @@ |
||||
HighGUI {#tutorial_ug_intelperc} |
||||
======= |
||||
|
||||
Using Creative Senz3D and other Intel Perceptual Computing SDK compatible depth sensors |
||||
--------------------------------------------------------------------------------------- |
||||
|
||||
Depth sensors compatible with Intel Perceptual Computing SDK are supported through VideoCapture |
||||
class. Depth map, RGB image and some other formats of output can be retrieved by using familiar |
||||
interface of VideoCapture. |
||||
|
||||
In order to use depth sensor with OpenCV you should do the following preliminary steps: |
||||
|
||||
-# Install Intel Perceptual Computing SDK (from here <http://www.intel.com/software/perceptual>). |
||||
|
||||
-# Configure OpenCV with Intel Perceptual Computing SDK support by setting WITH_INTELPERC flag in |
||||
CMake. If Intel Perceptual Computing SDK is found in install folders OpenCV will be built with |
||||
Intel Perceptual Computing SDK library (see a status INTELPERC in CMake log). If CMake process |
||||
doesn't find Intel Perceptual Computing SDK installation folder automatically, the user should |
||||
change corresponding CMake variables INTELPERC_LIB_DIR and INTELPERC_INCLUDE_DIR to the |
||||
proper value. |
||||
|
||||
-# Build OpenCV. |
||||
|
||||
VideoCapture can retrieve the following data: |
||||
|
||||
-# data given from depth generator: |
||||
- CAP_INTELPERC_DEPTH_MAP - each pixel is a 16-bit integer. The value indicates the |
||||
distance from an object to the camera's XY plane or the Cartesian depth. (CV_16UC1) |
||||
- CAP_INTELPERC_UVDEPTH_MAP - each pixel contains two 32-bit floating point values in |
||||
the range of 0-1, representing the mapping of depth coordinates to the color |
||||
coordinates. (CV_32FC2) |
||||
- CAP_INTELPERC_IR_MAP - each pixel is a 16-bit integer. The value indicates the |
||||
intensity of the reflected laser beam. (CV_16UC1) |
||||
|
||||
-# data given from RGB image generator: |
||||
- CAP_INTELPERC_IMAGE - color image. (CV_8UC3) |
||||
|
||||
In order to get depth map from depth sensor use VideoCapture::operator \>\>, e. g. : |
||||
@code{.cpp} |
||||
VideoCapture capture( CAP_INTELPERC ); |
||||
for(;;) |
||||
{ |
||||
Mat depthMap; |
||||
capture >> depthMap; |
||||
|
||||
if( waitKey( 30 ) >= 0 ) |
||||
break; |
||||
} |
||||
@endcode |
||||
For getting several data maps use VideoCapture::grab and VideoCapture::retrieve, e.g. : |
||||
@code{.cpp} |
||||
VideoCapture capture(CAP_INTELPERC); |
||||
for(;;) |
||||
{ |
||||
Mat depthMap; |
||||
Mat image; |
||||
Mat irImage; |
||||
|
||||
capture.grab(); |
||||
|
||||
capture.retrieve( depthMap, CAP_INTELPERC_DEPTH_MAP ); |
||||
capture.retrieve( image, CAP_INTELPERC_IMAGE ); |
||||
capture.retrieve( irImage, CAP_INTELPERC_IR_MAP); |
||||
|
||||
if( waitKey( 30 ) >= 0 ) |
||||
break; |
||||
} |
||||
@endcode |
||||
For setting and getting some property of sensor\` data generators use VideoCapture::set and |
||||
VideoCapture::get methods respectively, e.g. : |
||||
@code{.cpp} |
||||
VideoCapture capture( CAP_INTELPERC ); |
||||
capture.set( CAP_INTELPERC_DEPTH_GENERATOR | CAP_PROP_INTELPERC_PROFILE_IDX, 0 ); |
||||
cout << "FPS " << capture.get( CAP_INTELPERC_DEPTH_GENERATOR+CAP_PROP_FPS ) << endl; |
||||
@endcode |
||||
Since two types of sensor's data generators are supported (image generator and depth generator), |
||||
there are two flags that should be used to set/get property of the needed generator: |
||||
|
||||
- CAP_INTELPERC_IMAGE_GENERATOR -- a flag for access to the image generator properties. |
||||
- CAP_INTELPERC_DEPTH_GENERATOR -- a flag for access to the depth generator properties. This |
||||
flag value is assumed by default if neither of the two possible values of the property is set. |
||||
|
||||
For more information please refer to the example of usage |
||||
[intelpercccaptureccpp](https://github.com/Itseez/opencv/tree/master/samples/cpp/intelperc_capture.cpp) |
||||
in opencv/samples/cpp folder. |
@ -0,0 +1,180 @@ |
||||
Operations with images {#tutorial_ug_mat} |
||||
====================== |
||||
|
||||
Input/Output |
||||
------------ |
||||
|
||||
### Images |
||||
|
||||
Load an image from a file: |
||||
@code{.cpp} |
||||
Mat img = imread(filename) |
||||
@endcode |
||||
|
||||
If you read a jpg file, a 3 channel image is created by default. If you need a grayscale image, use: |
||||
|
||||
@code{.cpp} |
||||
Mat img = imread(filename, 0); |
||||
@endcode |
||||
|
||||
@note format of the file is determined by its content (first few bytes) Save an image to a file: |
||||
|
||||
@code{.cpp} |
||||
imwrite(filename, img); |
||||
@endcode |
||||
|
||||
@note format of the file is determined by its extension. |
||||
|
||||
@note use imdecode and imencode to read and write image from/to memory rather than a file. |
||||
|
||||
XML/YAML |
||||
-------- |
||||
|
||||
TBD |
||||
|
||||
Basic operations with images |
||||
---------------------------- |
||||
|
||||
### Accessing pixel intensity values |
||||
|
||||
In order to get pixel intensity value, you have to know the type of an image and the number of |
||||
channels. Here is an example for a single channel grey scale image (type 8UC1) and pixel coordinates |
||||
x and y: |
||||
@code{.cpp} |
||||
Scalar intensity = img.at<uchar>(y, x); |
||||
@endcode |
||||
intensity.val[0] contains a value from 0 to 255. Note the ordering of x and y. Since in OpenCV |
||||
images are represented by the same structure as matrices, we use the same convention for both |
||||
cases - the 0-based row index (or y-coordinate) goes first and the 0-based column index (or |
||||
x-coordinate) follows it. Alternatively, you can use the following notation: |
||||
@code{.cpp} |
||||
Scalar intensity = img.at<uchar>(Point(x, y)); |
||||
@endcode |
||||
Now let us consider a 3 channel image with BGR color ordering (the default format returned by |
||||
imread): |
||||
@code{.cpp} |
||||
Vec3b intensity = img.at<Vec3b>(y, x); |
||||
uchar blue = intensity.val[0]; |
||||
uchar green = intensity.val[1]; |
||||
uchar red = intensity.val[2]; |
||||
@endcode |
||||
You can use the same method for floating-point images (for example, you can get such an image by |
||||
running Sobel on a 3 channel image): |
||||
@code{.cpp} |
||||
Vec3f intensity = img.at<Vec3f>(y, x); |
||||
float blue = intensity.val[0]; |
||||
float green = intensity.val[1]; |
||||
float red = intensity.val[2]; |
||||
@endcode |
||||
The same method can be used to change pixel intensities: |
||||
@code{.cpp} |
||||
img.at<uchar>(y, x) = 128; |
||||
@endcode |
||||
There are functions in OpenCV, especially from calib3d module, such as projectPoints, that take an |
||||
array of 2D or 3D points in the form of Mat. Matrix should contain exactly one column, each row |
||||
corresponds to a point, matrix type should be 32FC2 or 32FC3 correspondingly. Such a matrix can be |
||||
easily constructed from `std::vector`: |
||||
@code{.cpp} |
||||
vector<Point2f> points; |
||||
//... fill the array |
||||
Mat pointsMat = Mat(points); |
||||
@endcode |
||||
One can access a point in this matrix using the same method Mat::at : |
||||
@code{.cpp} |
||||
Point2f point = pointsMat.at<Point2f>(i, 0); |
||||
@endcode |
||||
|
||||
### Memory management and reference counting |
||||
|
||||
Mat is a structure that keeps matrix/image characteristics (rows and columns number, data type etc) |
||||
and a pointer to data. So nothing prevents us from having several instances of Mat corresponding to |
||||
the same data. A Mat keeps a reference count that tells if data has to be deallocated when a |
||||
particular instance of Mat is destroyed. Here is an example of creating two matrices without copying |
||||
data: |
||||
@code{.cpp} |
||||
std::vector<Point3f> points; |
||||
// .. fill the array |
||||
Mat pointsMat = Mat(points).reshape(1); |
||||
@endcode |
||||
As a result we get a 32FC1 matrix with 3 columns instead of 32FC3 matrix with 1 column. pointsMat |
||||
uses data from points and will not deallocate the memory when destroyed. In this particular |
||||
instance, however, developer has to make sure that lifetime of points is longer than of pointsMat. |
||||
If we need to copy the data, this is done using, for example, cv::Mat::copyTo or cv::Mat::clone: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); |
||||
Mat img1 = img.clone(); |
||||
@endcode |
||||
To the contrary with C API where an output image had to be created by developer, an empty output Mat |
||||
can be supplied to each function. Each implementation calls Mat::create for a destination matrix. |
||||
This method allocates data for a matrix if it is empty. If it is not empty and has the correct size |
||||
and type, the method does nothing. If, however, size or type are different from input arguments, the |
||||
data is deallocated (and lost) and a new data is allocated. For example: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); |
||||
Mat sobelx; |
||||
Sobel(img, sobelx, CV_32F, 1, 0); |
||||
@endcode |
||||
|
||||
### Primitive operations |
||||
|
||||
There is a number of convenient operators defined on a matrix. For example, here is how we can make |
||||
a black image from an existing greyscale image \`img\`: |
||||
@code{.cpp} |
||||
img = Scalar(0); |
||||
@endcode |
||||
Selecting a region of interest: |
||||
@code{.cpp} |
||||
Rect r(10, 10, 100, 100); |
||||
Mat smallImg = img(r); |
||||
@endcode |
||||
A convertion from Mat to C API data structures: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); |
||||
IplImage img1 = img; |
||||
CvMat m = img; |
||||
@endcode |
||||
|
||||
Note that there is no data copying here. |
||||
|
||||
Conversion from color to grey scale: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); // loading a 8UC3 image |
||||
Mat grey; |
||||
cvtColor(img, grey, COLOR_BGR2GRAY); |
||||
@endcode |
||||
Change image type from 8UC1 to 32FC1: |
||||
@code{.cpp} |
||||
src.convertTo(dst, CV_32F); |
||||
@endcode |
||||
|
||||
### Visualizing images |
||||
|
||||
It is very useful to see intermediate results of your algorithm during development process. OpenCV |
||||
provides a convenient way of visualizing images. A 8U image can be shown using: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); |
||||
|
||||
namedWindow("image", WINDOW_AUTOSIZE); |
||||
imshow("image", img); |
||||
waitKey(); |
||||
@endcode |
||||
|
||||
A call to waitKey() starts a message passing cycle that waits for a key stroke in the "image" |
||||
window. A 32F image needs to be converted to 8U type. For example: |
||||
@code{.cpp} |
||||
Mat img = imread("image.jpg"); |
||||
Mat grey; |
||||
cvtColor(img, grey, COLOR_BGR2GRAY); |
||||
|
||||
Mat sobelx; |
||||
Sobel(grey, sobelx, CV_32F, 1, 0); |
||||
|
||||
double minVal, maxVal; |
||||
minMaxLoc(sobelx, &minVal, &maxVal); //find minimum and maximum intensities |
||||
Mat draw; |
||||
sobelx.convertTo(draw, CV_8U, 255.0/(maxVal - minVal), -minVal * 255.0/(maxVal - minVal)); |
||||
|
||||
namedWindow("image", WINDOW_AUTOSIZE); |
||||
imshow("image", draw); |
||||
waitKey(); |
||||
@endcode |
@ -0,0 +1,323 @@ |
||||
Cascade Classifier Training {#tutorial_ug_traincascade} |
||||
=========================== |
||||
|
||||
Introduction |
||||
------------ |
||||
|
||||
The work with a cascade classifier inlcudes two major stages: training and detection. Detection |
||||
stage is described in a documentation of objdetect module of general OpenCV documentation. |
||||
Documentation gives some basic information about cascade classifier. Current guide is describing how |
||||
to train a cascade classifier: preparation of a training data and running the training application. |
||||
|
||||
### Important notes |
||||
|
||||
There are two applications in OpenCV to train cascade classifier: opencv_haartraining and |
||||
opencv_traincascade. opencv_traincascade is a newer version, written in C++ in accordance to |
||||
OpenCV 2.x API. But the main difference between this two applications is that opencv_traincascade |
||||
supports both Haar @cite Viola01 and @cite Liao2007 (Local Binary Patterns) features. LBP features |
||||
are integer in contrast to Haar features, so both training and detection with LBP are several times |
||||
faster then with Haar features. Regarding the LBP and Haar detection quality, it depends on |
||||
training: the quality of training dataset first of all and training parameters too. It's possible to |
||||
train a LBP-based classifier that will provide almost the same quality as Haar-based one. |
||||
|
||||
opencv_traincascade and opencv_haartraining store the trained classifier in different file |
||||
formats. Note, the newer cascade detection interface (see CascadeClassifier class in objdetect |
||||
module) support both formats. opencv_traincascade can save (export) a trained cascade in the older |
||||
format. But opencv_traincascade and opencv_haartraining can not load (import) a classifier in |
||||
another format for the futher training after interruption. |
||||
|
||||
Note that opencv_traincascade application can use TBB for multi-threading. To use it in multicore |
||||
mode OpenCV must be built with TBB. |
||||
|
||||
Also there are some auxilary utilities related to the training. |
||||
|
||||
- opencv_createsamples is used to prepare a training dataset of positive and test samples. |
||||
opencv_createsamples produces dataset of positive samples in a format that is supported by |
||||
both opencv_haartraining and opencv_traincascade applications. The output is a file |
||||
with \*.vec extension, it is a binary format which contains images. |
||||
- opencv_performance may be used to evaluate the quality of classifiers, but for trained by |
||||
opencv_haartraining only. It takes a collection of marked up images, runs the classifier and |
||||
reports the performance, i.e. number of found objects, number of missed objects, number of |
||||
false alarms and other information. |
||||
|
||||
Since opencv_haartraining is an obsolete application, only opencv_traincascade will be described |
||||
futher. opencv_createsamples utility is needed to prepare a training data for opencv_traincascade, |
||||
so it will be described too. |
||||
|
||||
Training data preparation |
||||
------------------------- |
||||
|
||||
For training we need a set of samples. There are two types of samples: negative and positive. |
||||
Negative samples correspond to non-object images. Positive samples correspond to images with |
||||
detected objects. Set of negative samples must be prepared manually, whereas set of positive samples |
||||
is created using opencv_createsamples utility. |
||||
|
||||
### Negative Samples |
||||
|
||||
Negative samples are taken from arbitrary images. These images must not contain detected objects. |
||||
Negative samples are enumerated in a special file. It is a text file in which each line contains an |
||||
image filename (relative to the directory of the description file) of negative sample image. This |
||||
file must be created manually. Note that negative samples and sample images are also called |
||||
background samples or background samples images, and are used interchangeably in this document. |
||||
Described images may be of different sizes. But each image should be (but not nessesarily) larger |
||||
then a training window size, because these images are used to subsample negative image to the |
||||
training size. |
||||
|
||||
An example of description file: |
||||
|
||||
Directory structure: |
||||
@code{.text} |
||||
/img |
||||
img1.jpg |
||||
img2.jpg |
||||
bg.txt |
||||
@endcode |
||||
File bg.txt: |
||||
@code{.text} |
||||
img/img1.jpg |
||||
img/img2.jpg |
||||
@endcode |
||||
### Positive Samples |
||||
|
||||
Positive samples are created by opencv_createsamples utility. They may be created from a single |
||||
image with object or from a collection of previously marked up images. |
||||
|
||||
Please note that you need a large dataset of positive samples before you give it to the mentioned |
||||
utility, because it only applies perspective transformation. For example you may need only one |
||||
positive sample for absolutely rigid object like an OpenCV logo, but you definetely need hundreds |
||||
and even thousands of positive samples for faces. In the case of faces you should consider all the |
||||
race and age groups, emotions and perhaps beard styles. |
||||
|
||||
So, a single object image may contain a company logo. Then a large set of positive samples is |
||||
created from the given object image by random rotating, changing the logo intensity as well as |
||||
placing the logo on arbitrary background. The amount and range of randomness can be controlled by |
||||
command line arguments of opencv_createsamples utility. |
||||
|
||||
Command line arguments: |
||||
|
||||
- -vec \<vec_file_name\> |
||||
|
||||
Name of the output file containing the positive samples for training. |
||||
|
||||
- -img \<image_file_name\> |
||||
|
||||
Source object image (e.g., a company logo). |
||||
|
||||
- -bg \<background_file_name\> |
||||
|
||||
Background description file; contains a list of images which are used as a background for |
||||
randomly distorted versions of the object. |
||||
|
||||
- -num \<number_of_samples\> |
||||
|
||||
Number of positive samples to generate. |
||||
|
||||
- -bgcolor \<background_color\> |
||||
|
||||
Background color (currently grayscale images are assumed); the background color denotes the |
||||
transparent color. Since there might be compression artifacts, the amount of color tolerance |
||||
can be specified by -bgthresh. All pixels withing bgcolor-bgthresh and bgcolor+bgthresh range |
||||
are interpreted as transparent. |
||||
|
||||
- -bgthresh \<background_color_threshold\> |
||||
- -inv |
||||
|
||||
If specified, colors will be inverted. |
||||
|
||||
- -randinv |
||||
|
||||
If specified, colors will be inverted randomly. |
||||
|
||||
- -maxidev \<max_intensity_deviation\> |
||||
|
||||
Maximal intensity deviation of pixels in foreground samples. |
||||
|
||||
- -maxxangle \<max_x_rotation_angle\> |
||||
- -maxyangle \<max_y_rotation_angle\> |
||||
- -maxzangle \<max_z_rotation_angle\> |
||||
|
||||
Maximum rotation angles must be given in radians. |
||||
|
||||
- -show |
||||
|
||||
Useful debugging option. If specified, each sample will be shown. Pressing Esc will continue |
||||
the samples creation process without. |
||||
|
||||
- -w \<sample_width\> |
||||
|
||||
Width (in pixels) of the output samples. |
||||
|
||||
- -h \<sample_height\> |
||||
|
||||
Height (in pixels) of the output samples. |
||||
|
||||
For following procedure is used to create a sample object instance: The source image is rotated |
||||
randomly around all three axes. The chosen angle is limited my -max?angle. Then pixels having the |
||||
intensity from [bg_color-bg_color_threshold; bg_color+bg_color_threshold] range are |
||||
interpreted as transparent. White noise is added to the intensities of the foreground. If the -inv |
||||
key is specified then foreground pixel intensities are inverted. If -randinv key is specified then |
||||
algorithm randomly selects whether inversion should be applied to this sample. Finally, the obtained |
||||
image is placed onto an arbitrary background from the background description file, resized to the |
||||
desired size specified by -w and -h and stored to the vec-file, specified by the -vec command line |
||||
option. |
||||
|
||||
Positive samples also may be obtained from a collection of previously marked up images. This |
||||
collection is described by a text file similar to background description file. Each line of this |
||||
file corresponds to an image. The first element of the line is the filename. It is followed by the |
||||
number of object instances. The following numbers are the coordinates of objects bounding rectangles |
||||
(x, y, width, height). |
||||
|
||||
An example of description file: |
||||
|
||||
Directory structure: |
||||
@code{.text} |
||||
/img |
||||
img1.jpg |
||||
img2.jpg |
||||
info.dat |
||||
@endcode |
||||
File info.dat: |
||||
@code{.text} |
||||
img/img1.jpg 1 140 100 45 45 |
||||
img/img2.jpg 2 100 200 50 50 50 30 25 25 |
||||
@endcode |
||||
Image img1.jpg contains single object instance with the following coordinates of bounding rectangle: |
||||
(140, 100, 45, 45). Image img2.jpg contains two object instances. |
||||
|
||||
In order to create positive samples from such collection, -info argument should be specified instead |
||||
of \`-img\`: |
||||
|
||||
- -info \<collection_file_name\> |
||||
|
||||
Description file of marked up images collection. |
||||
|
||||
The scheme of samples creation in this case is as follows. The object instances are taken from |
||||
images. Then they are resized to target samples size and stored in output vec-file. No distortion is |
||||
applied, so the only affecting arguments are -w, -h, -show and -num. |
||||
|
||||
opencv_createsamples utility may be used for examining samples stored in positive samples file. In |
||||
order to do this only -vec, -w and -h parameters should be specified. |
||||
|
||||
Note that for training, it does not matter how vec-files with positive samples are generated. But |
||||
opencv_createsamples utility is the only one way to collect/create a vector file of positive |
||||
samples, provided by OpenCV. |
||||
|
||||
Example of vec-file is available here opencv/data/vec_files/trainingfaces_24-24.vec. It can be |
||||
used to train a face detector with the following window size: -w 24 -h 24. |
||||
|
||||
Cascade Training |
||||
---------------- |
||||
|
||||
The next step is the training of classifier. As mentioned above opencv_traincascade or |
||||
opencv_haartraining may be used to train a cascade classifier, but only the newer |
||||
opencv_traincascade will be described futher. |
||||
|
||||
Command line arguments of opencv_traincascade application grouped by purposes: |
||||
|
||||
-# Common arguments: |
||||
|
||||
- -data \<cascade_dir_name\> |
||||
|
||||
Where the trained classifier should be stored. |
||||
|
||||
- -vec \<vec_file_name\> |
||||
|
||||
vec-file with positive samples (created by opencv_createsamples utility). |
||||
|
||||
- -bg \<background_file_name\> |
||||
|
||||
Background description file. |
||||
|
||||
- -numPos \<number_of_positive_samples\> |
||||
- -numNeg \<number_of_negative_samples\> |
||||
|
||||
Number of positive/negative samples used in training for every classifier stage. |
||||
|
||||
- -numStages \<number_of_stages\> |
||||
|
||||
Number of cascade stages to be trained. |
||||
|
||||
- -precalcValBufSize \<precalculated_vals_buffer_size_in_Mb\> |
||||
|
||||
Size of buffer for precalculated feature values (in Mb). |
||||
|
||||
- -precalcIdxBufSize \<precalculated_idxs_buffer_size_in_Mb\> |
||||
|
||||
Size of buffer for precalculated feature indices (in Mb). The more memory you have the |
||||
faster the training process. |
||||
|
||||
- -baseFormatSave |
||||
|
||||
This argument is actual in case of Haar-like features. If it is specified, the cascade will |
||||
be saved in the old format. |
||||
|
||||
- -numThreads \<max_number_of_threads\> |
||||
|
||||
Maximum number of threads to use during training. Notice that the actual number of used |
||||
threads may be lower, depending on your machine and compilation options. |
||||
|
||||
-# Cascade parameters: |
||||
|
||||
- -stageType \<BOOST(default)\> |
||||
|
||||
Type of stages. Only boosted classifier are supported as a stage type at the moment. |
||||
|
||||
- -featureType\<{HAAR(default), LBP}\> |
||||
|
||||
Type of features: HAAR - Haar-like features, LBP - local binary patterns. |
||||
|
||||
- -w \<sampleWidth\> |
||||
- -h \<sampleHeight\> |
||||
|
||||
Size of training samples (in pixels). Must have exactly the same values as used during |
||||
training samples creation (opencv_createsamples utility). |
||||
|
||||
-# Boosted classifer parameters: |
||||
|
||||
- -bt \<{DAB, RAB, LB, GAB(default)}\> |
||||
|
||||
Type of boosted classifiers: DAB - Discrete AdaBoost, RAB - Real AdaBoost, LB - LogitBoost, |
||||
GAB - Gentle AdaBoost. |
||||
|
||||
- -minHitRate \<min_hit_rate\> |
||||
|
||||
Minimal desired hit rate for each stage of the classifier. Overall hit rate may be estimated |
||||
as (min_hit_rate\^number_of_stages). |
||||
|
||||
- -maxFalseAlarmRate \<max_false_alarm_rate\> |
||||
|
||||
Maximal desired false alarm rate for each stage of the classifier. Overall false alarm rate |
||||
may be estimated as (max_false_alarm_rate\^number_of_stages). |
||||
|
||||
- -weightTrimRate \<weight_trim_rate\> |
||||
|
||||
Specifies whether trimming should be used and its weight. A decent choice is 0.95. |
||||
|
||||
- -maxDepth \<max_depth_of_weak_tree\> |
||||
|
||||
Maximal depth of a weak tree. A decent choice is 1, that is case of stumps. |
||||
|
||||
- -maxWeakCount \<max_weak_tree_count\> |
||||
|
||||
Maximal count of weak trees for every cascade stage. The boosted classifier (stage) will |
||||
have so many weak trees (\<=maxWeakCount), as needed to achieve the |
||||
given -maxFalseAlarmRate. |
||||
|
||||
-# Haar-like feature parameters: |
||||
|
||||
- -mode \<BASIC (default) | CORE | ALL\> |
||||
|
||||
Selects the type of Haar features set used in training. BASIC use only upright features, |
||||
while ALL uses the full set of upright and 45 degree rotated feature set. See @cite Lienhart02 |
||||
for more details. |
||||
|
||||
-# Local Binary Patterns parameters: |
||||
|
||||
Local Binary Patterns don't have parameters. |
||||
|
||||
After the opencv_traincascade application has finished its work, the trained cascade will be saved |
||||
in cascade.xml file in the folder, which was passed as -data parameter. Other files in this folder |
||||
are created for the case of interrupted training, so you may delete them after completion of |
||||
training. |
||||
|
||||
Training is finished and you can test you cascade classifier! |
@ -0,0 +1,8 @@ |
||||
OpenCV User Guide {#tutorial_user_guide} |
||||
================= |
||||
|
||||
- @subpage tutorial_ug_mat |
||||
- @subpage tutorial_ug_features2d |
||||
- @subpage tutorial_ug_highgui |
||||
- @subpage tutorial_ug_traincascade |
||||
- @subpage tutorial_ug_intelperc |
Loading…
Reference in new issue