OpenCV combat [2] HOG+SVM realizes pedestrian detection

OpenCV combat [2] HOG+SVM realizes pedestrian detection

table of Contents

What is HOG?

Histogram of Oriented Gradient (HOG) feature is a feature descriptor used for object detection in computer vision and image processing. It composes features by
calculating and counting the histogram of the gradient direction of the local area of the image . Hog feature combined with SVM classifier has been widely used in image recognition.

HOG vs SIFT

SIFT: The description method of feature points
HOG: The description method of the feature quantity of a certain area
1. It can express a larger shape
2. It is very suitable for pedestrian and vehicle detection.
Suppose we want to detect pedestrians in intelligent driving:
positive sample:

negative sample :

The essence of recognition is to find the most essential difference between positive samples and negative samples. For example, pedestrians have horizontal edges on the shoulders and vertical edges on both arms. The edges in the non-pedestrian sample are disorganized. Therefore, the shape can be detected by constructing a histogram of the gradient. Because the histogram loses spatial information, HOG divides the image into a small area (the block method in the threshold processing), constructs histograms for the small areas, and then stitches them to get a large histogram.
Disadvantages of HOG: slow speed, poor real-time performance; difficult to deal with occlusion problems.
HOG features do not have rotation robustness, and scale robustness

HOG steps

1. Gamma correction (enhance the contrast of the image)
2. Calculate the gradient information
3. Calculate the gradient histogram in the unit of cell (a pixel block)
4. Calculate the feature quantity in the unit of block (several cells as a block) A
specific step:

Generally speaking, the gradient direction is divided into nine equal parts.
Generally, 3*3 pixels form a cell, so that each cell can get a 9-dimensional histogram.
Every 3*3 cells form a block, and normalize in each block:

the purpose of normalization: to enhance the robustness to brightness.

HOG's way of detecting pedestrians

The sliding window method is usually used:

calculate the gradient histogram of the pixels contained in the sliding window, and then compare it with the histogram in the pedestrian template (such as using various moments). When the two are very similar, we consider this area Pedestrian area.
The problem arises from this:
Since the template is of a fixed size, it can only detect pedestrians of a fixed size. How to use a single template to detect when the size of the pedestrian in the image changes?

Opencv implementation

OpenCV implements two types of pedestrian detection based on HOG features, namely SVM and Cascade. The location of the cascade classifier file that comes with OpenCV is "XX\opencv\sources\data\hogcascades" (OpenCV4.x version is available ).
The number detection file that comes with opencv is located in the opencv installation directory (the following is my installation location):
D:\Program Files\opencv\sources\samples\cpp

The constructor of HOGDescriptor:

CV_WRAP HOGDescriptor () : winSize( 64 , 128 ), blockSize( 16 , 16 ), blockStride( 8 , 8 ), cellSize( 8 , 8 ), nbins( 9 ), derivAperture( 1 ), winSigma( -1 ), histogramNormType(HOGDescriptor::L2Hys), L2HysThreshold( 0.2 ), gammaCorrection(true), free_coef( -1.f ), nlevels(HOGDescriptor::DEFAULT_NLEVELS), signedGradient(false) { } Copy code

Window size winSize(64,128), block size blockSize(16,16), block sliding increment blockStride(8,8), cell size cellSize(8,8), number of gradient directions nbins(9).
The above are the member variables of HOGDescriptor, the values in parentheses are their default values, and they reflect the parameters of the HOG descriptor.

nBins represents the number of gradient directions in a cell. For example, when nBins=9, the histogram of gradients in 9 directions is counted in a cell, and each direction is 180/9=20 degrees.

There are two Detectors in HOGDescriptor: getDaimlerPeopleDetector, getDefaultPeopleDetector

Pedestrian detection HOG+SVM steps

Reference code:

# include <opencv2/objdetect.hpp> # include <opencv2/highgui.hpp> # include <opencv2/imgproc.hpp> # include <opencv2/videoio.hpp> # include <iostream> # include <iomanip> using namespace cv; using namespace std; class Detector { //enum Mode {Default, Daimler} m; enum { Default, Daimler };//Define enumeration type int m; HOGDescriptor hog, hog_d; public : Detector ( int a): m (a), hog (), hog_d ( Size ( 48 , 96 ), Size ( 16 , 16 ), Size ( 8 , 8 ), Size ( 8 , 8 ), 9 ) //Constructor, automatically called when the object is initialized, m, hog, hog_d are data members, followed by an initialization form in parentheses { hog. setSVMDetector (HOGDescriptor:: getDefaultPeopleDetector ()); hog_d. setSVMDetector (HOGDescriptor:: getDaimlerPeopleDetector ()); } void toggleMode () {m = (m == Default? Daimler: Default);} string modeName () const { return (m == Default? "Default" : "Daimler" );} vector<Rect> detect (InputArray img ) { //Run the detector with default parameters. to get a higher hit-rate //(and more false alarms, respectively), decrease the hitThreshold and //groupThreshold (set groupThreshold to 0 to turn off the grouping completely). vector<Rect> found; if (m == Default) hog. detectMultiScale (img, found, 0 , Size ( 8 , 8 ), Size ( 32 , 32 ), 1.05 , 2 , false ); else if (m == Daimler) hog_d. detectMultiScale (img, found, 0.5 , Size ( 8 , 8 ), Size ( 32 , 32 ), 1.05 , 2 , true ); return found; } void adjustRect (Rect& r) const { //The HOG detector returns slightly larger rectangles than the real objects, //so we slightly shrink the rectangles to get a nicer output. rx += cvRound (r.width * 0.1 ); r.width = cvRound (r.width * 0.8 ); ry += cvRound (r.height * 0.07 ); r.height = cvRound (r.height * 0.8 ); } }; //Modify the parameter area static const string keys = "{ help h | | print help message }" "{ camera c | 0 | capture video from camera (device index starting from 0) }" "{ video v | D:/opencv/opencv4.0/opencv4.0.0/sources/samples/data/vtest.avi| use video as input }" ; int main ( int argc, char ** argv) { CommandLineParser parser (argc, argv, keys) ; //keys: a string describing acceptable command line parameters parser. about ( "This sample demonstrates the use ot the HoG descriptor ." );//Set related information. Related information will be displayed when printMessage is called. if (parser. has ( "help" )) { parser. printMessage (); return 0 ; } int camera = parser.get< int >( "camera" ); string file = parser.get<string>( "video" ); if (!parser. check ()) //Check for parsing errors. It returns true when an error occurs. Errors may be conversion errors, missing parameters, etc. { parser. printErrors (); return 1 ; } VideoCapture cap; if (file. empty ()) cap. open (camera); else cap. open (file. c_str ()); if (!cap. isOpened ()) { cout << "Can not open video stream:'" << (file. empty ()? "<camera>" : file) << "'" << endl; return 2 ; } cout << "Press'q' or <ESC> to quit." << endl; cout << "Press <space> to toggle between Default and Daimler detector" << endl; //Default and Daimler detector Detector detector ( 1 ) ; //Initialize using Daimler detector Mat frame; for (;;) { cap >> frame; if (frame. empty ()) { cout << "Finished reading: empty frame" << endl; break ; } int64 t = getTickCount (); vector<Rect> found = detector. detect (frame); t = getTickCount ()-t; //show the window { ostringstream buf; buf << "Mode: " << detector. modeName () << " ||| " << "FPS:" << fixed << setprecision ( 1 ) << ( getTickFrequency ()/( double )t); putText (frame, buf. str (), Point ( 10 , 30 ), FONT_HERSHEY_PLAIN, 2.0 , Scalar ( 0 , 0 , 255 ), 2 , LINE_AA); } for (vector<Rect>::iterator i = found. begin (); i != found. end (); ++i) { Rect& r = *i; detector. adjustRect (r); rectangle (frame, r. tl (), r. br (), cv:: Scalar ( 0 , 255 , 0 ), 2 ); } imshow ( "People detector" , frame); //interact with user const char key = ( char ) waitKey ( 30 ); if (key == 27 || key == 'q' ) //ESC { cout << "Exit requested" << endl; break ; } else if (key == '' ) { detector. toggleMode (); } } return 0 ; } Copy code

Simplified detection of single pictures

# include <opencv2/objdetect.hpp> # include <opencv2/highgui.hpp> # include <opencv2/imgproc.hpp> # include <opencv2/videoio.hpp> # include <iostream> # include <iomanip> using namespace cv; using namespace std; class Detector { //enum Mode {Default, Daimler} m; enum { Default, Daimler };//Define enumeration type int m; HOGDescriptor hog, hog_d; public : Detector ( int a): m (a), hog (), hog_d ( Size ( 48 , 96 ), Size ( 16 , 16 ), Size ( 8 , 8 ), Size ( 8 , 8 ), 9 ) //Constructor, automatically called when the object is initialized, m, hog, hog_d are data members, followed by an initialization form in parentheses { hog. setSVMDetector (HOGDescriptor:: getDefaultPeopleDetector ()); hog_d. setSVMDetector (HOGDescriptor:: getDaimlerPeopleDetector ()); } void toggleMode () {m = (m == Default? Daimler: Default);} string modeName () const { return (m == Default? "Default" : "Daimler" );} vector<Rect> detect (InputArray img ) { //Run the detector with default parameters. to get a higher hit-rate //(and more false alarms, respectively), decrease the hitThreshold and //groupThreshold (set groupThreshold to 0 to turn off the grouping completely). vector<Rect> found; if (m == Default) hog. detectMultiScale (img, found, 0 , Size ( 8 , 8 ), Size ( 32 , 32 ), 1.05 , 2 , false ); else if (m == Daimler) hog_d. detectMultiScale (img, found, 0.5 , Size ( 8 , 8 ), Size ( 32 , 32 ), 1.05 , 2 , true ); return found; } void adjustRect (Rect& r) const { //The HOG detector returns slightly larger rectangles than the real objects, //so we slightly shrink the rectangles to get a nicer output. rx += cvRound (r.width * 0.1 ); r.width = cvRound (r.width * 0.8 ); ry += cvRound (r.height * 0.07 ); r.height = cvRound (r.height * 0.8 ); } }; int main ( int argc, char ** argv) { Detector detector ( 1 ) ; //Initialize the use of Daimler detector Mat img = imread ( "D:\\opencv_picture_test\\HOG pedestrian detection\\timg.jpg" ); vector<Rect> found = detector. detect (img); for (vector<Rect>::iterator i = found. begin (); i != found. end (); ++i) { Rect& r = *i; detector. adjustRect (r); rectangle (img, r. tl (), r. br (), cv:: Scalar ( 0 , 255 , 0 ), 2 ); } imshow ( "People detector" , img); waitKey ( 0 ); return 0 ; } Copy code

result:

Simplified HOG calculation


Knowledge points that need to be used:

Manhattan distance:

Because the general method of creating an array is not feasible to fill in the variables in [], here we use the method of dynamically creating an array. The memory must be released before the program returns

# include <opencv2/opencv.hpp> # include "opencv2/features2d.hpp" # include <iostream> # include "windows.h" # include <stdio.h> # include <time.h> # include <math.h > //#include "My_ImageProssing_base.h" # define WINDOW_NAME1 "[Program Window 1]" # define WINDOW_NAME2 "[Program Window 2]" using namespace cv; using namespace std; RNG g_rng ( 12345 ) ; Mat src_image; Mat img1; Mat img2; //*--------------------------Manually implement HOG descriptors---------------- ---------------------*/ int angle_lianghua ( float angle) { int result = angle/45 ; return result; } int main () { //Change the font color of the console system ( "color 02" ); //Read image src_image = imread ( "D:\\opencv_picture_test\\HOG pedestrian detection\\hogTemplate.jpg" ); img1 = imread ( "D:\\opencv_picture_test\\HOG pedestrian detection\\img1.jpg" ); img2 = imread ( "D:\\opencv_picture_test\\HOG pedestrian detection\\img2.jpg" ); //error judgment if (!(src_image.data || img1.data || img2.data)) { cout << "image load failed!" << endl; return -1 ; } //1 Calculate hogTemplate //Calculate gradient and angle direction for all pixels Mat gx, gy; Mat mag, angle; //Amplitude and angle Sobel (src_image, gx, CV_32F, 1 , 0 , 1 ); Sobel (src_image, gy, CV_32F, 0 , 1 , 1 ); cartToPolar (gx, gy, mag, angle , false ); //false gets the angle int cellSize = 16 ; //The size of each cell int nx = src_image.cols/cellSize; //How many in each row int ny = src_image.rows/cellSize; //How many in each column int cellnums = nx * ny ; //There are several cells int bins = cellnums * 8 ; float * ref_hist = new float [bins]; memset (ref_hist, 0 , sizeof ( float ) * bins); int binnum = 0 ; //Calculate a picture for ( int j = 0 ;j <ny;j++) { for ( int i = 0 ; i <nx; i++) { //Calculate the histogram of each cell for ( int y = j * cellSize; y <(j + 1 ) * cellSize; y++) { for ( int x = i * cellSize; x <(i + 1 ) * cellSize;x++) { //Quantize the angle int tempangle1 = 0 ; float tempangle2 = angle.at< float >(y, x); //The angle value of the current pixel tempangle1 = angle_lianghua (tempangle2); //The angle component of the current cell float magnitude = mag.at< float >(y, x); //The amplitude value of the current pixel ref_hist[tempangle1 + binnum * 8 ] += magnitude; //Add the current pixel to the array } } binnum++; //Number of cells+1 } } //2 Calculate img1 //Calculate gradient and angle direction for all pixels Mat gx_img1, gy_img1; Mat mag_img1, angle_img1; //Amplitude and angle Sobel (img1, gx_img1, CV_32F, 1 , 0 , 1 ); Sobel (img1, gy_img1, CV_32F, 0 , 1 , 1 ); cartToPolar (gx_img1, gy_img1, _mag_img1, angle , false ); //false is the angle nx = img1.cols/cellSize; //How many in each row ny = img1.rows/cellSize; //How many in each column cellnums = nx * ny; // Yes Several cell bins = cellnums * 8 ; float * ref_hist_img1 = new float [bins]; memset(ref_hist_img1, 0 , sizeof ( float ) * bins); binnum = 0 ; //Calculate a picture for ( int j = 0 ;j <ny;j++) { for ( int i = 0 ; i <nx; i++) { //Calculate the histogram of each cell for ( int y = j * cellSize; y <(j + 1 ) * cellSize; y++) { for ( int x = i * cellSize; x <(i + 1 ) * cellSize;x++) { //Quantize the angle int tempangle1 = 0 ; float tempangle2 = angle_img1.at< float >(y, x); //The angle value of the current pixel tempangle1 = angle_lianghua (tempangle2); //The angle component of the current cell float magnitude = mag_img1.at< float >(y, x); //The amplitude value of the current pixel ref_hist_img1[tempangle1 + binnum * 8 ] += magnitude; //Add the current pixel to the array } } binnum++; //Number of cells+1 } } //3 Calculate img2 //Calculate gradient and angle direction for all pixels Mat gx_img2, gy_img2; Mat mag_img2, angle_img2; //Amplitude and angle Sobel (img2, gx_img2, CV_32F, 1 , 0 , 1 ); Sobel (img2, gy_img2, CV_32F, 0 , 1 , 1 ); cartToPolar (gx_img2, gy_img2, _mag_img2, angle , false ); //false is the angle nx = img2.cols/cellSize; //How many in each row ny = img2.rows/cellSize; //How many in each column cellnums = nx * ny; // Yes Several cell bins = cellnums * 8 ; float * ref_hist_img2 = new float [bins]; memset(ref_hist_img2, 0 , sizeof ( float ) * bins); binnum = 0 ; //Calculate a picture for ( int j = 0 ;j <ny;j++) { for ( int i = 0 ; i <nx; i++) { //Calculate the histogram of each cell for ( int y = j * cellSize; y <(j + 1 ) * cellSize; y++) { for ( int x = i * cellSize; x <(i + 1 ) * cellSize;x++) { //Quantize the angle int tempangle1 = 0 ; float tempangle2 = angle_img2.at< float >(y, x); //The angle value of the current pixel tempangle1 = angle_lianghua (tempangle2); //The angle component of the current pixel float magnitude = mag_img2.at< float >(y, x); //The amplitude value of the current pixel ref_hist_img2[tempangle1 + binnum * 8 ] += magnitude; //Add the current pixel to the array } } binnum++; //Number of cells+1 } } //[4] Calculate the moments of ref_hist_img1 and ref_hist\ref_hist_img2 and ref_hist respectively int result1 = 0 ; int result2 = 0 ; for ( int i = 0 ;i <bins; i++) { //Here to simplify the calculation, instead of calculating the square root, but calculating abs result1 += abs (ref_hist[i]- ref_hist_img1[i]); result2 += abs (ref_hist[i]-ref_hist_img2[i]); } cout << result1 << endl; cout << result2 << endl; if (result1 <result2) { cout << "img1 is more similar to the original image" << endl; } else cout << "img2 is more similar to the original image" << endl; waitKey ( 0 ); delete [] ref_hist; delete [] ref_hist_img1; delete [] ref_hist_img2; return 0 ; } Copy code

result:


Reference:

OpenCV actual combat 4: HOG+SVM realizes pedestrian detection
HOG detectMultiScale parameter analysis
CommandLineParser class (command line parsing class)
C++ syntax: constructor and destructor
"Digital Image Processing PPT. Li Zhu Edition"