Object Detection (Darknet)
Last updated
Last updated
This document will introduce how to use darknet to train a YOLOv2 target detection model. After reading this document, you will find that model training and prediction are very simple, and the most time-consuming and energy-consuming part is often the data preprocessing of the training set.
Here we will briefly introduce the difference between object classification and object detection , what is YOLO , and what is darknet .
The following figure clearly illustrates the target classification:
Take a picture as input, and then the model will tell me the probability of there being a cat in the picture and the probability of there being a dog, but it will not tell me how many cats and dogs there are, nor will it tell me the specific locations of the cats and dogs in the picture.
Object detection is different:
The target detection model output will tell me where the cat and dog are in the picture, and the probability of it being a cat. Therefore, the target detection model provides richer information, but the requirements for the model are also higher.
I highly recommend this article, which explains target detection in a very intuitive and detailed way:
http://machinethink.net/blog/object-detection/
There are many target detection algorithms now. This article uses the very famous YOLO (You only look once). Just as its name implies, YOLO can get the result in one step (one-stage) so it is very fast (R-CNN requires two-stage). Darknet is the open source implementation of the algorithm by the author of the paper. Since it is the author's own implementation, naturally many people will be willing to take a look.
Darknet is implemented in C language, and the open source source code can also be found on GitHub . Darknet can be used to train new models or to make predictions using existing models.
Next, we will introduce how to use darknet to train a Minion target detection model.
The first thing to do is of course to take pictures. You can use any camera to take pictures. However, if the camera used to collect data is the same as the one used for the final real-time prediction, the effect will of course be better. However, I used the camera on the K210 to take pictures here, and the USB camera was used for the prediction in the animation above, and the effect is also good.
Actually, taking photos is just taking some photos that contain the objects you want to recognize. Take photos from all angles and distances. This is very important!!! For example, if you want to recognize only half of the Minions, then there must be such examples in the photos you take. Here are the 300 photos I took:
Usually when doing image recognition for machine learning, the training set is often 10GB+, but I used a K210 camera to take the photos, and each photo was 320x240, about 3KB in size. In the end, the 300 photos together took up less than 3MB, but the final training results were quite good.
If you happen to have a K210, you can refer to the following micropython code to automatically take a photo every 1-2 seconds and save it to the SD card. Of course, it is no problem to use a mobile phone camera :
Do not save the images as Chinese names here. It is recommended to label them in order with numbers as shown in the figure. At this point we should have hundreds of images containing the objects we want to detect.
This step is probably the most tiring. The above is just 300 photos. This step is to label these photos one by one. It looks like this:
It is recommended to use the software LabelImg for labeling:
The software is very simple to use. Open the folder of the pictures you took, select the folder you want to save them in, and then mark them one by one. Finally, you can see that each picture has generated a corresponding txt file:
If we open a txt file, we can see something like the following:
The meaning of each number is:
Since we only have one category, the first number is always 0, representing the first category.
At this point, we should have a corresponding txt file for each picture we took, which contains the content mentioned above.
Before starting training, we also need to divide the pictures arranged above into training sets and test sets so that we can know the effect of model training.
Here is a python script that can automatically generate training sets and test sets. In fact, the ultimate goal is to generate two files, train.txt and test.txt, which contain the names of training images and test images respectively.
Finally, train.txt and test.txt look like this:
There should be some photos taken here. Each photo corresponds to a txt file indicating the position and size of the Minions. There is also a train.txt and test.txt file that summarizes the file names of the current training set and test set.
The training process is very simple. It is recommended to use GPU. Please be sure to use GPU and use Linux development environment. Ubuntu is naturally the most widely used Linux distribution in the scientific research field. Before starting training, please install the graphics card driver , CUDA tool chain , and cuDNN library .
I trained on an i5 for 2 full days and only iterated 200 steps. I trained on an NVIDIA GP104 for 20,000 steps in one morning.
After everything is installed, you can type nvidia-smi to see the CUDA version:
If there is no problem with the development environment installation here, we can start training. In fact, the training process is very simple. First, download the source code:
To use GPU training, we need to modify the Makefile:
Then compile:
After compiling darknet, you can see a darknet executable file in the source code root directory.
Next we add an obj.names file in the cfg directory, which defines our object categories:
Of course, we only have one category. Next, we need to add an obj.data file in the cfg directory. The first line defines the number of object categories. We only have one category. The three files train.txt, test.txt, and obj.names that we generated earlier are here. The final backup refers to the location where the trained model is saved.
Finally, because the model we trained has only one category, but the YOLO model does not have only one category by default, we need to modify cfg/yolov2.cfg. I have marked the parts that need to be modified. I saved this configuration file as minionsv2.cfg:
Finally, we can start training:
Remember to change the above configuration file to your own configuration file. The pre-trained model in the above command can be downloaded from here darknet19_448.conv.23 (78MB) to speed up the training.
As mentioned earlier, the trained model can be found in the backup directory. This is the model file that was automatically saved after I trained it for 20,000 steps:
If we want to see the detection results from the camera in real time, we need to recompile darknet first, because accessing the camera requires OpenCV library support. Modify the Makefile:
Re-make to get the darknet executable file, and enter the following command to see the test results.
Remember to change the above file path to your actual file path.
To summarize here, there are actually only a few steps to train the target detection model using darknet:
Take photos
Use labelimg to mark the location of the object you want to identify on the photo
Divide the photos into test set and training set, generate train.txt and test.txt
Compile darknet, configure CUDA, cuDNN
Add configuration files cfg/obj.names and cfg/obj.data
Modify the yolov2 model to a single category cfg/yolov2.cfg, and download darknet19_448.conv.23
Start training with GPU, the model will be saved automatically