Object Detection (Darknet)

This document will introduce how to use darknet to train a YOLOv2 target detection model. After reading this document, you will find that model training and prediction are very simple, and the most time-consuming and energy-consuming part is often the data preprocessing of the training set.

Here we will briefly introduce the difference between object classification and object detection , what is YOLO , and what is darknet .

The following figure clearly illustrates the target classification:

img

Take a picture as input, and then the model will tell me the probability of there being a cat in the picture and the probability of there being a dog, but it will not tell me how many cats and dogs there are, nor will it tell me the specific locations of the cats and dogs in the picture.

Object detection is different:

img

The target detection model output will tell me where the cat and dog are in the picture, and the probability of it being a cat. Therefore, the target detection model provides richer information, but the requirements for the model are also higher.

I highly recommend this article, which explains target detection in a very intuitive and detailed way:

http://machinethink.net/blog/object-detection/

There are many target detection algorithms now. This article uses the very famous YOLO (You only look once). Just as its name implies, YOLO can get the result in one step (one-stage) so it is very fast (R-CNN requires two-stage). Darknet is the open source implementation of the algorithm by the author of the paper. Since it is the author's own implementation, naturally many people will be willing to take a look.

Darknet is implemented in C language, and the open source source code can also be found on GitHub . Darknet can be used to train new models or to make predictions using existing models.

Next, we will introduce how to use darknet to train a Minion target detection model.

img

The first thing to do is of course to take pictures. You can use any camera to take pictures. However, if the camera used to collect data is the same as the one used for the final real-time prediction, the effect will of course be better. However, I used the camera on the K210 to take pictures here, and the USB camera was used for the prediction in the animation above, and the effect is also good.

Actually, taking photos is just taking some photos that contain the objects you want to recognize. Take photos from all angles and distances. This is very important!!! For example, if you want to recognize only half of the Minions, then there must be such examples in the photos you take. Here are the 300 photos I took:

img

Usually when doing image recognition for machine learning, the training set is often 10GB+, but I used a K210 camera to take the photos, and each photo was 320x240, about 3KB in size. In the end, the 300 photos together took up less than 3MB, but the final training results were quite good.

If you happen to have a K210, you can refer to the following micropython code to automatically take a photo every 1-2 seconds and save it to the SD card. Of course, it is no problem to use a mobile phone camera :

# Untitled - By: RT-Thread - 周五 7月 19 2019

import sensor, image, time, lcd
from fpioa_manager import *
from Maix import GPIO
import os

def getMax():
    maxnum = 0;
    files = os.listdir('/sd')
    for file in files:
        name = file.split(".")
        if(len(name)>1 and name[1] == "jpg"):
            if(int(name[0])>maxnum):
                maxnum = int(name[0])
    return maxnum

fm.register(board_info.LED_R, fm.fpioa.GPIO0)
led_r=GPIO(GPIO.GPIO0,GPIO.OUT)

lcd.init(freq=15000000)

sensor.reset()
sensor.set_pixformat(sensor.RGB565) # Set pixel format to RGB565 (or GRAYSCALE)
sensor.set_framesize(sensor.QVGA)   # Set frame size to QVGA (320x240)
sensor.skip_frames(time = 2000)     # Wait for settings take effect.
sensor.set_hmirror(0)
sensor.set_windowing((224,224))
clock = time.clock()                # Create a clock object to track the FPS.

i = getMax() + 1

def capture():
    global i
    img = sensor.snapshot()
    filename = '/sd/' + str(i) + '.jpg'
    print(filename)
    img.save(filename)
    img.draw_string(2,2, ("%2.1f" %(i)), color=(0,128,0), scale=2)
    lcd.display(img)
    i = i + 1;

print("Start from %d" % i)

while True:
    led_r.value(0)
    time.sleep(1)
    led_r.value(1)
    capture()copymistakeCopy Success

Do not save the images as Chinese names here. It is recommended to label them in order with numbers as shown in the figure. At this point we should have hundreds of images containing the objects we want to detect.

This step is probably the most tiring. The above is just 300 photos. This step is to label these photos one by one. It looks like this:

It is recommended to use the software LabelImg for labeling:

The software is very simple to use. Open the folder of the pictures you took, select the folder you want to save them in, and then mark them one by one. Finally, you can see that each picture has generated a corresponding txt file:

If we open a txt file, we can see something like the following:

0     0.671875    0.575893   0.352679    0.848214copymistakeCopy Success

The meaning of each number is:

物体类别   物体中心位置 x    物体中心位置 y   物体宽度 x   物体高度ycopymistakeCopy Success

Since we only have one category, the first number is always 0, representing the first category.

At this point, we should have a corresponding txt file for each picture we took, which contains the content mentioned above.

Before starting training, we also need to divide the pictures arranged above into training sets and test sets so that we can know the effect of model training.

Here is a python script that can automatically generate training sets and test sets. In fact, the ultimate goal is to generate two files, train.txt and test.txt, which contain the names of training images and test images respectively.

import glob, os

# Current directory
current_dir = os.path.dirname(os.path.abspath(__file__))

# Directory where the data will reside, relative to 'darknet.exe'
# 修改为你自己的目录
path_data = './train/'

# Percentage of images to be used for the test set
percentage_test = 10;

# Create and/or truncate train.txt and test.txt
file_train = open('train.txt', 'w')
file_test = open('test.txt', 'w')

# Populate train.txt and test.txt
counter = 1
index_test = round(100 / percentage_test)
for pathAndFilename in glob.iglob(os.path.join(current_dir, "*.jpg")):
    title, ext = os.path.splitext(os.path.basename(pathAndFilename))

    if counter == index_test:
        counter = 1
        file_test.write(path_data + title + '.jpg' + "\n")
    else:
        file_train.write(path_data + title + '.jpg' + "\n")
        counter = counter + 1copymistakeCopy Success

Finally, train.txt and test.txt look like this:

There should be some photos taken here. Each photo corresponds to a txt file indicating the position and size of the Minions. There is also a train.txt and test.txt file that summarizes the file names of the current training set and test set.

The training process is very simple. It is recommended to use GPU. Please be sure to use GPU and use Linux development environment. Ubuntu is naturally the most widely used Linux distribution in the scientific research field. Before starting training, please install the graphics card driver , CUDA tool chain , and cuDNN library .

I trained on an i5 for 2 full days and only iterated 200 steps. I trained on an NVIDIA GP104 for 20,000 steps in one morning.

After everything is installed, you can type nvidia-smi to see the CUDA version:

If there is no problem with the development environment installation here, we can start training. In fact, the training process is very simple. First, download the source code:

git clone https://github.com/pjreddie/darknetcopymistakeCopy Success

To use GPU training, we need to modify the Makefile:

GPU=1
CUDNN=1
OPENCV=0
OPENMP=0
DEBUG=0copymistakeCopy Success

Then compile:

makecopymistakeCopy Success

After compiling darknet, you can see a darknet executable file in the source code root directory.

Next we add an obj.names file in the cfg directory, which defines our object categories:

minionscopymistakeCopy Success

Of course, we only have one category. Next, we need to add an obj.data file in the cfg directory. The first line defines the number of object categories. We only have one category. The three files train.txt, test.txt, and obj.names that we generated earlier are here. The final backup refers to the location where the trained model is saved.

classes= 1
train  = /home/wuhan/darknet/data/train.txt
valid  = /home/wuhan/darknet/data/test.txt
names = /home/wuhan/darknet/cfg/obj.names
backup = backup/copymistakeCopy Success

Finally, because the model we trained has only one category, but the YOLO model does not have only one category by default, we need to modify cfg/yolov2.cfg. I have marked the parts that need to be modified. I saved this configuration file as minionsv2.cfg:

[net]
# Testing
batch=16
subdivisions=1
# Training
# batch=64
# subdivisions=2
width=214     # 修改图像宽度
height=214    # 修改图像高度
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 500200
policy=steps
steps=400000,450000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
size=3
stride=1
pad=1
filters=512
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=30         # filters = (classes + 5) * 5
activation=linear

[region]
anchors =  0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828
bias_match=1
classes=1         # 只有 1 类
coords=4
num=5
softmax=1
jitter=.2
rescore=0

object_scale=5
noobject_scale=1
class_scale=1
coord_scale=1

absolute=1
thresh = .6
random=1copymistakeCopy Success

Finally, we can start training:

./darknet detector train cfg/obj.data cfg/minionsv2.cfg darknet19_448.conv.23copymistakeCopy Success

Remember to change the above configuration file to your own configuration file. The pre-trained model in the above command can be downloaded from here darknet19_448.conv.23 (78MB) to speed up the training.

As mentioned earlier, the trained model can be found in the backup directory. This is the model file that was automatically saved after I trained it for 20,000 steps:

minionsv2_10000.weights
minionsv2_100.weights
minionsv2_20000.weights
minionsv2_200.weights
minionsv2_300.weights
minionsv2_400.weights
minionsv2_500.weights
minionsv2_600.weights
minionsv2_700.weights
minionsv2_800.weights
minionsv2_900.weights
minionsv2.backupcopymistakeCopy Success

If we want to see the detection results from the camera in real time, we need to recompile darknet first, because accessing the camera requires OpenCV library support. Modify the Makefile:

GPU=1
CUDNN=1
OPENCV=1
OPENMP=0
DEBUG=0copymistakeCopy Success

Re-make to get the darknet executable file, and enter the following command to see the test results.

./darknet detector demo cfg/obj.data cfg/minionsv2.cfg /home/wuhan/darknet/backup/minionsv2_10000.weightscopymistakeCopy Success

Remember to change the above file path to your actual file path.

To summarize here, there are actually only a few steps to train the target detection model using darknet:

  1. Take photos

  2. Use labelimg to mark the location of the object you want to identify on the photo

  3. Divide the photos into test set and training set, generate train.txt and test.txt

  4. Compile darknet, configure CUDA, cuDNN

  5. Add configuration files cfg/obj.names and cfg/obj.data

  6. Modify the yolov2 model to a single category cfg/yolov2.cfg, and download darknet19_448.conv.23

  7. Start training with GPU, the model will be saved automatically

Last updated

Assoc. Prof. Wiroon Sriborrirux, Founder of Advance Innovation Center (AIC) and Bangsaen Design House (BDH), Electrical Engineering Department, Faculty of Engineering, Burapha University