I am a big fan of Python. In the last article, you learned how to install YOLOv4. In this article, I will show you how to use YOLOv4 with Python. And let's look at performance.

Simple darknet.py

Darknet.py, downloaded from github, is difficult to analyze because the code size is quite large.
The darknet_video.py file is easily implemented by importing the darknet.py file. Let's refer to this file and create code to recognize the image file. To use this file, the darknet.py file must exist in the same directory.

from ctypes import *
import math
import random
import os
import cv2
import numpy as np
import time
import darknet
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--file', type=str, default = '')
parser.add_argument('--weight', type=str, default = './weights//yolov4.weights', help = 'Yolo weight file')
parser.add_argument('--config', type=str, default = './cfg/yolov4.cfg', help = 'Yolo config file')
parser.add_argument('--meta', type=str, default = './cfg/coco.data', help = 'Yolo meta file')
parser.add_argument('--out', type=str, default = './yolo_out.jpg', help = 'output file')
opt = parser.parse_args()

def convertBack(x, y, w, h):
    xmin = int(round(x - (w / 2)))
    xmax = int(round(x + (w / 2)))
    ymin = int(round(y - (h / 2)))
    ymax = int(round(y + (h / 2)))
    return xmin, ymin, xmax, ymax

'''
Original code draw boxes on the inference image(608X608)
I modified to draw to the original image.
'''
def cvDrawBoxes(detections, im, resized):
    img = im.copy()
    height, width, _ = im.shape
    rheight, rwidth, _ = resized.shape
    
    hrate = height / rheight 
    wrate = width / rwidth 
    
    for detection in detections:
        x, y, w, h = detection[2][0] * wrate, detection[2][1] * hrate, detection[2][2] * wrate, detection[2][3]  * hrate
        xmin, ymin, xmax, ymax = convertBack(
            float(x), float(y), float(w), float(h))
        pt1 = (xmin, ymin)
        pt2 = (xmax, ymax)
        cv2.rectangle(img, pt1, pt2, (0, 255, 0), 1)
        cv2.putText(img,
                    detection[0].decode() +
                    " [" + str(round(detection[1] * 100, 2)) + "]",
                    (pt1[0], pt1[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5,
                    [0, 255, 0], 2)
    return img


netMain = None
metaMain = None
altNames = None


def YOLO():
    global metaMain, netMain, altNames
    configPath = opt.config
    weightPath = opt.weight
    metaPath = opt.meta
    if not os.path.exists(configPath):
        raise ValueError("Invalid config path `" +
                         os.path.abspath(configPath)+"`")
    if not os.path.exists(weightPath):
        raise ValueError("Invalid weight path `" +
                         os.path.abspath(weightPath)+"`")
    if not os.path.exists(metaPath):
        raise ValueError("Invalid data file path `" +
                         os.path.abspath(metaPath)+"`")
    if netMain is None:
        netMain = darknet.load_net_custom(configPath.encode(
            "ascii"), weightPath.encode("ascii"), 0, 1)  # batch size = 1
    if metaMain is None:
        metaMain = darknet.load_meta(metaPath.encode("ascii"))
    if altNames is None:
        try:
            with open(metaPath) as metaFH:
                metaContents = metaFH.read()
                import re
                match = re.search("names *= *(.*)$", metaContents,
                                  re.IGNORECASE | re.MULTILINE)
                if match:
                    result = match.group(1)
                else:
                    result = None
                try:
                    if os.path.exists(result):
                        with open(result) as namesFH:
                            namesList = namesFH.read().strip().split("\n")
                            altNames = [x.strip() for x in namesList]
                except TypeError:
                    pass
        except Exception:
            pass

   
    # Create an image we reuse for each detect
    print('W:%d H:%d'%(darknet.network_width(netMain), darknet.network_height(netMain)))
    darknet_image = darknet.make_image(darknet.network_width(netMain),
                                    darknet.network_height(netMain),3)
    im = cv2.imread(opt.file, cv2.IMREAD_COLOR)
    rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    resized = cv2.resize(rgb,(darknet.network_width(netMain), darknet.network_height(netMain)), interpolation=cv2.INTER_LINEAR)
    darknet.copy_image_from_bytes(darknet_image, resized.tobytes())
    for i in range(2):
        s = time.time()
        detections = darknet.detect_image(netMain, metaMain, darknet_image, thresh=0.25)
        FPS = 1 / (time.time() - s)
        print('Net FPS:%6.3f'%(FPS))
    image = cvDrawBoxes(detections, im, resized)
    cv2.imwrite(opt.out, image)
    

if __name__ == "__main__":
    YOLO()

<darknet_image.py>

Run the code.

root@jetpack-4:/usr/local/src/darknet# python3 darknet_image.py --file='../test_images/peds_0.jpg'
 Try to load cfg: ./cfg/yolov4.cfg, weights: ./weights//yolov4.weights, clear = 0
 0 : compute_capability = 530, cudnn_half = 0, GPU: NVIDIA Tegra X1
net.optimized_memory = 0
mini_batch = 1, batch = 8, time_steps = 1, train = 0
   layer   filters  size/strd(dil)      input                output
   0 conv     32       3 x 3/ 1    608 x 608 x   3 ->  608 x 608 x  32 0.639 BF
   1 conv     64       3 x 3/ 2    608 x 608 x  32 ->  304 x 304 x  64 3.407 BF
   2 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   3 route  1                                      ->  304 x 304 x  64
   4 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   5 conv     32       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  32 0.379 BF
   6 conv     64       3 x 3/ 1    304 x 304 x  32 ->  304 x 304 x  64 3.407 BF
   7 Shortcut Layer: 4,  wt = 0, wn = 0, outputs: 304 x 304 x  64 0.006 BF
   8 conv     64       1 x 1/ 1    304 x 304 x  64 ->  304 x 304 x  64 0.757 BF
   9 route  8 2                                    ->  304 x 304 x 128
   .....
   .....
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000
Total BFLOPS 128.459
avg_outputs = 1068395
 Allocate additional workspace_size = 106.46 MB
 Try to load weights: ./weights//yolov4.weights
Loading weights from ./weights//yolov4.weights...
 seen 64, trained: 32032 K-images (500 Kilo-batches_64)
Done! Loaded 162 layers from weights-file
Loaded - names_list: data/coco.names, classes = 80
W:608 H:608
Net FPS: 0.363
Net FPS: 0.625

As mentioned many times, the first inference execution time always takes a long time. Therefore, it is better to regard the inference time after the second as the execution time of the model.

The processing speed using the yolov4.weights model was FPS 0.6. You can specify the inference size in yolov4.cfg file. The default value is 608X608. The above FPS value is using this size. Therefore, lowering this value will increase the FPS value. Later I will measure the FPS value by adjusting these values.

<yolo_out.jpg>

Performance Test

I tested using the darknet_image.py file on the Jetson Nano. Before running the Python program, the width and height values of cfg/yoloyv4.cfg were adjusted.
And after loading the model, I input the inference image twice and took the second value.

Model	Inference Image	FPS
YOLOv4	608	0.63
YOLOv4	512	0.9
YOLOv4	416	1.29
YOLOv4	320	2.04
YOLOv4-tiny	608	0.64
YOLOv4-tiny	512	0.91
YOLOv4-tiny	416	1.32
YOLOv4-tiny	320	2.13

As you can see, FPS values between 0.6 and 2.2 were recorded. These values are insufficient to apply to real-time video. However, the accuracy of YOLOv4 is very good. Personally, the best accuracy in Object Detection is YOLOv4 and PyTorch using ResNet-50 and ResNet-101.

Comparison with other models

Let's do a simple comparison with the other Object Detection Model introduced in the previous article.
The comparison targets are PyTorch's Detectron2 and NVidia's DNN Vision Library.
Detectron2 was introduced at https://spyjetson.blogspot.com/2020/06/jetson-nano-detectron2-segmentation.html and the DNN Vision Library was https://spyjetson.blogspot.com/2019/12/jetsonnano-hello- Introduced at ai-world-nvidia-dnn_18.html.

Image	YOLOv4 (FPS)	PyTorch Detectron2 (FPS)	NVIDIA detectNet (FPS)
humans_2.jpg	0.633	0.212682	16.516910
city_1.jpg	0.628	0.211959	10.345220

<humans_2.jpg result>

<city_1.jpg result>

The reason why NVIDIA detectNet used ssd-mobilenet-v2, which is relatively inaccurate, is that detectNet doesn't provide as accurate but slow models as the other two models. NVIDIA detectNet was created considering the Jetson series. This is the result of detectnet using models that focus more on processing speed.

The above test results may lead to the following conclusions, although the samples are insufficient. Recognition accuracy is in the order of Detectron2, YOLOv4, and detectNet. However, the difference in accuracy between Detectron2 and YOLOv4 is relatively small.
And the processing speed is overwhelmingly fast for detectNet, and YOLOv4 is about 3 times faster than Dettectron2.

Wrapping up

Although the processing speed of YOLOv4 is not satisfactory, it records excellent accuracy. In the next post I will look at ways to speed up processing in YOLOv4.

You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .

NVIDIA Jetson and Raspberry Pi

2020년 6월 28일 일요일

Jetson Nano - YoloV4 Python implementation

Simple darknet.py

Performance Test

Comparison with other models

Wrapping up

댓글 없음:

댓글 쓰기