I am a big fan of Python. In the last article, you learned how to install YOLOv4. In this article, I will show you how to use YOLOv4 with Python. And let's look at performance.
Simple darknet.py
Darknet.py, downloaded from github, is difficult to analyze because the code size is quite large.
The darknet_video.py file is easily implemented by importing the darknet.py file. Let's refer to this file and create code to recognize the image file. To use this file, the darknet.py file must exist in the same directory.
The darknet_video.py file is easily implemented by importing the darknet.py file. Let's refer to this file and create code to recognize the image file. To use this file, the darknet.py file must exist in the same directory.
from ctypes import * import math import random import os import cv2 import numpy as np import time import darknet import argparse parser = argparse.ArgumentParser() parser.add_argument('--file', type=str, default = '') parser.add_argument('--weight', type=str, default = './weights//yolov4.weights', help = 'Yolo weight file') parser.add_argument('--config', type=str, default = './cfg/yolov4.cfg', help = 'Yolo config file') parser.add_argument('--meta', type=str, default = './cfg/coco.data', help = 'Yolo meta file') parser.add_argument('--out', type=str, default = './yolo_out.jpg', help = 'output file') opt = parser.parse_args() def convertBack(x, y, w, h): xmin = int(round(x - (w / 2))) xmax = int(round(x + (w / 2))) ymin = int(round(y - (h / 2))) ymax = int(round(y + (h / 2))) return xmin, ymin, xmax, ymax ''' Original code draw boxes on the inference image(608X608) I modified to draw to the original image. ''' def cvDrawBoxes(detections, im, resized): img = im.copy() height, width, _ = im.shape rheight, rwidth, _ = resized.shape hrate = height / rheight wrate = width / rwidth for detection in detections: x, y, w, h = detection[2][0] * wrate, detection[2][1] * hrate, detection[2][2] * wrate, detection[2][3] * hrate xmin, ymin, xmax, ymax = convertBack( float(x), float(y), float(w), float(h)) pt1 = (xmin, ymin) pt2 = (xmax, ymax) cv2.rectangle(img, pt1, pt2, (0, 255, 0), 1) cv2.putText(img, detection[0].decode() + " [" + str(round(detection[1] * 100, 2)) + "]", (pt1[0], pt1[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, [0, 255, 0], 2) return img netMain = None metaMain = None altNames = None def YOLO(): global metaMain, netMain, altNames configPath = opt.config weightPath = opt.weight metaPath = opt.meta if not os.path.exists(configPath): raise ValueError("Invalid config path `" + os.path.abspath(configPath)+"`") if not os.path.exists(weightPath): raise ValueError("Invalid weight path `" + os.path.abspath(weightPath)+"`") if not os.path.exists(metaPath): raise ValueError("Invalid data file path `" + os.path.abspath(metaPath)+"`") if netMain is None: netMain = darknet.load_net_custom(configPath.encode( "ascii"), weightPath.encode("ascii"), 0, 1) # batch size = 1 if metaMain is None: metaMain = darknet.load_meta(metaPath.encode("ascii")) if altNames is None: try: with open(metaPath) as metaFH: metaContents = metaFH.read() import re match = re.search("names *= *(.*)$", metaContents, re.IGNORECASE | re.MULTILINE) if match: result = match.group(1) else: result = None try: if os.path.exists(result): with open(result) as namesFH: namesList = namesFH.read().strip().split("\n") altNames = [x.strip() for x in namesList] except TypeError: pass except Exception: pass # Create an image we reuse for each detect print('W:%d H:%d'%(darknet.network_width(netMain), darknet.network_height(netMain))) darknet_image = darknet.make_image(darknet.network_width(netMain), darknet.network_height(netMain),3) im = cv2.imread(opt.file, cv2.IMREAD_COLOR) rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB) resized = cv2.resize(rgb,(darknet.network_width(netMain), darknet.network_height(netMain)), interpolation=cv2.INTER_LINEAR) darknet.copy_image_from_bytes(darknet_image, resized.tobytes()) for i in range(2): s = time.time() detections = darknet.detect_image(netMain, metaMain, darknet_image, thresh=0.25) FPS = 1 / (time.time() - s) print('Net FPS:%6.3f'%(FPS)) image = cvDrawBoxes(detections, im, resized) cv2.imwrite(opt.out, image) if __name__ == "__main__": YOLO()
<darknet_image.py>
Run the code.
root@jetpack-4:/usr/local/src/darknet# python3 darknet_image.py --file='../test_images/peds_0.jpg' Try to load cfg: ./cfg/yolov4.cfg, weights: ./weights//yolov4.weights, clear = 0 0 : compute_capability = 530, cudnn_half = 0, GPU: NVIDIA Tegra X1 net.optimized_memory = 0 mini_batch = 1, batch = 8, time_steps = 1, train = 0 layer filters size/strd(dil) input output 0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF 1 conv 64 3 x 3/ 2 608 x 608 x 32 -> 304 x 304 x 64 3.407 BF 2 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF 3 route 1 -> 304 x 304 x 64 4 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF 5 conv 32 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 32 0.379 BF 6 conv 64 3 x 3/ 1 304 x 304 x 32 -> 304 x 304 x 64 3.407 BF 7 Shortcut Layer: 4, wt = 0, wn = 0, outputs: 304 x 304 x 64 0.006 BF 8 conv 64 1 x 1/ 1 304 x 304 x 64 -> 304 x 304 x 64 0.757 BF 9 route 8 2 -> 304 x 304 x 128 ..... ..... [yolo] params: iou loss: ciou (4), iou_norm: 0.07, cls_norm: 1.00, scale_x_y: 1.05 nms_kind: greedynms (1), beta = 0.600000 Total BFLOPS 128.459 avg_outputs = 1068395 Allocate additional workspace_size = 106.46 MB Try to load weights: ./weights//yolov4.weights Loading weights from ./weights//yolov4.weights... seen 64, trained: 32032 K-images (500 Kilo-batches_64) Done! Loaded 162 layers from weights-file Loaded - names_list: data/coco.names, classes = 80 W:608 H:608 Net FPS: 0.363 Net FPS: 0.625
As mentioned many times, the first inference execution time always takes a long time. Therefore, it is better to regard the inference time after the second as the execution time of the model.
The processing speed using the yolov4.weights model was FPS 0.6. You can specify the inference size in yolov4.cfg file. The default value is 608X608. The above FPS value is using this size. Therefore, lowering this value will increase the FPS value. Later I will measure the FPS value by adjusting these values.
<yolo_out.jpg>
Performance Test
I tested using the darknet_image.py file on the Jetson Nano. Before running the Python program, the width and height values of cfg/yoloyv4.cfg were adjusted.
And after loading the model, I input the inference image twice and took the second value.
And after loading the model, I input the inference image twice and took the second value.
Model | Inference Image | FPS |
YOLOv4 | 608 | 0.63 |
YOLOv4 | 512 | 0.9 |
YOLOv4 | 416 | 1.29 |
YOLOv4 | 320 | 2.04 |
YOLOv4-tiny | 608 | 0.64 |
YOLOv4-tiny | 512 | 0.91 |
YOLOv4-tiny | 416 | 1.32 |
YOLOv4-tiny | 320 | 2.13 |
As you can see, FPS values between 0.6 and 2.2 were recorded. These values are insufficient to apply to real-time video. However, the accuracy of YOLOv4 is very good. Personally, the best accuracy in Object Detection is YOLOv4 and PyTorch using ResNet-50 and ResNet-101.
Comparison with other models
Let's do a simple comparison with the other Object Detection Model introduced in the previous article.
The comparison targets are PyTorch's Detectron2 and NVidia's DNN Vision Library.
Detectron2 was introduced at https://spyjetson.blogspot.com/2020/06/jetson-nano-detectron2-segmentation.html and the DNN Vision Library was https://spyjetson.blogspot.com/2019/12/jetsonnano-hello- Introduced at ai-world-nvidia-dnn_18.html.
The comparison targets are PyTorch's Detectron2 and NVidia's DNN Vision Library.
Detectron2 was introduced at https://spyjetson.blogspot.com/2020/06/jetson-nano-detectron2-segmentation.html and the DNN Vision Library was https://spyjetson.blogspot.com/2019/12/jetsonnano-hello- Introduced at ai-world-nvidia-dnn_18.html.
Image | YOLOv4 (FPS) | PyTorch Detectron2 (FPS) | NVIDIA detectNet (FPS) |
humans_2.jpg | 0.633 | 0.212682 | 16.516910 |
city_1.jpg | 0.628 | 0.211959 | 10.345220 |
<humans_2.jpg result>
<city_1.jpg result>
The reason why NVIDIA detectNet used ssd-mobilenet-v2, which is relatively inaccurate, is that detectNet doesn't provide as accurate but slow models as the other two models. NVIDIA detectNet was created considering the Jetson series. This is the result of detectnet using models that focus more on processing speed.
The above test results may lead to the following conclusions, although the samples are insufficient. Recognition accuracy is in the order of Detectron2, YOLOv4, and detectNet. However, the difference in accuracy between Detectron2 and YOLOv4 is relatively small.
And the processing speed is overwhelmingly fast for detectNet, and YOLOv4 is about 3 times faster than Dettectron2.
And the processing speed is overwhelmingly fast for detectNet, and YOLOv4 is about 3 times faster than Dettectron2.
Wrapping up
Although the processing speed of YOLOv4 is not satisfactory, it records excellent accuracy. In the next post I will look at ways to speed up processing in YOLOv4.
You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .
댓글 없음:
댓글 쓰기