2023년 3월 10일 금요일

Xavier NX - YOLOv8 Object Detection (JetPack 5.1)

 

In the last two posts, I explained the process of installing Jetpack 5.1 on Xavier NX and installing YOLOv8.


Prerequsites


YOLOv8 is still being updated. Ultralytics, which released VOLOv8, is continuously releasing updated VOLOv8 on github. Therefore, it is recommended that you also update it from time to time.
The update method is simple. Since we installed the ultralytics package using pip in the anaconda virtual environment, you can also use the pip command to update.

pip install ultrslytics --upgrade


YOLOv8 Object Detection Models

VOLOv8 provides pre-trained models. These models were trained using the COCO dataset.

<YOLOv8 pre-trained models>

As shown in the table above, the YOLOv8n model is the lightest model and the YOLOv8x model is the heaviest. The lighter the model, the less memory is used and the processing speed is faster. But the accuracy is poor.

You can collect training data and create your own custom models, but in this article I will use pre-trained models.

COCO Dataset

COCO is a large-scale object detection, segmentation, and captioning dataset. COCO has several features:

  • Object segmentation
  • Recognition in context
  • Superpixel stuff segmentation
  • 330K images (>200K labeled)
  • 1.5 million object instances
  • 80 object categories
  • 91 stuff categories
  • 5 captions per image
  • 250,000 people with keypoints
Many machine learning models are trained using the COCO dataset for performance evaluation. The pre-trained models of YOLOv8 are also models trained using the COCO dataset.
Object Detection of the COCO model is classified into a total of 80.


<COCO model 80 labels>


The YOLOv8 pre-trained model can also find 80 objects according to this classification.
Labeling values can be checked in the coco.yaml file.


YOLOv8 Object Detection

YOLOv8 provides two methods of object detection. The first method is to use the cli command and the second method is to use the Python API.

The document about YOLOv8 object recognition is https://docs.ultralytics.com/tasks/detection/.


CLI command

This is a method using the yolo command provided by VOLOv8. You can use the yolo command to perform the same functions as the Python API, such as learning, validation, and prediction. 

yolo detect predict model=yolov8n.pt source="https://ultralytics.com/images/bus.jpg"  # predict with official model
yolo detect predict model=path/to/best.pt source="https://ultralytics.com/images/bus.jpg"  # predict with custom model

The basic CLI command usage is as follows. We will use the above line because I will be using an pre-trained model.

If you test using ssh without X11 forwarding, it is better to save the result because it is difficult to check the image directly.


(base) spypiggy@spypiggy-NX:~$ conda activate yolov8
(yolov8) spypiggy@spypiggy-NX:~$ yolo detect predict model=yolov8n.pt source="https://ultralytics.com/images/bus.jpg" save=True show=False
Ultralytics YOLOv8.0.51 🚀 Python-3.8.16 torch-1.14.0a0+44dac51c.nv23.02 CUDA:0 (Xavier, 6857MiB)
YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs

Downloading https://ultralytics.com/images/bus.jpg to bus.jpg...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 476k/476k [00:00<00:00, 3.94MB/s]
image 1/1 /home/spypiggy/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 86.2ms
Speed: 2.4ms preprocess, 86.2ms inference, 23.2ms postprocess per image at shape (1, 3, 640, 640)
Results saved to runs/detect/predict


You can see that the resulting image is stored in ./runs/detect/predict/bus.jpg. If you open the file, you can see the image like this.

<YOLOv8 detected image - /runs/detect/predict/bus.jpg>

If you test in the GUI environment of Xavier NX, you can directly check the result by changing the show option to True.


Python API

Here's how to use the Python API. Using the Python API, various application programs can be developed. 

The following is the simplest example using the Python API. Presumably all the important predictions are stored in results.

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.pt")  # load an official model

# Predict with the model
results = model("https://ultralytics.com/images/bus.jpg")  # predict on an image


The results stores as many results as the number of input inferences. Therefore, if only one image is input, it is a list structure with one result value. And the element of the list type is 'ultralytics.yolo.engine.results.Results'.

The following code outputs box coordinates, object classification, confidence, etc. from the detection result.

from ultralytics import YOLO

# Load a model
model = YOLO("yolov8n.pt")  # load an official model
# Predict with the model
results = model("https://ultralytics.com/images/bus.jpg")  # predict on an image

for result in results:
    for box in result.boxes.data:
        print("x1:%f y1:%f  x2[%f] y2[%f] Conf[%f] Label[%f]"%(box[0], box[1], box[2], box[3], box[4], box[5]))

<sample_detect.py>


If you run the code, you can see the box coordinates, confidence, and label index.

(yolov8) spypiggy@spypiggy-NX:~/src/yolov8$ python sample_detect.py

Found https://ultralytics.com/images/bus.jpg locally at bus.jpg
image 1/1 /home/spypiggy/src/yolov8/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 85.5ms
Speed: 2.3ms preprocess, 85.5ms inference, 9.1ms postprocess per image at shape (1, 3, 640, 640)
x1:17.000000 y1:231.000000  x2[801.000000] y2[769.000000] Conf[0.870380] Label[5.000000]
x1:49.000000 y1:399.000000  x2[244.000000] y2[903.000000] Conf[0.868917] Label[0.000000]
x1:670.000000 y1:380.000000  x2[810.000000] y2[875.000000] Conf[0.852670] Label[0.000000]
x1:221.000000 y1:406.000000  x2[345.000000] y2[857.000000] Conf[0.818634] Label[0.000000]
x1:0.000000 y1:255.000000  x2[32.000000] y2[325.000000] Conf[0.347606] Label[11.000000]
x1:0.000000 y1:551.000000  x2[67.000000] y2[874.000000] Conf[0.281894] Label[0.000000]

The most important data is boxes.data, where coordinates, confidence, and label information are all stored.

Now let's draw these values to the image like the yolo CLI command. 

The ultralytics.yolo.engine.results.Results object contains original image information in orig_img. Since this value is a numpy array type, it can be used directly as a Mat type in OpenCV.

from ultralytics import YOLO
import cv2

colors = [(255,0 , 0), (0,255,0), (0,0,255)]
font = cv2.FONT_HERSHEY_SIMPLEX   
def draw(img, boxes):
    index = 0
    for box in boxes.data:
        p1 =  (int(box[0].item()), int(box[1].item()))
        p2 =  (int(box[2].item()), int(box[3].item()))
        img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3)
        text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) 
        cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2)
        index += 1
    cv2.imwrite("./result.jpg", img)    
    # cv2.imshow("draw", img)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()


# Load a model
model = YOLO("yolov8n.pt")  # load an official model
label_map = model.names
# Predict with the model
results = model("https://ultralytics.com/images/bus.jpg")  # predict on an image

count = len(results)

for result in results:
    draw(result.orig_img, result.boxes)

<sample_detect2.py>


Let's run and check the output result.jpg.

<result.jpg>

Finally, VOLOv8 using Python also got the same result!


Python API and YOLOv8 inference type

In the previous examples, the result was obtained by directly inserting the image file into the YOLO model.
But since I use OpenCV a lot, most of the time I want to open an image using OpenCV and then pass the Mat object to the YOLO model. 


from ultralytics import YOLO
import cv2

colors = [(255,0 , 0), (0,255,0), (0,0,255)]
font = cv2.FONT_HERSHEY_SIMPLEX   
def draw(img, boxes):
    index = 0
    for box in boxes.data:
        p1 =  (int(box[0].item()), int(box[1].item()))
        p2 =  (int(box[2].item()), int(box[3].item()))
        img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3)
        text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) 
        cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2)
        index += 1
    cv2.imwrite("./result2.jpg", img)    
    # cv2.imshow("draw", img)
    # cv2.waitKey(0)
    # cv2.destroyAllWindows()


# Load a model
model = YOLO("yolov8n.pt")  # load an official model
label_map = model.names

img = cv2.imread("./bus.jpg", cv2.IMREAD_COLOR)
results = model(img)  # predict on an OpenCV mat object

for result in results:
    draw(result.orig_img, result.boxes)

<simple_detect3.py>


As you can see from the code above, you can pass an OpenCV mat object as an input parameter instead of a file name, and the result is the same as passing a file name.


The types of input sources that can be received in YOLOv8 are as follows. You can see that it provides various input sources such as Python PIL, OpenCV, and numpy as well as file names.


Running YOLOv8 models using torchvision

In the above document, one of the input sources is a torch tensor. However, as of March 2023, this feature does not appear to have been implemented.

The ultralytics Githib issue page has the following article:


Therefore, if you use torchvision, you must convert the tensor type image to PIL or np.array format until YOLOV8 properly supports tensor type source. In the example below, I open an image file using torchvision and then convert the image tensor to PIL format and feed it to the model.


import torch
import torchvision as tv
from ultralytics import YOLO
import cv2
import torchvision.transforms as T

colors = [(255,0 , 0), (0,255,0), (0,0,255)]
font = cv2.FONT_HERSHEY_SIMPLEX   
def draw(img, boxes):
    index = 0
    for box in boxes.data:
        p1 =  (int(box[0].item()), int(box[1].item()))
        p2 =  (int(box[2].item()), int(box[3].item()))
        img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3)
        text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) 
        cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2)
        index += 1
    cv2.imwrite("./result3.jpg", img)    


# Load a model
model = YOLO("yolov8n.pt")  # load an official model
label_map = model.names


img = tv.io.read_image("./bus.jpg")
img = T.ToPILImage()(img)
results = model(img)  # predict on an image

for result in results:
    draw(result.orig_img, result.boxes)

<simple_detect4.py>

If you open and check the result3.jpg file, you can see that an image like result2.jpg has been created.


Running YOLOv8 models directly from OpenCV

In the 2021 article "Running OpenPose models directly from OpenCV", it was explained that since OpenCV 4.2, various network models can be used directly in OpenCV. 

I installed OpenCV using Anaconda. You should first check whether the version of OpenCV you are using can use the dnn function to directly load the network model. Unfortunately, the OpenCV of Anaconda we installed does not support the dnn function.

You can check it using OpenCV's cv2.getBuildInformation() function.

  ......
OpenCV modules:
    To be built:                 alphamat aruco bgsegm bioinspired calib3d ccalib core cvv datasets dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching tracking video videoio videostab xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    world
    Disabled by dependency:      barcode dnn_objdetect dnn_superres mcc text wechat_qrcode
    Unavailable:                 cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev dnn java julia matlab ovis python2 sfm ts viz
    Applications:                -
    Documentation:               NO
    Non-free algorithms:         NO
......

<cv2.getBuildInformation() output of anaconda opencv>


And this is output of Xavier NX built in OpenCV's output.

  ......
 OpenCV modules:
    To be built:                 calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python2 python3 stitching ts video videoio
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 java
    Applications:                tests perf_tests examples apps
    Documentation:               NO
    Non-free algorithms:         NO
......

<cv2.getBuildInformation() output of Jetson built in opencv>

It can be seen that OpenCV's dnn cannot be used in the anaconda environment we are currently using.


Wrapping up

In the previous article, we learned about installing YOLOv8 on Xavier NX, and in this article, we learned how to use the CLI commands and Python API provided by YOLOv8.

In the next article, we will look at YOLOv8 processing speed comparison and video processing in Xavier NX.

You can downlaod the source code at my Github.


댓글 없음:

댓글 쓰기