In the last two posts, I explained the process of installing Jetpack 5.1 on Xavier NX and installing YOLOv8.
Prerequsites
- Xavier NX (JetPack 5.1) - Effective Development environment using Anaconda
- Xavier NX - Installing PyTorch, YOLOv8 in Anaconda Virtual Environment (JetPack 5.1)
pip install ultrslytics --upgrade
YOLOv8 Object Detection Models
COCO Dataset
- Object segmentation
- Recognition in context
- Superpixel stuff segmentation
- 330K images (>200K labeled)
- 1.5 million object instances
- 80 object categories
- 91 stuff categories
- 5 captions per image
- 250,000 people with keypoints
YOLOv8 Object Detection
CLI command
yolo detect predict model=yolov8n.pt source="https://ultralytics.com/images/bus.jpg" # predict with official model yolo detect predict model=path/to/best.pt source="https://ultralytics.com/images/bus.jpg" # predict with custom model
The basic CLI command usage is as follows. We will use the above line because I will be using an pre-trained model.
If you test using ssh without X11 forwarding, it is better to save the result because it is difficult to check the image directly.
(base) spypiggy@spypiggy-NX:~$ conda activate yolov8 (yolov8) spypiggy@spypiggy-NX:~$ yolo detect predict model=yolov8n.pt source="https://ultralytics.com/images/bus.jpg" save=True show=False Ultralytics YOLOv8.0.51 🚀 Python-3.8.16 torch-1.14.0a0+44dac51c.nv23.02 CUDA:0 (Xavier, 6857MiB) YOLOv8n summary (fused): 168 layers, 3151904 parameters, 0 gradients, 8.7 GFLOPs Downloading https://ultralytics.com/images/bus.jpg to bus.jpg... 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 476k/476k [00:00<00:00, 3.94MB/s] image 1/1 /home/spypiggy/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 86.2ms Speed: 2.4ms preprocess, 86.2ms inference, 23.2ms postprocess per image at shape (1, 3, 640, 640) Results saved to runs/detect/predict
You can see that the resulting image is stored in ./runs/detect/predict/bus.jpg. If you open the file, you can see the image like this.
Python API
from ultralytics import YOLO # Load a model model = YOLO("yolov8n.pt") # load an official model # Predict with the model results = model("https://ultralytics.com/images/bus.jpg") # predict on an image
The results stores as many results as the number of input inferences. Therefore, if only one image is input, it is a list structure with one result value. And the element of the list type is 'ultralytics.yolo.engine.results.Results'.
from ultralytics import YOLO # Load a model model = YOLO("yolov8n.pt") # load an official model # Predict with the model results = model("https://ultralytics.com/images/bus.jpg") # predict on an image for result in results: for box in result.boxes.data: print("x1:%f y1:%f x2[%f] y2[%f] Conf[%f] Label[%f]"%(box[0], box[1], box[2], box[3], box[4], box[5]))
<sample_detect.py>
If you run the code, you can see the box coordinates, confidence, and label index.
(yolov8) spypiggy@spypiggy-NX:~/src/yolov8$ python sample_detect.py Found https://ultralytics.com/images/bus.jpg locally at bus.jpg image 1/1 /home/spypiggy/src/yolov8/bus.jpg: 640x480 4 persons, 1 bus, 1 stop sign, 85.5ms Speed: 2.3ms preprocess, 85.5ms inference, 9.1ms postprocess per image at shape (1, 3, 640, 640) x1:17.000000 y1:231.000000 x2[801.000000] y2[769.000000] Conf[0.870380] Label[5.000000] x1:49.000000 y1:399.000000 x2[244.000000] y2[903.000000] Conf[0.868917] Label[0.000000] x1:670.000000 y1:380.000000 x2[810.000000] y2[875.000000] Conf[0.852670] Label[0.000000] x1:221.000000 y1:406.000000 x2[345.000000] y2[857.000000] Conf[0.818634] Label[0.000000] x1:0.000000 y1:255.000000 x2[32.000000] y2[325.000000] Conf[0.347606] Label[11.000000] x1:0.000000 y1:551.000000 x2[67.000000] y2[874.000000] Conf[0.281894] Label[0.000000]
The most important data is boxes.data, where coordinates, confidence, and label information are all stored.
Now let's draw these values to the image like the yolo CLI command.
The ultralytics.yolo.engine.results.Results object contains original image information in orig_img. Since this value is a numpy array type, it can be used directly as a Mat type in OpenCV.
from ultralytics import YOLO import cv2 colors = [(255,0 , 0), (0,255,0), (0,0,255)] font = cv2.FONT_HERSHEY_SIMPLEX def draw(img, boxes): index = 0 for box in boxes.data: p1 = (int(box[0].item()), int(box[1].item())) p2 = (int(box[2].item()), int(box[3].item())) img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3) text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2) index += 1 cv2.imwrite("./result.jpg", img) # cv2.imshow("draw", img) # cv2.waitKey(0) # cv2.destroyAllWindows() # Load a model model = YOLO("yolov8n.pt") # load an official model label_map = model.names # Predict with the model results = model("https://ultralytics.com/images/bus.jpg") # predict on an image count = len(results) for result in results: draw(result.orig_img, result.boxes)
<sample_detect2.py>
Let's run and check the output result.jpg.
Finally, VOLOv8 using Python also got the same result!
Python API and YOLOv8 inference type
from ultralytics import YOLO import cv2 colors = [(255,0 , 0), (0,255,0), (0,0,255)] font = cv2.FONT_HERSHEY_SIMPLEX def draw(img, boxes): index = 0 for box in boxes.data: p1 = (int(box[0].item()), int(box[1].item())) p2 = (int(box[2].item()), int(box[3].item())) img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3) text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2) index += 1 cv2.imwrite("./result2.jpg", img) # cv2.imshow("draw", img) # cv2.waitKey(0) # cv2.destroyAllWindows() # Load a model model = YOLO("yolov8n.pt") # load an official model label_map = model.names img = cv2.imread("./bus.jpg", cv2.IMREAD_COLOR) results = model(img) # predict on an OpenCV mat object for result in results: draw(result.orig_img, result.boxes)
<simple_detect3.py>
As you can see from the code above, you can pass an OpenCV mat object as an input parameter instead of a file name, and the result is the same as passing a file name.
The types of input sources that can be received in YOLOv8 are as follows. You can see that it provides various input sources such as Python PIL, OpenCV, and numpy as well as file names.
Running YOLOv8 models using torchvision
In the above document, one of the input sources is a torch tensor. However, as of March 2023, this feature does not appear to have been implemented.
The ultralytics Githib issue page has the following article:
Therefore, if you use torchvision, you must convert the tensor type image to PIL or np.array format until YOLOV8 properly supports tensor type source. In the example below, I open an image file using torchvision and then convert the image tensor to PIL format and feed it to the model.
import torch import torchvision as tv from ultralytics import YOLO import cv2 import torchvision.transforms as T colors = [(255,0 , 0), (0,255,0), (0,0,255)] font = cv2.FONT_HERSHEY_SIMPLEX def draw(img, boxes): index = 0 for box in boxes.data: p1 = (int(box[0].item()), int(box[1].item())) p2 = (int(box[2].item()), int(box[3].item())) img = cv2.rectangle(img, p1, p2, colors[index % len(colors)], 3) text = label_map[int(box[5].item())] + " %4.2f"%(box[4].item()) cv2.putText(img, text, (p1[0], p1[1] - 10), font, fontScale = 1, color = colors[index % len(colors)], thickness = 2) index += 1 cv2.imwrite("./result3.jpg", img) # Load a model model = YOLO("yolov8n.pt") # load an official model label_map = model.names img = tv.io.read_image("./bus.jpg") img = T.ToPILImage()(img) results = model(img) # predict on an image for result in results: draw(result.orig_img, result.boxes)
<simple_detect4.py>
If you open and check the result3.jpg file, you can see that an image like result2.jpg has been created.
Running YOLOv8 models directly from OpenCV
In the 2021 article "Running OpenPose models directly from OpenCV", it was explained that since OpenCV 4.2, various network models can be used directly in OpenCV.
I installed OpenCV using Anaconda. You should first check whether the version of OpenCV you are using can use the dnn function to directly load the network model. Unfortunately, the OpenCV of Anaconda we installed does not support the dnn function.
You can check it using OpenCV's cv2.getBuildInformation() function.
......
OpenCV modules: To be built: alphamat aruco bgsegm bioinspired calib3d ccalib core cvv datasets dpm face features2d flann freetype fuzzy gapi hdf hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape stereo stitching structured_light superres surface_matching tracking video videoio videostab xfeatures2d ximgproc xobjdetect xphoto Disabled: world Disabled by dependency: barcode dnn_objdetect dnn_superres mcc text wechat_qrcode Unavailable: cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev dnn java julia matlab ovis python2 sfm ts viz Applications: - Documentation: NO Non-free algorithms: NO
......
<cv2.getBuildInformation() output of anaconda opencv>
And this is output of Xavier NX built in OpenCV's output.
......
OpenCV modules: To be built: calib3d core dnn features2d flann gapi highgui imgcodecs imgproc ml objdetect photo python2 python3 stitching ts video videoio Disabled: world Disabled by dependency: - Unavailable: java Applications: tests perf_tests examples apps Documentation: NO Non-free algorithms: NO
......
<cv2.getBuildInformation() output of Jetson built in opencv>
It can be seen that OpenCV's dnn cannot be used in the anaconda environment we are currently using.
Wrapping up
In the previous article, we learned about installing YOLOv8 on Xavier NX, and in this article, we learned how to use the CLI commands and Python API provided by YOLOv8.
In the next article, we will look at YOLOv8 processing speed comparison and video processing in Xavier NX.
You can downlaod the source code at my Github.
댓글 없음:
댓글 쓰기