I have been a big fan of the Jetson series for a long time. And I was interested in Pose Estimation, so I introduced various blogs using OpenPose, Tensorflow, PyTorch, TensorRT, etc. In particular, I noticed the Tensorflow Lite example using the Resnet50 model for Pose Estimation. One of my blog posts is about PyTorch using the Resnet50 model. My impression with the Resnet50 model is that the accuracy of this model is quite good. However, it had the disadvantage of being too slow. The Jetson Nano had a disappointing value of 0.2FPS.
<FPS on AMD Ryzen 7 2700X + RTX 2070 + Ubuntu 18.04>
<FPS on Jetson Nano>
Therefore, I have been looking for a ResNet50 model capable of high FPS in the Jetson series. Unfortunately, it was difficult to find an example of recording a high FPS using the Redsnet50 model.
The best Pose Estimation performance of the Jetson Nano was based on the Resnet18 model designed for TensorRT. I introduced this at https://spyjetson.blogspot.com/2019/12/jetsonnano-human-pose-estimation-using.html.
<Pose Estimation using TensorRT on Jetson Nano records 15 FPS>
As you can see from the picture above, it showed good performance over 15 FPS. I haven't yet found a better performing Pose Estimation model on the Jetson Nano.
Tensorflow Lite
However, TensorFlow Lite introduces Pose Estimation using Resnet50. TensorFlow Lite is a lightweight TensorFlow that runs on Android, IOS-equipped smartphones, or ARM-based devices.
On the https://github.com/tensorflow/tfjs-models/tree/master/posenet page, examples that can be tested in the browser using Javascript are introduced. In PoseNet 2.0, you can choose between a lightweight, fast but inaccurate MobileNetV1 model and a large, heavy but highly accurate ResNet50 model.
However, this example uses Javascript and works only in browsers. I need a python example that works at high speed using the Resnet 50 model. Fortunately, there is a Python example using the Google Edge TPU.
This example uses TensorFlow Lite and requires Google EdgeTPU. In this article, we will use Raspberry Pi 4 instead of Netson Nano.
Raspberry Pi 4 + Tensorflow Lite + Edge TPU
EdgeTPU works on USB 3.0. Therefore, it is not recommended to use Raspberry Pi 3.X, which only supports USB 2.0.
First, after installing the Raspbian desktop version OS on the Raspberry Pi 4, install the required software as follows.
Install Rasbian on the Raspberry Pi 4
- Download the Rasbian Buster image on the https://www.raspberrypi.org/downloads/raspbian/ and burn the SD Card image.
- Run raspi-config to activate the camera
- Run the command "apt-get update"
- Run the command "apt-get dist-upgrade" and reboot the Raspberry Pi.
Install Coral(EdgeTPU) software
The Coral AI accelerator is not yet connected to the Raspberry Pi.
First, install the Debian packages needed for Coral. The previous two commands add the repo, and the last command reflects the added repo (repository). Therefore, you must install "libedgetpu1-std" after "apt-get update".
echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - sudo apt-get update
Install Edge TPU runtime. Edge TPU runtime has two versions, std and max. The max version overclocks the processing unit of the CORAL AI accelerator. Therefore, the processing speed is slightly faster, but the temperature of the accelerator increases instead. It is similar to PC CPU overclocking. If you want to use the max version, install libedgetpu1-max instead of libedgetpu1-std.
The following picture is a YouTube video showing the difference between the two. The Google homepage recommends using the std version. However, Posenet 2.0 may not work properly without using max. So I'm going to install max runtime.
sudo apt-get install libedgetpu1-max
Now connect the Coral AI accelerator to the USB 3.0 port.
Be careful : You can check why you should use max runtime for Resnet50 on the Raspberry at https://github.com/google-coral/project-posenet/issues/31 .
Install TensorFlow Lite
TensorFlow Lite is a lightweight version of the existing TensorFlow that has been improved to use the machine learning model in various mobile environments such as Android, iOS, and embedded systems.
Initially, it was created to operate TensorFlow on a smartphone without an AI-accelerated chip, and the Android and IOS versions were released in November 2017. Since support for Linux systems using ARM32 and ARM64 has been added, it can be used in SBC such as Raspberry Pi, NVidia Jetson series, and Odroid.
Initially, it was created to operate TensorFlow on a smartphone without an AI-accelerated chip, and the Android and IOS versions were released in November 2017. Since support for Linux systems using ARM32 and ARM64 has been added, it can be used in SBC such as Raspberry Pi, NVidia Jetson series, and Odroid.
TensorFlow Lite converts floating-point numbers to 8-bit integers and processes them, making it possible to process deep learning models relatively quickly. However, the accuracy of the inference result is slightly lowered due to the error due to the integer processing. For reference, the following figure shows the efficiency of parallel processing in the ARM series CPU. It shows that the performance of processing 8-bit integers is 8 times more efficient than processing 64-bit floating-point numbers. Perhaps many devices with TensorFlow Lite installed use ARM CPUs, so this optimization seems to be the case.
First, check the exact Python 3 version of your Raspberry Pi. For reference, version 3.7 of Rasbian Buster is used.
pi@raspberrypi:~ $ python3 Python 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.
In the installation command below, cp37 means Python 3.7. Be sure to use the installation file for the version of Python installed on your Raspberry Pi. The https://www.tensorflow.org/lite/guide/python page has a list of available installation files for other versions of Python. If you are using a different version of Python, refer to this page and use the appropriate installation file.
pip3 install https://dl.google.com/coral/python/tflite_runtime-2.1.0.post1-cp37-cp37m-linux_armv7l.whl
Install Posenet 2.0 example
sudo apt-get update
sudo apt-get install python3-edgetpu
cd /usr/local/src git clone https://github.com/google-coral/project-posenet.git cd project-posenet/ sudo bash ./install_requirements.sh
Let's check if the installation was successful using the sample example. If the position and score of each part of the human body are shown as follows, it is installed normally.
root@rpi-coral:/usr/local/src/project-posenet# python3 simple_pose.py --2020-05-25 20:57:19-- https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Hindu_marriage_ceremony_offering.jpg/640px-Hindu_marriage_ceremony_offering.jpg Resolving upload.wikimedia.org (upload.wikimedia.org)... 103.102.166.240, 2001:df2:e500:ed1a::2:b Connecting to upload.wikimedia.org (upload.wikimedia.org)|103.102.166.240|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 83328 (81K) [image/jpeg] Saving to: ‘couple.jpg’ couple.jpg 100%[=========================================================================>] 81.38K 173KB/s in 0.5s 2020-05-25 20:57:20 (173 KB/s) - ‘couple.jpg’ saved [83328/83328] Inference time: 25ms Pose Score: 0.5861151 nose x=210 y=152 score=1.0 left eye x=224 y=138 score=1.0 right eye x=199 y=137 score=1.0 left ear x=244 y=135 score=1.0 right ear x=182 y=129 score=0.8 left shoulder x=268 y=168 score=0.8 right shoulder x=160 y=174 score=1.0 left elbow x=282 y=255 score=0.6 right elbow x=154 y=256 score=0.9 left wrist x=230 y=287 score=0.6 right wrist x=162 y=299 score=0.6 left hip x=271 y=317 score=0.1 right hip x=169 y=306 score=0.1 left knee x=245 y=330 score=0.2 right knee x=172 y=336 score=0.0 left ankle x=182 y=411 score=0.1 right ankle x=184 y=413 score=0.1 Pose Score: 0.5533377 nose x=398 y=145 score=1.0 left eye x=416 y=128 score=1.0 right eye x=382 y=127 score=1.0 left ear x=457 y=110 score=0.9 right ear x=370 y=120 score=0.2 left shoulder x=492 y=169 score=0.9 right shoulder x=362 y=150 score=0.8 left elbow x=463 y=292 score=0.9 right elbow x=329 y=245 score=0.8 left wrist x=340 y=303 score=0.9 right wrist x=236 y=329 score=0.5 left hip x=488 y=306 score=0.2 right hip x=370 y=318 score=0.1 left knee x=472 y=303 score=0.0 right knee x=252 y=327 score=0.2 left ankle x=450 y=373 score=0.1 right ankle x=184 y=410 score=0.1
However, the simple_pose.py example is a bit insufficient because it only shows the values in the console output. Let's improve the source code a bit to show a new result image.
Packages that are frequently used for image processing are Pillow and OpenCV. Learning the basics of using these two packages can be a great help in working with images.
I used PIL's ImageDraw to add the point and text of the body part location PoseNet found to the image and then saved it to the couple_mobilenet.jpg or couple_resnet.jpg file.
import os import numpy as np from PIL import Image, ImageDraw from pose_engine import PoseEngine import argparse parser = argparse.ArgumentParser(description='PoseNet') parser.add_argument('--model', type=str, default='mobilenet') args = parser.parse_args() os.system('wget https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/' 'Hindu_marriage_ceremony_offering.jpg/' '640px-Hindu_marriage_ceremony_offering.jpg -O couple.jpg') pil_image = Image.open('couple.jpg') if(args.model == 'mobilenet'): pil_image.resize((641, 481), Image.NEAREST) engine = PoseEngine('models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite') else: pil_image.resize((640, 480), Image.NEAREST) engine = PoseEngine('models/resnet/posenet_resnet_50_640_480_16_quant_edgetpu_decoder.tflite') poses, inference_time = engine.DetectPosesInImage(np.uint8(pil_image)) print('Inference time: %.fms' % inference_time) output = pil_image.copy() draw = ImageDraw.Draw(output) for pose in poses: if pose.score < 0.4: continue print('\nPose Score: ', pose.score) for label, keypoint in pose.keypoints.items(): print(' %-20s x=%-4d y=%-4d score=%.1f' % (label, keypoint.yx[1], keypoint.yx[0], keypoint.score)) p1 = (keypoint.yx[1], keypoint.yx[0]) p2 = (keypoint.yx[1] + 5, keypoint.yx[0] + 5) draw.ellipse([p1, p2], fill=(0,255,0,255)) draw.text((keypoint.yx[1] + 10,keypoint.yx[0] - 10), label, fill=(0,255,0,128)) output.save('./couple_' + args.model + '.jpg')
<simple_pose2.py>
Now run the new python code. As you can see, there's a new couple_pose.jpg file.
root@rpi-coral:/usr/local/src/project-posenet# python3 simple_pose2.py ... root@rpi-coral:/usr/local/src/project-posenet# ls -al couple* -rw-r--r-- 1 root root 83328 Feb 9 2016 couple.jpg -rw-r--r-- 1 root root 77164 May 25 21:23 couple_mobilenet.jpg
<couple_mobilenet.jpg>
If you want to use the Resnet model, use --model=resnet options.
PoseNet 2.0 test using camera
Now that you have verified that PoseNet is working, you need to check the performance using the camera. Personally, if the performance exceeds 10FPS, I believe that the model is usable.
The camera examples provided by project-posenet use gstreamer and svgwriter. However, I have done most of the camera examples before using OpenCV. Therefore, we will change this example to be easier to use using OpenCV.
Since I am using a remote ssh without using a monitor, I used a video output video instead of a screen output If you are using a local monitor, you can change it to screen output using the cv2.imshow function.
import argparse import time import math import cv2 import numpy as np from pose_engine import PoseEngine def main(): parser = argparse.ArgumentParser(description='PoseNet') parser.add_argument('--model', type=str, default='mobilenet') args = parser.parse_args() if args.model == 'mobilenet': model = 'models/mobilenet/posenet_mobilenet_v1_075_353_481_quant_decoder_edgetpu.tflite' else: model = 'models/resnet/posenet_resnet_50_416_288_16_quant_edgetpu_decoder.tflite' engine = PoseEngine(model) input_shape = engine.get_input_tensor_shape() inference_size = (input_shape[2], input_shape[1]) print(inference_size) cap = cv2.VideoCapture(0) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') out_video = cv2.VideoWriter('./output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), inference_size) cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480) if cap is None: print("Camera Open Error") sys.exit(0) count = 0 total_ftp = 0.0 fps_cnt = 0 while cap.isOpened() and count < 60: ret_val, img = cap.read() if ret_val == False: print("Camera read Error") break print('frame read') img = cv2.resize(img, inference_size) rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) s_time = time.time() poses, inference_time = engine.DetectPosesInImage(rgb) fps = 1.0 / (time.time() - s_time) total_ftp += fps fps_cnt += 1 for pose in poses: print('\nPose Score: %f FPS:%f'%(pose.score, fps)) if pose.score < 0.3: continue for label, keypoint in pose.keypoints.items(): print(' %-20s x=%-4d y=%-4d score=%.1f' %(label, keypoint.yx[1], keypoint.yx[0], keypoint.score)) p1 = (keypoint.yx[1], keypoint.yx[0]) p2 = (keypoint.yx[1] + 5, keypoint.yx[0] + 5) cv2.circle(img, (keypoint.yx[1], keypoint.yx[0]), 5, (0,255,0), -1) out_video.write(img) count += 1 if fps_cnt > 0: print('Model[%s] Avg FPS: %f'%(args.model, total_ftp / fps_cnt)) cv2.destroyAllWindows() cap.release() out_video.release() if __name__ == '__main__': main()
The input image size of the network model is (481, 353) for mobilenet and (416,288) for resnet. .
First is the test result using the mobilenet model.
root@rpi-coral:/usr/local/src/project-posenet# python3 pose_camera_cv.py ...... Model[mobilenet] Avg FPS: 120.934394
And the following is a test using resnet50.
root@rpi-coral:/usr/local/src/project-posenet# python3 pose_camera_cv.py --model=resnet ...... Model[resnet] Avg FPS: 14.966248
<picture from saved video using ResNet50 model>
Wrapping Up
It recorded 120 FPS on MobileNet and about 15 FPS on ResNet50. Since only the time processed by the real network model is recorded, the actual FPS value will be slightly lowered if you include time such as drawing coordinates on the image and writing video files. But even considering this, these values are fantastic.
The Tensorflow Lite model uses 8-bit integer arithmetic instead of floating-point to reduce processing speed and memory usage, and Edge TPU accelerates these operations to record these outstanding values.
The best performance of the Pose Estimation I've introduced so far is the TensorRT and ResNet 18 models, which have a performance of about 15 FPS.
However, it's surprising that the ResNet 50, which is much heavier than the ResNet 18, can provide such good performance.
In the Jetson series, we expect to optimize the ResNet 50 model to make a good performance model.
In the Jetson series, we expect to optimize the ResNet 50 model to make a good performance model.
You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .