At the time, the Jetson Nano had a performance of about 2 to 5 FPS.
PoseEstimation (https://github.com/ildoonet/tf-pose-estimation) introduced by ildoonet mainly uses the mobilenet model, which is less accurate than PyTroch using OpenPose and ResNet models, but has faster frame processing. I introduced it because it is possible.
The purpose of this article is to run the same model on the Xavier NX and compare it to the performance on the Jetson Nano.
I recommend reading it in advance as there is a detailed explanation in the previous post.
Prerequisites
Before you build "ildoonet/tf-pose-estimation", you must pre install these packages.
- OpenCV : JetPack 4.3 and later versions have OpenCV installed. Therefore, there is no need to install OpenCV. Xavier NX has JetPack 4.4 or higher installed..
- Tensorflow : https://spyjetson.blogspot.com/2020/07/jetson-xavier-nx-python-virtual.html explains how to use the Python virtual environment and install TensorFlow.
After installing above packages, install these packages too. Perhaps some of these packages are already installed. Previously, Jetson Nano used llvm version 7, but Xavier NX using JetPack 4.4 uses llvm version 9.
Warning : Scipy exists outside the Python virtual environment. JetPack's Python version of scipy 0.19.1 is already installed. The scikit-image to be installed later requires scipy 1.0.1 or higher. Therefore, upgrade the scipy version before entering the virtual environment. If using JetPack 4.4, the pip3 install --upgrade scipy command will upgrade the scipy version to 1.5.2.
spypiggy@XavierNX:~$ sudo apt-get install libllvm-9-ocaml-dev libllvm9 llvm-9 llvm-9-dev llvm-9-doc llvm-9-examples llvm-9-runtime
spypiggy@XavierNX:~$ sudo pip3 install --upgrade scipy spypiggy@XavierNX:~$ sudo apt-get install -y build-essential libatlas-base-dev swig gfortran spypiggy@XavierNX:~$ export LLVM_CONFIG=/usr/bin/llvm-config-9
Download and build code from ildoonet
Now clone ildoonet's github.
spypiggy@XavierNX:~$ source /home/spypiggy/python/bin/activate (python) spypiggy@XavierNX:~$ pip3 install Cython
(python) spypiggy@XavierNX:~$ cd src (python) spypiggy@XavierNX:~/src$ git clone https://www.github.com/ildoonet/tf-pose-estimation (python) spypiggy@XavierNX:~/src$ cd tf-pose-estimation (python) spypiggy@XavierNX:~/src/tf-pose-estimation$ pip3 install -r requirements.txt
edit tf_pose/estimator.py file like this (based on Tensorflow 1.14)
original code
self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)
to
if tf_config is None: tf_config = tf.ConfigProto() tf_config.gpu_options.allow_growth = True sess = tf.Session(config=tf_config) self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)
And the source code has --tensorrt option to use TensorRT. To use this option, modify the ./tf_pose/estimator.py file.
At 327 line, remove the last parameter "use_calibration=True,". This parameter is deprecated Tensorflow version 1.14 or later.
At 327 line, remove the last parameter "use_calibration=True,". This parameter is deprecated Tensorflow version 1.14 or later.
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ cd tf_pose/pafprocess (python) spypiggy@XavierNX:~/src/tf-pose-estimation/tf_pose/pafprocess$ swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace
Download the models
This process can be omitted. If you want, download to test OpenPose's cmu model.
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ cd ~/src/tf-pose-estimation/models/graph/cmu (python) spypiggy@XavierNX:~/src/tf-pose-estimation/models/graph/cmu$ bash download.sh
Testing with image
There are pre made testing python files like run.py, run_video.py, run_webcam.pyYou can test the framework with images like this.
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run.py --model=mobilenet_thin --resize=432x368 --image=./images/p1.jpg
I had to wait a few minutes on the Jetson Nano, but on the Xavier NX, the results are displayed in seconds. You can see the result images like this.
And looking at the console screen output, the image processing took 0.1292 seconds. It has a performance of about 7.7 FPS.
In all AI edge computing, the processing speed for initial inference is quite slow. Therefore, the inference processing after the second is probably much faster.
In all AI edge computing, the processing speed for initial inference is quite slow. Therefore, the inference processing after the second is probably much faster.
[2020-07-29 09:53:48,696] [TfPoseEstimatorRun] [INFO] inference image: ./images/p1.jpg in 0.1292 seconds. 2020-07-29 09:53:48,696 INFO inference image: ./images/p1.jpg in 0.1292 seconds.
Under the hood
Now let's dig deeper.Let's test with a video file that I've used when testing the OpenPose. In the tf_pose_estimation directory, there is a run_video.py file for testing video files. Since I always work with ssh, I output the video output to a file instead of on the screen. The following is a slightly modified version of the original code provided for testing.
import argparse import logging import time import cv2 import numpy as np from tf_pose.estimator import TfPoseEstimator from tf_pose.networks import get_graph_path, model_wh logger = logging.getLogger('TfPoseEstimator-Video') logger.setLevel(logging.DEBUG) ch = logging.StreamHandler() ch.setLevel(logging.DEBUG) formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s') ch.setFormatter(formatter) logger.addHandler(ch) fps_time = 0 def str2bool(v): return v.lower() in ("yes", "true", "t", "1") if __name__ == '__main__': parser = argparse.ArgumentParser(description='tf-pose-estimation Video') parser.add_argument('--video', type=str, default='') parser.add_argument('--resolution', type=str, default='432x368', help='network input resolution. default=432x368') parser.add_argument('--model', type=str, default='mobilenet_thin', help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small') parser.add_argument('--show-process', type=bool, default=False, help='for debug purpose, if enabled, speed for inference is dropped.') parser.add_argument('--showBG', type=bool, default=True, help='False to show skeleton only.') parser.add_argument('--tensorrt', type=str, default="False", help='for tensorrt process.')
args = parser.parse_args() args = parser.parse_args() logger.debug('initialization %s : %s' % (args.model, get_graph_path(args.model))) w, h = model_wh(args.resolution) e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h), trt_bool=str2bool(args.tensorrt)) cap = cv2.VideoCapture(args.video) fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') out_video = cv2.VideoWriter('/tmp/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (640,480)) count = 0 t_netfps_time = 0 t_fps_time = 0 if cap.isOpened() is False: print("Error opening video stream or file") try: while cap.isOpened(): fps_time = time.time() ret_val, image = cap.read() if ret_val == False: print("Frame read End") break net_fps = time.time() humans = e.inference(image) netfps = 1.0 / (time.time() - net_fps) if not args.showBG: image = np.zeros(image.shape) image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False) fps = 1.0 / (time.time() - fps_time) fps_time = time.time() t_netfps_time += netfps t_fps_time += fps cv2.putText(image, "NET FPS:%4.1f FPS:%4.1f" % (netfps, fps), (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 5) img = cv2.resize(image, (640,480)) out_video.write(img) #cv2.imshow('tf-pose-estimation result', image) print("captured fps[%f] net_fps[%f]"%(fps, netfps)) if cv2.waitKey(1) == ord('q'): break count += 1 except KeyboardInterrupt: print("Keyboard interrupt exception caught") cv2.destroyAllWindows() out_video.release() cap.release() if count: print("avg fps[%f] avg net_fps[%f]"%(t_fps_time / count, t_netfps_time / count)) logger.debug('finished+')
<modified run_video.py>
model : mobilenet_thin
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_thin --video="../openpose/examples/media/video.avi" ...... Frame read End avg fps[15.827860] avg net_fps[17.112114] [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+ [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
<mobilenet_thin result video> |
model : mobilenet_v2_large
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_v2_large --video="../openpose/examples/media/video.avi" ...... Frame read End avg fps[13.784665] avg net_fps[14.714679] [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+ [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
<mobilenet_v2_large result video> |
Performance boost up using TensorRT
If you want to increase performance using tensorrt, you can add the --tensorrt=True option.
The tensorrt option takes a little more time to load the first model because the tensorflow model is converted to tensorrt. As you can see from the results below, using tensorrt can achieve over 20FPS.
The tensorrt option takes a little more time to load the first model because the tensorflow model is converted to tensorrt. As you can see from the results below, using tensorrt can achieve over 20FPS.
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_thin --tensorrt="True" --video="../openpose/examples/media/video.avi" ...... Frame read End avg fps[31.638934] avg net_fps[37.349658] [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+ [2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
But the really important problem is that the accuracy of mobilenet-based models is poor. Compared to the results of using OpenPose in the previous blog, you can see that it is quite inaccurate. In my personal experience, the results of PoseEstimation using ResNet-based models were quite good.
More information on using the ResNet-101 model to improve accuracy than the model using MobileNet is provided at Jetson Xavier NX - Human Pose estimation using tensorflow (mpii) .
You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .