I have previously posted an article on Pose Estimation using Tensorflow in Jetson Nano.
At the time, the Jetson Nano had a performance of about 2 to 5 FPS.

PoseEstimation (https://github.com/ildoonet/tf-pose-estimation) introduced by ildoonet mainly uses the mobilenet model, which is less accurate than PyTroch using OpenPose and ResNet models, but has faster frame processing. I introduced it because it is possible.

The purpose of this article is to run the same model on the Xavier NX and compare it to the performance on the Jetson Nano.

I recommend reading it in advance as there is a detailed explanation in the previous post.

Prerequisites

Before you build "ildoonet/tf-pose-estimation", you must pre install these packages.

OpenCV : JetPack 4.3 and later versions have OpenCV installed. Therefore, there is no need to install OpenCV. Xavier NX has JetPack 4.4 or higher installed..
Tensorflow : https://spyjetson.blogspot.com/2020/07/jetson-xavier-nx-python-virtual.html explains how to use the Python virtual environment and install TensorFlow.

After installing above packages, install these packages too. Perhaps some of these packages are already installed. Previously, Jetson Nano used llvm version 7, but Xavier NX using JetPack 4.4 uses llvm version 9.

Warning : Scipy exists outside the Python virtual environment. JetPack's Python version of scipy 0.19.1 is already installed. The scikit-image to be installed later requires scipy 1.0.1 or higher. Therefore, upgrade the scipy version before entering the virtual environment. If using JetPack 4.4, the pip3 install --upgrade scipy command will upgrade the scipy version to 1.5.2.

spypiggy@XavierNX:~$ sudo apt-get install libllvm-9-ocaml-dev libllvm9 llvm-9 llvm-9-dev llvm-9-doc llvm-9-examples llvm-9-runtime 
spypiggy@XavierNX:~$ sudo pip3 install --upgrade scipy
spypiggy@XavierNX:~$ sudo apt-get install -y build-essential libatlas-base-dev swig gfortran
spypiggy@XavierNX:~$ export LLVM_CONFIG=/usr/bin/llvm-config-9

Download and build code from ildoonet

Now clone ildoonet's github.

spypiggy@XavierNX:~$ source /home/spypiggy/python/bin/activate
(python) spypiggy@XavierNX:~$ pip3 install Cython
(python) spypiggy@XavierNX:~$ cd src
(python) spypiggy@XavierNX:~/src$ git clone https://www.github.com/ildoonet/tf-pose-estimation
(python) spypiggy@XavierNX:~/src$ cd tf-pose-estimation
(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ pip3 install -r requirements.txt

edit tf_pose/estimator.py file like this (based on Tensorflow 1.14)

original code

self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

if tf_config is None:
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    sess = tf.Session(config=tf_config)

self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

And the source code has --tensorrt option to use TensorRT. To use this option, modify the ./tf_pose/estimator.py file.
At 327 line, remove the last parameter "use_calibration=True,". This parameter is deprecated Tensorflow version 1.14 or later.

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ cd tf_pose/pafprocess
(python) spypiggy@XavierNX:~/src/tf-pose-estimation/tf_pose/pafprocess$ swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

Download the models

This process can be omitted. If you want, download to test OpenPose's cmu model.

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ cd  ~/src/tf-pose-estimation/models/graph/cmu
(python) spypiggy@XavierNX:~/src/tf-pose-estimation/models/graph/cmu$ bash download.sh

Testing with image

There are pre made testing python files like run.py, run_video.py, run_webcam.py
You can test the framework with images like this.

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run.py --model=mobilenet_thin --resize=432x368 --image=./images/p1.jpg

I had to wait a few minutes on the Jetson Nano, but on the Xavier NX, the results are displayed in seconds. You can see the result images like this.

And looking at the console screen output, the image processing took 0.1292 seconds. It has a performance of about 7.7 FPS.
In all AI edge computing, the processing speed for initial inference is quite slow. Therefore, the inference processing after the second is probably much faster.

[2020-07-29 09:53:48,696] [TfPoseEstimatorRun] [INFO] inference image: ./images/p1.jpg in 0.1292 seconds.
2020-07-29 09:53:48,696 INFO inference image: ./images/p1.jpg in 0.1292 seconds.

Under the hood

Now let's dig deeper.

Let's test with a video file that I've used when testing the OpenPose. In the tf_pose_estimation directory, there is a run_video.py file for testing video files. Since I always work with ssh, I output the video output to a file instead of on the screen. The following is a slightly modified version of the original code provided for testing.

import argparse
import logging
import time

import cv2
import numpy as np

from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh

logger = logging.getLogger('TfPoseEstimator-Video')
logger.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] [%(levelname)s] %(message)s')
ch.setFormatter(formatter)
logger.addHandler(ch)

fps_time = 0

def str2bool(v):
    return v.lower() in ("yes", "true", "t", "1")
if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='tf-pose-estimation Video')
    parser.add_argument('--video', type=str, default='')
    parser.add_argument('--resolution', type=str, default='432x368', help='network input resolution. default=432x368')
    parser.add_argument('--model', type=str, default='mobilenet_thin', help='cmu / mobilenet_thin / mobilenet_v2_large / mobilenet_v2_small')
    parser.add_argument('--show-process', type=bool, default=False,
                        help='for debug purpose, if enabled, speed for inference is dropped.')
    parser.add_argument('--showBG', type=bool, default=True, help='False to show skeleton only.')
    parser.add_argument('--tensorrt', type=str, default="False",
                        help='for tensorrt process.')    
    args = parser.parse_args()    args = parser.parse_args()

    logger.debug('initialization %s : %s' % (args.model, get_graph_path(args.model)))
    w, h = model_wh(args.resolution)
    e = TfPoseEstimator(get_graph_path(args.model), target_size=(w, h), trt_bool=str2bool(args.tensorrt))
    cap = cv2.VideoCapture(args.video)
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
    out_video = cv2.VideoWriter('/tmp/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (640,480))
    count = 0
    t_netfps_time = 0
    t_fps_time = 0    

    if cap.isOpened() is False:
        print("Error opening video stream or file")

    try:    
        while cap.isOpened():
            fps_time = time.time()
            ret_val, image = cap.read()
            if ret_val == False:
                print("Frame read End")
                break

            net_fps = time.time()
            humans = e.inference(image)
            netfps = 1.0 / (time.time() - net_fps)
            if not args.showBG:
                image = np.zeros(image.shape)
            image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
            fps = 1.0 / (time.time() - fps_time)
            fps_time = time.time()
            t_netfps_time += netfps
            t_fps_time += fps
            cv2.putText(image, "NET FPS:%4.1f FPS:%4.1f" % (netfps, fps), (50, 50),  cv2.FONT_HERSHEY_SIMPLEX, 2, (0, 255, 0), 5)
            img = cv2.resize(image, (640,480))            
            out_video.write(img) 
            #cv2.imshow('tf-pose-estimation result', image)
            print("captured fps[%f] net_fps[%f]"%(fps, netfps))

            if cv2.waitKey(1) == ord('q'):
                break
            count += 1    
    except KeyboardInterrupt:
        print("Keyboard interrupt exception caught")
        
    cv2.destroyAllWindows()
    out_video.release()
    cap.release()
    if count:
        print("avg fps[%f] avg net_fps[%f]"%(t_fps_time / count, t_netfps_time / count)) 
logger.debug('finished+')

model : mobilenet_thin

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_thin --video="../openpose/examples/media/video.avi"
......
Frame read End
avg fps[15.827860] avg net_fps[17.112114]
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+

<mobilenet_thin result video>

model : mobilenet_v2_large

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_v2_large --video="../openpose/examples/media/video.avi"
......
Frame read End
avg fps[13.784665] avg net_fps[14.714679]
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+

<mobilenet_v2_large result video>

Performance boost up using TensorRT

If you want to increase performance using tensorrt, you can add the --tensorrt=True option.
The tensorrt option takes a little more time to load the first model because the tensorflow model is converted to tensorrt. As you can see from the results below, using tensorrt can achieve over 20FPS.

(python) spypiggy@XavierNX:~/src/tf-pose-estimation$ python3 run_video.py --model=mobilenet_thin  --tensorrt="True" --video="../openpose/examples/media/video.avi"
......
Frame read End
avg fps[31.638934] avg net_fps[37.349658]
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+
[2020-07-29 11:32:20,795] [TfPoseEstimator-Video] [DEBUG] finished+

But the really important problem is that the accuracy of mobilenet-based models is poor. Compared to the results of using OpenPose in the previous blog, you can see that it is quite inaccurate. In my personal experience, the results of PoseEstimation using ResNet-based models were quite good.

More information on using the ResNet-101 model to improve accuracy than the model using MobileNet is provided at Jetson Xavier NX - Human Pose estimation using tensorflow (mpii) .

You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .

NVIDIA Jetson and Raspberry Pi

2020년 7월 30일 목요일

Jetson Xavier NX - Human Pose estimation using tensorflow