2020년 5월 25일 월요일

Raspberry Pi - Human Pose estimation using Google EdgeTPU

I have been a big fan of the Jetson series for a long time. And I was interested in Pose Estimation, so I introduced various blogs using OpenPose, Tensorflow, PyTorch, TensorRT, etc. In particular, I noticed the Tensorflow Lite example using the Resnet50 model for Pose Estimation. One of my blog posts is about PyTorch using the Resnet50 model. My impression with the Resnet50 model is that the accuracy of this model is quite good. However, it had the disadvantage of being too slow. The Jetson Nano had a disappointing value of 0.2FPS.

<FPS on AMD Ryzen 7 2700X + RTX 2070 + Ubuntu 18.04>

<FPS on Jetson Nano>


Therefore, I have been looking for a ResNet50 model capable of high FPS in the Jetson series. Unfortunately, it was difficult to find an example of recording a high FPS using the Redsnet50 model.
The best Pose Estimation performance of the Jetson Nano was based on the Resnet18 model designed for TensorRT. I introduced this at https://spyjetson.blogspot.com/2019/12/jetsonnano-human-pose-estimation-using.html.

<Pose Estimation using TensorRT on Jetson Nano records 15 FPS>

As you can see from the picture above, it showed good performance over 15 FPS. I haven't yet found a better performing Pose Estimation model on the Jetson Nano.

Tensorflow Lite

However, TensorFlow Lite introduces Pose Estimation using Resnet50. TensorFlow Lite is a lightweight TensorFlow that runs on Android, IOS-equipped smartphones, or ARM-based devices.

On the https://github.com/tensorflow/tfjs-models/tree/master/posenet page, examples that can be tested in the browser using Javascript are introduced. In PoseNet 2.0, you can choose between a lightweight, fast but inaccurate MobileNetV1 model and a large, heavy but highly accurate ResNet50 model.




However, this example uses Javascript and works only in browsers. I need a python example that works at high speed using the Resnet 50 model. Fortunately, there is a Python example using the Google Edge TPU.

This example uses TensorFlow Lite and requires Google EdgeTPU. In this article, we will use Raspberry Pi 4 instead of Netson Nano.

Raspberry Pi 4 + Tensorflow Lite + Edge TPU

EdgeTPU works on USB 3.0. Therefore, it is not recommended to use Raspberry Pi 3.X, which only supports USB 2.0.
First, after installing the Raspbian desktop version OS on the Raspberry Pi 4, install the required software as follows.

Install Rasbian on the Raspberry Pi 4

  1. Download the Rasbian Buster image on the https://www.raspberrypi.org/downloads/raspbian/ and burn the SD Card image.
  2. Run raspi-config to activate the camera
  3. Run the command "apt-get update"
  4. Run the command "apt-get dist-upgrade" and reboot the Raspberry Pi.

Install Coral(EdgeTPU) software

Use the information on the official website https://coral.ai/docs/accelerator/get-started/#requirements

The Coral AI accelerator is not yet connected to the Raspberry Pi.
First, install the Debian packages needed for Coral. The previous two commands add the repo, and the last command reflects the added repo (repository). Therefore, you must install "libedgetpu1-std" after "apt-get update".

echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update

Install Edge TPU runtime. Edge TPU runtime has two versions, std and max. The max version overclocks the processing unit of the CORAL AI accelerator. Therefore, the processing speed is slightly faster, but the temperature of the accelerator increases instead. It is similar to PC CPU overclocking. If you want to use the max version, install libedgetpu1-max instead of libedgetpu1-std.
The following picture is a YouTube video showing the difference between the two. The Google homepage recommends using the std version. However, Posenet 2.0 may not work properly without using max. So I'm going to install max runtime.


sudo apt-get install libedgetpu1-max

Now connect the Coral AI accelerator to the USB 3.0 port.

Be careful : You can check why you should use max runtime for Resnet50 on the Raspberry at https://github.com/google-coral/project-posenet/issues/31 .

Install TensorFlow Lite

TensorFlow Lite is a lightweight version of the existing TensorFlow that has been improved to use the machine learning model in various mobile environments such as Android, iOS, and embedded systems.
Initially, it was created to operate TensorFlow on a smartphone without an AI-accelerated chip, and the Android and IOS versions were released in November 2017. Since support for Linux systems using ARM32 and ARM64 has been added, it can be used in SBC such as Raspberry Pi, NVidia Jetson series, and Odroid.
TensorFlow Lite converts floating-point numbers to 8-bit integers and processes them, making it possible to process deep learning models relatively quickly. However, the accuracy of the inference result is slightly lowered due to the error due to the integer processing. For reference, the following figure shows the efficiency of parallel processing in the ARM series CPU. It shows that the performance of processing 8-bit integers is 8 times more efficient than processing 64-bit floating-point numbers. Perhaps many devices with TensorFlow Lite installed use ARM CPUs, so this optimization seems to be the case.


First, check the exact Python 3 version of your Raspberry Pi. For reference, version 3.7 of Rasbian Buster is used.

pi@raspberrypi:~ $ python3
Python 3.7.3 (default, Dec 20 2019, 18:57:59) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

In the installation command below, cp37 means Python 3.7. Be sure to use the installation file for the version of Python installed on your Raspberry Pi. The https://www.tensorflow.org/lite/guide/python page has a list of available installation files for other versions of Python. If you are using a different version of Python, refer to this page and use the appropriate installation file.

pip3 install https://dl.google.com/coral/python/tflite_runtime-2.1.0.post1-cp37-cp37m-linux_armv7l.whl


Install Posenet 2.0 example

Now it is time to install Posenet. Download the code from https://github.com/google-coral/project-posenet. In addition to PoseNet, https://github.com/google-coral provides examples using EdgeTPU.

sudo apt-get update
sudo apt-get install python3-edgetpu
cd /usr/local/src git clone https://github.com/google-coral/project-posenet.git cd project-posenet/ sudo bash ./install_requirements.sh


Let's check if the installation was successful using the sample example. If the position and score of each part of the human body are shown as follows, it is installed normally.

root@rpi-coral:/usr/local/src/project-posenet# python3 simple_pose.py
--2020-05-25 20:57:19--  https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Hindu_marriage_ceremony_offering.jpg/640px-Hindu_marriage_ceremony_offering.jpg
Resolving upload.wikimedia.org (upload.wikimedia.org)... 103.102.166.240, 2001:df2:e500:ed1a::2:b
Connecting to upload.wikimedia.org (upload.wikimedia.org)|103.102.166.240|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 83328 (81K) [image/jpeg]
Saving to: couple.jpg

couple.jpg                             100%[=========================================================================>]  81.38K   173KB/s    in 0.5s

2020-05-25 20:57:20 (173 KB/s) - couple.jpg saved [83328/83328]

Inference time: 25ms

Pose Score:  0.5861151
 nose                 x=210  y=152  score=1.0
 left eye             x=224  y=138  score=1.0
 right eye            x=199  y=137  score=1.0
 left ear             x=244  y=135  score=1.0
 right ear            x=182  y=129  score=0.8
 left shoulder        x=268  y=168  score=0.8
 right shoulder       x=160  y=174  score=1.0
 left elbow           x=282  y=255  score=0.6
 right elbow          x=154  y=256  score=0.9
 left wrist           x=230  y=287  score=0.6
 right wrist          x=162  y=299  score=0.6
 left hip             x=271  y=317  score=0.1
 right hip            x=169  y=306  score=0.1
 left knee            x=245  y=330  score=0.2
 right knee           x=172  y=336  score=0.0
 left ankle           x=182  y=411  score=0.1
 right ankle          x=184  y=413  score=0.1

Pose Score:  0.5533377
 nose                 x=398  y=145  score=1.0
 left eye             x=416  y=128  score=1.0
 right eye            x=382  y=127  score=1.0
 left ear             x=457  y=110  score=0.9
 right ear            x=370  y=120  score=0.2
 left shoulder        x=492  y=169  score=0.9
 right shoulder       x=362  y=150  score=0.8
 left elbow           x=463  y=292  score=0.9
 right elbow          x=329  y=245  score=0.8
 left wrist           x=340  y=303  score=0.9
 right wrist          x=236  y=329  score=0.5
 left hip             x=488  y=306  score=0.2
 right hip            x=370  y=318  score=0.1
 left knee            x=472  y=303  score=0.0
 right knee           x=252  y=327  score=0.2
 left ankle           x=450  y=373  score=0.1
 right ankle          x=184  y=410  score=0.1

However, the simple_pose.py example is a bit insufficient because it only shows the values ​​in the console output. Let's improve the source code a bit to show a new result image.
Packages that are frequently used for image processing are Pillow and OpenCV. Learning the basics of using these two packages can be a great help in working with images.

I used PIL's ImageDraw to add the point and text of the body part location PoseNet found to the image and then saved it to the couple_mobilenet.jpg or couple_resnet.jpg file.

import os
import numpy as np
from PIL import Image, ImageDraw
from pose_engine import PoseEngine
import argparse

parser = argparse.ArgumentParser(description='PoseNet')
parser.add_argument('--model', type=str, default='mobilenet')
args = parser.parse_args()

os.system('wget https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/'
          'Hindu_marriage_ceremony_offering.jpg/'
          '640px-Hindu_marriage_ceremony_offering.jpg -O couple.jpg')
pil_image = Image.open('couple.jpg')

if(args.model == 'mobilenet'):
    pil_image.resize((641, 481), Image.NEAREST)
    engine = PoseEngine('models/mobilenet/posenet_mobilenet_v1_075_481_641_quant_decoder_edgetpu.tflite')
else:
    pil_image.resize((640, 480), Image.NEAREST)
    engine = PoseEngine('models/resnet/posenet_resnet_50_640_480_16_quant_edgetpu_decoder.tflite')
    
poses, inference_time = engine.DetectPosesInImage(np.uint8(pil_image))
print('Inference time: %.fms' % inference_time)

output = pil_image.copy()
draw = ImageDraw.Draw(output)
for pose in poses:
    if pose.score < 0.4: continue
    print('\nPose Score: ', pose.score)
    for label, keypoint in pose.keypoints.items():
        print(' %-20s x=%-4d y=%-4d score=%.1f' %
              (label, keypoint.yx[1], keypoint.yx[0], keypoint.score))
        p1 = (keypoint.yx[1], keypoint.yx[0])
        p2 = (keypoint.yx[1] + 5, keypoint.yx[0] + 5)
        draw.ellipse([p1, p2], fill=(0,255,0,255))
        draw.text((keypoint.yx[1] + 10,keypoint.yx[0] - 10), label,  fill=(0,255,0,128))
        
output.save('./couple_' + args.model + '.jpg') 
<simple_pose2.py>

Now run the new python code. As you can see, there's a new couple_pose.jpg file.

root@rpi-coral:/usr/local/src/project-posenet# python3 simple_pose2.py
...
root@rpi-coral:/usr/local/src/project-posenet# ls -al couple*
-rw-r--r-- 1 root root 83328 Feb  9  2016 couple.jpg
-rw-r--r-- 1 root root 77164 May 25 21:23 couple_mobilenet.jpg

<couple_mobilenet.jpg>


If you want to use the Resnet model, use --model=resnet options.

PoseNet 2.0 test using camera

Now that you have verified that PoseNet is working, you need to check the performance using the camera. Personally, if the performance exceeds 10FPS, I believe that the model is usable.
The camera examples provided by project-posenet use gstreamer and svgwriter. However, I have done most of the camera examples before using OpenCV. Therefore, we will change this example to be easier to use using OpenCV.
Since I am using a remote ssh without using a monitor, I used a video output video instead of a screen output If you are using a local monitor, you can change it to screen output using the cv2.imshow function.


import argparse
import time
import math
import cv2
import numpy as np
from pose_engine import PoseEngine


def main():
    parser = argparse.ArgumentParser(description='PoseNet')
    parser.add_argument('--model', type=str, default='mobilenet')
    args = parser.parse_args()
    
    if args.model == 'mobilenet':
        model = 'models/mobilenet/posenet_mobilenet_v1_075_353_481_quant_decoder_edgetpu.tflite'
    else:
        model = 'models/resnet/posenet_resnet_50_416_288_16_quant_edgetpu_decoder.tflite'
        
    engine = PoseEngine(model)
    input_shape = engine.get_input_tensor_shape()        
    inference_size = (input_shape[2], input_shape[1])
    print(inference_size)
    cap = cv2.VideoCapture(0)
    fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
    out_video = cv2.VideoWriter('./output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), inference_size)
    
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
    if cap is None:
        print("Camera Open Error")
        sys.exit(0)
    
    count = 0
    total_ftp = 0.0
    fps_cnt = 0
    while cap.isOpened() and count < 60:
        ret_val, img = cap.read()
        if ret_val == False:
            print("Camera read Error")
            break    
        print('frame read')
        img = cv2.resize(img, inference_size)
        rgb = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        s_time = time.time()
        poses, inference_time = engine.DetectPosesInImage(rgb)
        fps = 1.0 / (time.time() - s_time)
        total_ftp += fps
        fps_cnt += 1
        for pose in poses:
            print('\nPose Score: %f  FPS:%f'%(pose.score, fps))
            if pose.score < 0.3: continue
            for label, keypoint in pose.keypoints.items():
                print(' %-20s x=%-4d y=%-4d score=%.1f' %(label, keypoint.yx[1], keypoint.yx[0], keypoint.score))
                p1 = (keypoint.yx[1], keypoint.yx[0])
                p2 = (keypoint.yx[1] + 5, keypoint.yx[0] + 5)
                cv2.circle(img, (keypoint.yx[1], keypoint.yx[0]), 5, (0,255,0), -1)

        out_video.write(img)                
        count += 1
    if fps_cnt > 0:   
        print('Model[%s] Avg FPS: %f'%(args.model, total_ftp / fps_cnt))
    cv2.destroyAllWindows()        
    cap.release()    
    out_video.release()
    
if __name__ == '__main__':
    main()

The input image size of the network model is (481, 353) for mobilenet and (416,288) for resnet. .

First is the test result using the mobilenet model.

root@rpi-coral:/usr/local/src/project-posenet# python3 pose_camera_cv.py
......
Model[mobilenet] Avg FPS: 120.934394


And the following is a test using resnet50.

root@rpi-coral:/usr/local/src/project-posenet# python3 pose_camera_cv.py  --model=resnet
......
Model[resnet] Avg FPS: 14.966248


<picture from saved video using ResNet50 model>

Wrapping Up

It recorded 120 FPS on MobileNet and about 15 FPS on ResNet50. Since only the time processed by the real network model is recorded, the actual FPS value will be slightly lowered if you include time such as drawing coordinates on the image and writing video files. But even considering this, these values ​​are fantastic.
The Tensorflow Lite model uses 8-bit integer arithmetic instead of floating-point to reduce processing speed and memory usage, and Edge TPU accelerates these operations to record these outstanding values.

The best performance of the Pose Estimation I've introduced so far is the TensorRT and ResNet 18 models, which have a performance of about 15 FPS.

However, it's surprising that the ResNet 50, which is much heavier than the ResNet 18, can provide such good performance.
In the Jetson series, we expect to optimize the ResNet 50 model to make a good performance model.

You can download the source code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson .