2019년 12월 15일 일요일

JetsonNano - NVIDIA AI IOT - Human Pose estimation using TensorRT

last updated 2020.08.01 : update for Jetpack 4.4

I used Jetson Nano, Ubuntu 18.04 Official image with root account. And I always use python3. The source code introduced in this article can be downloaded here.


In the previous article, I described the use of OpenPose to estimate human pose using Jetson Nano and Jetson TX2. However, the performance is only 0.8 FPS in the nano and about 2 FPS in the TX2. In another article, I explained how to increase FPS using TensorFlow and a lightweight network model(It scored 4 ~ 5 FPS), and convert the lightweight models to tensorRT model to boost up. But the FPS is under 10. And the lightweight keypoint detection model's accuracy is not satisfactory. You can see this post at https://spyjetson.blogspot.com/2019/11/jetsonnano-human-pose-estimation-using.html.
I believe that performance should exceed 10 FPS for use in real-world projects. This time, I will again challenge 10 FPS using the TensorRT model provided by NVidia.

This post is a reference to https://github.com/NVIDIA-AI-IOT/trt_pose.

TensorRT Pose Estimation

This project features multi-instance pose estimation accelerated by NVIDIA TensorRT. It is ideal for applications where low latency is necessary. It includes
  • Training scripts to train on any keypoint task data in MSCOCO format
  • A collection of models that may be easily optimized with TensorRT using torch2trt
This project can be used easily for the task of human pose estimation, or extended for something new.




Models

Below are models pre-trained on the MSCOCO dataset. The throughput in FPS is shown for each platform
Model Jetson Nano Jetson Xavier Weights
resnet18_baseline_att_224x224_A 22 251 download (81MB)
densenet121_baseline_att_256x256_B 12 101 download (84MB)

Prerequisites

For information on installing Jetpack 4.3 (December 2019), see https://spyjetson.blogspot.com/2020/02/jetson-nano-jetpack-43-sd-image.html.

Install PyTorch with JetPack 4.3

If you are want to use JetPack 4.4, skip to "Install PyTorch with JetPack 4.4".   

Be careful : These packages are upgraded from time to time. So you should check the site first and find the latest version to install.

update(2020. 04) :  upgrade to pytorch version 1.4

Before installing pytorch 1.3, visit this site to check the latest pytorch version.
If you do not have PyTorch installed, install it first.
Please see my other post about PyTorch Pose Estimation for more information.


cd /usr/local/src 
wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl
pip3 install torch-1.3.0-cp36-cp36m-linux_aarch64.whl
 
#Next install torchvision 0.4.2 
cd /usr/local/src  
git clone -b v0.4.2 https://github.com/pytorch/vision torchvision
cd torchvision
python3 setup.py install 


Before installing pytorch 1.4, visit this site to check the latest pytorch version.
If you do not have PyTorch installed, install it first.
Please see my other post about PyTorch Pose Estimation for more information.


wget https://nvidia.box.com/shared/static/ncgzus5o23uck9i5oth2n8n06k340l6k.whl -O torch-1.4.0-cp36-cp36m-linux_aarch64.whl
sudo apt-get install libopenblas-base
pip3 install torch-1.4.0-cp36-cp36m-linux_aarch64.whl

#install pillow first  

apt install libjpeg8-dev zlib1g-dev libtiff-dev libfreetype6 libfreetype6-dev libwebp-dev libopenjp2-7-dev libopenjp2-7-dev -y
pip3 install pillow --global-option="build_ext" \
--global-option="--enable-zlib" \
--global-option="--enable-jpeg" \
--global-option="--enable-tiff" \
--global-option="--enable-freetype" \
--global-option="--enable-webp" \
--global-option="--enable-webpmux" \
--global-option="--enable-jpeg2000"


pip3 install torchvision 

Install PyTorch with JetPack 4.4

I documented how to install JetPack 4.4 and PyTorch at https://spyjetson.blogspot.com/2020/07/jetson-nano-jetpack-44production.html. See the content of this blog.

Install torch2trt

After installing PyTorch, install torch2trt.

cd /usr/local/src
git clone https://github.com/NVIDIA-AI-IOT/torch2trt
cd torch2trt
python3 setup.py install


Be careful : If PyTorch is not installed, an error occurs during the torch2trt installation process.

Installation

Follow these steps.  Some packages might be already installed if you tested some of my posts.

pip3 install tqdm cython pycocotools
apt-get install python3-matplotlib
cd /usr/local/src
git clone https://github.com/NVIDIA-AI-IOT/trt_pose
cd trt_pose
python3 setup.py install


Click the above link and download the models. Use Web Browser to download the files. They are on the Google drive.
After the download is complete, move the files to directory "tasks/human_pose" directory.

If you want to download the file directly from google drive using the console command, use the following command.
cd tasks/human_pose
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd" -O resnet18_baseline_att_224x224_A_epoch_249.pth
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU" -O densenet121_baseline_att_256x256_B_epoch_160.pth


root@spypiggy-desktop:/usr/local/src/trt_pose/tasks/human_pose# ls -al
total 168776
drwxr-xr-x 3 root root     4096 12월 13 23:33 .
drwxr-xr-x 3 root root     4096 12월 13 23:10 ..
-rw-r--r-- 1 root root 87573944 12월 13 23:32 densenet121_baseline_att_256x256_B_epoch_160.pth
-rwxr-xr-x 1 root root      182 12월 13 23:10 download_coco.sh
-rw-r--r-- 1 root root    12027 12월 13 23:10 eval.ipynb
drwxr-xr-x 2 root root     4096 12월 13 23:10 experiments
-rw-r--r-- 1 root root      510 12월 13 23:10 human_pose.json
-rw-r--r-- 1 root root    10177 12월 13 23:10 live_demo.ipynb
-rw-r--r-- 1 root root     2521 12월 13 23:10 preprocess_coco_person.py
-rw-r--r-- 1 root root 85195117 12월 13 23:32 resnet18_baseline_att_224x224_A_epoch_249.pth

Get Keypoints From Image

Repo's example code is provided for Jupiter Notebook. So I modified some of this code for Python. And to use the keypoints later, I added code that shows the location of the keypoint and the body points it points to. And I'm used to using OpenCV. I will change the image processing to use OpenCV.

At this point, it supports resnet18 and densenet121 models. So you can use --model options to choose the model you want to use. Use "--model=resnet" for resnet18_baseline_att_224x224_A, or use "--model=densenet" for densenet121_baseline_att_256x256_B.
Another option is the "--image" for selecting images.


import json import trt_pose.coco import trt_pose.models import torch import torch2trt from torch2trt import TRTModule import time import cv2 import torchvision.transforms as transforms import PIL.Image, PIL.ImageDraw from trt_pose.draw_objects import DrawObjects from trt_pose.parse_objects import ParseObjects import argparse import os.path ''' img is PIL format ''' def draw_keypoints(img, key): thickness = 5 w, h = img.size draw = PIL.ImageDraw.Draw(img) #draw Rankle -> RKnee (16-> 14) if all(key[16]) and all(key[14]): draw.line([ int(key[16][2] * w), int(key[16][1] * h), int(key[14][2] * w), int(key[14][1] * h)],width = thickness, fill=(51,51,204)) #draw RKnee -> Rhip (14-> 12) if all(key[14]) and all(key[12]): draw.line([ int(key[14][2] * w), int(key[14][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(51,51,204)) #draw Rhip -> Lhip (12-> 11) if all(key[12]) and all(key[11]): draw.line([ int(key[12][2] * w), int(key[12][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(51,51,204)) #draw Lhip -> Lknee (11-> 13) if all(key[11]) and all(key[13]): draw.line([ int(key[11][2] * w), int(key[11][1] * h), int(key[13][2] * w), int(key[13][1] * h)],width = thickness, fill=(51,51,204)) #draw Lknee -> Lankle (13-> 15) if all(key[13]) and all(key[15]): draw.line([ int(key[13][2] * w), int(key[13][1] * h), int(key[15][2] * w), int(key[15][1] * h)],width = thickness, fill=(51,51,204)) #draw Rwrist -> Relbow (10-> 8) if all(key[10]) and all(key[8]): draw.line([ int(key[10][2] * w), int(key[10][1] * h), int(key[8][2] * w), int(key[8][1] * h)],width = thickness, fill=(255,255,51)) #draw Relbow -> Rshoulder (8-> 6) if all(key[8]) and all(key[6]): draw.line([ int(key[8][2] * w), int(key[8][1] * h), int(key[6][2] * w), int(key[6][1] * h)],width = thickness, fill=(255,255,51)) #draw Rshoulder -> Lshoulder (6-> 5) if all(key[6]) and all(key[5]): draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[5][2] * w), int(key[5][1] * h)],width = thickness, fill=(255,255,0)) #draw Lshoulder -> Lelbow (5-> 7) if all(key[5]) and all(key[7]): draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[7][2] * w), int(key[7][1] * h)],width = thickness, fill=(51,255,51)) #draw Lelbow -> Lwrist (7-> 9) if all(key[7]) and all(key[9]): draw.line([ int(key[7][2] * w), int(key[7][1] * h), int(key[9][2] * w), int(key[9][1] * h)],width = thickness, fill=(51,255,51)) #draw Rshoulder -> RHip (6-> 12) if all(key[6]) and all(key[12]): draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(153,0,51)) #draw Lshoulder -> LHip (5-> 11) if all(key[5]) and all(key[11]): draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(153,0,51)) #draw nose -> Reye (0-> 2) if all(key[0][1:]) and all(key[2]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[2][2] * w), int(key[2][1] * h)],width = thickness, fill=(219,0,219)) #draw Reye -> Rear (2-> 4) if all(key[2]) and all(key[4]): draw.line([ int(key[2][2] * w), int(key[2][1] * h), int(key[4][2] * w), int(key[4][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> Leye (0-> 1) if all(key[0][1:]) and all(key[1]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[1][2] * w), int(key[1][1] * h)],width = thickness, fill=(219,0,219)) #draw Leye -> Lear (1-> 3) if all(key[1]) and all(key[3]): draw.line([ int(key[1][2] * w), int(key[1][1] * h), int(key[3][2] * w), int(key[3][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> neck (0-> 17) if all(key[0][1:]) and all(key[17]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[17][2] * w), int(key[17][1] * h)],width = thickness, fill=(255,255,0)) return img ''' hnum: 0 based human index kpoint : index + keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height) ''' def get_keypoint(humans, hnum, peaks): #check invalid human index kpoint = [] human = humans[0][hnum] C = human.shape[0] for j in range(C): k = int(human[j]) if k >= 0: peak = peaks[0][j][k] # peak[1]:width, peak[0]:height peak = (j, float(peak[0]), float(peak[1])) kpoint.append(peak) print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) ) else: peak = (j, None, None) kpoint.append(peak) print('index:%d : None'%(j) ) return kpoint def preprocess(image): global device device = torch.device('cuda') image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = PIL.Image.fromarray(image) image = transforms.functional.to_tensor(image).to(device) image.sub_(mean[:, None, None]).div_(std[:, None, None]) return image[None, ...] ''' Draw to inference (small)image ''' def execute(img): start = time.time() data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() end = time.time() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) for i in range(counts[0]): print("Human index:%d "%( i )) get_keypoint(objects, i, peaks) print("Human count:%d len:%d "%(counts[0], len(counts))) print('===== Net FPS :%f ====='%( 1 / (end - start))) draw_objects(img, counts, objects, peaks) return img ''' Draw to original image ''' def execute_2(img, org): start = time.time() data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() end = time.time() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) for i in range(counts[0]): print("Human index:%d "%( i )) kpoint = get_keypoint(objects, i, peaks) #print(kpoint) org = draw_keypoints(org, kpoint) print("Human count:%d len:%d "%(counts[0], len(counts))) print('===== Net FPS :%f ====='%( 1 / (end - start))) return org parser = argparse.ArgumentParser(description='TensorRT pose estimation run') parser.add_argument('--image', type=str, default='/home/spypiggy/src/test_images/humans_7.jpg') parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' ) args = parser.parse_args() with open('human_pose.json', 'r') as f: human_pose = json.load(f) topology = trt_pose.coco.coco_category_to_topology(human_pose) num_parts = len(human_pose['keypoints']) num_links = len(human_pose['skeleton']) if 'resnet' in args.model: print('------ model = resnet--------') MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth' OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth' model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 224 HEIGHT = 224 else: print('------ model = densenet--------') MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth' OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth' model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 256 HEIGHT = 256 data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda() if os.path.exists(OPTIMIZED_MODEL) == False: model.load_state_dict(torch.load(MODEL_WEIGHTS)) model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25) torch.save(model_trt.state_dict(), OPTIMIZED_MODEL) model_trt = TRTModule() model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL)) t0 = time.time() torch.cuda.current_stream().synchronize() for i in range(50): y = model_trt(data) torch.cuda.current_stream().synchronize() t1 = time.time() print(50.0 / (t1 - t0)) mean = torch.Tensor([0.485, 0.456, 0.406]).cuda() std = torch.Tensor([0.229, 0.224, 0.225]).cuda() device = torch.device('cuda') src = cv2.imread(args.image, cv2.IMREAD_COLOR) pilimg = cv2.cvtColor(src, cv2.COLOR_BGR2RGB) pilimg = PIL.Image.fromarray(pilimg) orgimg = pilimg.copy() image = cv2.resize(src, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA) parse_objects = ParseObjects(topology) draw_objects = DrawObjects(topology) for x in range(1): img = image.copy() #img = execute(img) pilimg = execute_2(img, orgimg) #cv2.imshow('key',img) dir, filename = os.path.split(args.image) name, ext = os.path.splitext(filename) pilimg.save('/home/spypiggy/src/test_images/result/%s_%s.png'%(args.model, name))
<detect_image.py>

Run the code.

root@spypiggy-jesonnano:/usr/local/src/trt_pose/tasks/human_pose# python3 detect_image.py --model=densenet



The result shows that there are 3 humans in the picture, and shows each person's keypoint informations.

Be Careful : The above key point output values are based on image size 1 and are Y and X coordinates. Therefore, [0.374, 0.505] of the last line has a Y coordinate of 0.374 and an X coordinate of 0.5.5. To apply this value to the original image, multiply it by the height and width of the image.


This is the test input image.


This is the output image.



What is the index?

The index shows the keypoint name index. There's a file named "human_pose.json" in the "trt_pose/tasks/human_pose" directory.


{
    "supercategory": "person",
    "id": 1,
    "name": "person",
    "keypoints": [
        "nose",
        "left_eye",
        "right_eye",
        "left_ear",
        "right_ear",
        "left_shoulder",
        "right_shoulder",
        "left_elbow",
        "right_elbow",
        "left_wrist",
        "right_wrist",
        "left_hip",
        "right_hip",
        "left_knee",
        "right_knee",
        "left_ankle",
        "right_ankle",
        "neck"
    ],
    "skeleton": [
      .........

Yes. index 0 indicates "nose", and index 17 indicates "neck". If no particular keypoint is found, it returns None for the keypoint coordinates.

<index position>

What is the floating point value?

If a particular keypoint is found, it returns a coordinate value between 0.0 and 1.0. If you use a resnet18_baseline_att_224x224_A model, input inference image size is 224X224, if you use a densenet121_baseline_att_256x256, input inference image size is 256X256. Multiply this coordinate by the image size to calculate the exact location in the input image.


Get Keypoints from video


import json
import trt_pose.coco
import trt_pose.models
import torch
import torch2trt
from torch2trt import TRTModule
import time, sys
import cv2
import torchvision.transforms as transforms
import PIL.Image
from trt_pose.draw_objects import DrawObjects
from trt_pose.parse_objects import ParseObjects
import argparse
import os.path


'''
hnum: 0 based human index
kpoint : keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height
'''
def get_keypoint(humans, hnum, peaks):
    #check invalid human index
    kpoint = []
    human = humans[0][hnum]
    C = human.shape[0]
    for j in range(C):
        k = int(human[j])
        if k >= 0:
            peak = peaks[0][j][k]   # peak[1]:width, peak[0]:height
            peak = (j, float(peak[0]), float(peak[1]))
            kpoint.append(peak)
            #print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) )
        else:    
            peak = (j, None, None)
            kpoint.append(peak)
            #print('index:%d : None %d'%(j, k) )
    return kpoint


parser = argparse.ArgumentParser(description='TensorRT pose estimation run')
parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' )
args = parser.parse_args()

with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)

topology = trt_pose.coco.coco_category_to_topology(human_pose)

num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])


if 'resnet' in args.model:
    print('------ model = resnet--------')
    MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth'
    OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'
    model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 224
    HEIGHT = 224

else:    
    print('------ model = densenet--------')
    MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth'
    OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth'
    model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 256
    HEIGHT = 256

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
if os.path.exists(OPTIMIZED_MODEL) == False:
    model.load_state_dict(torch.load(MODEL_WEIGHTS))
    model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
    torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)

model_trt = TRTModule()
model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))

t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
    y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(50.0 / (t1 - t0))

mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()
std = torch.Tensor([0.229, 0.224, 0.225]).cuda()
device = torch.device('cuda')

def preprocess(image):
    global device
    device = torch.device('cuda')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = PIL.Image.fromarray(image)
    image = transforms.functional.to_tensor(image).to(device)
    image.sub_(mean[:, None, None]).div_(std[:, None, None])
    return image[None, ...]

def execute(img, src, t):
    color = (0, 255, 0)
    data = preprocess(img)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    fps = 1.0 / (time.time() - t)
    for i in range(counts[0]):
        keypoints = get_keypoint(objects, i, peaks)
        for j in range(len(keypoints)):
            if keypoints[j][1]:
                x = round(keypoints[j][2] * WIDTH * X_compress)
                y = round(keypoints[j][1] * HEIGHT * Y_compress)
                cv2.circle(src, (x, y), 3, color, 2)
                cv2.putText(src , "%d" % int(keypoints[j][0]), (x + 5, y),  cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 1)
                cv2.circle(src, (x, y), 3, color, 2)
    print("FPS:%f "%(fps))
    #draw_objects(img, counts, objects, peaks)

    cv2.putText(src , "FPS: %f" % (fps), (20, 20),  cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 1)
    out_video.write(src)



cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)

ret_val, img = cap.read()
fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
out_video = cv2.VideoWriter('/tmp/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (640, 480))
count = 0

X_compress = 640.0 / WIDTH * 1.0
Y_compress = 480.0 / HEIGHT * 1.0

if cap is None:
    print("Camera Open Error")
    sys.exit(0)

parse_objects = ParseObjects(topology)
draw_objects = DrawObjects(topology)

while cap.isOpened() and count < 500:
    t = time.time()
    ret_val, dst = cap.read()
    if ret_val == False:
        print("Camera read Error")
        break

    img = cv2.resize(dst, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA)
    execute(img, dst, t)
    count += 1


cv2.destroyAllWindows()
out_video.release()
cap.release()
<detect_video.py>

Run the code.

root@spypiggy-jesonnano:/usr/local/src/trt_pose/tasks/human_pose# python3 detect_video.py --model=resnet



The output FPS is 15 ~ 16 with resnet model. The densenet model's FPS is 9 ~ 10. This is the best pose estimation model I've ever experienced with the Jetson Nano.


<webcam captured image with densenet model>

Wrapping up

I've tested human pose estimation with several models so far, but the model presented in this article is the best. If you are using the Jetson series, I recommend using this model for human pose estimation.

If you are interested in trt_pose of Jetson TX2, see my other post(JetsonTX2 - NVIDIA AI IOT - Human Pose estimation using TensorRT)

If you are interested in trt_pose of Jetson Xavier NX, see my other post(Xavier NX - NVIDIA AI IOT - Human Pose estimation using TensorRT)

댓글 34개:

  1. Hi, thank for the detailed explanation! Do you know how to perform training with this pose estimation repo? Thank you!

    답글삭제
  2. Hi, thank for the detailed explanation! Do you know how to perform training with this pose estimation repo? Thank you!

    답글삭제
    답글
    1. No, I don't know the training.
      I'm just interested in AI edge computing.

      삭제
  3. 작성자가 댓글을 삭제했습니다.

    답글삭제
  4. I get the following error when trying to run your code, any help would be greatly appreciated. Thank you


    ------ model = resnet--------
    12.615812525168135
    [ WARN:0] global /home/ahmad/opencv/modules/videoio/src/cap_gstreamer.cpp (1757) handleMessage OpenCV | GStreamer warning: Embedded video playback halted; module v4l2src0 reported: Internal data stream error.
    [ WARN:0] global /home/ahmad/opencv/modules/videoio/src/cap_gstreamer.cpp (886) open OpenCV | GStreamer warning: unable to start pipeline
    [ WARN:0] global /home/ahmad/opencv/modules/videoio/src/cap_gstreamer.cpp (480) isPipelinePlaying OpenCV | GStreamer warning: GStreamer: pipeline have not been created
    VIDEOIO ERROR: V4L: Unable to get camera FPS
    [ERROR:0] global /home/ahmad/opencv/modules/videoio/src/cap.cpp (392) open VIDEOIO(GSTREAMER): raised OpenCV exception:

    OpenCV(4.1.1) /home/ahmad/opencv/modules/videoio/src/cap_gstreamer.cpp:1392: error: (-215:Assertion failed) fps > 0 in function 'open'

    답글삭제
  5. Did you build the source code to implement OpenCV? Errors related to Gstreamer are not easy to find. If you built OpenCV using source code, please refer to https://spyjetson.blogspot.com/2019/09/jetsonnano-opencv-411-build.html. Starting with Jetpack 4.3, OpenCV 4.1.1 is installed and provided as standard. I recommend that you test it again in this environment.

    답글삭제
  6. Thanks for the quick response. Yes I started with Jetpack 4.3. Do you have a YouTube video walkthrough for this project?

    답글삭제
    답글
    1. No, I don't have a Youtube channel. I'm going to test the code on the Jetpack 4.3 soon. And I'll share the result with you.

      삭제
    2. I installed a new Jetpack 4.3 to my SD card and installed required S/W.(In my old post, I used Pytotch 1.3, but 1.4 is the newest version. So I installed Pytorch 1.4. I updated the post)
      Then I tested the source codes. I got no errors. Two sample codes works well.
      If you modified the above code and use cv2.imshow function and works with remote SSH, perhaps an error occurs.

      If you still have troubles, pls follow my updated posts for installing required S/W.

      삭제
    3. Thanks, we were able to get it running! We are trying to build on this to identify certain arm/body movements. For example, a right arm in the form of a 90 degree angle would be detected and some action would be done. Any guidance on how we can do this?

      삭제
    4. I posted an article calculating the angles of body elements.
      Pls see my other post https://spyjetson.blogspot.com/2019/09/jetsonnano-human-pose-estimation-using.html .

      삭제
  7. Hi,

    I am trying to run the scripts you provided but I am getting some errors. Are you able to help me?

    When running the image detection I get this:

    sudo python3 imagepose.py --image="/home/yusuf/Desktop/trt_pose/tasks/human_pose/humans_7.jpg" --model=densenet

    ------ model = densenet--------
    Traceback (most recent call last):
    File "imagepose.py", line 189, in
    model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))
    File "/home/yusuf/.local/lib/python3.6/site-packages/torch/serialization.py", line 580, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    File "/home/yusuf/.local/lib/python3.6/site-packages/torch/serialization.py", line 750, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
    EOFError: Ran out of input

    When I run the video script, I have put a rtsp stream in videoCapture command, I get the following:

    sudo python3 pose.py --model=resnet

    ------ model = resnet--------
    10.43325879840057
    [mpeg4 @ 0x8d456a40] timebase 1/180000 not supported by MPEG 4 standard, the maximum admitted value for the timebase denominator is 65535
    Could not open codec 'mpeg4': Unspecified error

    (python3:11240): GStreamer-CRITICAL **: 14:09:09.238: gst_element_make_from_uri: assertion 'gst_uri_is_valid (uri)' failed
    [ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (1578) open OpenCV | GStreamer warning: cannot link elements
    FPS:1.639794
    FPS:11.211899
    FPS:11.327354
    FPS:12.698891
    FPS:11.814142
    FPS:11.931534
    FPS:12.863122
    FPS:12.060881
    FPS:11.799983
    FPS:11.527310
    FPS:11.725890
    FPS:11.484793
    FPS:11.461256
    FPS:11.368434
    FPS:10.250536
    FPS:12.350064
    FPS:12.299182
    FPS:12.202286
    FPS:12.305858
    FPS:11.702694
    FPS:12.276323
    FPS:11.430241
    FPS:11.881376
    FPS:11.498080
    FPS:11.719468
    FPS:12.041905
    FPS:11.002348
    FPS:11.516833
    FPS:11.950912
    FPS:11.712432
    FPS:11.380124
    FPS:11.752569
    FPS:11.287330
    FPS:11.376420
    FPS:12.228220
    FPS:12.093605
    FPS:12.204487
    FPS:11.931330
    FPS:10.575200
    FPS:11.191824
    FPS:11.480393
    FPS:11.527690
    FPS:11.590061
    FPS:11.062620
    FPS:12.181696
    FPS:11.310584
    FPS:11.654535
    FPS:12.211096
    FPS:12.110505
    FPS:11.391003
    FPS:12.075395
    FPS:12.046644
    FPS:11.445306
    FPS:11.597785
    FPS:11.618571
    FPS:12.026782
    FPS:11.750693
    FPS:11.907349
    FPS:12.047059
    FPS:11.271042
    FPS:12.103132
    FPS:11.214297
    FPS:11.124442
    FPS:10.811288
    FPS:11.607028
    FPS:12.319086
    FPS:12.151803
    FPS:11.843867
    FPS:11.440592
    FPS:11.780595
    FPS:11.719272
    FPS:12.162303
    FPS:11.648418
    FPS:11.588716
    FPS:11.949107
    FPS:11.665751
    FPS:11.808222
    [h264 @ 0x8d3a7440] cbp too large (4099) at 10 4
    [h264 @ 0x8d3a7440] error while decoding MB 10 4
    FPS:0.031468
    Camera read Error

    Are you able to help me? I do not have a webcam, I am running on Jetson nano and I have a CSI Camera or a IPcamera app on phone with RTSP stream.

    Kind Regards,

    Yusuf

    답글삭제
    답글
    1. In my example, the python file name is detect_image2.py.
      And this is my Jetson's information.(I used Xavier NX, but I'm sure that the result is the same. In my execution environment, the python files and model files are in the same directory.



      (python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ pwd
      /home/spypiggy/src/trt_pose/tasks/human_pose
      (python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ ls -al
      total 307328
      drwxrwxr-x 3 spypiggy spypiggy 4096 Aug 2 01:56 .
      drwxrwxr-x 3 spypiggy spypiggy 4096 Aug 1 08:37 ..
      -rw-rw-r-- 1 spypiggy spypiggy 87573944 Aug 1 08:58 densenet121_baseline_att_256x256_B_epoch_160.pth
      -rw-rw-r-- 1 spypiggy spypiggy 66055471 Aug 1 09:08 densenet121_baseline_att_256x256_B_epoch_160_trt.pth
      -rw-rw-r-- 1 spypiggy spypiggy 9960 Aug 2 00:54 detect_image2.py
      -rw-rw-r-- 1 spypiggy spypiggy 10483 Aug 2 01:40 detect_video2.py
      -rw-rw-r-- 1 spypiggy spypiggy 182 Aug 1 08:37 download_coco.sh
      -rw-rw-r-- 1 spypiggy spypiggy 12027 Aug 1 08:37 eval.ipynb
      drwxrwxr-x 2 spypiggy spypiggy 4096 Aug 1 08:37 experiments
      -rw-rw-r-- 1 spypiggy spypiggy 510 Aug 1 08:37 human_pose.json
      -rw-rw-r-- 1 spypiggy spypiggy 10177 Aug 1 08:37 live_demo.ipynb
      -rw-rw-r-- 1 spypiggy spypiggy 2521 Aug 1 08:37 preprocess_coco_person.py
      -rw-rw-r-- 1 spypiggy spypiggy 85195117 Aug 1 08:53 resnet18_baseline_att_224x224_A_epoch_249.pth
      -rw-rw-r-- 1 spypiggy spypiggy 75797551 Aug 2 01:21 resnet18_baseline_att_224x224_A_epoch_249_trt.pth


      And when I ran the python, I can get the success result as below. Please double check that the model file is in the same directory.
      (python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ python3 detect_image2.py --image="/home/spypiggy/src/test_images/humans_7.jpg" --model=densenet
      .....
      index:9 : None
      index:10 : None
      index:11 : success [0.580, 0.568]
      index:12 : success [0.575, 0.447]
      index:13 : success [0.716, 0.551]
      index:14 : success [0.712, 0.468]
      index:15 : success [0.854, 0.538]
      index:16 : success [0.856, 0.475]
      index:17 : success [0.375, 0.505]
      Human count:3 len:1
      ===== Net FPS :40.797051 =====

      삭제
  8. Hi,

    I am trying to run these two scripts to test but I am getting errors. When I run the image detection script I get the following error:

    sudo python3 imagepose.py --image="/home/yusuf/Desktop/trt_pose/tasks/human_pose/humans_7.jpg" --model=densenet
    ------ model = densenet--------
    Traceback (most recent call last):
    File "imagepose.py", line 189, in
    model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))
    File "/home/yusuf/.local/lib/python3.6/site-packages/torch/serialization.py", line 580, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    File "/home/yusuf/.local/lib/python3.6/site-packages/torch/serialization.py", line 750, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
    EOFError: Ran out of input

    When I run the video script on CSI camera, it keeps showing the FPS but no display shows to show the stream and pose estimation?

    Kind Regards,

    Yusuf

    답글삭제
    답글
    1. When using CSI camera, cv2.VideoCapture(0) function parameter should be changed.
      Please see my blog(https://spyjetson.blogspot.com/2020/02/camera-csi-camera-raspberry-pi-camera-v2.html) about CSI camera.

      삭제
  9. Hi! Do you know how to run pose estimation and draw the keypoints like you did in the image script but for local video files? I've tried replacing the cap in the camera one with the local video path but with no success. Thanks!

    답글삭제
    답글
    1. Hi Kar.
      With OpenCV, processing image files and video is almost the same. When processing video files, OpenCV reads frame by frame and treats them as one image.
      Therefore, video processing is also easy to understand if you have fully understood the Python script for image processing.
      And if you analyze the sample code of detect_video.py well, you will get help. And I have confirmed that Detect_video.py runs well without any problems.
      Good luck!

      삭제
  10. Thank you @spypiggy you helped me a lot, great quality material and guide!

    답글삭제
  11. Hi, Thank you for offering these scripts!! I would like to ask you how can I get the confidence/score value for each keypoint (like in openpose for example). Thanks in advance.

    답글삭제
    답글
    1. Hi John.
      Sorry for the late reply.
      It seems that the trt_pose does not offer the confidence value of key points.
      But you can adjust keypoint extraction using thresholds.
      I wrote a new article about debugging trt_pose. And in this article, I've added some helpful information to your question. Please take a look.
      https://spyjetson.blogspot.com/2021/01/xavier-nx-nvidia-ai-iot-fast-debuggiing.html

      삭제
  12. Hello spypiggy,

    https://github.com/hafizas101/Real-time-human-pose-estimation-and-classification

    I saw the article before.
    I wanna use your methods for finishing pose classification.
    Can you please share the pose classification knowlege or other methods?

    답글삭제
    답글
    1. Hello, sorry that I first time use the response of blog .

      I'm not hafizas101, the following is my github.
      https://github.com/c7934597.

      I'm implementing your article and the pose classification of hafizas101. Because I need to inference somebody for taking pictures. Maybe inferencing whether catching thing?

      My methods is using key points of pose, and then counting their degree. Is that good method? I first time developing pose estimation , therefore, I need tutorials about that. Thank you so much.

      삭제
    2. Hi Ming,
      I don't know hafizas101's github contents.
      When it comes to solving all problems, I believe machine learning isn't the best solution or the best option. In gesture recognition, a method of classifying key point data of a specific motion by machine learning may be possible. However, I don't think it's a bad way to hardcode by calculating the angle and position of keypoints. Rather, for simple operations, this could be a good way to produce faster and more accurate results.

      삭제
  13. Hi
    Do you know an human body part segmentation implementation for jetson?

    답글삭제
    답글
    1. First, I recommend Pytorch detestron2(https://github.com/facebookresearch/detectron2).
      I've also written a blog about detectron2(https://spyjetson.blogspot.com/2020/06/jetson-nano-detectron2-segmentation.html).
      And there're several githubs. In the github, search keyword "body segmentation"
      Pls see "https://github.com/kevinlin311tw/CDCL-human-part-segmentation"

      삭제
  14. HI i have a problem in detect_video.py . i can't play output.mp4... do you know why?

    답글삭제
    답글
    1. The detect_video.py simply records 500 frames on the /tmp/output.mp4.
      You have to wait until the program ends.
      If you have done this, I dont't know the reason.

      삭제
  15. Hi! I found your amazing codes by someone's comment on my issue of trt-pose.
    I want to know if I can classify human behaviors(like sitting, lying, walking and falling) by measuring the angle between keypoints.
    Is there any way to do that on your second code? I'd very appreciate it if you could answer my question!

    답글삭제
    답글
    1. The code for calculating the angle between keypoints can be found on my blog.
      Of course, it is up to you to estimate the posture using this angle.

      삭제
  16. Hi, thank you for you post. This will be very helpful for my project.
    However, I got error while installing torch2trt
    when I typed 'python3 setup.py install'
    I got error like this
    error: command 'aarch64-linux-gnu-gcc' failed with exit status 1
    How can solve this problem?
    Thank you

    답글삭제
  17. Hi, Thank you your article! this is awesome!
    Howmever, when U try to run the example code (detect_image.py),
    it shows ' -------model = densenet --------
    but after that,the Jetson suddenly stop and dosen't work anything.
    the screen freezes.

    Do you know how can I solve this problem?
    I'm using jetpang 4.6, and pythorch 1.8
    Is this because of version problem??

    답글삭제
    답글
    1. I haven't tested it yet with jetpank 4.6 and python 1.8.
      Most of the reasons for a sudden system hang is due to insufficient memory on the Jetson Nano. If you change your Ubuntu desktop to LXDE, you can free up about 1 GB of additional memory. First, use this method to increase the available memory, and then test again. I know LXDE is easy to install with JetPack 4.6.
      It is also introduced at https://spyjetson.blogspot.com/2019/09/jetson-nano-useful-tips-before-you.html.

      삭제