last updated 2021.01.14 : update for VSCode debugging

This article is a re-implementation of what was previously tested on the Jetson Nano(JetsonNano - NVIDIA AI IOT - Human Pose estimation using TensorRT ). Personally, the model that showed the best performance in the Jetson series was the model using TensorRT. This post is a reference to https://github.com/NVIDIA-AI-IOT/trt_pose.

TensorRT Pose Estimation

This project features multi-instance pose estimation accelerated by NVIDIA TensorRT. It is ideal for applications where low latency is necessary. It includes

Training scripts to train on any keypoint task data in MSCOCO format
A collection of models that may be easily optimized with TensorRT using torch2trt

This project can be used easily for the task of human pose estimation, or extended for something new.

Models

Below are models pre-trained on the MSCOCO dataset. The throughput in FPS is shown for each platform

Model	Jetson Nano	Jetson Xavier	Weights
resnet18_baseline_att_224x224_A	22	251	download (81MB)
densenet121_baseline_att_256x256_B	12	101	download (84MB)

Prerequisites

Xavier NX only works with JetPack 4.4 or higher. I documented how to install JetPack 4.4 and PyTorch at Jetson Xavier NX - JetPack 4.4(production release) headless setup and Jetson Xavier NX - Python virtual environment and ML platforms(tensorflow, Pytorch) installation. See the content of this blog.

Install torch2trt

After installing PyTorch, install torch2trt.

spypiggy@XavierNX:~/$ source /home/spypiggy/python/bin/activate
(python) spypiggy@XavierNX:~/src/torch2trt$  pip3 install tqdm cython pycocotools matplotlib
(python) spypiggy@XavierNX:~/$ cd src 
(python) spypiggy@XavierNX:~/src$ git clone https://github.com/NVIDIA-AI-IOT/torch2trt
(python) spypiggy@XavierNX:~/src$ cd torch2trt
(python) spypiggy@XavierNX:~/src/torch2trt$ python3 setup.py install

Be careful : If PyTorch is not installed, an error occurs during the torch2trt installation process.

Installation

Follow these steps. Some packages might be already installed if you tested some of my posts.

(python) spypiggy@XavierNX:~/src/torch2trt$ cd ~/src 
(python) spypiggy@XavierNX:~/src$  git clone https://github.com/NVIDIA-AI-IOT/trt_pose
(python) spypiggy@XavierNX:~/src$ cd trt_pose/
(python) spypiggy@XavierNX:~/src/trt_pose$ python3 setup.py install

Download models

Click the above link and download the models. Use Web Browser to download the files. They are on the Google drive.

After the download is complete, move the files to directory "tasks/human_pose" directory.

If you want to download the file directly from google drive using the console command, use the following command.

cd tasks/human_pose
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd" -O resnet18_baseline_att_224x224_A_epoch_249.pth
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU" -O densenet121_baseline_att_256x256_B_epoch_160.pth

Now it looks like this.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ ls -al
total 233292
drwxrwxr-x 3 spypiggy spypiggy     4096 Aug  1 13:09 .
drwxrwxr-x 3 spypiggy spypiggy     4096 Aug  1 08:37 ..
-rw-rw-r-- 1 spypiggy spypiggy 87573944 Aug  1 08:58 densenet121_baseline_att_256x256_B_epoch_160.pth
-rw-rw-r-- 1 spypiggy spypiggy      182 Aug  1 08:37 download_coco.sh
-rw-rw-r-- 1 spypiggy spypiggy    12027 Aug  1 08:37 eval.ipynb
drwxrwxr-x 2 spypiggy spypiggy     4096 Aug  1 08:37 experiments
-rw-rw-r-- 1 spypiggy spypiggy      510 Aug  1 08:37 human_pose.json
-rw-rw-r-- 1 spypiggy spypiggy    10177 Aug  1 08:37 live_demo.ipynb
-rw-rw-r-- 1 spypiggy spypiggy     2521 Aug  1 08:37 preprocess_coco_person.py
-rw-rw-r-- 1 spypiggy spypiggy 85195117 Aug  1 08:53 resnet18_baseline_att_224x224_A_epoch_249.pth

Get Keypoints From Image

Repo's example code is provided for Jupiter Notebook. So I modified some of this code for Python. And to use the keypoints later, I added code that shows the location of the keypoint and the body points it points to. And I'm used to using OpenCV. I will change the image processing to use OpenCV.

At this point, it supports resnet18 and densenet121 models. So you can use --model options to choose the model you want to use. Use "--model=resnet" for resnet18_baseline_att_224x224_A, or use "--model=densenet" for densenet121_baseline_att_256x256_B.
Another option is the "--image" for selecting images.


import json
import trt_pose.coco
import trt_pose.models
import torch
import torch2trt
from torch2trt import TRTModule
import time
import cv2
import torchvision.transforms as transforms
import PIL.Image, PIL.ImageDraw
from trt_pose.draw_objects import DrawObjects
from trt_pose.parse_objects import ParseObjects
import argparse
import os.path

'''
img is PIL format
'''
def draw_keypoints(img, key):
    thickness = 5
    w, h = img.size
    draw = PIL.ImageDraw.Draw(img)
    #draw Rankle -> RKnee (16-> 14)
    if all(key[16]) and all(key[14]):
        draw.line([ int(key[16][2] * w), int(key[16][1] * h), int(key[14][2] * w), int(key[14][1] * h)],width = thickness, fill=(51,51,204))
    #draw RKnee -> Rhip (14-> 12)
    if all(key[14]) and all(key[12]):
        draw.line([ int(key[14][2] * w), int(key[14][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(51,51,204))
    #draw Rhip -> Lhip (12-> 11)
    if all(key[12]) and all(key[11]):
        draw.line([ int(key[12][2] * w), int(key[12][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(51,51,204))
    #draw Lhip -> Lknee (11-> 13)
    if all(key[11]) and all(key[13]):
        draw.line([ int(key[11][2] * w), int(key[11][1] * h), int(key[13][2] * w), int(key[13][1] * h)],width = thickness, fill=(51,51,204))
    #draw Lknee -> Lankle (13-> 15)
    if all(key[13]) and all(key[15]):
        draw.line([ int(key[13][2] * w), int(key[13][1] * h), int(key[15][2] * w), int(key[15][1] * h)],width = thickness, fill=(51,51,204))

    #draw Rwrist -> Relbow (10-> 8)
    if all(key[10]) and all(key[8]):
        draw.line([ int(key[10][2] * w), int(key[10][1] * h), int(key[8][2] * w), int(key[8][1] * h)],width = thickness, fill=(255,255,51))
    #draw Relbow -> Rshoulder (8-> 6)
    if all(key[8]) and all(key[6]):
        draw.line([ int(key[8][2] * w), int(key[8][1] * h), int(key[6][2] * w), int(key[6][1] * h)],width = thickness, fill=(255,255,51))
    #draw Rshoulder -> Lshoulder (6-> 5)
    if all(key[6]) and all(key[5]):
        draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[5][2] * w), int(key[5][1] * h)],width = thickness, fill=(255,255,0))
    #draw Lshoulder -> Lelbow (5-> 7)
    if all(key[5]) and all(key[7]):
        draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[7][2] * w), int(key[7][1] * h)],width = thickness, fill=(51,255,51))
    #draw Lelbow -> Lwrist (7-> 9)
    if all(key[7]) and all(key[9]):
        draw.line([ int(key[7][2] * w), int(key[7][1] * h), int(key[9][2] * w), int(key[9][1] * h)],width = thickness, fill=(51,255,51))

    #draw Rshoulder -> RHip (6-> 12)
    if all(key[6]) and all(key[12]):
        draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(153,0,51))
    #draw Lshoulder -> LHip (5-> 11)
    if all(key[5]) and all(key[11]):
        draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(153,0,51))


    #draw nose -> Reye (0-> 2)
    if all(key[0][1:]) and all(key[2]):
        draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[2][2] * w), int(key[2][1] * h)],width = thickness, fill=(219,0,219))

    #draw Reye -> Rear (2-> 4)
    if all(key[2]) and all(key[4]):
        draw.line([ int(key[2][2] * w), int(key[2][1] * h), int(key[4][2] * w), int(key[4][1] * h)],width = thickness, fill=(219,0,219))

    #draw nose -> Leye (0-> 1)
    if all(key[0][1:]) and all(key[1]):
        draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[1][2] * w), int(key[1][1] * h)],width = thickness, fill=(219,0,219))

    #draw Leye -> Lear (1-> 3)
    if all(key[1]) and all(key[3]):
        draw.line([ int(key[1][2] * w), int(key[1][1] * h), int(key[3][2] * w), int(key[3][1] * h)],width = thickness, fill=(219,0,219))

    #draw nose -> neck (0-> 17)
    if all(key[0][1:]) and all(key[17]):
        draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[17][2] * w), int(key[17][1] * h)],width = thickness, fill=(255,255,0))
    return img

'''
hnum: 0 based human index
kpoint : index + keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height)
'''
def get_keypoint(humans, hnum, peaks):
    #check invalid human index
    kpoint = []
    human = humans[0][hnum]
    C = human.shape[0]
    for j in range(C):
        k = int(human[j])
        if k >= 0:
            peak = peaks[0][j][k]   # peak[1]:width, peak[0]:height
            peak = (j, float(peak[0]), float(peak[1]))
            kpoint.append(peak)
            print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) )
        else:    
            peak = (j, None, None)
            kpoint.append(peak)
            print('index:%d : None'%(j) )
    return kpoint

def preprocess(image):
    global device
    device = torch.device('cuda')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = PIL.Image.fromarray(image)
    image = transforms.functional.to_tensor(image).to(device)
    image.sub_(mean[:, None, None]).div_(std[:, None, None])
    return image[None, ...]

'''
Draw to inference (small)image
'''
def execute(img):
    start = time.time()
    data = preprocess(img)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    end = time.time()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    for i in range(counts[0]):
        print("Human index:%d "%( i ))
        get_keypoint(objects, i, peaks)
    print("Human count:%d len:%d "%(counts[0], len(counts)))
    print('===== Net FPS :%f ====='%( 1 / (end - start)))
    draw_objects(img, counts, objects, peaks)
    return img

'''
Draw to original image
'''
def execute_2(img, org):
    start = time.time()
    data = preprocess(img)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    end = time.time()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    for i in range(counts[0]):
        print("Human index:%d "%( i ))
        kpoint = get_keypoint(objects, i, peaks)
        #print(kpoint)
        org = draw_keypoints(org, kpoint)
    print("Human count:%d len:%d "%(counts[0], len(counts)))
    print('===== Net FPS :%f ====='%( 1 / (end - start)))
    return org


parser = argparse.ArgumentParser(description='TensorRT pose estimation run')
parser.add_argument('--image', type=str, default='/home/spypiggy/src/test_images/humans_7.jpg')
parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' )
args = parser.parse_args()

with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)

topology = trt_pose.coco.coco_category_to_topology(human_pose)
num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])


if 'resnet' in args.model:
    print('------ model = resnet--------')
    MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth'
    OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'
    model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 224
    HEIGHT = 224

else:    
    print('------ model = densenet--------')
    MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth'
    OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth'
    model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 256
    HEIGHT = 256

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
if os.path.exists(OPTIMIZED_MODEL) == False:
    model.load_state_dict(torch.load(MODEL_WEIGHTS))
    model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
    torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)

model_trt = TRTModule()
model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))

t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
    y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(50.0 / (t1 - t0))

mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()
std = torch.Tensor([0.229, 0.224, 0.225]).cuda()
device = torch.device('cuda')

src = cv2.imread(args.image, cv2.IMREAD_COLOR)
pilimg = cv2.cvtColor(src, cv2.COLOR_BGR2RGB)
pilimg = PIL.Image.fromarray(pilimg)
orgimg = pilimg.copy()

image = cv2.resize(src, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA)
parse_objects = ParseObjects(topology)
draw_objects = DrawObjects(topology)
for x in range(1):
    img = image.copy()
    #img = execute(img)
    pilimg = execute_2(img, orgimg)
    #cv2.imshow('key',img)
dir, filename = os.path.split(args.image)
name, ext = os.path.splitext(filename)
pilimg.save('/home/spypiggy/src/test_images/result/%s_%s.png'%(args.model, name))

<detect_image.py>

Run the code.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ python3 detect_image.py --image=/home/spypiggy/src/test_images/image_multi.png --model=densenet
.......
Human count:16 len:1
===== Net FPS :33.802406 =====

As you can see, the FPS value is 33! Amazing. After loading the model, the first inference task is always slow. Therefore, the image processing after the second time will increase in the hot speed. I will check the correct FPS value again while testing the video file later.

This is the test input image.

This is the output image.

The meaning of KeyPoint values in the COCO model is explained in detail in JetsonNano-NVIDIA AI IOT-Human Pose estimation using TensorRT introduced earlier.

Get Keypoints from video


import json
import trt_pose.coco
import trt_pose.models
import torch
import torch2trt
from torch2trt import TRTModule
import time, sys
import cv2
import PIL.Image, PIL.ImageDraw, PIL.ImageFont
import numpy as np

import torchvision.transforms as transforms
from trt_pose.draw_objects import DrawObjects
from trt_pose.parse_objects import ParseObjects
import argparse
import os.path


def draw_keypoints(img, key):
    thickness = 5
    w, h = img.size
    draw = PIL.ImageDraw.Draw(img)
    #draw Rankle -> RKnee (16-> 14)
    if all(key[16]) and all(key[14]):
        draw.line([ round(key[16][2] * w), round(key[16][1] * h), round(key[14][2] * w), round(key[14][1] * h)],width = thickness, fill=(51,51,204))
    #draw RKnee -> Rhip (14-> 12)
    if all(key[14]) and all(key[12]):
        draw.line([ round(key[14][2] * w), round(key[14][1] * h), round(key[12][2] * w), round(key[12][1] * h)],width = thickness, fill=(51,51,204))
    #draw Rhip -> Lhip (12-> 11)
    if all(key[12]) and all(key[11]):
        draw.line([ round(key[12][2] * w), round(key[12][1] * h), round(key[11][2] * w), round(key[11][1] * h)],width = thickness, fill=(51,51,204))
    #draw Lhip -> Lknee (11-> 13)
    if all(key[11]) and all(key[13]):
        draw.line([ round(key[11][2] * w), round(key[11][1] * h), round(key[13][2] * w), round(key[13][1] * h)],width = thickness, fill=(51,51,204))
    #draw Lknee -> Lankle (13-> 15)
    if all(key[13]) and all(key[15]):
        draw.line([ round(key[13][2] * w), round(key[13][1] * h), round(key[15][2] * w), round(key[15][1] * h)],width = thickness, fill=(51,51,204))

    #draw Rwrist -> Relbow (10-> 8)
    if all(key[10]) and all(key[8]):
        draw.line([ round(key[10][2] * w), round(key[10][1] * h), round(key[8][2] * w), round(key[8][1] * h)],width = thickness, fill=(255,255,51))
    #draw Relbow -> Rshoulder (8-> 6)
    if all(key[8]) and all(key[6]):
        draw.line([ round(key[8][2] * w), round(key[8][1] * h), round(key[6][2] * w), round(key[6][1] * h)],width = thickness, fill=(255,255,51))
    #draw Rshoulder -> Lshoulder (6-> 5)
    if all(key[6]) and all(key[5]):
        draw.line([ round(key[6][2] * w), round(key[6][1] * h), round(key[5][2] * w), round(key[5][1] * h)],width = thickness, fill=(255,255,0))
    #draw Lshoulder -> Lelbow (5-> 7)
    if all(key[5]) and all(key[7]):
        draw.line([ round(key[5][2] * w), round(key[5][1] * h), round(key[7][2] * w), round(key[7][1] * h)],width = thickness, fill=(51,255,51))
    #draw Lelbow -> Lwrist (7-> 9)
    if all(key[7]) and all(key[9]):
        draw.line([ round(key[7][2] * w), round(key[7][1] * h), round(key[9][2] * w), round(key[9][1] * h)],width = thickness, fill=(51,255,51))

    #draw Rshoulder -> RHip (6-> 12)
    if all(key[6]) and all(key[12]):
        draw.line([ round(key[6][2] * w), round(key[6][1] * h), round(key[12][2] * w), round(key[12][1] * h)],width = thickness, fill=(153,0,51))
    #draw Lshoulder -> LHip (5-> 11)
    if all(key[5]) and all(key[11]):
        draw.line([ round(key[5][2] * w), round(key[5][1] * h), round(key[11][2] * w), round(key[11][1] * h)],width = thickness, fill=(153,0,51))


    #draw nose -> Reye (0-> 2)
    if all(key[0][1:]) and all(key[2]):
        draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[2][2] * w), round(key[2][1] * h)],width = thickness, fill=(219,0,219))

    #draw Reye -> Rear (2-> 4)
    if all(key[2]) and all(key[4]):
        draw.line([ round(key[2][2] * w), round(key[2][1] * h), round(key[4][2] * w), round(key[4][1] * h)],width = thickness, fill=(219,0,219))

    #draw nose -> Leye (0-> 1)
    if all(key[0][1:]) and all(key[1]):
        draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[1][2] * w), round(key[1][1] * h)],width = thickness, fill=(219,0,219))

    #draw Leye -> Lear (1-> 3)
    if all(key[1]) and all(key[3]):
        draw.line([ round(key[1][2] * w), round(key[1][1] * h), round(key[3][2] * w), round(key[3][1] * h)],width = thickness, fill=(219,0,219))

    #draw nose -> neck (0-> 17)
    if all(key[0][1:]) and all(key[17]):
        draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[17][2] * w), round(key[17][1] * h)],width = thickness, fill=(255,255,0))
    return img

'''
hnum: 0 based human index
kpoint : keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height
'''
def get_keypoint(humans, hnum, peaks):
    #check invalid human index
    kpoint = []
    human = humans[0][hnum]
    C = human.shape[0]
    for j in range(C):
        k = int(human[j])
        if k >= 0:
            peak = peaks[0][j][k]   # peak[1]:width, peak[0]:height
            peak = (j, float(peak[0]), float(peak[1]))
            kpoint.append(peak)
            #print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) )
        else:    
            peak = (j, None, None)
            kpoint.append(peak)
            #print('index:%d : None %d'%(j, k) )
    return kpoint


parser = argparse.ArgumentParser(description='TensorRT pose estimation run')
parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' )
parser.add_argument('--video', type=str, default='/home/spypiggy/src/test_images/video.avi', help = 'video file name' )
args = parser.parse_args()

with open('human_pose.json', 'r') as f:
    human_pose = json.load(f)

topology = trt_pose.coco.coco_category_to_topology(human_pose)

num_parts = len(human_pose['keypoints'])
num_links = len(human_pose['skeleton'])


if 'resnet' in args.model:
    print('------ model = resnet--------')
    MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth'
    OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth'
    model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 224
    HEIGHT = 224

else:    
    print('------ model = densenet--------')
    MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth'
    OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth'
    model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval()
    WIDTH = 256
    HEIGHT = 256

data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()
if os.path.exists(OPTIMIZED_MODEL) == False:
    print('-- Converting TensorRT models. This may takes several minutes...')
    model.load_state_dict(torch.load(MODEL_WEIGHTS))
    model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)
    torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)

model_trt = TRTModule()
model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))

t0 = time.time()
torch.cuda.current_stream().synchronize()
for i in range(50):
    y = model_trt(data)
torch.cuda.current_stream().synchronize()
t1 = time.time()

print(50.0 / (t1 - t0))

mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()
std = torch.Tensor([0.229, 0.224, 0.225]).cuda()
device = torch.device('cuda')

def preprocess(image):
    global device
    device = torch.device('cuda')
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    image = PIL.Image.fromarray(image)
    image = transforms.functional.to_tensor(image).to(device)
    image.sub_(mean[:, None, None]).div_(std[:, None, None])
    return image[None, ...]

def execute(img, src, t):
    color = (0, 255, 0)
    data = preprocess(img)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    fps = 1.0 / (time.time() - t)
    for i in range(counts[0]):
        keypoints = get_keypoint(objects, i, peaks)
        for j in range(len(keypoints)):
            if keypoints[j][1]:
                x = round(keypoints[j][2] * WIDTH * X_compress)
                y = round(keypoints[j][1] * HEIGHT * Y_compress)
                cv2.circle(src, (x, y), 3, color, 2)
                cv2.putText(src , "%d" % int(keypoints[j][0]), (x + 5, y),  cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 1)
                cv2.circle(src, (x, y), 3, color, 2)
    print("FPS:%f "%(fps))
    #draw_objects(img, counts, objects, peaks)

    cv2.putText(src , "FPS: %f" % (fps), (20, 20),  cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 1)
    out_video.write(src)


'''
Draw to original image
'''
def execute_2(img, org, count):
    start = time.time()
    data = preprocess(img)
    cmap, paf = model_trt(data)
    cmap, paf = cmap.detach().cpu(), paf.detach().cpu()
    end = time.time()
    counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)
    for i in range(counts[0]):
        #print("Human index:%d "%( i ))
        kpoint = get_keypoint(objects, i, peaks)
        #print(kpoint)
        org = draw_keypoints(org, kpoint)
    netfps = 1 / (end - start)  
    draw = PIL.ImageDraw.Draw(org)
    draw.text((30, 30), "NET FPS:%4.1f"%netfps, font=fnt, fill=(0,255,0))    
    print("Human count:%d len:%d "%(counts[0], len(counts)))
    print('===== Frmae[%d] Net FPS :%f ====='%(count, netfps))
    return org

cap = cv2.VideoCapture(args.video)
ret_val, img = cap.read()
H, W, __ = img.shape

fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
dir, filename = os.path.split(args.video)
name, ext = os.path.splitext(filename)
out_video = cv2.VideoWriter('/home/spypiggy/src/test_images/result/%s_%s.mp4'%(args.model, name), fourcc, cap.get(cv2.CAP_PROP_FPS), (W, H))
count = 0

X_compress = 640.0 / WIDTH * 1.0
Y_compress = 480.0 / HEIGHT * 1.0

if cap is None:
    print("Camera Open Error")
    sys.exit(0)

parse_objects = ParseObjects(topology)
draw_objects = DrawObjects(topology)

fontname = '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc'
fnt = PIL.ImageFont.truetype(fontname, 45)

count = 1
while cap.isOpened():
    ret_val, dst = cap.read()
    if ret_val == False:
        print("Frame Read End")
        break
    img = cv2.resize(dst, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA)
    pilimg = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB)
    pilimg = PIL.Image.fromarray(pilimg)
    pilimg = execute_2(img, pilimg, count) 
    array = np.asarray(pilimg, dtype="uint8")
    out_video.write(array)
    count += 1

cv2.destroyAllWindows()
out_video.release()
cap.release()

<detect_video2.py>

Run the code.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ python3 detect_video2.py
......
Human count:8 len:1
===== Frmae[200] Net FPS :48.353785 =====
Human count:8 len:1
===== Frmae[201] Net FPS :47.049301 =====
Human count:6 len:1
===== Frmae[202] Net FPS :47.425955 =====
Human count:9 len:1
===== Frmae[203] Net FPS :47.847956 =====
Human count:8 len:1
===== Frmae[204] Net FPS :48.244772 =====
Frame Read End

Outstanding performance of 45 ~ 80FPS. This value is the time taken to process the inference image in the model, and does not include the time to draw the detected keypoint value to the image and the time to save the video.

Let's check the accuracy .

The output FPS is 45 ~ 80 with resnet model. This is the best pose estimation model I've ever experienced with the Xavier NX and the accuracy is also excellent. .

Wrapping up

The PyTorch model (keypointrcnn_resnet50_fpn_coco-fc266e95.pth) using ResNet-50 is more accurate, but it is too difficult to use in the Jetson series because the processing speed is too low.
Personally, I think the best PoseEstimation models available in Jetson Nano, Xavier NX, etc. at this point are resnet18 and densenet121 using TensorRT.

The following article introduces how to easily debug trt_pose using VSCode.

Xavier NX - NVIDIA AI IOT - Fast debugging trt_pose(Human Pose estimation using TensorRT) using VSCode

NVIDIA Jetson and Raspberry Pi

2020년 8월 2일 일요일

Xavier NX - NVIDIA AI IOT - Human Pose estimation using TensorRT