2020년 8월 2일 일요일

Xavier NX - NVIDIA AI IOT - Human Pose estimation using TensorRT

last updated 2021.01.14 : update for VSCode debugging

This article is a re-implementation of what was previously tested on the Jetson Nano(JetsonNano - NVIDIA AI IOT - Human Pose estimation using TensorRT ). Personally, the model that showed the best performance in the Jetson series was the model using TensorRT. This post is a reference to https://github.com/NVIDIA-AI-IOT/trt_pose.

TensorRT Pose Estimation

This project features multi-instance pose estimation accelerated by NVIDIA TensorRT. It is ideal for applications where low latency is necessary. It includes
  • Training scripts to train on any keypoint task data in MSCOCO format
  • A collection of models that may be easily optimized with TensorRT using torch2trt
This project can be used easily for the task of human pose estimation, or extended for something new.




Models

Below are models pre-trained on the MSCOCO dataset. The throughput in FPS is shown for each platform
Model Jetson Nano Jetson Xavier Weights
resnet18_baseline_att_224x224_A 22 251 download (81MB)
densenet121_baseline_att_256x256_B 12 101 download (84MB)

Prerequisites

Xavier NX only works with JetPack 4.4 or higher. I documented how to install JetPack 4.4 and PyTorch at Jetson Xavier NX - JetPack 4.4(production release) headless setup  and Jetson Xavier NX - Python virtual environment and ML platforms(tensorflow, Pytorch) installation. See the content of this blog.

Install torch2trt

After installing PyTorch, install torch2trt.

spypiggy@XavierNX:~/$ source /home/spypiggy/python/bin/activate
(python) spypiggy@XavierNX:~/src/torch2trt$  pip3 install tqdm cython pycocotools matplotlib
(python) spypiggy@XavierNX:~/$ cd src 
(python) spypiggy@XavierNX:~/src$ git clone https://github.com/NVIDIA-AI-IOT/torch2trt
(python) spypiggy@XavierNX:~/src$ cd torch2trt
(python) spypiggy@XavierNX:~/src/torch2trt$ python3 setup.py install

Be careful : If PyTorch is not installed, an error occurs during the torch2trt installation process.

Installation

Follow these steps.  Some packages might be already installed if you tested some of my posts.

(python) spypiggy@XavierNX:~/src/torch2trt$ cd ~/src 
(python) spypiggy@XavierNX:~/src$  git clone https://github.com/NVIDIA-AI-IOT/trt_pose
(python) spypiggy@XavierNX:~/src$ cd trt_pose/
(python) spypiggy@XavierNX:~/src/trt_pose$ python3 setup.py install


Download models

Click the above link and download the models. Use Web Browser to download the files. They are on the Google drive.
After the download is complete, move the files to directory "tasks/human_pose" directory.

If you want to download the file directly from google drive using the console command, use the following command.
cd tasks/human_pose
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1XYDdCUdiF2xxx4rznmLb62SdOUZuoNbd" -O resnet18_baseline_att_224x224_A_epoch_249.pth
wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=13FkJkx7evQ1WwP54UmdiDXWyFMY1OxDU" -O densenet121_baseline_att_256x256_B_epoch_160.pth

Now it looks like this.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ ls -al
total 233292
drwxrwxr-x 3 spypiggy spypiggy     4096 Aug  1 13:09 .
drwxrwxr-x 3 spypiggy spypiggy     4096 Aug  1 08:37 ..
-rw-rw-r-- 1 spypiggy spypiggy 87573944 Aug  1 08:58 densenet121_baseline_att_256x256_B_epoch_160.pth
-rw-rw-r-- 1 spypiggy spypiggy      182 Aug  1 08:37 download_coco.sh
-rw-rw-r-- 1 spypiggy spypiggy    12027 Aug  1 08:37 eval.ipynb
drwxrwxr-x 2 spypiggy spypiggy     4096 Aug  1 08:37 experiments
-rw-rw-r-- 1 spypiggy spypiggy      510 Aug  1 08:37 human_pose.json
-rw-rw-r-- 1 spypiggy spypiggy    10177 Aug  1 08:37 live_demo.ipynb
-rw-rw-r-- 1 spypiggy spypiggy     2521 Aug  1 08:37 preprocess_coco_person.py
-rw-rw-r-- 1 spypiggy spypiggy 85195117 Aug  1 08:53 resnet18_baseline_att_224x224_A_epoch_249.pth

Get Keypoints From Image

Repo's example code is provided for Jupiter Notebook. So I modified some of this code for Python. And to use the keypoints later, I added code that shows the location of the keypoint and the body points it points to. And I'm used to using OpenCV. I will change the image processing to use OpenCV.

At this point, it supports resnet18 and densenet121 models. So you can use --model options to choose the model you want to use. Use "--model=resnet" for resnet18_baseline_att_224x224_A, or use "--model=densenet" for densenet121_baseline_att_256x256_B.
Another option is the "--image" for selecting images.



import json import trt_pose.coco import trt_pose.models import torch import torch2trt from torch2trt import TRTModule import time import cv2 import torchvision.transforms as transforms import PIL.Image, PIL.ImageDraw from trt_pose.draw_objects import DrawObjects from trt_pose.parse_objects import ParseObjects import argparse import os.path ''' img is PIL format ''' def draw_keypoints(img, key): thickness = 5 w, h = img.size draw = PIL.ImageDraw.Draw(img) #draw Rankle -> RKnee (16-> 14) if all(key[16]) and all(key[14]): draw.line([ int(key[16][2] * w), int(key[16][1] * h), int(key[14][2] * w), int(key[14][1] * h)],width = thickness, fill=(51,51,204)) #draw RKnee -> Rhip (14-> 12) if all(key[14]) and all(key[12]): draw.line([ int(key[14][2] * w), int(key[14][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(51,51,204)) #draw Rhip -> Lhip (12-> 11) if all(key[12]) and all(key[11]): draw.line([ int(key[12][2] * w), int(key[12][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(51,51,204)) #draw Lhip -> Lknee (11-> 13) if all(key[11]) and all(key[13]): draw.line([ int(key[11][2] * w), int(key[11][1] * h), int(key[13][2] * w), int(key[13][1] * h)],width = thickness, fill=(51,51,204)) #draw Lknee -> Lankle (13-> 15) if all(key[13]) and all(key[15]): draw.line([ int(key[13][2] * w), int(key[13][1] * h), int(key[15][2] * w), int(key[15][1] * h)],width = thickness, fill=(51,51,204)) #draw Rwrist -> Relbow (10-> 8) if all(key[10]) and all(key[8]): draw.line([ int(key[10][2] * w), int(key[10][1] * h), int(key[8][2] * w), int(key[8][1] * h)],width = thickness, fill=(255,255,51)) #draw Relbow -> Rshoulder (8-> 6) if all(key[8]) and all(key[6]): draw.line([ int(key[8][2] * w), int(key[8][1] * h), int(key[6][2] * w), int(key[6][1] * h)],width = thickness, fill=(255,255,51)) #draw Rshoulder -> Lshoulder (6-> 5) if all(key[6]) and all(key[5]): draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[5][2] * w), int(key[5][1] * h)],width = thickness, fill=(255,255,0)) #draw Lshoulder -> Lelbow (5-> 7) if all(key[5]) and all(key[7]): draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[7][2] * w), int(key[7][1] * h)],width = thickness, fill=(51,255,51)) #draw Lelbow -> Lwrist (7-> 9) if all(key[7]) and all(key[9]): draw.line([ int(key[7][2] * w), int(key[7][1] * h), int(key[9][2] * w), int(key[9][1] * h)],width = thickness, fill=(51,255,51)) #draw Rshoulder -> RHip (6-> 12) if all(key[6]) and all(key[12]): draw.line([ int(key[6][2] * w), int(key[6][1] * h), int(key[12][2] * w), int(key[12][1] * h)],width = thickness, fill=(153,0,51)) #draw Lshoulder -> LHip (5-> 11) if all(key[5]) and all(key[11]): draw.line([ int(key[5][2] * w), int(key[5][1] * h), int(key[11][2] * w), int(key[11][1] * h)],width = thickness, fill=(153,0,51)) #draw nose -> Reye (0-> 2) if all(key[0][1:]) and all(key[2]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[2][2] * w), int(key[2][1] * h)],width = thickness, fill=(219,0,219)) #draw Reye -> Rear (2-> 4) if all(key[2]) and all(key[4]): draw.line([ int(key[2][2] * w), int(key[2][1] * h), int(key[4][2] * w), int(key[4][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> Leye (0-> 1) if all(key[0][1:]) and all(key[1]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[1][2] * w), int(key[1][1] * h)],width = thickness, fill=(219,0,219)) #draw Leye -> Lear (1-> 3) if all(key[1]) and all(key[3]): draw.line([ int(key[1][2] * w), int(key[1][1] * h), int(key[3][2] * w), int(key[3][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> neck (0-> 17) if all(key[0][1:]) and all(key[17]): draw.line([ int(key[0][2] * w), int(key[0][1] * h), int(key[17][2] * w), int(key[17][1] * h)],width = thickness, fill=(255,255,0)) return img ''' hnum: 0 based human index kpoint : index + keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height) ''' def get_keypoint(humans, hnum, peaks): #check invalid human index kpoint = [] human = humans[0][hnum] C = human.shape[0] for j in range(C): k = int(human[j]) if k >= 0: peak = peaks[0][j][k] # peak[1]:width, peak[0]:height peak = (j, float(peak[0]), float(peak[1])) kpoint.append(peak) print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) ) else: peak = (j, None, None) kpoint.append(peak) print('index:%d : None'%(j) ) return kpoint def preprocess(image): global device device = torch.device('cuda') image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = PIL.Image.fromarray(image) image = transforms.functional.to_tensor(image).to(device) image.sub_(mean[:, None, None]).div_(std[:, None, None]) return image[None, ...] ''' Draw to inference (small)image ''' def execute(img): start = time.time() data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() end = time.time() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) for i in range(counts[0]): print("Human index:%d "%( i )) get_keypoint(objects, i, peaks) print("Human count:%d len:%d "%(counts[0], len(counts))) print('===== Net FPS :%f ====='%( 1 / (end - start))) draw_objects(img, counts, objects, peaks) return img ''' Draw to original image ''' def execute_2(img, org): start = time.time() data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() end = time.time() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) for i in range(counts[0]): print("Human index:%d "%( i )) kpoint = get_keypoint(objects, i, peaks) #print(kpoint) org = draw_keypoints(org, kpoint) print("Human count:%d len:%d "%(counts[0], len(counts))) print('===== Net FPS :%f ====='%( 1 / (end - start))) return org parser = argparse.ArgumentParser(description='TensorRT pose estimation run') parser.add_argument('--image', type=str, default='/home/spypiggy/src/test_images/humans_7.jpg') parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' ) args = parser.parse_args() with open('human_pose.json', 'r') as f: human_pose = json.load(f) topology = trt_pose.coco.coco_category_to_topology(human_pose) num_parts = len(human_pose['keypoints']) num_links = len(human_pose['skeleton']) if 'resnet' in args.model: print('------ model = resnet--------') MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth' OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth' model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 224 HEIGHT = 224 else: print('------ model = densenet--------') MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth' OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth' model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 256 HEIGHT = 256 data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda() if os.path.exists(OPTIMIZED_MODEL) == False: model.load_state_dict(torch.load(MODEL_WEIGHTS)) model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25) torch.save(model_trt.state_dict(), OPTIMIZED_MODEL) model_trt = TRTModule() model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL)) t0 = time.time() torch.cuda.current_stream().synchronize() for i in range(50): y = model_trt(data) torch.cuda.current_stream().synchronize() t1 = time.time() print(50.0 / (t1 - t0)) mean = torch.Tensor([0.485, 0.456, 0.406]).cuda() std = torch.Tensor([0.229, 0.224, 0.225]).cuda() device = torch.device('cuda') src = cv2.imread(args.image, cv2.IMREAD_COLOR) pilimg = cv2.cvtColor(src, cv2.COLOR_BGR2RGB) pilimg = PIL.Image.fromarray(pilimg) orgimg = pilimg.copy() image = cv2.resize(src, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA) parse_objects = ParseObjects(topology) draw_objects = DrawObjects(topology) for x in range(1): img = image.copy() #img = execute(img) pilimg = execute_2(img, orgimg) #cv2.imshow('key',img) dir, filename = os.path.split(args.image) name, ext = os.path.splitext(filename) pilimg.save('/home/spypiggy/src/test_images/result/%s_%s.png'%(args.model, name))
<detect_image.py>

Run the code.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ python3 detect_image.py --image=/home/spypiggy/src/test_images/image_multi.png --model=densenet
.......
Human count:16 len:1 ===== Net FPS :33.802406 =====

As you can see, the FPS value is 33! Amazing. After loading the model, the first inference task is always slow. Therefore, the image processing after the second time will increase in the hot speed. I will check the correct FPS value again while testing the video file later.


This is the test input image.

This is the output image.

The meaning of KeyPoint values in the COCO model is explained in detail in JetsonNano-NVIDIA AI IOT-Human Pose estimation using TensorRT introduced earlier.


Get Keypoints from video



import json import trt_pose.coco import trt_pose.models import torch import torch2trt from torch2trt import TRTModule import time, sys import cv2 import PIL.Image, PIL.ImageDraw, PIL.ImageFont import numpy as np import torchvision.transforms as transforms from trt_pose.draw_objects import DrawObjects from trt_pose.parse_objects import ParseObjects import argparse import os.path def draw_keypoints(img, key): thickness = 5 w, h = img.size draw = PIL.ImageDraw.Draw(img) #draw Rankle -> RKnee (16-> 14) if all(key[16]) and all(key[14]): draw.line([ round(key[16][2] * w), round(key[16][1] * h), round(key[14][2] * w), round(key[14][1] * h)],width = thickness, fill=(51,51,204)) #draw RKnee -> Rhip (14-> 12) if all(key[14]) and all(key[12]): draw.line([ round(key[14][2] * w), round(key[14][1] * h), round(key[12][2] * w), round(key[12][1] * h)],width = thickness, fill=(51,51,204)) #draw Rhip -> Lhip (12-> 11) if all(key[12]) and all(key[11]): draw.line([ round(key[12][2] * w), round(key[12][1] * h), round(key[11][2] * w), round(key[11][1] * h)],width = thickness, fill=(51,51,204)) #draw Lhip -> Lknee (11-> 13) if all(key[11]) and all(key[13]): draw.line([ round(key[11][2] * w), round(key[11][1] * h), round(key[13][2] * w), round(key[13][1] * h)],width = thickness, fill=(51,51,204)) #draw Lknee -> Lankle (13-> 15) if all(key[13]) and all(key[15]): draw.line([ round(key[13][2] * w), round(key[13][1] * h), round(key[15][2] * w), round(key[15][1] * h)],width = thickness, fill=(51,51,204)) #draw Rwrist -> Relbow (10-> 8) if all(key[10]) and all(key[8]): draw.line([ round(key[10][2] * w), round(key[10][1] * h), round(key[8][2] * w), round(key[8][1] * h)],width = thickness, fill=(255,255,51)) #draw Relbow -> Rshoulder (8-> 6) if all(key[8]) and all(key[6]): draw.line([ round(key[8][2] * w), round(key[8][1] * h), round(key[6][2] * w), round(key[6][1] * h)],width = thickness, fill=(255,255,51)) #draw Rshoulder -> Lshoulder (6-> 5) if all(key[6]) and all(key[5]): draw.line([ round(key[6][2] * w), round(key[6][1] * h), round(key[5][2] * w), round(key[5][1] * h)],width = thickness, fill=(255,255,0)) #draw Lshoulder -> Lelbow (5-> 7) if all(key[5]) and all(key[7]): draw.line([ round(key[5][2] * w), round(key[5][1] * h), round(key[7][2] * w), round(key[7][1] * h)],width = thickness, fill=(51,255,51)) #draw Lelbow -> Lwrist (7-> 9) if all(key[7]) and all(key[9]): draw.line([ round(key[7][2] * w), round(key[7][1] * h), round(key[9][2] * w), round(key[9][1] * h)],width = thickness, fill=(51,255,51)) #draw Rshoulder -> RHip (6-> 12) if all(key[6]) and all(key[12]): draw.line([ round(key[6][2] * w), round(key[6][1] * h), round(key[12][2] * w), round(key[12][1] * h)],width = thickness, fill=(153,0,51)) #draw Lshoulder -> LHip (5-> 11) if all(key[5]) and all(key[11]): draw.line([ round(key[5][2] * w), round(key[5][1] * h), round(key[11][2] * w), round(key[11][1] * h)],width = thickness, fill=(153,0,51)) #draw nose -> Reye (0-> 2) if all(key[0][1:]) and all(key[2]): draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[2][2] * w), round(key[2][1] * h)],width = thickness, fill=(219,0,219)) #draw Reye -> Rear (2-> 4) if all(key[2]) and all(key[4]): draw.line([ round(key[2][2] * w), round(key[2][1] * h), round(key[4][2] * w), round(key[4][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> Leye (0-> 1) if all(key[0][1:]) and all(key[1]): draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[1][2] * w), round(key[1][1] * h)],width = thickness, fill=(219,0,219)) #draw Leye -> Lear (1-> 3) if all(key[1]) and all(key[3]): draw.line([ round(key[1][2] * w), round(key[1][1] * h), round(key[3][2] * w), round(key[3][1] * h)],width = thickness, fill=(219,0,219)) #draw nose -> neck (0-> 17) if all(key[0][1:]) and all(key[17]): draw.line([ round(key[0][2] * w), round(key[0][1] * h), round(key[17][2] * w), round(key[17][1] * h)],width = thickness, fill=(255,255,0)) return img ''' hnum: 0 based human index kpoint : keypoints (float type range : 0.0 ~ 1.0 ==> later multiply by image width, height ''' def get_keypoint(humans, hnum, peaks): #check invalid human index kpoint = [] human = humans[0][hnum] C = human.shape[0] for j in range(C): k = int(human[j]) if k >= 0: peak = peaks[0][j][k] # peak[1]:width, peak[0]:height peak = (j, float(peak[0]), float(peak[1])) kpoint.append(peak) #print('index:%d : success [%5.3f, %5.3f]'%(j, peak[1], peak[2]) ) else: peak = (j, None, None) kpoint.append(peak) #print('index:%d : None %d'%(j, k) ) return kpoint parser = argparse.ArgumentParser(description='TensorRT pose estimation run') parser.add_argument('--model', type=str, default='resnet', help = 'resnet or densenet' ) parser.add_argument('--video', type=str, default='/home/spypiggy/src/test_images/video.avi', help = 'video file name' ) args = parser.parse_args() with open('human_pose.json', 'r') as f: human_pose = json.load(f) topology = trt_pose.coco.coco_category_to_topology(human_pose) num_parts = len(human_pose['keypoints']) num_links = len(human_pose['skeleton']) if 'resnet' in args.model: print('------ model = resnet--------') MODEL_WEIGHTS = 'resnet18_baseline_att_224x224_A_epoch_249.pth' OPTIMIZED_MODEL = 'resnet18_baseline_att_224x224_A_epoch_249_trt.pth' model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 224 HEIGHT = 224 else: print('------ model = densenet--------') MODEL_WEIGHTS = 'densenet121_baseline_att_256x256_B_epoch_160.pth' OPTIMIZED_MODEL = 'densenet121_baseline_att_256x256_B_epoch_160_trt.pth' model = trt_pose.models.densenet121_baseline_att(num_parts, 2 * num_links).cuda().eval() WIDTH = 256 HEIGHT = 256 data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda() if os.path.exists(OPTIMIZED_MODEL) == False: print('-- Converting TensorRT models. This may takes several minutes...') model.load_state_dict(torch.load(MODEL_WEIGHTS)) model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25) torch.save(model_trt.state_dict(), OPTIMIZED_MODEL) model_trt = TRTModule() model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL)) t0 = time.time() torch.cuda.current_stream().synchronize() for i in range(50): y = model_trt(data) torch.cuda.current_stream().synchronize() t1 = time.time() print(50.0 / (t1 - t0)) mean = torch.Tensor([0.485, 0.456, 0.406]).cuda() std = torch.Tensor([0.229, 0.224, 0.225]).cuda() device = torch.device('cuda') def preprocess(image): global device device = torch.device('cuda') image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = PIL.Image.fromarray(image) image = transforms.functional.to_tensor(image).to(device) image.sub_(mean[:, None, None]).div_(std[:, None, None]) return image[None, ...] def execute(img, src, t): color = (0, 255, 0) data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) fps = 1.0 / (time.time() - t) for i in range(counts[0]): keypoints = get_keypoint(objects, i, peaks) for j in range(len(keypoints)): if keypoints[j][1]: x = round(keypoints[j][2] * WIDTH * X_compress) y = round(keypoints[j][1] * HEIGHT * Y_compress) cv2.circle(src, (x, y), 3, color, 2) cv2.putText(src , "%d" % int(keypoints[j][0]), (x + 5, y), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 255), 1) cv2.circle(src, (x, y), 3, color, 2) print("FPS:%f "%(fps)) #draw_objects(img, counts, objects, peaks) cv2.putText(src , "FPS: %f" % (fps), (20, 20), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 1) out_video.write(src) ''' Draw to original image ''' def execute_2(img, org, count): start = time.time() data = preprocess(img) cmap, paf = model_trt(data) cmap, paf = cmap.detach().cpu(), paf.detach().cpu() end = time.time() counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15) for i in range(counts[0]): #print("Human index:%d "%( i )) kpoint = get_keypoint(objects, i, peaks) #print(kpoint) org = draw_keypoints(org, kpoint) netfps = 1 / (end - start) draw = PIL.ImageDraw.Draw(org) draw.text((30, 30), "NET FPS:%4.1f"%netfps, font=fnt, fill=(0,255,0)) print("Human count:%d len:%d "%(counts[0], len(counts))) print('===== Frmae[%d] Net FPS :%f ====='%(count, netfps)) return org cap = cv2.VideoCapture(args.video) ret_val, img = cap.read() H, W, __ = img.shape fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') dir, filename = os.path.split(args.video) name, ext = os.path.splitext(filename) out_video = cv2.VideoWriter('/home/spypiggy/src/test_images/result/%s_%s.mp4'%(args.model, name), fourcc, cap.get(cv2.CAP_PROP_FPS), (W, H)) count = 0 X_compress = 640.0 / WIDTH * 1.0 Y_compress = 480.0 / HEIGHT * 1.0 if cap is None: print("Camera Open Error") sys.exit(0) parse_objects = ParseObjects(topology) draw_objects = DrawObjects(topology) fontname = '/usr/share/fonts/opentype/noto/NotoSansCJK-Regular.ttc' fnt = PIL.ImageFont.truetype(fontname, 45) count = 1 while cap.isOpened(): ret_val, dst = cap.read() if ret_val == False: print("Frame Read End") break img = cv2.resize(dst, dsize=(WIDTH, HEIGHT), interpolation=cv2.INTER_AREA) pilimg = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB) pilimg = PIL.Image.fromarray(pilimg) pilimg = execute_2(img, pilimg, count) array = np.asarray(pilimg, dtype="uint8") out_video.write(array) count += 1 cv2.destroyAllWindows() out_video.release() cap.release()

<detect_video2.py>

Run the code.

(python) spypiggy@XavierNX:~/src/trt_pose/tasks/human_pose$ python3 detect_video2.py
......
Human count:8 len:1
===== Frmae[200] Net FPS :48.353785 =====
Human count:8 len:1
===== Frmae[201] Net FPS :47.049301 =====
Human count:6 len:1
===== Frmae[202] Net FPS :47.425955 =====
Human count:9 len:1
===== Frmae[203] Net FPS :47.847956 =====
Human count:8 len:1
===== Frmae[204] Net FPS :48.244772 =====
Frame Read End

Outstanding performance of 45 ~ 80FPS. This value is the time taken to process the inference image in the model, and does not include the time to draw the detected keypoint value to the image and the time to save the video.
Let's check the accuracy .



The output FPS is 45 ~ 80 with resnet model. This is the best pose estimation model I've ever experienced with the Xavier NX and the accuracy is also excellent. .

Wrapping up

The PyTorch model (keypointrcnn_resnet50_fpn_coco-fc266e95.pth) using ResNet-50 is more accurate, but it is too difficult to use in the Jetson series because the processing speed is too low.
Personally, I think the best PoseEstimation models available in Jetson Nano, Xavier NX, etc. at this point are resnet18 and densenet121 using TensorRT.

The following article introduces how to easily debug trt_pose using VSCode.










댓글 없음:

댓글 쓰기