last updated 2021.08.24 : update for introduction to new link
This article was written a long time ago, so some links have errors. If you are trying to implement Pose Estimation by installing Pytorch 1.9 or higher on JetPack 4.5 or 4.6, I recommend looking at the following link.
I used Jetson Nano, Ubuntu 18.04 Official image with root account.
In my previous article, I explained pose estimation using Tensorflow, OpenPose .
In this article I'm going to use Pytorch to do a pose estimation.
First I'll examine the torchvision package.
The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Surely it requires the Pytorch framework.
Currently(2019.10) torchvision 0.4.x requires Pytorch Ver 1.2.
The torchvision datasets includes MNIST, CIFAR, COCO, and many more. You can find full datasets here.
And torchvision supports many models like AlexNet, ResNet, Inception V3, GoogLeNet, MobileNet V2, ....
For "Pose Estimation", the torchvision supports "Keypoint R-CNN ResNet-50 FPN" model.
You can find a detailed explanations here.
This table shows how much memory the models need. "Keypoint R-CNN ResNet-50 FPN" needs 6.8 GB. This value far exceeds the memory of Jetson Nano. But let's try it out and see if it's successful or not.
Network
|
train time (s / it)
|
test time (s / it)
|
memory (GB)
|
---|---|---|---|
Faster R-CNN ResNet-50 FPN
|
0.2288
|
0.0590
|
5.2
|
Mask R-CNN ResNet-50 FPN
|
0.2728
|
0.0903
|
5.4
|
Keypoint R-CNN ResNet-50 FPN
|
0.3789
|
0.1242
|
6.8
|
Prerequisites
Before you build Pytorch, torchvision, you must pre install these packages.
apt-get install libjpeg-dev zlib1g-dev
Installation (JetPack 4.3)
Be careful : These packages are upgraded from time to time. So you should check the site first and find the latest version to install. Pytorch version under 1.3 has some problem with cuda(PyTorch issue #8103). So I strongly recommend that you use version 1.3 or higher If you are using JetPack 4.4, skip to the next serction.
Before installing pytorch 1.3, visit this site to check the latest pytorch version.Before installing torchvision 0.4.2, visit this site to check the latest pytorch version.
cd /usr/local/src #First install torch 1.3, numpy 1.16.5 wget https://nvidia.box.com/shared/static/phqe92v26cbhqjohwtvxorrwnmrnfx1o.whl -O torch-1.3.0-cp36-cp36m-linux_aarch64.whl
pip3 install numpy torch-1.3.0-cp36-cp36m-linux_aarch64.whl #Next install torchvision 0.4.2 git clone -b v0.4.2 https://github.com/pytorch/vision torchvision cd torchvision python3 setup.py install
Be careful : If you met the errors about numpy, remove the numpy and reinstall it with version 1.16.5 (pip3 install numpy=1.16.5)
apt-get remove python3-numpy pip3 install numpy==1.16.5
Let's check whether the installation is correct.
If you see thescreen like this, the installation is successful.
root@spypiggy-desktop:/usr/local/src/study/torchvision_walkthrough# python3 Python 3.6.8 (default, Aug 20 2019, 17:12:48) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> import torchvision
Installation (JetPack 4.4)
In my other article Jetson Nano-JetPack 4.4 (production release) and Pytorch 1.6.0 installation, I explained how to install PyTorch, torchvision.
Installation Sample Codes
Now we have finished installing Pytorch, torchvision. It's time to
install sample python codes to proceed. I'll use the codes from https://github.com/kairess/torchvision_walkthrough.git .
Now you can find several sample files to test. some files are jupyter notebook files. The author of these codes use a MacBook. So the sample codes do not take GPU(cuda) into account. There are my codes considering the GPU at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/tree/master/tf-pose-estimation. Using CUDA in pytorch is about 10 times faster!
cd /usr/local/src
git clone https://github.com/kairess/torchvision_walkthrough.git
cd /usr/local/src//torchvision_walkthrough
Now you can find several sample files to test. some files are jupyter notebook files. The author of these codes use a MacBook. So the sample codes do not take GPU(cuda) into account. There are my codes considering the GPU at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/tree/master/tf-pose-estimation. Using CUDA in pytorch is about 10 times faster!
Download models
We will use the keypointrcnn_resnet50_fpn model.
In PyTorch, this model can be accessed by models.detection.keypointrcnn_resnet50_fpn.
If as in the example code below
In PyTorch, this model can be accessed by models.detection.keypointrcnn_resnet50_fpn.
If as in the example code below
model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval()
PyTorch automatically stores the model in the current user's cache directory. The storage path is as follows. ~/.cache/torch/hub/checkpoints/keypointrcnn_resnet50_fpn_coco-fc266e95.pth
If you want to save and use the model in a specific directory in advance without using the cache directory, you can download the model to a specific directory and change the code as follows.
First download the model to the specific directory.
wget http://download.pytorch.org/models/keypointrcnn_resnet50_fpn_coco-fc266e95.pth -O "filename"
Then load the model from the local system.
model = models.detection.keypointrcnn_resnet50_fpn(pretrained=False).eval() model.load_state_dict(torch.load('/home/spypiggy/src/torchvision_walkthrough/models/keypointrcnn_resnet50_fpn_coco-fc266e95.pth'))
Keypoint detection comparison of performance with or without cuda
import torch import torchvision from torchvision import models import torchvision.transforms as T import numpy as np from PIL import Image import matplotlib.pyplot as plt from matplotlib.path import Path import matplotlib.patches as patches import argparse import sys, time IMG_SIZE = 480 THRESHOLD = 0.95 parser = argparse.ArgumentParser(description="Keypoint detection. - Pytorch") parser.add_argument("--cuda", action="store_true") args = parser.parse_args() if True == torch.cuda.is_available(): print('pytorch:%s GPU support'% torch.__version__) else: print('pytorch:%s GPU Not support ==> Error:Jetson should support cuda'% torch.__version__) sys.exit() print('torchvision', torchvision.__version__) model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval() if(args.cuda): model = model.cuda() #img = Image.open('imgs/07.jpg') img = Image.open('imgs/apink1.jpg') img = img.resize((IMG_SIZE, int(img.height * IMG_SIZE / img.width))) plt.figure(figsize=(16, 16)) plt.imshow(img) trf = T.Compose([ T.ToTensor() ]) input_img = trf(img) print(input_img.shape) if(args.cuda): input_img = input_img.cuda() fps_time = time.perf_counter() out = model([input_img])[0] print(out.keys()) codes = [ Path.MOVETO, Path.LINETO, Path.LINETO ] fig, ax = plt.subplots(1, figsize=(16, 16)) ax.imshow(img) for box, score, keypoints in zip(out['boxes'], out['scores'], out['keypoints']): if(args.cuda): score = score.cpu().detach().numpy() else: score = score.detach().numpy() if score < THRESHOLD: continue if(args.cuda): box = box.to(torch.int16).cpu().numpy() keypoints = keypoints.to(torch.int16).cpu().numpy()[:, :2] else: box = box.detach().numpy() keypoints = keypoints.detach().numpy()[:, :2] rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor='b', facecolor='none') ax.add_patch(rect) # 17 keypoints for k in keypoints: circle = patches.Circle((k[0], k[1]), radius=2, facecolor='r') ax.add_patch(circle) # draw path # left arm path = Path(keypoints[5:10:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right arm path = Path(keypoints[6:11:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # left leg path = Path(keypoints[11:16:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right leg path = Path(keypoints[12:17:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) plt.savefig('result.jpg') fps = 1.0 / (time.perf_counter() - fps_time) if(args.cuda): print('FPS(cuda support):%f'%(fps)) else: print('FPS(cuda not support):%f'%(fps))
<keypoints2.py>
Let's run above code without --cuda options
root@spypiggy-desktop:/usr/local/src/study/torchvision_walkthrough# python3 keypoints2.py pytorch:1.3.0 GPU support torchvision 0.4.2 torch.Size([3, 720, 480]) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores']) FPS(cuda not support):0.002254
Now let's run above code with --cuda options
root@spypiggy-desktop:/usr/local/src/study/torchvision_walkthrough# python3 keypoints2.py --cuda pytorch:1.3.0 GPU support torchvision 0.4.2 torch.Size([3, 720, 480]) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores']) FPS(cuda support):0.070780
As you can see in the above result, by using cuda, you can achieve a speed improvement of more than 10 times.
Under the hood
Now let's dig deeper.GPU Support and model loading
This code checks to see if it supports CUDA, then it loads model. At first time it might take some seconds for downloading models from server.These codes are same meaning.
if torch.cuda.is_available(): device = torch.device('cuda') model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True) model = model.to(device) model.eval() else: device = torch.device('cpu') model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval()
model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval() if torch.cuda.is_available(): model.cuda()
Torchvision keypoint number and human parts
Torchvision's keypoint numbering is different from OpenPose or Tensorflow's models.The values are like this.
COCO_PERSON_KEYPOINT_NAMES = [ 'nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear', 'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow', 'left_wrist', 'right_wrist', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'left_ankle', 'right_ankle' ]
Result from model
Unlike TensorFlow, Pytorch is so intuitive that it makes code easier to understand.Only three lines of code are enough.
First convert a numpy image to tensor, move the variable to cuda if using GPU.
Then insert the tensor to model, the return value is the list of dictionary type. As I inserted one image to model, index 0 of the list is sufficient.
input_img = trf(img) # Make image to Pytorch tensor input_img = input_img.to(device) out = model([input_img])[0]
If you print the out variable's dictionary keys, you can see these key values.
print(out.keys()) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])
You can see these values' explanations at https://pytorch.org/docs/stable/torchvision/models.html#object-detection-instance-segmentation-and-person-keypoint-detection.
But at this point(2019.10), the document is incomplete. They don't explain the keypoints_scores. The remaining values are explained as follows.
Be careful : keypoints visibility value seems to be not correct. 1 means the point is visible, 0 means the point is invisible(hidden human parts).
- boxes (
FloatTensor[N, 4]
): the ground-truth boxes in[x1, y1, x2, y2]
format, with values between0
andH
and0
andW
- labels (
Int64Tensor[N]
): the class label for each ground-truth box- keypoints (
FloatTensor[N, K, 3]
): theK
keypoints location for each of theN
instances, in the format[x, y, visibility]
, wherevisibility=0
means that the keypoint is not visible.
If you input image inference to the network model(keypointrcnn_resnet50_fpn), you can get output dictionary value. In this code, 'for loop' iterates for human counts, and prints keypoints and keypoint_score values.
out = model([input_img])[0] for box, score, keypoints, kscores in zip(out['boxes'], out['scores'], out['keypoints'], out['keypoints_scores'] ): score = score.cpu().detach().numpy() box = box.cpu().detach().numpy() points = keypoints.cpu().detach().numpy() kscores = kscores.cpu().detach().numpy() print(kscores) print(points)
keypoints_score
Some images may only show the torso or parts of the body like this I used in my article "Human Pose Estimation using OpenPose."
<imgs/COCO_294.jpg>
python3 keypoints_gpu.py --image=imgs/COCO_294.jpg
You can get the console outputs like this.
[12.644188 13.438369 14.29085 11.607738 13.448702 4.9779096 7.0458913 7.259503 10.004989 7.2911468 9.224331 0.04336043 1.5927595 -0.7652377 1.4325492 -1.3891729 -1.781479 ] [[110.76687 99.818794 1. ] [121.55979 90.219604 1. ] [ 99.57419 89.81964 1. ] [134.35143 99.41882 1. ] [ 81.18623 98.6189 1. ] [100.77341 145.01497 1. ] [ 83.58466 143.81506 1. ] [169.12865 258.20538 1. ] [174.72498 208.60957 1. ] [246.27812 251.40593 1. ] [215.89803 151.41441 1. ] [100.37367 302.20163 1. ] [ 79.98702 303.8015 1. ] [224.29253 193.0109 1. ] [147.54279 221.4085 1. ] [249.47603 251.40593 1. ] [247.07762 252.20589 1. ]]
Let's compare the keypoints_score and it's name. The score at the lower wrist is very low. If you see the above picture, you might understand enough.
KeyPoint Names Keypoint Scores 'nose', 12.644188 'left_eye', 13.438369 'right_eye', 14.29085 'left_ear', 11.607738 'right_ear', 13.448702 'left_shoulder', 4.9779096 'right_shoulder', 7.0458913 'left_elbow', 7.259503 'right_elbow', 10.004989 'left_wrist', 7.2911468 'right_wrist', 9.224331 'left_hip', 0.04336043 'right_hip', 1.5927595 'left_knee', -0.7652377 'right_knee', 1.4325492 'left_ankle', -1.3891729 'right_ankle' -1.781479
I can't find these values range explanation. If you know a description of the range of this value, let me know.
You can set threshold values near 3. If keypoint_score value is below this threshold, discard the keypoint values of that index.
Be careful : As you cane see, the man's left should(index 5) is hidden. So keypoint visibility value of left_shoulder should be 0. But no.... Perhaps future version of this model(keypointrcnn_resnet50_fpn) might modify this bug. But now, you must not use the visibility value.
I set the threshould value to 3.5 in my python code.
I intentionally omit lines connecting the low score keypoints. If you do not consider keypoint scores, the following picture will be created. I set the threshould value to -3.5 in my python code.
FPS check
https://github.com/kairess/torchvision_walkthrough.git provides sample code(video_keypoints.py) for video file keypoint detection. I modified video_keypoints.py to display fps and to speed up processing using cuda. I set the video width 480 pixel. If you change the output video size, the fps might change.import torch import torchvision from torchvision import models import torchvision.transforms as T import time import cv2 import numpy as np import gc import sys print('pytorch', torch.__version__) print('torchvision', torchvision.__version__) IMG_SIZE = 480 THRESHOLD = 0.7 fps_time = 0 def process_frame(img): #out = None torch.cuda.empty_cache() gc.collect() fps_time = time.perf_counter() img = cv2.resize(img, (IMG_SIZE, int(img.shape[0] * IMG_SIZE / img.shape[1]))) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) trf = T.Compose([ T.ToTensor() ]) input_tensor = trf(img) input_img = [input_tensor.to(device)] out = model(input_img)[0] print(len(out['boxes'])) for box, score, keypoints in zip(out['boxes'], out['scores'], out['keypoints']): score_np = score.cpu().detach().numpy() print(score_np) if score_np < THRESHOLD: continue box_np = box.to(torch.int16).cpu().numpy() keypoints_np = keypoints.to(torch.int16).cpu().numpy()[:, :2] cv2.rectangle(img, pt1=(int(box_np[0]), int(box_np[1])), pt2=(int(box_np[2]), int(box_np[3])), thickness=2, color=(0, 0, 255)) for k in keypoints_np: cv2.circle(img, center=tuple(k.astype(int)), radius=2, color=(255, 0, 0), thickness=-1) cv2.polylines(img, pts=[keypoints_np[5:10:2].astype(int)], isClosed=False, color=(255, 0, 0), thickness=2) cv2.polylines(img, pts=[keypoints_np[6:11:2].astype(int)], isClosed=False, color=(255, 0, 0), thickness=2) cv2.polylines(img, pts=[keypoints_np[11:16:2].astype(int)], isClosed=False, color=(255, 0, 0), thickness=2) cv2.polylines(img, pts=[keypoints_np[12:17:2].astype(int)], isClosed=False, color=(255, 0, 0), thickness=2) fps = 1.0 / (time.perf_counter() - fps_time) new_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR) cv2.putText(new_img , "FPS: %f" % (fps), (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2) out_video.write(new_img) input_tensor.cpu() if torch.cuda.is_available(): device = torch.device('cuda') model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True) model = model.to(device) model.eval() else: device = torch.device('cpu') model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval() cap = cv2.VideoCapture('imgs/02.mp4') ret, img = cap.read() fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') out_video = cv2.VideoWriter('imgs/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (IMG_SIZE, int(img.shape[0] * IMG_SIZE / img.shape[1]))) count = 1 while cap.isOpened(): ret, img = cap.read() if ret == False: break process_frame(img) sys.stdout.flush () print('Frame count[%d]'%count) count += 1 out_video.release() cap.release()
<video_gpu.py>
AMD Ryzen 7 2700X + RTX 2070 + Ubuntu 18.04
This captured image is part of the video made from the workstation(AMD Ryzen 2700X(64GB DDR4), Nvidia RTX2070 GPU, Ubuntu 18.04 OS). As you can see, the FPS is around 8 ~ 10 frames.Jetson Nano
This captured image is part of the video made from the Jetson Nano. As you can see, the FPS is around 0.3 frames. I think this fps is too poor to apply for your realtime projects.Wrapping Up
Torchvision's pose estimation performance is very poor on the Jetson Nano. I'll test the same torchvision's pose estimation on the Jetson TX2 soon.If you want the most satisfactory human pose estimation performance on Jetson Nano, see the following article(https://spyjetson.blogspot.com/2019/12/jetsonnano-human-pose-estimation-using.html). NVIDIA team introduces human pose estimation using models optimized for TensorRT.