Wednesday, October 16, 2019, I've described how to implement PoseEstimation in PyTorch.
It was explained based on JetPack 4.4 and PyTorch 1.6. But now some links are not complete and contents are outdated. Therefore, I will try to implement PoseEstimation again after installing PyTorch 1.9 from JetPack 4.6, the most recent version as of August 2021. In terms of content, there is no significant difference from the previous article.
Prerequisites
Before you build Pytorch, torchvision, you must pre install these packages.
apt-get install libjpeg-dev zlib1g-dev
Install PyTorch
PyTorch should not be downloaded from the PyTorch website, but must be downloaded from the link below and installed. The installation file below is built for NVidia Jetson series.
Before installing pytorch , visit this site to check the latest pytorch version.
Before installing torchvision , visit this site to check the latest torchvision version.
The latest version at this time is PyTorch 1.9. Download and install the file below.
Delete old versions of PyTorch
root@spypiggy-nano:/usr/local/src/detr# pip3 freeze|grep torch torch==1.1.0 torchvision==0.3.0 root@spypiggy-nano:/usr/local/src/detr# pip3 uninstall torchvision==0.3.0 root@spypiggy-nano:/usr/local/src/detr# pip3 uninstall torch==1.1.0
Download and install Pytorch whl file
We always use Python 3.X. Therefore, download the whl file that can be used in Python 3.6. And install the necessary packages as follows.
apt-get install python3-pip libopenblas-base libopenmpi-dev pip3 install Cython wget -O torch-1.9.0-cp36-cp36m-linux_aarch64.whl https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl pip3 install torch-1.9.0-cp36-cp36m-linux_aarch64.whl
Download and install Pytorch whl file
If you have successfully installed PyTorch 1.9.0, install Torchvision 0.10.0. The latest version of torchvision can be found at https://github.com/pytorch/vision/releases.
sudo apt-get install libjpeg-dev zlib1g-dev libfreetype6-dev wget https://github.com/pytorch/vision/archive/v0.10.0.tar.gz tar -xvzf v0.10.0.tar.gz cd vision-0.10.0 #This takes very long time, have a coffee time sudo python3 setup.py install
Let's check whether the installation is correct. If you see the screen like this, the installation is successful.
root@spypiggyNano:/usr/local/src# python3 Python 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import torch >>> import torchvision >>> torch.__version__ '1.9.0' >>> torchvision.__version__ '0.10.0a0'
Be Careful : You must use python3, pip3 commands. PyTorch 1.5 and later, Python 2 is no longer supported.
Installation Sample Codes for Pose Estimation
cd /usr/local/src git clone https://github.com/kairess/torchvision_walkthrough.git cd /usr/local/src//torchvision_walkthrough
Now you can find several sample files to test. some files are jupyter notebook files. The author of these codes use a MacBook. So the sample codes do not take GPU(cuda) into account. I'm going to modify the sample codes to sue CUDA. Using CUDA in pytorch is about 10 times faster!
Keypoint detection comparison of performance with or without cuda
The example code below is slightly modified to use CUDA.
import torch import torchvision from torchvision import models import torchvision.transforms as T import numpy as np from PIL import Image import matplotlib.pyplot as plt from matplotlib.path import Path import matplotlib.patches as patches import argparse import sys, time IMG_SIZE = 480 THRESHOLD = 0.95 parser = argparse.ArgumentParser(description="Keypoint detection. - Pytorch") parser.add_argument("--cuda", action="store_true") args = parser.parse_args() if True == torch.cuda.is_available(): print('pytorch:%s GPU support'% torch.__version__) else: print('pytorch:%s GPU Not support ==> Error:Jetson should support cuda'% torch.__version__) sys.exit() print('torchvision', torchvision.__version__) model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval() if(args.cuda): model = model.cuda() #img = Image.open('imgs/07.jpg') img = Image.open('imgs/apink1.jpg') img = img.resize((IMG_SIZE, int(img.height * IMG_SIZE / img.width))) plt.figure(figsize=(16, 16)) plt.imshow(img) trf = T.Compose([ T.ToTensor() ]) input_img = trf(img) print(input_img.shape) if(args.cuda): input_img = input_img.cuda()
#The first result is time consuming. After the second, check the processing time with the result.
model([input_img])
print(out.keys()) codes = [ Path.MOVETO, Path.LINETO, Path.LINETO ] fig, ax = plt.subplots(1, figsize=(16, 16)) ax.imshow(img) for box, score, keypoints in zip(out['boxes'], out['scores'], out['keypoints']): if(args.cuda): score = score.cpu().detach().numpy() else: score = score.detach().numpy() if score < THRESHOLD: continue if(args.cuda): box = box.to(torch.int16).cpu().numpy() keypoints = keypoints.to(torch.int16).cpu().numpy()[:, :2] else: box = box.detach().numpy() keypoints = keypoints.detach().numpy()[:, :2] rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor='b', facecolor='none') ax.add_patch(rect) # 17 keypoints for k in keypoints: circle = patches.Circle((k[0], k[1]), radius=2, facecolor='r') ax.add_patch(circle) # draw path # left arm path = Path(keypoints[5:10:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right arm path = Path(keypoints[6:11:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # left leg path = Path(keypoints[11:16:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right leg path = Path(keypoints[12:17:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) plt.savefig('result.jpg') fps = 1.0 / (time.perf_counter() - fps_time) if(args.cuda): print('FPS(cuda support):%f'%(fps)) else: print('FPS(cuda not support):%f'%(fps))fps_time = time.perf_counter()out = model([input_img])[0]
Let's run above code without --cuda options and with --cuda options.
spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py pytorch:1.9.0 GPU support torchvision 0.10.0a0 torch.Size([3, 335, 480]) /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores']) FPS(cuda not support):0.021109 spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py --cuda pytorch:1.9.0 GPU support torchvision 0.10.0a0 torch.Size([3, 335, 480]) /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores']) FPS(cuda support):0.158586
Be Careful : Note that the model is processed twice in the source code. FPS is calculated as the processing time of the second model. The reason is that the first execution result after loading the model takes a lot of time.
The saved result.jpg file is as follows.
Even with CUDA, the speed is only 0.15 FPS. At this speed, it takes about 6.7 seconds to process one frame, making it unsuitable for real-time video stream processing. The main reason is that the resnet50 model used in this article is quite heavy instead of recording excellent accuracy.
Under the hood
Torchvision keypoint number and human parts
COCO_PERSON_KEYPOINT_NAMES = [ 'nose', 'left_eye', 'right_eye', 'left_ear', 'right_ear', 'left_shoulder', 'right_shoulder', 'left_elbow', 'right_elbow', 'left_wrist', 'right_wrist', 'left_hip', 'right_hip', 'left_knee', 'right_knee', 'left_ankle', 'right_ankle' ]
Result from model
Unlike TensorFlow 1.X, Pytorch is so intuitive that it makes code easier to understand.
Only three lines of code are enough.
First convert a numpy image to tensor, move the variable to cuda if using GPU.
Then insert the tensor to model, the return value is the list of dictionary type. As I inserted one image to model, index 0 of the list is sufficient.
input_img = trf(img) # Make image to Pytorch tensor input_img = input_img.to(device) out = model([input_img])[0]
If you print the out variable's dictionary keys, you can see these key values.
print(out.keys()) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])
You can see these values' explanations at https://pytorch.org/vision/stable/models.html#object-detection-instance-segmentation-and-person-keypoint-detection
- boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values between 0 and H and 0 and W
- labels (Int64Tensor[N]): the class label for each ground-truth box
- keypoints (FloatTensor[N, K, 3]): the K keypoints location for each of the N instances, in the format [x, y, visibility], where visibility=0 means that the keypoint is not visible.
out = model([input_img])[0] for box, score, keypoints, kscores in zip(out['boxes'], out['scores'], out['keypoints'], out['keypoints_scores'] ): score = score.cpu().detach().numpy() box = box.cpu().detach().numpy() points = keypoints.cpu().detach().numpy() kscores = kscores.cpu().detach().numpy() print(kscores) print(points)
Let's check the keypoints_score and keypoints of this picture. This is an inference image. The lower body is not visible in this picture.
Run this command.
spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py --image=./imgs/03.jpg --cuda pytorch:1.9.0 GPU support torchvision 0.10.0a0 torch.Size([3, 719, 480]) /usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.) return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode) dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores']) scores: 0.998314380645752 kscores: [ 1.25098372 0.82812518 2.83464313 10.02956772 10.867136 7.50433922 9.46248245 5.73677111 4.82222462 -1.08437061 -2.38515258 -2.25649142 -2.69200468 -3.09163022 -2.31611848 -1.32797825 -1.90648782] keypoints: [[386 295] [193 245] [368 264] [224 298] [352 298] [127 452] [412 470] [112 706] [420 706] [174 409] [419 706] [205 706] [354 706] [217 706] [434 578] [113 706] [421 706]] FPS(cuda support):0.148066
Let's compare the keypoints_score and it's name. The score at the lower wrist is very low. If you see the above picture, you might understand enough.
nose 1.250984 left_eye 0.828125 right_eye 2.834643 left_ear 10.029568 right_ear 10.867136 left_shoulder 7.504339 right_shoulder 9.462482 left_elbow 5.736771 right_elbow 4.822225 left_wrist -1.084371 right_wrist -2.385153 left_hip -2.256491 right_hip -2.692005 left_knee -3.091630 right_knee -2.316118 left_ankle -1.327978 right_ankle -1.906488
Example reflecting kscores
import torch import torchvision from torchvision import models import torchvision.transforms as T import numpy as np from PIL import Image import matplotlib.pyplot as plt from matplotlib.path import Path import matplotlib.patches as patches import argparse import sys, time IMG_SIZE = 480 parser = argparse.ArgumentParser(description="Keypoint detection. - Pytorch") parser.add_argument('--image', type=str, default="./imgs/03.jpg", help='inference image') parser.add_argument('--accuracy', type=float, default=0.9, help='accuracy. default=0.6') parser.add_argument("--cuda", action="store_true") args = parser.parse_args() if True == torch.cuda.is_available(): print('pytorch:%s GPU support'% torch.__version__) else: print('pytorch:%s GPU Not support ==> Error:Jetson should support cuda'% torch.__version__) sys.exit() print('torchvision', torchvision.__version__) model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval() if(args.cuda): model = model.cuda() img = Image.open(args.image) #img = Image.open('imgs/apink1.jpg') img = img.resize((IMG_SIZE, int(img.height * IMG_SIZE / img.width))) plt.figure(figsize=(16, 16)) plt.imshow(img) trf = T.Compose([ T.ToTensor() ]) input_img = trf(img) print(input_img.shape) if(args.cuda): input_img = input_img.cuda() #The first result is time consuming. After the second, check the processing time with the result. #model([input_img]) fps_time = time.perf_counter() out = model([input_img])[0] print(out.keys()) t_human = 0 r_human = 0 codes = [ Path.MOVETO, #Path.LINETO, Path.LINETO ] fig, ax = plt.subplots(1, figsize=(16, 16)) ax.imshow(img) t_human = 0 r_human = 0 for box, score, keypoints, kscores in zip(out['boxes'], out['scores'], out['keypoints'], out['keypoints_scores'] ): if(args.cuda): score = score.cpu().detach().numpy() kscores = kscores.cpu().detach().numpy() box = box.to(torch.int16).cpu().numpy() keypoints = keypoints.to(torch.int16).cpu().numpy()[:, :2] else: score = score.detach().numpy() box = box.detach().numpy() keypoints = keypoints.detach().numpy()[:, :2] kscores = kscores.detach().numpy() t_human += 1 if score < args.accuracy: continue r_human += 1 rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor='b', facecolor='none') ax.add_patch(rect) # 17 keypoints #for k in keypoints: for x in range(len(keypoints)): k = keypoints[x] if kscores[x] > 0: if x == 5: circle = patches.Circle((k[0], k[1]), radius=4, facecolor='r') else: circle = patches.Circle((k[0], k[1]), radius=2, facecolor='r') ax.add_patch(circle) # draw path # left arm if kscores[5] > 0 and kscores[7] > 0: path = Path(keypoints[5:8:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) if kscores[7] > 0 and kscores[9] > 0: path = Path(keypoints[7:10:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right arm if kscores[6] > 0 and kscores[8] > 0: path = Path(keypoints[6:9:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) if kscores[8] > 0 and kscores[10] > 0: path = Path(keypoints[8:11:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # left leg if kscores[11] > 0 and kscores[13] > 0: path = Path(keypoints[11:14:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) if kscores[13] > 0 and kscores[15] > 0: path = Path(keypoints[13:16:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right leg if kscores[12] > 0 and kscores[14] > 0: path = Path(keypoints[12:15:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) if kscores[14] > 0 and kscores[16] > 0: path = Path(keypoints[14:17:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) plt.savefig('result.jpg') fps = 1.0 / (time.perf_counter() - fps_time) print('total human:%d real human:%d'%(t_human, r_human)) if(args.cuda): print('FPS(cuda support):%f'%(fps)) else: print('FPS(cuda not support):%f'%(fps))
The result of executing the above code is as follows.
Wrapping up
After about two years since October 2019, I looked at keypoint recognition in PyTorch again. PyTorch has been upgraded in the meantime, but there doesn't seem to be much change in the model and documentation for keypoint recognition.
This Resnet50-based model has high accuracy, but the processing speed is too low to be suitable for real-time video processing on the Jetson Nano. It is well worth considering for image file processing purposes. If you want to detect keypoints in real-time video files on Jetson Nano, please refer to the following links.
- JetsonNano - NVIDIA AI IOT - Human Pose estimation using TensorRT
- Xavier NX - NVIDIA AI IOT - Human Pose estimation using TensorRT
The source code of this text can be downloaded from my github.
댓글 없음:
댓글 쓰기