2021년 8월 24일 화요일

JetsonNano - Installing the latest Pytorch 1.9 and Pose Estimation

 Wednesday, October 16, 2019, I've described how to implement PoseEstimation in PyTorch.

It was explained based on JetPack 4.4 and PyTorch 1.6. But now some links are not complete and contents are outdated. Therefore, I will try to implement PoseEstimation again after installing PyTorch 1.9 from JetPack 4.6, the most recent version as of August 2021. In terms of content, there is no significant difference from the previous article.


Before you build Pytorch, torchvision, you must pre install these packages.

apt-get install libjpeg-dev zlib1g-dev

Install PyTorch

PyTorch should not be downloaded from the PyTorch website, but must be downloaded from the link below and installed. The installation file below is built for NVidia Jetson series.

Before installing pytorch , visit this site to check the latest pytorch version.

Before installing torchvision , visit this site to check the latest torchvision version.

The latest version at this time is PyTorch 1.9. Download and install the file below.

Delete old versions of PyTorch

First check pre-installed PyTorch. If there is no PyTorch version already installed, proceed to the next step.

root@spypiggy-nano:/usr/local/src/detr# pip3 freeze|grep torch

root@spypiggy-nano:/usr/local/src/detr# pip3 uninstall  torchvision==0.3.0
root@spypiggy-nano:/usr/local/src/detr# pip3 uninstall  torch==1.1.0

Download and install Pytorch whl file 

We always use Python 3.X. Therefore, download the whl file that can be used in Python 3.6. And install the necessary packages as follows.

apt-get install python3-pip libopenblas-base libopenmpi-dev 
pip3 install Cython
wget -O torch-1.9.0-cp36-cp36m-linux_aarch64.whl https://nvidia.box.com/shared/static/h1z9sw4bb1ybi0rm3tu8qdj8hs05ljbm.whl
pip3 install torch-1.9.0-cp36-cp36m-linux_aarch64.whl

Download and install Pytorch whl file

If you have successfully installed PyTorch 1.9.0, install Torchvision 0.10.0. The latest version of torchvision can be found at https://github.com/pytorch/vision/releases.

sudo apt-get install libjpeg-dev zlib1g-dev libfreetype6-dev
wget https://github.com/pytorch/vision/archive/v0.10.0.tar.gz
tar -xvzf v0.10.0.tar.gz
cd vision-0.10.0
#This takes very long time, have a coffee time
sudo python3 setup.py install

Let's check whether the installation is correct. If you see the screen like this, the installation is successful.

root@spypiggyNano:/usr/local/src# python3
Python 3.6.9 (default, Jan 26 2021, 15:33:00)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> import torchvision
>>> torch.__version__
>>> torchvision.__version__

Be Careful : You must use python3, pip3 commands. PyTorch 1.5 and later, Python 2 is no longer supported.

Installation Sample Codes for Pose Estimation

Now we have finished installing Pytorch, torchvision. It's time to install sample python codes to proceed. I'll use the codes from  https://github.com/kairess/torchvision_walkthrough.git .

cd /usr/local/src
git clone https://github.com/kairess/torchvision_walkthrough.git
cd /usr/local/src//torchvision_walkthrough 

Now you can find several sample files to test. some files are jupyter notebook files. The author of these codes use a MacBook. So the sample codes do not take GPU(cuda) into account. I'm going to modify the sample codes to sue CUDA. Using CUDA in pytorch is about 10 times faster!

Keypoint detection comparison of performance with or without cuda

The example code below is slightly modified to use CUDA.

import torch
import torchvision
from torchvision import models
import torchvision.transforms as T

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import argparse
import sys, time

IMG_SIZE = 480

parser = argparse.ArgumentParser(description="Keypoint detection. - Pytorch")
parser.add_argument("--cuda", action="store_true")
args = parser.parse_args()

if True == torch.cuda.is_available():
    print('pytorch:%s GPU support'% torch.__version__)
    print('pytorch:%s GPU Not support ==> Error:Jetson should support cuda'% torch.__version__)
print('torchvision', torchvision.__version__)

model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval()
    model = model.cuda()

#img = Image.open('imgs/07.jpg')
img = Image.open('imgs/apink1.jpg')
img = img.resize((IMG_SIZE, int(img.height * IMG_SIZE / img.width)))

plt.figure(figsize=(16, 16))

trf = T.Compose([

input_img = trf(img)
    input_img = input_img.cuda()

#The first result is time consuming. After the second, check the processing time with the result.
fps_time  = time.perf_counter()
out = model([input_img])[0]
print(out.keys()) codes = [ Path.MOVETO, Path.LINETO, Path.LINETO ] fig, ax = plt.subplots(1, figsize=(16, 16)) ax.imshow(img) for box, score, keypoints in zip(out['boxes'], out['scores'], out['keypoints']): if(args.cuda): score = score.cpu().detach().numpy() else: score = score.detach().numpy() if score < THRESHOLD: continue if(args.cuda): box = box.to(torch.int16).cpu().numpy() keypoints = keypoints.to(torch.int16).cpu().numpy()[:, :2] else: box = box.detach().numpy() keypoints = keypoints.detach().numpy()[:, :2] rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor='b', facecolor='none') ax.add_patch(rect) # 17 keypoints for k in keypoints: circle = patches.Circle((k[0], k[1]), radius=2, facecolor='r') ax.add_patch(circle) # draw path # left arm path = Path(keypoints[5:10:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right arm path = Path(keypoints[6:11:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # left leg path = Path(keypoints[11:16:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) # right leg path = Path(keypoints[12:17:2], codes) line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r') ax.add_patch(line) plt.savefig('result.jpg') fps = 1.0 / (time.perf_counter() - fps_time) if(args.cuda): print('FPS(cuda support):%f'%(fps)) else: print('FPS(cuda not support):%f'%(fps))

Let's run above code without --cuda options and with --cuda options.

spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py
pytorch:1.9.0 GPU support
torchvision 0.10.0a0
torch.Size([3, 335, 480])
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])
FPS(cuda not support):0.021109
spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py --cuda
pytorch:1.9.0 GPU support
torchvision 0.10.0a0
torch.Size([3, 335, 480])
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])
FPS(cuda support):0.158586

Be Careful : Note that the model is processed twice in the source code. FPS is calculated as the processing time of the second model. The reason is that the first execution result after loading the model takes a lot of time.

The saved result.jpg file is as follows.


Even with CUDA, the speed is only 0.15 FPS. At this speed, it takes about 6.7 seconds to process one frame, making it unsuitable for real-time video stream processing. The main reason is that the resnet50 model used in this article is quite heavy instead of recording excellent accuracy.

Under the hood

Now let's dig deeper.

Torchvision keypoint number and human parts

Torchvision's keypoint numbering is different from OpenPose or Tensorflow's models.
The values are like this.


Result from model

Unlike TensorFlow 1.X, Pytorch is so intuitive that it makes code easier to understand.

Only three lines of code are enough.

First convert a numpy image to tensor, move the variable to cuda if using GPU.

Then insert the tensor to model, the return value is the list of dictionary type. As I inserted one image to model, index 0 of the list is sufficient.

input_img = trf(img)    # Make image to Pytorch tensor
input_img = input_img.to(device)
out = model([input_img])[0]

If you print the out variable's dictionary keys, you can see these key values.


dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])

You can see these values' explanations at  https://pytorch.org/vision/stable/models.html#object-detection-instance-segmentation-and-person-keypoint-detection

But the document is incomplete. They don't explain the keypoints_scores. The remaining values ​​are explained as follows.

  • boxes (FloatTensor[N, 4]): the ground-truth boxes in [x1, y1, x2, y2] format, with values between 0 and H and 0 and W
  • labels (Int64Tensor[N]): the class label for each ground-truth box
  • keypoints (FloatTensor[N, K, 3]): the K keypoints location for each of the N instances, in the format [x, y, visibility], where visibility=0 means that the keypoint is not visible.
Be careful : keypoints visibility value seems to be not correct. 1 means the point is visible, 0 means the point is invisible(hidden human parts). However, this value is always 1. Therefore, this value has no meaning so far. It is a good way to determine the accuracy of the keypoint region with the keypoints_scores value.

If you input image inference to the network model(keypointrcnn_resnet50_fpn), you can get output dictionary value.  In this code, 'for loop' iterates for human counts, and prints keypoints and keypoint_score values.

out = model([input_img])[0]
for box, score, keypoints, kscores in zip(out['boxes'], out['scores'], out['keypoints'], out['keypoints_scores'] ):
    score = score.cpu().detach().numpy()
    box = box.cpu().detach().numpy()
    points = keypoints.cpu().detach().numpy()
    kscores = kscores.cpu().detach().numpy()

Let's check the keypoints_score and keypoints of this picture. This is an inference image. The lower body is not visible in this picture.


Run this command.

spypiggy@spypiggyNano:/usr/local/src/torchvision_walkthrough$ sudo python3 keypoints2.py --image=./imgs/03.jpg --cuda
pytorch:1.9.0 GPU support
torchvision 0.10.0a0
torch.Size([3, 719, 480])
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /media/nvidia/NVME/pytorch/pytorch-v1.9.0/c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)
dict_keys(['boxes', 'labels', 'scores', 'keypoints', 'keypoints_scores'])
scores: 0.998314380645752
kscores: [  1.25098372   0.82812518   2.83464313  10.02956772  10.867136
   7.50433922   9.46248245   5.73677111   4.82222462  -1.08437061
  -2.38515258  -2.25649142  -2.69200468  -3.09163022  -2.31611848
  -1.32797825  -1.90648782]
keypoints: [[386 295]
 [193 245]
 [368 264]
 [224 298]
 [352 298]
 [127 452]
 [412 470]
 [112 706]
 [420 706]
 [174 409]
 [419 706]
 [205 706]
 [354 706]
 [217 706]
 [434 578]
 [113 706]
 [421 706]]
FPS(cuda support):0.148066

Let's compare the keypoints_score and it's name. The score at the lower wrist is very low. If you see the above picture, you might understand enough.

nose                 1.250984
left_eye             0.828125
right_eye            2.834643
left_ear             10.029568
right_ear            10.867136
left_shoulder        7.504339
right_shoulder       9.462482
left_elbow           5.736771
right_elbow          4.822225
left_wrist           -1.084371
right_wrist          -2.385153
left_hip             -2.256491
right_hip            -2.692005
left_knee            -3.091630
right_knee           -2.316118
left_ankle           -1.327978
right_ankle          -1.906488

The kscore values of the lower body key points are negative. And looking at the output image result.jpg, the line connecting the feet from the waist was drawn strangely because the -value kscore was not taken into account. The line from the elbow to the wrist was also drawn strangely.

Example reflecting kscores

The code below is an improvement not to draw a line when the score of the connection keypoint is - when connecting the arm and leg joints.

import torch
import torchvision
from torchvision import models
import torchvision.transforms as T

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.path import Path
import matplotlib.patches as patches
import argparse
import sys, time

IMG_SIZE = 480

parser = argparse.ArgumentParser(description="Keypoint detection. - Pytorch")
parser.add_argument('--image', type=str, default="./imgs/03.jpg", help='inference image')
parser.add_argument('--accuracy', type=float, default=0.9, help='accuracy. default=0.6')
parser.add_argument("--cuda", action="store_true")
args = parser.parse_args()

if True == torch.cuda.is_available():
    print('pytorch:%s GPU support'% torch.__version__)
    print('pytorch:%s GPU Not support ==> Error:Jetson should support cuda'% torch.__version__)
print('torchvision', torchvision.__version__)

model = models.detection.keypointrcnn_resnet50_fpn(pretrained=True).eval()
    model = model.cuda()

img = Image.open(args.image)
#img = Image.open('imgs/apink1.jpg')
img = img.resize((IMG_SIZE, int(img.height * IMG_SIZE / img.width)))

plt.figure(figsize=(16, 16))

trf = T.Compose([

input_img = trf(img)
    input_img = input_img.cuda()

#The first result is time consuming. After the second, check the processing time with the result.
fps_time  = time.perf_counter()
out = model([input_img])[0]
t_human = 0
r_human = 0

codes = [

fig, ax = plt.subplots(1, figsize=(16, 16))
t_human = 0
r_human = 0
for box, score, keypoints, kscores  in zip(out['boxes'], out['scores'], out['keypoints'], out['keypoints_scores'] ):
        score = score.cpu().detach().numpy()
        kscores = kscores.cpu().detach().numpy()    
        box = box.to(torch.int16).cpu().numpy()
        keypoints = keypoints.to(torch.int16).cpu().numpy()[:, :2]
        score = score.detach().numpy()
        box = box.detach().numpy()
        keypoints = keypoints.detach().numpy()[:, :2]
        kscores = kscores.detach().numpy()    

    t_human += 1
    if score < args.accuracy:
    r_human += 1

    rect = patches.Rectangle((box[0], box[1]), box[2]-box[0], box[3]-box[1], linewidth=2, edgecolor='b', facecolor='none')

    # 17 keypoints
    #for k in keypoints:
    for x in range(len(keypoints)):
        k = keypoints[x]
        if kscores[x] > 0:
            if x == 5:
                circle = patches.Circle((k[0], k[1]), radius=4, facecolor='r')
                circle = patches.Circle((k[0], k[1]), radius=2, facecolor='r')
    # draw path
    # left arm
    if kscores[5] > 0 and kscores[7] > 0:
        path = Path(keypoints[5:8:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')
    if kscores[7] > 0 and kscores[9] > 0:
        path = Path(keypoints[7:10:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')
    # right arm
    if kscores[6] > 0 and kscores[8] > 0:
        path = Path(keypoints[6:9:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')
    if kscores[8] > 0 and kscores[10] > 0:
        path = Path(keypoints[8:11:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')

    # left leg
    if kscores[11] > 0 and kscores[13] > 0:
        path = Path(keypoints[11:14:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')
    if kscores[13] > 0  and kscores[15] > 0:
        path = Path(keypoints[13:16:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')

    # right leg
    if kscores[12] > 0 and kscores[14] > 0:
        path = Path(keypoints[12:15:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')
    if kscores[14] > 0 and kscores[16] > 0:
        path = Path(keypoints[14:17:2], codes)
        line = patches.PathPatch(path, linewidth=2, facecolor='none', edgecolor='r')

fps = 1.0 / (time.perf_counter() - fps_time)
print('total human:%d  real human:%d'%(t_human, r_human))

    print('FPS(cuda support):%f'%(fps))
    print('FPS(cuda not support):%f'%(fps))

The result of executing the above code is as follows.

Wrapping up

After about two years since October 2019, I looked at keypoint recognition in PyTorch again. PyTorch has been upgraded in the meantime, but there doesn't seem to be much change in the model and documentation for keypoint recognition.

This Resnet50-based model has high accuracy, but the processing speed is too low to be suitable for real-time video processing on the Jetson Nano. It is well worth considering for image file processing purposes. If you want to detect keypoints in real-time video files on Jetson Nano, please refer to the following links.

The source code of this text can be downloaded from my github.

2021년 8월 22일 일요일

Jetpack 4.6 - Headless Installation on Jetson Nano using USB cable

JetPack 4.6 was released on August 4th 2021. The simple features of Jetpack 4.6 are:

  • L4T version: 32.6.1
  • Support for Jetson AGX Xavier Industrial module.
  • Support for new 20W mode on Jetson Xavier NX enabling better video encode and video decode performance and higher memory bandwidth. The included 10W and 15W nvpmodel configurations will perform exactly as did the 10W and 20W modes with previous JetPack releases. Any custom nvpmodel created with a previous release will require regeneration for use with JetPack 4.6. Please read L4T 32.6.1 release notes for details.
  • Image based Over-The-Air update tools for developing end-to-end OTA solution for Jetson products in the field. Supported on Jetson TX2 series, Jetson Xavier NX and Jetson AGX Xavier series.
  • A/B Root File System redundancy to flash, maintain and update redundant root file systems. Enhances fault tolerance during OTA by falling back to the working root file system slot in case of a failure. Supported on Jetson TX2 series, Jetson Xavier NX and Jetson AGX Xavier series.
  • A new flashing tool to flash internal or external media connected to Jetson1. Supports Jetson TX2 series, Jetson Xavier NX and Jetson AGX Xavier. The new tool uses initial RAM disk for flashing and is up to1.5x faster when flashing compared to the previous method2.
  • Secure boot is enhanced3 for Jetson TX2 series to extend encryption support to kernel, kernel-dtb and initrd.
  • Disk encryption of external media4 supported to protect data at rest for Jetson AGX Xavier series, Jetson Xavier NX and Jetson TX2.
  • NVMe driver added to CBoot for Jetson Xavier NX and Jetson AGX Xavier series. Enables loading kernel, kernel-dtb and initrd from the root file system on NVMe.
  • Enhanced Jetson-IO tools to configure the camera header interface and dynamically add support for a camera using device tree overlays.
  • Support for configuring for Raspberry-PI cameras line IMX219 or High Def IMX477 at run time using Jetson-IO tool on Jetson Nano 2GB, Jetson Nano and Jetson Xavier NX developer kits.

 Today I will install JetPack 4.6 the way I prefer. My preferred installation method is to connect the Jetson Nano to a PC with a USB cable. It uses some kind of serial communication.

This is the easiest way to do initial setup without having to connect a monitor, keyboard, and mouse to the Jetson Nano.

Installation process

First, burn the image to the SD card.

SD Card Burning

SD card should be at least 32GB. Personally, I recommend 64GB. To test Edge AI on Jetson Nano, you need to install packages such as OpenCV, Tensorflow, and Pytorch. In addition, there is not enough space with a 16GB SD card because it has to store various AI models, images, and videos.

  • Download the JetPack 4.6 image from https://developer.nvidia.com/embedded/jetpack.
  • Burn the downloaded image file to the SD card using a program such as Etcher. Etcher can be used immediately without unpacking the downloaded Jetpack 4.6 zip file.

Connect the Jetson Nano and Host Computer with USB cable

Now it's time to connect the PC and Jetson Nano.

  • Insert the flashed sd card into the Jetson Nano.
  • Connect an Ethernet cable with Internet access to the Jetson Nano.
  • Connect the Jetson Nano and Windows PC(WIndows OS is OK) with USB cable(which data transfer possible).  

<USB connection on Jetson Nano>

  • Power on the Jetson Nano, then wait for Windows to recognize Nano.
  • Start a Windows DeviceManagerl and check the COM port of Jetson Nano. 
<Jetson Nano connected to COM3>

Initial setup screens

  • Run putty and open the serial port. In my case "COM3".
  • If successful, you can see the console screen of Jetson Nano. See the images below. 
  • Next, follow the instructions on the screen like this.

<Initial welcome screen>

<License agreement screen>

<Language selection for installation screen>

<Country selection screen>

<Language selection screen>

<UTC selection screen>

<User fullname input screen>

<Username input screen>

<Password input screen>

<Confirm password screen>

If the partition size is the same as the number displayed on the screen, unless there is a special case, the entire capacity of the SD card is used.

<Pattition size input screen>

With the Ethernet cable connected, select eth0.

<Network configuration screen>

<Network configuration progress screen>

<Hostname input screen>

Select the power mode (NVP Model) to be used in Jetson Nano. More information about the NVP model can be found in Useful tips before using Jetson Series (Nano, TX2, Xavier NX, Xavier).

<Power mode selection> 

When the installation is complete, the connection is disconnected and Jetson Nano restarts. After a while, use Putty again to connect to the COM port.

<Serial connection lost screen>

Now you will be asked to enter your username and password.

<First Login screen>

Wrapping up

Using a USB cable, you can set up the initial image very easily on a PC without connecting a monitor or keyboard. This method is also my favorite method. After installation, ssh is usually used using the Ethernet IP address. However, this method can be used at any time. If it is difficult to check easily because the IP is changed while maintaining the Ethernet setting of the Jetson Nano as DHCP, you can use ssh as a serial connection using a USB cable as introduced in this article. This method can be applied not only to JetPack 4.6 but also to older Jetpack installations.

If you need useful settings for images after initial installation, refer to the next page. It explains useful tools and basic setup methods for Jetson series.