2019년 9월 27일 금요일

JetsonNano - Human Pose estimation using tensorflow

last updated 2020.07.30 : update for Jetpack 4.4

I used Jetson Nano, Ubuntu 18.04 Official image with root account.

OpenPose(https://github.com/CMU-Perceptual-Computing-Lab/openpose) is one of the most popular pose estimation framework. You can install OpenPose on the Jetson Nano. But Jetson Nano's GPU power is not strong enough to run the OpenPose. It suffers from lack of memory, computing power.

There's a lightweight Pose Estimation Tensorflow framework(https://github.com/ildoonet/tf-pose-estimation).  Let's install and do Pose Estimation.


Prerequisites

JetPack 4.3

If you are using JetPack 4.4, skip to the section below.

Before you build "ildoonet/tf-pose-estimation", you must pre install these packages. See the URLs.


After installing above packages, install these packages too.

1
2
3
4
apt-get install libllvm-7-ocaml-dev libllvm7 llvm-7 llvm-7-dev llvm-7-doc llvm-7-examples llvm-7-runtime 
apt-get install -y build-essential libatlas-base-dev swig gfortran
export LLVM_CONFIG=/usr/bin/llvm-config-7 
pip3 install Cython

JetPack 4.4

Before you build "ildoonet/tf-pose-estimation", you must pre install these packages.

After installing above packages, install these packages too. Perhaps some of these packages are already installed. Previously, Jetson Nano used llvm version 7, but  JetPack 4.4 uses llvm version 9.

Warning : The scikit-image to be installed later requires scipy 1.0.1 or higher. Therefore, upgrade the scipy version .  The pip3 install --upgrade scipy command will upgrade the scipy version to 1.5.2.

apt-get install libllvm-9-ocaml-dev libllvm9 llvm-9 llvm-9-dev llvm-9-doc llvm-9-examples llvm-9-runtime 
pip3 install --upgrade scipy apt-get install -y build-essential libatlas-base-dev swig gfortran export LLVM_CONFIG=/usr/bin/llvm-config-9

Download and build code from ildoonet

Now clone ildoonet's github.

Install Step #1

1
2
3
4
cd /usr/local/src
git clone https://www.github.com/ildoonet/tf-pose-estimation
cd tf-pose-estimation
pip3 install -r requirements.txt

Edit Code

edit tf_pose/estimator.py file like this (based on Tensorflow 1.14)

original code

1
self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

to

1
2
3
4
5
6
if tf_config is None:
    tf_config = tf.ConfigProto()
    tf_config.gpu_options.allow_growth = True
    sess = tf.Session(config=tf_config)

self.persistent_sess = tf.Session(graph=self.graph, config=tf_config)

And the source code has --tensorrt option to use TensorRT. To use this option, modify the  ./tf_pose/estimator.py file.
At 327 line, remove the last parameter "use_calibration=True,". This parameter is deprecated Tensorflow version 1.14 or later.

Install Step #2

1
2
cd tf_pose/pafprocess
swig -python -c++ pafprocess.i && python3 setup.py build_ext --inplace

 

Install Step #3(optional)

1
2
cd  /usr/local/src/study/tf-pose-estimation/models/graph/cmu
bash download.sh

 

Testing with image

There are pre made testing python files like run.py, run_video.py, run_webcam.py
You can test the framework with images like this.


1
$ python3 run.py --model=mobilenet_thin --resize=432x368 --image=./images/p1.jpg


You may see many warning messages alerting out of memory. But these messages are caused by insufficient memory(Jetson Nano's 4GB Memory) and no error message.


Wait for about several minutes. You can see the result images like this.



Poor performance

Does it take too much time for estimating a pose? Yes, but it's too early to disappoint. Most of deep learning frameworks load network model first, and it takes some time. After loading the model into memory, next step is fast.


Under the hood

Now let's dig deeper.

Webcam test

Let's test with a webcam to check the framework performance. You can use a Raspberry Pi CSI v2 camera or a USB webcam. I'll use a webcam. After connect a webcam, make sure it's connected properly. Use lsusb command, then find the webcam. I'm using a Logitech Webcam.



Check the Bus, Device values(001, 003). Then You can view the webcam's resolution like this.


lsusb -s 001:003 -v 

or 

lsusb -s 001:003 -v |grep -E "wWidth|wHeight"

Let's test several network models to compare the performance.

mobilenet_v2_small model test

First I'll test the mobilenet_v2_small network that is the most light model.

python3 run_webcam.py --model=mobilenet_v2_small


The performance of mobilenet_v2_small is 1.2 ~ 3.9 fps(frame per seconds)


mobilenet_v2_large model test

python3 run_webcam.py --model=mobilenet_v2_large


The performance of mobilenet_v2_large is 2.5 fps(frame per seconds)

mobilenet_thin model test

python3 run_webcam.py --model=mobilenet_thin


The performance of mobilenet_thin is 1.5 ~ 3.1 fps(frame per seconds)


When Jetson Nano load three network models, loading mobilenet_thin takes the longest time, because the network size is bigger than others.
But after loading the network, three models show similar fps values.

I think the accuracy of pose estimation is lower than the OpenPose framework.  But OpenPose on the JetsonNano can't make 1fps performance. So If you plan to make a realtime pose estimation program on the Jetson Nano, this framework would be your choice.

The following pictures are the results of testing using mobilenet-thin, mobilenet-v2-small and mobilenet-v2-large.

mobilenet-thin test images


mobilenet-v2-small  test images



mobilenet-v2-large  test images



Raise FPS Value

I tested webcan without --resize option. Default value is 432X368. If you use the --resize=368x368 option, the fps value will rise above 4.

And the source code has --tensorrt option to use TensorRT. To use this option, modify the  ./tf_pose/estimator.py file.
At 327 line, remove the last parameter "use_calibration=True,". This parameter is deprecated version 1.14.

To use TensorRT, I used --tensorrt option.

python3 run_webcam.py --model=mobilenet_thin --resize=368x368 --tensorrt=true

The program took longer to load because of TensorRT graph initializing. Disappointingly, however, fps did not change much. If you know how to solve it, please let me know.


Using KeyPoints 

If you want to utilize this framework. you must know the keypoints position and its name(left shoulder, left eye, right knee, ...). Let's dig deeper.

CoCo(Common Objects in Context) keypoints

CoCo uses 18(0 ~ 17) keypoints like this. From now on, keypoints and human parts mean the same thing.


Each number represents these human parts.

  •     Nose = 0
  •     Neck = 1
  •     RShoulder = 2
  •     RElbow = 3
  •     RWrist = 4
  •     LShoulder = 5
  •     LElbow = 6
  •     LWrist = 7
  •     RHip = 8
  •     RKnee = 9
  •     RAnkle = 10
  •     LHip = 11
  •     LKnee = 12
  •     LAnkle = 13
  •     REye = 14
  •     LEye = 15
  •     REar = 16
  •     LEar = 17
  •     Background = 18

Finding humans, keypoints from image.

You can find this line from the sample code run.py. This line insert the inference image to the network model and get the result. So the return value "humans" contains all the necessary information for the pose estimation.



humans = e.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=args.resize_out_ratio)

humans return value is a dictionary(same as list but cannot modify the contents) type variable that contains Human class object. So its size means the number of human in the image.

I made 2 functions to simply the keypoint detection process. The first function returns part(keypoint) from human, keypoint number(0 ~ 17). And the second function returns keypoint position in the image from part.


'''
hnum: 0 based human index
pos : keypoint
'''
def get_keypoint(humans, hnum, pos):
    #check invalid human index
    if len(humans) <= hnum:
        return None

    #check invalid keypoint. human parts may not contain certain ketpoint
    if pos not in humans[hnum].body_parts.keys():
        return None

    part = humans[hnum].body_parts[pos]
    return part

'''
return keypoint posution (x, y) in the image
'''
def get_point_from_part(image, part):
    image_h, image_w = image.shape[:2]
    return (int(part.x * image_w + 0.5), int(part.y * image_h + 0.5))

Let's assume that we want to detect the keypoints of the first human(list index number = 0).


    for i in range(18):
        part = get_keypoint(humans, 0, i)
        if part is None:
            continue
        pos =  get_point_from_part(image, part)
        print('No:%d Name[%s] X:%d Y:%d Score:%f'%( part.part_idx, part.get_part_name(),  pos[0] , pos[1] , part.score))
        cv2.putText(image,str(part.part_idx),  (pos[0] + 10, pos[1]), font, 0.5, (0,255,0), 2)

This code will find CoCo Keypoints 0 to 17 if it exists. Then it prints the keypoints' position and put keypoint number on the image.

I made a new sample python program run2.py at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/tree/master/tf-pose-estimation .

Run this program like this.


python3 run2.py --image=./images/hong.jpg --model=mobilenet_v2_small

You can get the output image(mobilenet_v2_small.png) that contains the part number.
<input image:hong.jpg       output image:mobilenet_v2_small.png>

And console outputs are like this


No:0 Name[CocoPart.Nose] X:247 Y:118 Score:0.852600
No:1 Name[CocoPart.Neck] X:247 Y:199 Score:0.710355
No:2 Name[CocoPart.RShoulder] X:196 Y:195 Score:0.666499
No:3 Name[CocoPart.RElbow] X:127 Y:204 Score:0.675835
No:4 Name[CocoPart.RWrist] X:108 Y:138 Score:0.676011
No:5 Name[CocoPart.LShoulder] X:295 Y:195 Score:0.631604
No:6 Name[CocoPart.LElbow] X:346 Y:167 Score:0.633020
No:7 Name[CocoPart.LWrist] X:325 Y:90 Score:0.448359
No:8 Name[CocoPart.RHip] X:208 Y:395 Score:0.553614
No:9 Name[CocoPart.RKnee] X:217 Y:541 Score:0.671879
No:10 Name[CocoPart.RAnkle] X:235 Y:680 Score:0.622787
No:11 Name[CocoPart.LHip] X:284 Y:387 Score:0.399417
No:12 Name[CocoPart.LKnee] X:279 Y:525 Score:0.639912
No:13 Name[CocoPart.LAnkle] X:279 Y:680 Score:0.594650
No:14 Name[CocoPart.REye] X:233 Y:106 Score:0.878417
No:15 Name[CocoPart.LEye] X:258 Y:106 Score:0.801265
No:16 Name[CocoPart.REar] X:221 Y:122 Score:0.596228
No:17 Name[CocoPart.LEar] X:279 Y:122 Score:0.821814


Detecting angle of keypoints

If you want to get the left elbow angle, keypoint 5,6,7 are necessary.

I made additional function to detect angle between 3 keypoints.

def angle_between_points( p0, p1, p2 ):
  a = (p1[0]-p0[0])**2 + (p1[1]-p0[1])**2
  b = (p1[0]-p2[0])**2 + (p1[1]-p2[1])**2
  c = (p2[0]-p0[0])**2 + (p2[1]-p0[1])**2
  return  math.acos( (a+b-c) / math.sqrt(4*a*b) ) * 180 /math.pi

You can see full code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/blob/master/tf-pose-estimation/run_angle.py

This is output of run_angle.py


left hand angle:167.333590
left elbow angle:103.512531
left knee angle:177.924974
left ankle angle:177.698663
right hand angle:126.365861
right elbow angle:98.628579
right knee angle:176.148932
right ankle angle:178.021761

Wrapping Up

Even though Jetson Nano's webcam fps value is not satisfying, I think you can use this framework on the realtime keyframe detection.
I'll test this framework on the Jetson TX2 soon. Perhaps I can experience very high fps performance.

I installed this model on Xavier NX and posted what I tested on https://spyjetson.blogspot.com/2020/07/xavier-nx-human-pose-estimation-using.html. Xavier NX users should refer to this article.
More information on using the ResNet-101 model on the Xavier NX to improve accuracy than the model using MobileNet is provided at Jetson Xavier NX - Human Pose estimation using tensorflow (mpii)

If you want the most satisfactory human pose estimation performance on Jetson Nano, see the following article(https://spyjetson.blogspot.com/2019/12/jetsonnano-human-pose-estimation-using.html). NVIDIA team introduces human pose estimation using models optimized for TensorRT. 











JetsonNano - Installing Tensorflow

Tensorflow latest version install

last updated 2021.03.23 : update for Jetpack 4.5(Production Release)

I used Jetson Nano, Ubuntu 18.04 Official image with root account.

First, check the version of JetPack you are using. Then, refer to the part  that corresponds to your JetPack and install the necessary packages. In the middle of the site name where you can download tensorflow for Jetson, v42, v43, v44, v45 tells you the version of JetPack.

JetPack 4.2 Users

 First check the latest version at  https://developer.download.nvidia.com/compute/redist/jp/v42/tensorflow-gpu/  . In the url, v42 means Jetpack version 4.2.
At this point(2020.04), there are 8 whl files .
The cp36 in the filename means python version 3.6. I'm going to install the latest Tensorflow 1.X version. You can use "tensorflow_gpu-1.15.0+nv19.11-cp36-cp36m-linux_aarch64.whl" file.

Install required packages first. This takes some time for jetson Nano, so take a coffee time.


apt-get install -y python3-pip
apt-get install -y libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
#if you install tensorflow 1.14 
pip3 install -U numpy==1.16.5 grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta
 
#if you install tensorflow 1.15 
pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta

Now install the tensorflow.
If you want to install Tensorflow 1.14, execute the following command.

pip3 install  https://developer.download.nvidia.com/compute/redist/jp/v42/tensorflow-gpu/tensorflow_gpu-1.14.0+nv19.7-cp36-cp36m-linux_aarch64.whl

If you want to install Tensorflow 1.15, execute the following command.

pip3 install  https://developer.download.nvidia.com/compute/redist/jp/v42/tensorflow-gpu/tensorflow_gpu-1.15.0+nv19.11-cp36-cp36m-linux_aarch64.whl

JetPack 4.3 Users

 First check the latest version at  https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow-gpu/ . In the url, v43 means Jetpack version 4.3. At this point(2020.04), there are 4 whl files
 
The cp36 in the filename means python version 3.6.

I'm going to install the latest Tensorflow 1.X version. You can use "tensorflow_gpu-1.15.0+nv20.1-cp36-cp36m-linux_aarch64.whl" file.

Install required packages first. This takes some time for jetson Nano, so take a coffee time.


apt-get install -y python3-pip
apt-get install -y libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta


Now install the tensorflow.
If you want to install Tensorflow 1.15, execute the following command.

pip3 install  https://developer.download.nvidia.com/compute/redist/jp/v43/tensorflow-gpu/tensorflow_gpu-1.15.0+nv20.1-cp36-cp36m-linux_aarch64.whl

JetPack 4.4 Users

 First check the latest version at https://developer.download.nvidia.com/compute/redist/jp/v44/tensorflow/. The March 2021 version(nv20.12) is TensorFlow for the JetPack 4.4 Production Release version.

  • tensorflow-1.15.2+nv20.4-cp36-cp36m-linux_aarch64.whl 222MB 2020-04-30 07:01:03
  • tensorflow-1.15.2+nv20.6-cp36-cp36m-linux_aarch64.whl 232MB 2020-07-07 08:58:21
  • tensorflow-2.1.0+nv20.4-cp36-cp36m-linux_aarch64.whl 231MB 2020-08-05 18:51:59
  • tensorflow-2.2.0+nv20.6-cp36-cp36m-linux_aarch64.whl 276MB 2020-07-07 08:58:31
  • tensorflow-1.15.3+nv20.7-cp36-cp36m-linux_aarch64.whl 227MB 2020-08-05 01:39:24
  • tensorflow-2.2.0+nv20.7-cp36-cp36m-linux_aarch64.whl 274MB 2020-08-05 01:39:36
  • tensorflow-1.15.3+nv20.8-cp36-cp36m-linux_aarch64.whl 227MB 2020-09-02 07:31:18
  • tensorflow-2.2.0+nv20.8-cp36-cp36m-linux_aarch64.whl 276MB 2020-09-02 07:31:28
  • tensorflow-1.15.3+nv20.9-cp36-cp36m-linux_aarch64.whl 217MB 2020-10-05 20:25:07
  • tensorflow-2.3.0+nv20.9-cp36-cp36m-linux_aarch64.whl 264MB 2020-10-05 20:24:40
  • tensorflow-1.15.4+nv20.10-cp36-cp36m-linux_aarch64.whl 217MB 2020-10-24 00:51:44
  • tensorflow-2.3.1+nv20.10-cp36-cp36m-linux_aarch64.whl 264MB 2020-10-24 00:51:52
  • tensorflow-1.15.4+nv20.11-cp36-cp36m-linux_aarch64.whl 218MB 2020-11-24 19:01:32
  • tensorflow-2.3.1+nv20.11-cp36-cp36m-linux_aarch64.whl 264MB 2020-11-24 19:01:35
  • tensorflow-1.15.4+nv20.12-cp36-cp36m-linux_aarch64.whl 218MB 2020-12-18 14:54:04
  • tensorflow-2.3.1+nv20.12-cp36-cp36m-linux_aarch64.whl 264MB 2020-12-18 14:54:06

I'm going to install the latest Tensorflow 1.X version. You can use "tensorflow-1.15.4+nv20.12-cp36-cp36m-linux_aarch64.whl" file.

Install required packages first. This takes some time for jetson Nano, so take a coffee time.


apt-get install -y python3-pip
apt-get install -y libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta

Now install the tensorflow.
If you want to install Tensorflow 1.15, execute the following command.


pip3 install  https://developer.download.nvidia.com/compute/redist/jp/v44/tensorflow/tensorflow-1.15.4+nv20.12-cp36-cp36m-linux_aarch64.whl

JetPack 4.5 Users

 First check the latest version at https://developer.download.nvidia.com/compute/redist/jp/v45/tensorflow/
At this point(2021.03), there are 6 whl files
Now install the tensorflow.
If you want to install Tensorflow 1.15.5, execute the following command.

You may encounter such errors when installing the packages.


fatal error: xlocale.h: No such file or directory
    #include <xlocale.h>

The workaround  is so simple, just link the missing file with a symbolic link:


apt-get install -y python3-pip python3-numpy 
pip3 install --upgrade cython
ln -s /usr/include/locale.h /usr/include/xlocale.h
apt-get install -y libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
pip3 install grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta
pip3 install https://developer.download.nvidia.com/compute/redist/jp/v45/tensorflow/tensorflow-1.15.5+nv21.3-cp36-cp36m-linux_aarch64.whl

Check

You can run python3 and import tensorflow to check if TensorFlow is installed properly.

1
2
3
python3
>>import tensorflow as tf
>>tf.test.is_gpu_available()

JetsonNano - OpenCV 4.1.1 build

last updated 2020.01.05 : update for Jetpack 4.3

New Jetpack 4.3(2019 Dec) comes with built in OpenCV 4.1.1. So If you make a new SD card image, you don't have to do this job.

I used Jetson Nano, Ubuntu 18.04 Official image with root account.


I made some changes in the install script provided by NVIDIA.

I add cmake option  "ENABLE_PRECOMPILED_HEADERS=OFF" and modified the version number to 4.1.1.

You can downlaod this script at my repo(https://github.com/raspberry-pi-maker/NVIDIA-Jetson/tree/master/useful_scripts)


Increase swap memory

When you build a large software packages like openCV, you may experience an out of memory phenomenon. Increasing the swap file size can prevent this malfunction.



git clone https://github.com/JetsonHacksNano/installSwapfile
cd installSwapfile
./installSwapfile.sh

Above script file will increase 6GB swap files. You can change the swap file size by modifying the scripts. If you want to uninstall the swap setting, open the fstab file and delete the swap file line and reboot.



sudo vi /etc/fstab


Install Script(install_opencv4.1.1_Nano.sh)


#!/bin/bash
#
# Copyright (c) 2018, NVIDIA CORPORATION.  All rights reserved.
#
# NVIDIA Corporation and its licensors retain all intellectual property
# and proprietary rights in and to this software, related documentation
# and any modifications thereto.  Any use, reproduction, disclosure or
# distribution of this software and related documentation without an express
# license agreement from NVIDIA Corporation is strictly prohibited.
#

if [ "$#" -ne 1 ]; then
    echo "Usage: $0 <Install Folder>"
    exit
fi
folder="$1"
user="nvidia"
passwd="nvidia"

echo "** Remove OpenCV3.3 first"
sudo sudo apt-get purge *libopencv*

echo "** Install requirement"
sudo apt-get update
sudo apt-get install -y build-essential cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev
sudo apt-get install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev
sudo apt-get install -y python2.7-dev python3.6-dev python-dev python-numpy python3-numpy
sudo apt-get install -y libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev
sudo apt-get install -y libv4l-dev v4l-utils qv4l2 v4l2ucp
sudo apt-get install -y curl
sudo apt-get update

echo "** Download opencv-4.1.1"
cd $folder
curl -L https://github.com/opencv/opencv/archive/4.1.1.zip -o opencv-4.1.1.zip
curl -L https://github.com/opencv/opencv_contrib/archive/4.1.1.zip -o opencv_contrib-4.1.1.zip
unzip opencv-4.1.1.zip 
unzip opencv_contrib-4.1.1.zip 
cd opencv-4.1.1/

echo "** Building..."
mkdir release
cd release/
cmake -D WITH_CUDA=ON -D ENABLE_PRECOMPILED_HEADERS=OFF  -D CUDA_ARCH_BIN="5.3" -D CUDA_ARCH_PTX="" -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib-4.1.1/modules -D WITH_GSTREAMER=ON -D WITH_LIBV4L=ON -D BUILD_opencv_python2=ON -D BUILD_opencv_python3=ON -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local ..
make -j3
sudo make install

echo "** Install opencv-4.1.1 successfully"
echo "** Bye :)"

If you have trouble that the GUI GTK version 2, 3 conflicting, change the GUI option in the sh file from GTK to QT.
Please see my another installation script "install_opencv4.1.1_TX2.sh" .

These are some of my changes in the "install_opencv4.1.1_TX2.sh".


apt-get install qt5-default
#cmake option add 
-D WITH_GTK=OFF
-D WITH_QT=ON



Follow these steps to install  OpenCV 4.1.1 .

cd /usr/local/src
git clone https://github.com/raspberry-pi-maker/NVIDIA-Jetson.git
cd NVIDIA-Jetson/useful_scripts
chmod 755 install_opencv4.1.1_Nano.sh
./install_opencv4.1.1_Nano.sh /usr/local/src

Install_opencv4.1.1_Nano.sh files parameter "/usr/local/src" is the directory to store the opencv source codes.
OpenCV installation can take a while, so have a coffee break.

Check the installation

If you have installed successfully, you can check the result like this.


root@spypiggy-desktop:/usr/local/src# python3
Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> cv2.__version__
'4.1.1'



And check the cuda support of OpenCV. In the build information, you can see this line:
"NVIDIA CUDA:                   YES (ver 10.0, CUFFT CUBLAS)"


root@spypiggy-desktop:/usr/local/src# python3
Python 3.6.8 (default, Oct  7 2019, 12:59:55) 
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> print(cv2.getBuildInformation())
General configuration for OpenCV 4.1.1 =====================================
  Version control:               unknown

  Extra modules:
    Location (extra):            /usr/local/src/opencv_contrib-4.1.1/modules
    Version control (extra):     unknown

  Platform:
    Timestamp:                   2019-11-01T06:55:27Z
    Host:                        Linux 4.9.140-tegra aarch64
    CMake:                       3.10.2
    CMake generator:             Unix Makefiles
    CMake build tool:            /usr/bin/make
    Configuration:               RELEASE

  CPU/HW features:
    Baseline:                    NEON FP16
      required:                  NEON
      disabled:                  VFPV3

  C/C++:
    Built as dynamic libs?:      YES
    C++ Compiler:                /usr/bin/c++  (ver 7.4.0)
    C++ flags (Release):         -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -O3 -DNDEBUG  -DNDEBUG
    C++ flags (Debug):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wundef -Winit-self -Wpointer-arith -Wshadow -Wsign-promo -Wuninitialized -Winit-self -Wsuggest-override -Wno-delete-non-virtual-dtor -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -fvisibility-inlines-hidden -g  -O0 -DDEBUG -D_DEBUG
    C Compiler:                  /usr/bin/cc
    C flags (Release):           -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -O3 -DNDEBUG  -DNDEBUG
    C flags (Debug):             -fsigned-char -W -Wall -Werror=return-type -Werror=non-virtual-dtor -Werror=address -Werror=sequence-point -Wformat -Werror=format-security -Wmissing-declarations -Wmissing-prototypes -Wstrict-prototypes -Wundef -Winit-self -Wpointer-arith -Wshadow -Wuninitialized -Winit-self -Wno-comment -Wimplicit-fallthrough=3 -Wno-strict-overflow -fdiagnostics-show-option -pthread -fomit-frame-pointer -ffunction-sections -fdata-sections    -fvisibility=hidden -g  -O0 -DDEBUG -D_DEBUG
    Linker flags (Release):      -Wl,--gc-sections  
    Linker flags (Debug):        -Wl,--gc-sections  
    ccache:                      NO
    Precompiled headers:         NO
    Extra dependencies:          m pthread cudart_static -lpthread dl rt /usr/lib/aarch64-linux-gnu/libcuda.so nppc nppial nppicc nppicom nppidei nppif nppig nppim nppist nppisu nppitc npps cublas cudnn cufft -L/usr/local/cuda/lib64 -L/usr/lib/aarch64-linux-gnu
    3rdparty dependencies:

  OpenCV modules:
    To be built:                 aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dpm face features2d flann freetype fuzzy gapi hfs highgui img_hash imgcodecs imgproc line_descriptor ml objdetect optflow phase_unwrapping photo plot python2 python3 quality reg rgbd saliency shape stereo stitching structured_light superres surface_matching text tracking video videoio videostab xfeatures2d ximgproc xobjdetect xphoto
    Disabled:                    world
    Disabled by dependency:      -
    Unavailable:                 cnn_3dobj cvv hdf java js matlab ovis sfm ts viz
    Applications:                apps
    Documentation:               NO
    Non-free algorithms:         NO

  GUI: 
    GTK+:                        YES (ver 2.24.32)
      GThread :                  YES (ver 2.56.4)
      GtkGlExt:                  NO
    VTK support:                 NO

  Media I/O: 
    ZLib:                        /usr/lib/aarch64-linux-gnu/libz.so (ver 1.2.11)
    JPEG:                        /usr/lib/aarch64-linux-gnu/libjpeg.so (ver 80)
    WEBP:                        /usr/lib/aarch64-linux-gnu/libwebp.so (ver encoder: 0x020e)
    PNG:                         /usr/lib/aarch64-linux-gnu/libpng.so (ver 1.6.34)
    TIFF:                        /usr/lib/aarch64-linux-gnu/libtiff.so (ver 42 / 4.0.9)
    JPEG 2000:                   build (ver 1.900.1)
    OpenEXR:                     build (ver 2.3.0)
    HDR:                         YES
    SUNRASTER:                   YES
    PXM:                         YES
    PFM:                         YES

  Video I/O:
    DC1394:                      NO
    FFMPEG:                      YES
      avcodec:                   YES (57.107.100)
      avformat:                  YES (57.83.100)
      avutil:                    YES (55.78.100)
      swscale:                   YES (4.8.100)
      avresample:                NO
    GStreamer:                   YES (1.14.5)
    v4l/v4l2:                    YES (linux/videodev2.h)
   
  Parallel framework:            pthreads

  Trace:                         YES (with Intel ITT)

  Other third-party libraries:
    Lapack:                      NO
    Eigen:                       YES (ver 3.3.4)
    Custom HAL:                  YES (carotene (ver 0.0.1))
    Protobuf:                    build (3.5.1)

  NVIDIA CUDA:                   YES (ver 10.0, CUFFT CUBLAS)
    NVIDIA GPU arch:             53
    NVIDIA PTX archs:

  cuDNN:                         YES (ver 7.5.0)

  OpenCL:                        YES (no extra features)
    Include path:                /usr/local/src/opencv-4.1.1/3rdparty/include/opencl/1.2
    Link libraries:              Dynamic load

  Python 2:
    Interpreter:                 /usr/bin/python2.7 (ver 2.7.15)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython2.7.so (ver 2.7.15+)
    numpy:                       /usr/lib/python2.7/dist-packages/numpy/core/include (ver 1.13.3)
    install path:                lib/python2.7/dist-packages/cv2/python-2.7

  Python 3:
    Interpreter:                 /usr/bin/python3 (ver 3.6.8)
    Libraries:                   /usr/lib/aarch64-linux-gnu/libpython3.6m.so (ver 3.6.8)
    numpy:                       /usr/lib/python3/dist-packages/numpy/core/include (ver 1.13.3)
    install path:                lib/python3.6/dist-packages/cv2/python-3.6

  Python (for build):            /usr/bin/python2.7

  Java:                          
    ant:                         NO
    JNI:                         NO
    Java wrappers:               NO
    Java tests:                  NO

  Install to:                    /usr/local
-----------------------------------------------------------------