2021년 3월 14일 일요일

Running OpenPose models directly from OpenCV

This article has some references from Deep Learning based Human Pose Estimation using OpenCV.

 So far, I have written a lot about OpenPose. Most of the articles have implemented the Pose Estimation function using the Python functions provided by OpenPose. In this article, I will compare how to use the Python module provided by OpenPose and the Caffe model provided by OpenCV's dnn module.


Prerequisites


Pose Models

There are three models that extract body keypoints from OpenPose. Among these, if the model is not specified, body_25 is used by default. However, when using OpenCV dnn, this value must be specified correctly.

The three models are as follows.

  • COCO : 18 keypoints. 
  • MPI : 15 keypoints. least accurate model but fastest on CPU
  • BODY_25 : fastest for CUDA version, most accurate, and includes foot keypoints

The position of the Pose model is as follows.

root@spypiggy-nx:/usr/local/src/openpose-1.7.0/models/pose# pwd
/usr/local/src/openpose-1.7.0/models/pose
root@spypiggy-nx:/usr/local/src/openpose-1.7.0/models/pose# tree
.
├── body_25
   ├── pose_deploy.prototxt
   └── pose_iter_584000.caffemodel
├── coco
   ├── pose_deploy_linevec.prototxt
   └── pose_iter_440000.caffemodel
└── mpi
    ├── pose_deploy_linevec_faster_4_stages.prototxt
    ├── pose_deploy_linevec.prototxt
    └── pose_iter_160000.caffemodel

3 directories, 7 files

Tips : In the Pose model, only the default BODY_25 might be installed. If other models do not exist, run the getmodels.sh command in the models directory to download the rest of the models.


Common way to run OpenPose

The following example is a simple example using the OpenPose Python module.

import cv2
from openpose import pyopenpose as op

params = dict()
params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/"
params["net_resolution"] = "320x256"  #inference resolution

opWrapper = op.WrapperPython()
opWrapper.configure(params)
opWrapper.start()


datum = op.Datum()
imageToProcess = cv2.imread('/usr/local/src/image/blackpink/blackpink.png')
datum.cvInputData = imageToProcess
opWrapper.emplaceAndPop(op.VectorDatum([datum]))
newImage = datum.cvOutputData[:, :, :]
cv2.imwrite("/tmp/result.jpg", newImage)

<original.py>

Now run the sample code.

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 sample.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
[ WARN:0] global /usr/local/src/opencv-4.5.1/modules/core/src/matrix_expressions.cpp (1334) assign OpenCV/MatExpr: processing of multi-channel arrays might be changed in the future: https://github.com/opencv/opencv/issues/16739


</tmp/result.jpg>

However, there is also a way to use OpenCV's dnn module. Models created using popular machine learning frameworks are read directly from OpenCV and processed. Models that can be directly processed in OpenCV are as follows.

  • PyTorch
  • Tensorflow
  • Darknet(YOLO)
  • Caffe
  • ONNIX

OpenPose uses models learned using the Caffe framework. Therefore, you can follow the method of using the Caffe model provided by OpenCV.


Running the OpenPose model in OpenCV

Various network models have been supported since OpenCV 3.3. However, in versions prior to 4.2, the acceleration capabilities using NVidia GPU is not provided. Therefore, if you use OpenCV 4.1 provided by JetPack 4.5, you can use only the CPU without the GPU acceleration capabilities.

A simple way to use OpenCV's dnn is as follows.

  • Load network model
  • Read Image and Prepare blob
  • Make prediction(forward)
  • Parse results(key points)

Let's create the sample python program in a way that uses OpenCV dnn.

In the case of using the OpenPose module, you could directly receive the image linking keypoints from the module. However, when using the OpenCV dnn module, only the accuracy values of each pixel are received. (These values can also be obtained when using the OpenPose module.) Using these values, I need to draw the keypoint of the image.


Load network model

I am using OpenPose models trained on Caffe Framework. Caffe models have 2 files –

  • .prototxt file which specifies the architecture of the neural network – how the different layers are arranged etc.
  • .caffemodel file which stores the weights of the trained model

I'm going to use OpenPose default body_25 models and for OpenCV 4.1 compatibility, I will use CPU mode first.

protoFile = "/usr/local/src/openpose-1.7.0/models/pose/body_25/pose_deploy.prototxt"
weightsFile = "/usr/local/src/openpose-1.7.0/models/pose/body_25/pose_iter_584000.caffemodel"
net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile)
net.setPreferableBackend(cv2.dnn.DNN_TARGET_CPU)


Read Image and Prepare blob

The input image that I read using OpenCV should be converted to a input blob ( like Caffe ) so that it can be fed to the network. The blobFromImage do the job which converts the image from OpenCV format to Caffe blob format. 

img = cv2.imread('/usr/local/src/image/blackpink/blackpink.png')
frameWidth = img.shape[1]
frameHeight = img.shape[0]

inHeight = 368
inWidth = int((inHeight/frameHeight)*frameWidth)
Blob = cv2.dnn.blobFromImage(img, 1.0 / 255, (inWidth, inHeight),
                          (0, 0, 0), swapRB=False, crop=False)


Make Prediction

Once the image is passed to the model, the predictions can be made using a single line of code. The forward method for the DNN class in OpenCV makes a forward pass through the network and its output is the prediction results. 

net.setInput(Blob)
output = net.forward()


Parse results(key points)

The output is a 4D matrix :

  • The first dimension being the image ID. If the inference data is a single image, this value must be 1.
  • The second dimension indicates the index of a keypoint. The model produces Confidence Maps and Part Affinity maps which are all concatenated. For BODY_25 model it consists of 78 parts – 25 keypoint confidence Maps + 1 background + 26*2 Part Affinity Maps. Similarly, for MPI, it produces 44 points. I will use only the first 25 points for body_25 model which correspond to Keypoints.  This value defines the corresponding keypoint in the image (W x H). If this demention value is 0, you can find the nose, if it is 1, the neck, and so on. If the pose model is not BODY_25 but COCO or MPI, the index number and the corresponding body part will be different.
  • The third dimension is the height of the output map.
  • The fourth dimension is the width of the output map. I check whether each keypoint is present in the image or not. I get the location of the keypoint by finding the maxima of the confidence map of that keypoint. I also use a threshold to reduce false detections.



<output matrix of body-25 model> 

Be Careful: If you use other models like COCO, MPI, the output's 4D matrix should change. When using COCO, Keypoint detection channel might be 19(18 + background image) and PAF image might be 38(19 X 2).


Probability image extraction 

The probability distribution image for each part can be extracted as follows.

for index in range(25):
    probMap = output[0,index,:,:]

probMap is a two-dimensional matrix of H x W. It is similar to a gray scale image. The pixel value is a probability value of the existence of a corresponding key point, and has a probability value between 0.0 and 1.0. Therefore, even if you check this probMap using imshow or imwrite function, it is difficult to check with the naked eye.

To check visually, multiply by 255 and change it to an image pixel value range of 0 to 255, and then use the imshow and imwrite functions.

key_points = {
    0:  "Nose", 1:  "Neck", 2:  "RShoulder", 3:  "RElbow", 4:  "RWrist", 5:  "LShoulder", 6:  "LElbow",
    7:  "LWrist", 8:  "MidHip", 9:  "RHip", 10: "RKnee", 11: "RAnkle", 12: "LHip", 13: "LKnee",
    14: "LAnkle", 15: "REye", 16: "LEye", 17: "REar", 18: "LEar", 19: "LBigToe", 20: "LSmallToe",
    21: "LHeel", 22: "RBigToe", 23: "RSmallToe", 24: "RHeel", 25: "Background"
}
for index in range(25):
    probMap = output[0,index,:,:] * 255
    cv2.imwrite('/tmp/proMap_%s.jpg'%key_points[index], probMap)

If you run the code above, you can visually check the probability value for each part as shown in the following figure. Since the white point is a value close to 255, it is close to 1 as a probability value, and there is a high possibility that a keypoint exists.


If you want to accurately compare the position of the original image, you can use alpha blending as follows.

alpha = 0.3

for index in range(26):
    probMap = output[0,index,:,:] * 255
    probMap = cv2.resize(probMap, (img.shape[1], img.shape[0]))
    probMap = np.asarray(probMap, np.uint8)
    probMap = cv2.cvtColor(probMap,cv2.COLOR_GRAY2BGR)
    dst = cv2.addWeighted(img, alpha, probMap, (1-alpha), 0)
    cv2.imwrite('/tmp/combined_%s.jpg'%key_points[index], dst)

If you run the code above, you can get the next image with the proMap and the original image blended.

<combined_Nose.jpg>


PAF(Part Affinity Field) image extraction 

As can be seen from the probability image, when using OpenCV's dnn module, multiple specific keypoints all appear in one image. In the case of a single person image, there is no problem, but in the case of an image containing multiple people, it is necessary to connect the key points to each person.

This is an unnecessary process using OpenPose's Python module.

Soon I face the following problems. When trying to connect the nose to the throat in an image of multiple people, it is not easy to make a valid connection.

<which neck should I connect?>


The information necessary to effectively connect key points on a per-person basis can be obtained using PAF. As described above, the first 25 output values have a probability distribution for each of 25 body parts. And the remaining 52 contain information for linking these body parts information. 


PAF

PAF is a vector representation of the relationship with the next most effective keypoint when connecting keypoints.

The figure below shows PAF vectors when keypoints 1 to 7 connect to the next keypoints.

<image from Implementation of PAF (Openpose) Pose Detection Network & its Training Accelerations on GCP>


This image is a PAF image of the neck and nose joint. There are always two PAFs. One is, if the key points constituting the joint are A and B, there are vectors heading to A->B and vectors heading to B->A.

<PAF image of joint pair(neck and nose)>


Connecting valid pairs

The process of extracting vectors from PAF images to create valid joints is quite complex. Multi-Person Pose Estimation in OpenCV using OpenPose provides an excellent example, so I will take it and use it. Unfortunately, the examples provided in this article only work for the COCO model. So I modified this example to make it work on the BODY-25 model as well.

The key is the numpy dot operation. The np.dot function creates the largest value if two vectors are in the same direction, and 0 is returned if the two vectors make up 90 degrees. Therefore, if the vector consisting of two keypoint coordinates constituting the joint and the PAF vector are in the same direction, the largest value is returned. This algorithm is used to figure out a valid keypoint joint.

OpenPose does not use a top-down method for finding people and then detecting the keypoint. It finds all key pines and then connect a valid pair, and then use the Bottom Up method to calculate the number of people.

def getKeypoints(probMap, threshold=0.1):

    mapSmooth = cv2.GaussianBlur(probMap,(3,3),0,0)

    mapMask = np.uint8(mapSmooth>threshold)
    keypoints = []

    #find the blobs
    contours, _ = cv2.findContours(mapMask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)

    #for each blob find the maxima
    for cnt in contours:
        blobMask = np.zeros(mapMask.shape)
        blobMask = cv2.fillConvexPoly(blobMask, cnt, 1)
        maskedProbMap = mapSmooth * blobMask
        _, maxVal, _, maxLoc = cv2.minMaxLoc(maskedProbMap)
        keypoints.append(maxLoc + (probMap[maxLoc[1], maxLoc[0]],))

    return keypoints



# Find valid connections between the different joints of a all persons present
def getValidPairs(output):
    valid_pairs = []
    invalid_pairs = []
    n_interp_samples = 10
    paf_score_th = 0.1
    conf_th = 0.7
    # loop for every POSE_PAIR
    for k in range(len(mapIdx)):
        # A->B constitute a limb
        pafA = output[0, mapIdx[k][0], :, :]
        pafB = output[0, mapIdx[k][1], :, :]
        pafA = cv2.resize(pafA, (frameWidth, frameHeight))
        pafB = cv2.resize(pafB, (frameWidth, frameHeight))

        # Find the keypoints for the first and second limb
        candA = detected_keypoints[POSE_PAIRS[k][0]]
        candB = detected_keypoints[POSE_PAIRS[k][1]]
        nA = len(candA)
        nB = len(candB)

        # If keypoints for the joint-pair is detected
        # check every joint in candA with every joint in candB
        # Calculate the distance vector between the two joints
        # Find the PAF values at a set of interpolated points between the joints
        # Use the above formula to compute a score to mark the connection valid

        if( nA != 0 and nB != 0):
            valid_pair = np.zeros((0,3))
            for i in range(nA):
                max_j=-1
                maxScore = -1
                found = 0
                for j in range(nB):
                    # Find d_ij
                    d_ij = np.subtract(candB[j][:2], candA[i][:2])
                    norm = np.linalg.norm(d_ij)
                    if norm:
                        d_ij = d_ij / norm
                    else:
                        continue
                    # Find p(u)
                    interp_coord = list(zip(np.linspace(candA[i][0], candB[j][0], num=n_interp_samples),
                                            np.linspace(candA[i][1], candB[j][1], num=n_interp_samples)))
                    # Find L(p(u))
                    paf_interp = []
                    for k in range(len(interp_coord)):
                        paf_interp.append([pafA[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))],
                                           pafB[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))] ])
                    # Find E
                    paf_scores = np.dot(paf_interp, d_ij)
                    avg_paf_score = sum(paf_scores)/len(paf_scores)

                    # Check if the connection is valid
                    # If the fraction of interpolated vectors aligned with PAF is higher then threshold -> Valid Pair
                    if ( len(np.where(paf_scores > paf_score_th)[0]) / n_interp_samples ) > conf_th :
                        if avg_paf_score > maxScore:
                            max_j = j
                            maxScore = avg_paf_score
                            found = 1
                # Append the connection to the list
                if found:
                    valid_pair = np.append(valid_pair, [[candA[i][3], candB[max_j][3], maxScore]], axis=0)

            # Append the detected connections to the global list
            valid_pairs.append(valid_pair)
        else: # If no keypoints are detected
            print("No Connection : k = {}".format(k))
            invalid_pairs.append(k)
            valid_pairs.append([])
    return valid_pairs, invalid_pairs



# This function creates a list of keypoints belonging to each person
# For each detected valid pair, it assigns the joint(s) to a person
def getPersonwiseKeypoints(valid_pairs, invalid_pairs):
    # the last number in each row is the overall score
    personwiseKeypoints = -1 * np.ones((0, nPoints + 1))

    for k in range(len(mapIdx)):
        if k not in invalid_pairs:
            partAs = valid_pairs[k][:,0]
            partBs = valid_pairs[k][:,1]
            indexA, indexB = np.array(POSE_PAIRS[k])

            for i in range(len(valid_pairs[k])):
                found = 0
                person_idx = -1
                for j in range(len(personwiseKeypoints)):
                    if personwiseKeypoints[j][indexA] == partAs[i]:
                        person_idx = j
                        found = 1
                        break

                if found:
                    personwiseKeypoints[person_idx][indexB] = partBs[i]
                    personwiseKeypoints[person_idx][-1] += keypoints_list[partBs[i].astype(int), 2] + valid_pairs[k][i][2]

                # if find no partA in the subset, create a new subset
                elif not found and k < (nPoints - 1):
                    row = -1 * np.ones(nPoints + 1)
                    row[indexA] = partAs[i]
                    row[indexB] = partBs[i]
                    # add the keypoint_scores for the two keypoints and the paf_score
                    row[-1] = sum(keypoints_list[valid_pairs[k][i,:2].astype(int), 2]) + valid_pairs[k][i][2]
                    personwiseKeypoints = np.vstack([personwiseKeypoints, row])
    return personwiseKeypoints

<Part of a code that finds a valid pairs>

You can download the entire source code from my Github.

Now run the program and check the results.

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv.py --image=/usr/local/src/image/walking.jpg --model=body25          
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv.py --image=/usr/local/src/image/walking.jpg --model=coco


<body-25 model result and coco model result>

So far, I have briefly seen how to use OpenPose in OpenCV's dnn module. Next, I will install OpenCV 4.5 and test the speed to use the GPU acceleration function of the OpenCV dnn module.


Install OpenCV 4.5 and rebuild OpenPose

Jetpack 4.5 provides OpenCV 4.1 by default. Therefore, to implement dnn using GPU in OpenCV, it is necessary to upgrade to OpenCV 4.5(It is possible if the version is 4.2 or higher).

For the OpenCV 4.5 upgrade, refer to the following article.


And if you installed OpenPose 1.7 with OpenCV 4.1 installed, it is recommended to rebuild OpenPose 1.7 after installing OpenCV 4.5. Unless you rebuild OpenPose 1.7, there is no problem using OpenCV's dnn module, which is introduced in this article. However, if you use the Python module provided by OpenPose, an error occurs because of the OpenCV version as follows.

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 original.py
Traceback (most recent call last):
  File "sample.py", line 2, in <module>
    from openpose import pyopenpose as op
  File "/usr/lib/python3.6/dist-packages/openpose/__init__.py", line 1, in <module>
    from . import pyopenpose as pyopenpose
ImportError: libopencv_highgui.so.4.1: cannot open shared object file: No such file or directory

This error occurs because OpenCV was upgraded to 4.5. So, if you rebuild OpenPose with OpenCV 4.5 installed, this error will disappear. 

For the OpenPose1.7 installation, refer to the following article.

If you have not installed OpenPose, you can install OpenPose after upgrading OpenCV to 4.5.


GPU mode and CPU mode speed comparison

For testing, I used about 10 seconds in Chaplin's movie.

<Charlie Chaplin's Modern Times>


I measured the time it took to process a total of 238 frames in cpu mode and gpu mode.

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv_video.py --video=/usr/local/src/image/chaplin.mp4 --model=coco --device=cpu
......
......

Frame[238] processed time[15.89]
Total processed time[3821.73]
avg frame processing rate :16.06 

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv_video.py --video=/usr/local/src/image/chaplin.mp4 --model=coco --device=gpu
......
......

Frame[238] processed time[0.96]
Total processed time[241.08]
avg frame processing rate :1.01 


The processing time was reduced to 1/16, and it showed amazing performance. And you can see that the keypoints are being displayed normally in the output video.

<gpu-coco-output.mp4>

If you are using OpenCV dnn in the Jetson series, this is why you should upgrade OpenCV to 4.2 or higher. If you use frameworks such as Tensorflow, Caffe, PyTorch, YOLO, etc. other than OpenPose in the Jetson series, this is also a reason to upgrade OpenCV.

OpenPose built in python module vs. OpenCV dnn

Let's compare the performance of the Python module provided by OpenPose with the dnn module of OpenCV tested earlier. 
For testing, let's simplify an example program using OpenPose's Python module and test it with the same Charlie Chaplin video.


import cv2
import time
import numpy as np
from random import randint
import argparse
import sys, time
from openpose import pyopenpose as op

parser = argparse.ArgumentParser(description='Run keypoint detection')
parser.add_argument("--device", default="gpu", help="Device to inference on")
parser.add_argument("--video", default="/usr/local/src/image/chaplin.mp4", help="Input video")
parser.add_argument("--model", default="body25", help="model : body25 or coco")
args = parser.parse_args()



threshold = 0.2

if args.model == 'body25':
    #Body_25 model use 25 points
    key_points = {
        0:  "Nose", 1:  "Neck", 2:  "RShoulder", 3:  "RElbow", 4:  "RWrist", 5:  "LShoulder", 6:  "LElbow",
        7:  "LWrist", 8:  "MidHip", 9:  "RHip", 10: "RKnee", 11: "RAnkle", 12: "LHip", 13: "LKnee",
        14: "LAnkle", 15: "REye", 16: "LEye", 17: "REar", 18: "LEar", 19: "LBigToe", 20: "LSmallToe",
        21: "LHeel", 22: "RBigToe", 23: "RSmallToe", 24: "RHeel", 25: "Background"
    }

    #Body_25 keypoint pairs 
    POSE_PAIRS = [[1,2], [1,5], [2,3], [3,4], [5,6], [6,7],     #arm, shoulder line
                [1,8], [8,9], [9,10], [10,11], [8,12], [12,13], [13,14],  #2 leg
                [11,24], [11,22], [22,23], [14,21],[14,19],[19,20],    #2 foot  
                [1,0], [0,15], [15,17], [0,16], [16,18], #face
                [2,17], [5,18]
                ]  
                
    nPoints = 25

else:
    key_points = {
        0:  "Nose", 1:  "Neck", 2:  "RShoulder", 3:  "RElbow", 4:  "RWrist", 5:  "LShoulder", 6:  "LElbow",
        7:  "LWrist", 8:  "RHip", 9:  "RKnee", 10: "R-Ank", 11: "LHip", 12: "LKnee", 13: "LKnee", 14: "LAnkle", 
        15: "REye", 16: "LEye", 17: "REar", 18: "LEar"
    }
    POSE_PAIRS = [[1,2], [1,5], [2,3], [3,4], [5,6], [6,7],
                [1,8], [8,9], [9,10], [1,11], [11,12], [12,13],
                [1,0], [0,14], [14,16], [0,15], [15,17],
                [2,17], [5,16] ]
    nPoints = 18

alpha = 0.3

colors = [ [0,100,255], [0,100,255], [0,255,255], [0,100,255], [0,255,255], [0,100,255],
         [0,255,0], [255,200,100], [255,0,255], [0,255,0], [255,200,100], [255,0,255],
         [0,0,255], [255,0,0], [200,200,0], [255,0,0], [200,200,0], [0,0,0]]


cap = cv2.VideoCapture(args.video)
ret, img = cap.read()
if ret == False:
    print('Video File Read Error')    
    sys.exit(0)
frameHeight, frameWidth, c = img.shape

fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
out_video = cv2.VideoWriter('/tmp/%s-%s-output.mp4'%(args.model, args.device), fourcc, cap.get(cv2.CAP_PROP_FPS), (frameWidth,frameHeight))
frame = 0
inHeight = 368
t_elapsed = 0.0

params = dict()
params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/"
params["net_resolution"] = "368x-1" 
params["display"] = "0"     #speed up the processing time

opWrapper = op.WrapperPython()
opWrapper.configure(params)
opWrapper.start()


while cap.isOpened():
    f_st = time.time()
    ret, img = cap.read()
    if ret == False:
        break
    frame += 1    

    datum = op.Datum()
    datum.cvInputData = img
    opWrapper.emplaceAndPop(op.VectorDatum([datum]))
    human_count = len(datum.poseKeypoints)

    frameClone = img.copy()

    #draw keypoint circle
    for human in range(human_count):
        for j in range(nPoints):
            if datum.poseKeypoints[human][j][2] > threshold:
                center = (int(datum.poseKeypoints[human][j][0]) ,  int(datum.poseKeypoints[human][j][1]))
                #cv2.circle(frameClone, datum.poseKeypoints[human][j][0:2], 5, colors[j % 17], -1, cv2.LINE_AA)
                cv2.circle(img, center, 3, colors[j % 17], -1, cv2.LINE_AA)

    #draw line
    for human in range(human_count):
        for pair in POSE_PAIRS:
            if datum.poseKeypoints[human][pair[0]][2] > threshold and datum.poseKeypoints[human][pair[1]][2] > threshold:
                S = (int(datum.poseKeypoints[human][pair[0]][0]), int(datum.poseKeypoints[human][pair[0]][1]))
                T = (int(datum.poseKeypoints[human][pair[1]][0]), int(datum.poseKeypoints[human][pair[1]][1]))
                center = (int(datum.poseKeypoints[human][j][0]) ,  int(datum.poseKeypoints[human][j][1]))
                cv2.line(frameClone, S, T, colors[pair[0] % 17], 3, cv2.LINE_AA)


    out_video.write(frameClone)
    f_elapsed = time.time() - f_st 
    t_elapsed += f_elapsed
    print('Frame[%d] processed time[%4.2f]'%(frame, f_elapsed))


print('Total processed time[%4.2f]'%(t_elapsed))
print('avg frame processing rate :%4.2f'%(t_elapsed / frame))
cap.release()
out_video.release()

<op_video.py using OpenPose's Python module>


Now run the code and check the performance.

root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_video.py --model=coco
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.
......
......

Frame[238] processed time[0.34]
Total processed time[88.42]
avg frame processing rate :0.37

When using OpenPose's Python module, the performance is almost three times that of using OpenCV dnn GPU mode. 

Perhaps the reason for this difference in performance is that the part that connects ketpoint pairs using PAF is handled in C/C++ code in OpenPose, while in Python in the example using OpenCV dnn introduced earlier.
And while the OpenCV dnn module produced 26X3 output images, OpenPose omitting this process would have had an effect.

Wrapping up

I implemented OpenCV's keypoint recognition using OpenCV's dnn module. Since this method loads and processes Caffe models directly from OpenCV, there is an advantage that you do not need to install OpenPose if you only download the model.

In the case of using the OpenPose module, it was easy to apply because it provides the result value by classifying key points by person. However, in the case of using the OpenCV dnn module, only the keypoint value is provided, so there is a difficulty in connecting valid keypoints in units of people again using a PAF vector.

You can implement it yourself by referring to the example introduced above or the example provided on learnopencv's github site, but it takes a lot of study to understand the whole content. So, if you are only interested in keypoint extraction rather than using PAF vectors, it is much easier to use the Python module provided by OpenPose. When using OpenPose's Python module, the performance is also 3 times better than when using OpenCV dnn.

The source code can be downloaded from my github.