This article has some references from Deep Learning based Human Pose Estimation using OpenCV.
So far, I have written a lot about OpenPose. Most of the articles have implemented the Pose Estimation function using the Python functions provided by OpenPose. In this article, I will compare how to use the Python module provided by OpenPose and the Caffe model provided by OpenCV's dnn module.
Prerequisites
Pose Models
There are three models that extract body keypoints from OpenPose. Among these, if the model is not specified, body_25 is used by default. However, when using OpenCV dnn, this value must be specified correctly.
The three models are as follows.
- COCO : 18 keypoints.
- MPI : 15 keypoints. least accurate model but fastest on CPU
- BODY_25 : fastest for CUDA version, most accurate, and includes foot keypoints
The position of the Pose model is as follows.
root@spypiggy-nx:/usr/local/src/openpose-1.7.0/models/pose# pwd /usr/local/src/openpose-1.7.0/models/pose root@spypiggy-nx:/usr/local/src/openpose-1.7.0/models/pose# tree . ├── body_25 │ ├── pose_deploy.prototxt │ └── pose_iter_584000.caffemodel ├── coco │ ├── pose_deploy_linevec.prototxt │ └── pose_iter_440000.caffemodel └── mpi ├── pose_deploy_linevec_faster_4_stages.prototxt ├── pose_deploy_linevec.prototxt └── pose_iter_160000.caffemodel 3 directories, 7 files
Tips : In the Pose model, only the default BODY_25 might be installed. If other models do not exist, run the getmodels.sh command in the models directory to download the rest of the models.
Common way to run OpenPose
The following example is a simple example using the OpenPose Python module.
import cv2 from openpose import pyopenpose as op params = dict() params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/" params["net_resolution"] = "320x256" #inference resolution opWrapper = op.WrapperPython() opWrapper.configure(params) opWrapper.start() datum = op.Datum() imageToProcess = cv2.imread('/usr/local/src/image/blackpink/blackpink.png') datum.cvInputData = imageToProcess opWrapper.emplaceAndPop(op.VectorDatum([datum])) newImage = datum.cvOutputData[:, :, :] cv2.imwrite("/tmp/result.jpg", newImage)
<original.py>
Now run the sample code.
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 sample.py Starting OpenPose Python Wrapper... Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0. [ WARN:0] global /usr/local/src/opencv-4.5.1/modules/core/src/matrix_expressions.cpp (1334) assign OpenCV/MatExpr: processing of multi-channel arrays might be changed in the future: https://github.com/opencv/opencv/issues/16739
However, there is also a way to use OpenCV's dnn module. Models created using popular machine learning frameworks are read directly from OpenCV and processed. Models that can be directly processed in OpenCV are as follows.
- PyTorch
- Tensorflow
- Darknet(YOLO)
- Caffe
- ONNIX
OpenPose uses models learned using the Caffe framework. Therefore, you can follow the method of using the Caffe model provided by OpenCV.
Running the OpenPose model in OpenCV
Various network models have been supported since OpenCV 3.3. However, in versions prior to 4.2, the acceleration capabilities using NVidia GPU is not provided. Therefore, if you use OpenCV 4.1 provided by JetPack 4.5, you can use only the CPU without the GPU acceleration capabilities.
A simple way to use OpenCV's dnn is as follows.
- Load network model
- Read Image and Prepare blob
- Make prediction(forward)
- Parse results(key points)
Let's create the sample python program in a way that uses OpenCV dnn.
In the case of using the OpenPose module, you could directly receive the image linking keypoints from the module. However, when using the OpenCV dnn module, only the accuracy values of each pixel are received. (These values can also be obtained when using the OpenPose module.) Using these values, I need to draw the keypoint of the image.
Load network model
I am using OpenPose models trained on Caffe Framework. Caffe models have 2 files –
- .prototxt file which specifies the architecture of the neural network – how the different layers are arranged etc.
- .caffemodel file which stores the weights of the trained model
I'm going to use OpenPose default body_25 models and for OpenCV 4.1 compatibility, I will use CPU mode first.
protoFile = "/usr/local/src/openpose-1.7.0/models/pose/body_25/pose_deploy.prototxt" weightsFile = "/usr/local/src/openpose-1.7.0/models/pose/body_25/pose_iter_584000.caffemodel" net = cv2.dnn.readNetFromCaffe(protoFile, weightsFile) net.setPreferableBackend(cv2.dnn.DNN_TARGET_CPU)
Read Image and Prepare blob
The input image that I read using OpenCV should be converted to a input blob ( like Caffe ) so that it can be fed to the network. The blobFromImage do the job which converts the image from OpenCV format to Caffe blob format.
img = cv2.imread('/usr/local/src/image/blackpink/blackpink.png') frameWidth = img.shape[1] frameHeight = img.shape[0] inHeight = 368 inWidth = int((inHeight/frameHeight)*frameWidth) Blob = cv2.dnn.blobFromImage(img, 1.0 / 255, (inWidth, inHeight), (0, 0, 0), swapRB=False, crop=False)
Make Prediction
Once the image is passed to the model, the predictions can be made using a single line of code. The forward method for the DNN class in OpenCV makes a forward pass through the network and its output is the prediction results.
net.setInput(Blob) output = net.forward()
Parse results(key points)
The output is a 4D matrix :
- The first dimension being the image ID. If the inference data is a single image, this value must be 1.
- The second dimension indicates the index of a keypoint. The model produces Confidence Maps and Part Affinity maps which are all concatenated. For BODY_25 model it consists of 78 parts – 25 keypoint confidence Maps + 1 background + 26*2 Part Affinity Maps. Similarly, for MPI, it produces 44 points. I will use only the first 25 points for body_25 model which correspond to Keypoints. This value defines the corresponding keypoint in the image (W x H). If this demention value is 0, you can find the nose, if it is 1, the neck, and so on. If the pose model is not BODY_25 but COCO or MPI, the index number and the corresponding body part will be different.
- The third dimension is the height of the output map.
- The fourth dimension is the width of the output map. I check whether each keypoint is present in the image or not. I get the location of the keypoint by finding the maxima of the confidence map of that keypoint. I also use a threshold to reduce false detections.
Be Careful: If you use other models like COCO, MPI, the output's 4D matrix should change. When using COCO, Keypoint detection channel might be 19(18 + background image) and PAF image might be 38(19 X 2).
Probability image extraction
The probability distribution image for each part can be extracted as follows.
for index in range(25): probMap = output[0,index,:,:]
probMap is a two-dimensional matrix of H x W. It is similar to a gray scale image. The pixel value is a probability value of the existence of a corresponding key point, and has a probability value between 0.0 and 1.0. Therefore, even if you check this probMap using imshow or imwrite function, it is difficult to check with the naked eye.
To check visually, multiply by 255 and change it to an image pixel value range of 0 to 255, and then use the imshow and imwrite functions.
key_points = { 0: "Nose", 1: "Neck", 2: "RShoulder", 3: "RElbow", 4: "RWrist", 5: "LShoulder", 6: "LElbow", 7: "LWrist", 8: "MidHip", 9: "RHip", 10: "RKnee", 11: "RAnkle", 12: "LHip", 13: "LKnee", 14: "LAnkle", 15: "REye", 16: "LEye", 17: "REar", 18: "LEar", 19: "LBigToe", 20: "LSmallToe", 21: "LHeel", 22: "RBigToe", 23: "RSmallToe", 24: "RHeel", 25: "Background" } for index in range(25): probMap = output[0,index,:,:] * 255 cv2.imwrite('/tmp/proMap_%s.jpg'%key_points[index], probMap)
If you run the code above, you can visually check the probability value for each part as shown in the following figure. Since the white point is a value close to 255, it is close to 1 as a probability value, and there is a high possibility that a keypoint exists.
If you want to accurately compare the position of the original image, you can use alpha blending as follows.
alpha = 0.3 for index in range(26): probMap = output[0,index,:,:] * 255 probMap = cv2.resize(probMap, (img.shape[1], img.shape[0])) probMap = np.asarray(probMap, np.uint8) probMap = cv2.cvtColor(probMap,cv2.COLOR_GRAY2BGR) dst = cv2.addWeighted(img, alpha, probMap, (1-alpha), 0) cv2.imwrite('/tmp/combined_%s.jpg'%key_points[index], dst)
If you run the code above, you can get the next image with the proMap and the original image blended.
PAF(Part Affinity Field) image extraction
As can be seen from the probability image, when using OpenCV's dnn module, multiple specific keypoints all appear in one image. In the case of a single person image, there is no problem, but in the case of an image containing multiple people, it is necessary to connect the key points to each person.
This is an unnecessary process using OpenPose's Python module.
Soon I face the following problems. When trying to connect the nose to the throat in an image of multiple people, it is not easy to make a valid connection.
The information necessary to effectively connect key points on a per-person basis can be obtained using PAF. As described above, the first 25 output values have a probability distribution for each of 25 body parts. And the remaining 52 contain information for linking these body parts information.
PAF
PAF is a vector representation of the relationship with the next most effective keypoint when connecting keypoints.
The figure below shows PAF vectors when keypoints 1 to 7 connect to the next keypoints.
<image from Implementation of PAF (Openpose) Pose Detection Network & its Training Accelerations on GCP>This image is a PAF image of the neck and nose joint. There are always two PAFs. One is, if the key points constituting the joint are A and B, there are vectors heading to A->B and vectors heading to B->A.
Connecting valid pairs
The process of extracting vectors from PAF images to create valid joints is quite complex. Multi-Person Pose Estimation in OpenCV using OpenPose provides an excellent example, so I will take it and use it. Unfortunately, the examples provided in this article only work for the COCO model. So I modified this example to make it work on the BODY-25 model as well.
The key is the numpy dot operation. The np.dot function creates the largest value if two vectors are in the same direction, and 0 is returned if the two vectors make up 90 degrees. Therefore, if the vector consisting of two keypoint coordinates constituting the joint and the PAF vector are in the same direction, the largest value is returned. This algorithm is used to figure out a valid keypoint joint.
OpenPose does not use a top-down method for finding people and then detecting the keypoint. It finds all key pines and then connect a valid pair, and then use the Bottom Up method to calculate the number of people.
def getKeypoints(probMap, threshold=0.1): mapSmooth = cv2.GaussianBlur(probMap,(3,3),0,0) mapMask = np.uint8(mapSmooth>threshold) keypoints = [] #find the blobs contours, _ = cv2.findContours(mapMask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) #for each blob find the maxima for cnt in contours: blobMask = np.zeros(mapMask.shape) blobMask = cv2.fillConvexPoly(blobMask, cnt, 1) maskedProbMap = mapSmooth * blobMask _, maxVal, _, maxLoc = cv2.minMaxLoc(maskedProbMap) keypoints.append(maxLoc + (probMap[maxLoc[1], maxLoc[0]],)) return keypoints # Find valid connections between the different joints of a all persons present def getValidPairs(output): valid_pairs = [] invalid_pairs = [] n_interp_samples = 10 paf_score_th = 0.1 conf_th = 0.7 # loop for every POSE_PAIR for k in range(len(mapIdx)): # A->B constitute a limb pafA = output[0, mapIdx[k][0], :, :] pafB = output[0, mapIdx[k][1], :, :] pafA = cv2.resize(pafA, (frameWidth, frameHeight)) pafB = cv2.resize(pafB, (frameWidth, frameHeight)) # Find the keypoints for the first and second limb candA = detected_keypoints[POSE_PAIRS[k][0]] candB = detected_keypoints[POSE_PAIRS[k][1]] nA = len(candA) nB = len(candB) # If keypoints for the joint-pair is detected # check every joint in candA with every joint in candB # Calculate the distance vector between the two joints # Find the PAF values at a set of interpolated points between the joints # Use the above formula to compute a score to mark the connection valid if( nA != 0 and nB != 0): valid_pair = np.zeros((0,3)) for i in range(nA): max_j=-1 maxScore = -1 found = 0 for j in range(nB): # Find d_ij d_ij = np.subtract(candB[j][:2], candA[i][:2]) norm = np.linalg.norm(d_ij) if norm: d_ij = d_ij / norm else: continue # Find p(u) interp_coord = list(zip(np.linspace(candA[i][0], candB[j][0], num=n_interp_samples), np.linspace(candA[i][1], candB[j][1], num=n_interp_samples))) # Find L(p(u)) paf_interp = [] for k in range(len(interp_coord)): paf_interp.append([pafA[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))], pafB[int(round(interp_coord[k][1])), int(round(interp_coord[k][0]))] ]) # Find E paf_scores = np.dot(paf_interp, d_ij) avg_paf_score = sum(paf_scores)/len(paf_scores) # Check if the connection is valid # If the fraction of interpolated vectors aligned with PAF is higher then threshold -> Valid Pair if ( len(np.where(paf_scores > paf_score_th)[0]) / n_interp_samples ) > conf_th : if avg_paf_score > maxScore: max_j = j maxScore = avg_paf_score found = 1 # Append the connection to the list if found: valid_pair = np.append(valid_pair, [[candA[i][3], candB[max_j][3], maxScore]], axis=0) # Append the detected connections to the global list valid_pairs.append(valid_pair) else: # If no keypoints are detected print("No Connection : k = {}".format(k)) invalid_pairs.append(k) valid_pairs.append([]) return valid_pairs, invalid_pairs # This function creates a list of keypoints belonging to each person # For each detected valid pair, it assigns the joint(s) to a person def getPersonwiseKeypoints(valid_pairs, invalid_pairs): # the last number in each row is the overall score personwiseKeypoints = -1 * np.ones((0, nPoints + 1)) for k in range(len(mapIdx)): if k not in invalid_pairs: partAs = valid_pairs[k][:,0] partBs = valid_pairs[k][:,1] indexA, indexB = np.array(POSE_PAIRS[k]) for i in range(len(valid_pairs[k])): found = 0 person_idx = -1 for j in range(len(personwiseKeypoints)): if personwiseKeypoints[j][indexA] == partAs[i]: person_idx = j found = 1 break if found: personwiseKeypoints[person_idx][indexB] = partBs[i] personwiseKeypoints[person_idx][-1] += keypoints_list[partBs[i].astype(int), 2] + valid_pairs[k][i][2] # if find no partA in the subset, create a new subset elif not found and k < (nPoints - 1): row = -1 * np.ones(nPoints + 1) row[indexA] = partAs[i] row[indexB] = partBs[i] # add the keypoint_scores for the two keypoints and the paf_score row[-1] = sum(keypoints_list[valid_pairs[k][i,:2].astype(int), 2]) + valid_pairs[k][i][2] personwiseKeypoints = np.vstack([personwiseKeypoints, row]) return personwiseKeypoints
<Part of a code that finds a valid pairs>
You can download the entire source code from my Github.
Now run the program and check the results.
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv.py --image=/usr/local/src/image/walking.jpg --model=body25 root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv.py --image=/usr/local/src/image/walking.jpg --model=coco
So far, I have briefly seen how to use OpenPose in OpenCV's dnn module. Next, I will install OpenCV 4.5 and test the speed to use the GPU acceleration function of the OpenCV dnn module.
Install OpenCV 4.5 and rebuild OpenPose
Jetpack 4.5 provides OpenCV 4.1 by default. Therefore, to implement dnn using GPU in OpenCV, it is necessary to upgrade to OpenCV 4.5(It is possible if the version is 4.2 or higher).
For the OpenCV 4.5 upgrade, refer to the following article.
And if you installed OpenPose 1.7 with OpenCV 4.1 installed, it is recommended to rebuild OpenPose 1.7 after installing OpenCV 4.5. Unless you rebuild OpenPose 1.7, there is no problem using OpenCV's dnn module, which is introduced in this article. However, if you use the Python module provided by OpenPose, an error occurs because of the OpenCV version as follows.
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 original.py Traceback (most recent call last): File "sample.py", line 2, in <module> from openpose import pyopenpose as op File "/usr/lib/python3.6/dist-packages/openpose/__init__.py", line 1, in <module> from . import pyopenpose as pyopenpose ImportError: libopencv_highgui.so.4.1: cannot open shared object file: No such file or directory
This error occurs because OpenCV was upgraded to 4.5. So, if you rebuild OpenPose with OpenCV 4.5 installed, this error will disappear.
For the OpenPose1.7 installation, refer to the following article.
If you have not installed OpenPose, you can install OpenPose after upgrading OpenCV to 4.5.
GPU mode and CPU mode speed comparison
For testing, I used about 10 seconds in Chaplin's movie.
I measured the time it took to process a total of 238 frames in cpu mode and gpu mode.
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv_video.py --video=/usr/local/src/image/chaplin.mp4 --model=coco --device=cpu ...... ...... Frame[238] processed time[15.89] Total processed time[3821.73] avg frame processing rate :16.06 root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_cv_video.py --video=/usr/local/src/image/chaplin.mp4 --model=coco --device=gpu ...... ...... Frame[238] processed time[0.96] Total processed time[241.08] avg frame processing rate :1.01
The processing time was reduced to 1/16, and it showed amazing performance. And you can see that the keypoints are being displayed normally in the output video.
OpenPose built in python module vs. OpenCV dnn
import cv2 import time import numpy as np from random import randint import argparse import sys, time from openpose import pyopenpose as op parser = argparse.ArgumentParser(description='Run keypoint detection') parser.add_argument("--device", default="gpu", help="Device to inference on") parser.add_argument("--video", default="/usr/local/src/image/chaplin.mp4", help="Input video") parser.add_argument("--model", default="body25", help="model : body25 or coco") args = parser.parse_args() threshold = 0.2 if args.model == 'body25': #Body_25 model use 25 points key_points = { 0: "Nose", 1: "Neck", 2: "RShoulder", 3: "RElbow", 4: "RWrist", 5: "LShoulder", 6: "LElbow", 7: "LWrist", 8: "MidHip", 9: "RHip", 10: "RKnee", 11: "RAnkle", 12: "LHip", 13: "LKnee", 14: "LAnkle", 15: "REye", 16: "LEye", 17: "REar", 18: "LEar", 19: "LBigToe", 20: "LSmallToe", 21: "LHeel", 22: "RBigToe", 23: "RSmallToe", 24: "RHeel", 25: "Background" } #Body_25 keypoint pairs POSE_PAIRS = [[1,2], [1,5], [2,3], [3,4], [5,6], [6,7], #arm, shoulder line [1,8], [8,9], [9,10], [10,11], [8,12], [12,13], [13,14], #2 leg [11,24], [11,22], [22,23], [14,21],[14,19],[19,20], #2 foot [1,0], [0,15], [15,17], [0,16], [16,18], #face [2,17], [5,18] ] nPoints = 25 else: key_points = { 0: "Nose", 1: "Neck", 2: "RShoulder", 3: "RElbow", 4: "RWrist", 5: "LShoulder", 6: "LElbow", 7: "LWrist", 8: "RHip", 9: "RKnee", 10: "R-Ank", 11: "LHip", 12: "LKnee", 13: "LKnee", 14: "LAnkle", 15: "REye", 16: "LEye", 17: "REar", 18: "LEar" } POSE_PAIRS = [[1,2], [1,5], [2,3], [3,4], [5,6], [6,7], [1,8], [8,9], [9,10], [1,11], [11,12], [12,13], [1,0], [0,14], [14,16], [0,15], [15,17], [2,17], [5,16] ] nPoints = 18 alpha = 0.3 colors = [ [0,100,255], [0,100,255], [0,255,255], [0,100,255], [0,255,255], [0,100,255], [0,255,0], [255,200,100], [255,0,255], [0,255,0], [255,200,100], [255,0,255], [0,0,255], [255,0,0], [200,200,0], [255,0,0], [200,200,0], [0,0,0]] cap = cv2.VideoCapture(args.video) ret, img = cap.read() if ret == False: print('Video File Read Error') sys.exit(0) frameHeight, frameWidth, c = img.shape fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') out_video = cv2.VideoWriter('/tmp/%s-%s-output.mp4'%(args.model, args.device), fourcc, cap.get(cv2.CAP_PROP_FPS), (frameWidth,frameHeight)) frame = 0 inHeight = 368 t_elapsed = 0.0 params = dict() params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/" params["net_resolution"] = "368x-1" params["display"] = "0" #speed up the processing time opWrapper = op.WrapperPython() opWrapper.configure(params) opWrapper.start() while cap.isOpened(): f_st = time.time() ret, img = cap.read() if ret == False: break frame += 1 datum = op.Datum() datum.cvInputData = img opWrapper.emplaceAndPop(op.VectorDatum([datum])) human_count = len(datum.poseKeypoints) frameClone = img.copy() #draw keypoint circle for human in range(human_count): for j in range(nPoints): if datum.poseKeypoints[human][j][2] > threshold: center = (int(datum.poseKeypoints[human][j][0]) , int(datum.poseKeypoints[human][j][1])) #cv2.circle(frameClone, datum.poseKeypoints[human][j][0:2], 5, colors[j % 17], -1, cv2.LINE_AA) cv2.circle(img, center, 3, colors[j % 17], -1, cv2.LINE_AA) #draw line for human in range(human_count): for pair in POSE_PAIRS: if datum.poseKeypoints[human][pair[0]][2] > threshold and datum.poseKeypoints[human][pair[1]][2] > threshold: S = (int(datum.poseKeypoints[human][pair[0]][0]), int(datum.poseKeypoints[human][pair[0]][1])) T = (int(datum.poseKeypoints[human][pair[1]][0]), int(datum.poseKeypoints[human][pair[1]][1])) center = (int(datum.poseKeypoints[human][j][0]) , int(datum.poseKeypoints[human][j][1])) cv2.line(frameClone, S, T, colors[pair[0] % 17], 3, cv2.LINE_AA) out_video.write(frameClone) f_elapsed = time.time() - f_st t_elapsed += f_elapsed print('Frame[%d] processed time[%4.2f]'%(frame, f_elapsed)) print('Total processed time[%4.2f]'%(t_elapsed)) print('avg frame processing rate :%4.2f'%(t_elapsed / frame)) cap.release() out_video.release()
<op_video.py using OpenPose's Python module>
root@spypiggy-nx:/usr/local/src/study/opencv_dnn# python3 op_video.py --model=coco Starting OpenPose Python Wrapper... Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0. ...... ...... Frame[238] processed time[0.34] Total processed time[88.42] avg frame processing rate :0.37
When using OpenPose's Python module, the performance is almost three times that of using OpenCV dnn GPU mode.
Wrapping up
I implemented OpenCV's keypoint recognition using OpenCV's dnn module. Since this method loads and processes Caffe models directly from OpenCV, there is an advantage that you do not need to install OpenPose if you only download the model.
In the case of using the OpenPose module, it was easy to apply because it provides the result value by classifying key points by person. However, in the case of using the OpenCV dnn module, only the keypoint value is provided, so there is a difficulty in connecting valid keypoints in units of people again using a PAF vector.
You can implement it yourself by referring to the example introduced above or the example provided on learnopencv's github site, but it takes a lot of study to understand the whole content. So, if you are only interested in keypoint extraction rather than using PAF vectors, it is much easier to use the Python module provided by OpenPose. When using OpenPose's Python module, the performance is also 3 times better than when using OpenCV dnn.
The source code can be downloaded from my github.