This description is based on Jetpack 4.5 and OpenPose 1.7.0 (released on November 17, 2020). The Jetson series has been tested on the Nano and Xavier NX, but will work the same on the Xavier and TX2.

In the previous article, we saw how to detect the body keypoint of OpenPose using Python.

This time, we will look at how to use video files and webcam cameras.

Prerequisites

Traditional method using OpenCV

There are many known ways to frame a video or webcam using OpenCV, and I've also used it in many articles. Extracts images frame by frame from a video or webcam. And as in the previous article, you can process it in the same way as a still image.

# From Python
# It requires OpenCV installed for Python
import sys
import cv2
import os, time
from sys import platform
import argparse
from openpose import pyopenpose as op
from datetime import datetime

try:
    # Flags
    parser = argparse.ArgumentParser()
    parser.add_argument("--video_path", default="/usr/local/src/openpose-1.7.0/examples/media/video.avi", help="Process an video. ")
    args = parser.parse_known_args()

    # Custom Params (refer to include/openpose/flags.hpp for more parameters)
    params = dict()
    params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/"
    params["net_resolution"] = "320x-1" 
    # Add others in path?
    for i in range(0, len(args[1])):
        curr_item = args[1][i]
        if i != len(args[1])-1: next_item = args[1][i+1]
        else: next_item = "1"
        if "--" in curr_item and "--" in next_item:
            key = curr_item.replace('-','')
            if key not in params:  params[key] = "1"
        elif "--" in curr_item and "--" not in next_item:
            key = curr_item.replace('-','')
            if key not in params: params[key] = next_item

    # Starting OpenPose
    opWrapper = op.WrapperPython()
    opWrapper.configure(params)
    opWrapper.start()
    
    # Process video
    cap = cv2.VideoCapture(args[0].video_path)
    color = (0,0,255) #BGR
    thickness = -1      #draw inner space of circle
    font = cv2.FONT_HERSHEY_SIMPLEX
    while True:
        s = datetime.now() 
        ret,img = cap.read()
        if ret == False:
            break
        datum = op.Datum()
        datum.cvInputData = img
        opWrapper.emplaceAndPop(op.VectorDatum([datum]))
        human_count = len(datum.poseKeypoints)
        # Display Image
        
        for human in range(human_count):
            for j in range(25):
                if datum.poseKeypoints[human][j][2] > 0.01:
                    center = (int(datum.poseKeypoints[human][j][0]) ,  int(datum.poseKeypoints[human][j][1]))
                    cv2.circle(img, center, 3, color, thickness)
        e = datetime.now()
        delta = e - s
        sec = delta.total_seconds()   
        
        cv2.putText(img,'FPS[%5.2f] %d person detected'%(1/( sec),human_count),(20,30), font, 1,(255,255,255),1,cv2.LINE_AA)
        cv2.imshow("OpenPose 1.7.0 - Tutorial Python API", img)
        cv2.waitKey(1)

except Exception as e:
    print(e)
    sys.exit(-1)
    
cap.release()
cv2.destroyAllWindows()

<01_2_body_from_video.py>

I circled the original image's keypoint coordinates to differentiate it from the default output image rendered by OpenPose.

root@spypiggy-nx:/usr/local/src/study# python3 01_2_body_from_video.py
Starting OpenPose Python Wrapper...
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.

You should probably see output like this:

If you are using a webcam, change the parameter of cv2.VideoCapture to a number, and you do not need to modify the rest of the code.

cap = cv2.VideoCapture(0) #First Webcam,

Caffe Method

This method is shown in the OpenPose example.

However, it is difficult to explain in detail because the documentation provided by the OpenPose team is insufficient. Personally, I prefer to use the OpenCV introduced earlier.

# From Python
# It requires OpenCV installed for Python
import sys
import cv2
import os, time
from sys import platform
import argparse
from openpose import pyopenpose as op
from datetime import datetime

def display(sec, datums):
    datum = datums[0]
    img = datum.cvInputData[:, :, :]
    human_count = len(datum.poseKeypoints)
    color = (0,0,255) #BGR
    thickness = -1      #draw inner space of circle
    font = cv2.FONT_HERSHEY_SIMPLEX

    for human in range(human_count):
        for j in range(25):
            if datum.poseKeypoints[human][j][2] > 0.01:
                center = (int(datum.poseKeypoints[human][j][0]) ,  int(datum.poseKeypoints[human][j][1]))
                cv2.circle(img, center, 3, color, thickness)

    cv2.putText(img,'FPS[%6.2f] %d person detected'%(1.0/( sec),human_count),(20,30), font, 1,(255,0,0),1,cv2.LINE_AA)
    cv2.imshow("OpenPose 1.7.0 - Tutorial Python API", img)
    key = cv2.waitKey(1)
    return (key == 27)


def printKeypoints(datums):
    datum = datums[0]
    print("Body keypoints: \n" + str(datum.poseKeypoints))
    print("Face keypoints: \n" + str(datum.faceKeypoints))
    print("Left hand keypoints: \n" + str(datum.handKeypoints[0]))
    print("Right hand keypoints: \n" + str(datum.handKeypoints[1]))


try:
    # Flags
    parser = argparse.ArgumentParser()
    parser.add_argument("--no_display", action="store_true", help="Disable display.")
    args = parser.parse_known_args()

    # Custom Params (refer to include/openpose/flags.hpp for more parameters)
    params = dict()
    params["model_folder"] = "/usr/local/src/openpose-1.7.0/models/"
    params["net_resolution"] = "320x256" 
    params["camera_resolution"] = "640x480" 

    # Add others in path?
    for i in range(0, len(args[1])):
        curr_item = args[1][i]
        if i != len(args[1])-1: next_item = args[1][i+1]
        else: next_item = "1"
        if "--" in curr_item and "--" in next_item:
            key = curr_item.replace('-','')
            if key not in params:  params[key] = "1"
        elif "--" in curr_item and "--" not in next_item:
            key = curr_item.replace('-','')
            if key not in params: params[key] = next_item

    # Construct it from system arguments
    # op.init_argv(args[1])
    # oppython = op.OpenposePython()

    # Starting OpenPose
    opWrapper = op.WrapperPython(op.ThreadManagerMode.AsynchronousOut)
    opWrapper.configure(params)
    opWrapper.start()

    # Main loop
    userWantsToExit = False
    while not userWantsToExit:
        # Pop frame
        s = datetime.now() 
        datumProcessed = op.VectorDatum()
        if opWrapper.waitAndPop(datumProcessed):
            e = datetime.now()
            delta = e - s
            sec = delta.total_seconds()    
            if not args[0].no_display:
                # Display image
                userWantsToExit = display(sec, datumProcessed)
            print('FPS:%6.2f Total [%d] frames return'%(1.0 / (sec), len(datumProcessed)))    
            #printKeypoints(datumProcessed)
        else:
            break

except Exception as e:
    print(e)
    sys.exit(-1)

<12_1_asynchronous_custom_output.py>

I set the camera resolution to '640x480'.

root@spypiggy-nx:/usr/local/src/study# python3 12_1_asynchronous_custom_output.py
Starting OpenPose Python Wrapper...
[ WARN:0] global /home/nvidia/host/build_opencv/nv_opencv/modules/videoio/src/cap_gstreamer.cpp (933) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, duration=-1
Desired webcam resolution 640x480 could not being set. Final resolution: 2304x1536 in /usr/local/src/openpose-1.7.0/src/openpose/producer/webcamReader.cpp:WebcamReader():37
Auto-detecting camera index... Detected and opened camera 0.
Auto-detecting all available GPUs... Detected 1 GPU(s), using 1 of them starting at GPU 0.

You should probably see output like this:

Everything is processed in the next two lines. However, there is not enough documentation on this part, so if you want to learn more, you have to look directly into the source code.

        datumProcessed = op.VectorDatum()
        if opWrapper.waitAndPop(datumProcessed):

You can check the comments on opWrapper.waitAndPop in the /include/openpose/wrapper/wrapper.hpp file.

The display function of the example provided by OpenPose outputs datum.cvOutputData of the image rendered by OpenPose, but I decided to draw the keypoint coordinates on the original captured image as in the previous example using OpenCV. The original image is datum.cvInputData.

I have seen occasional malfunctions when the webcam is set to high resolution and then output to the screen. The FPS value was output abnormally or the process was terminated. This phenomenon does not seem to be related to the previously described memory shortage. If there is anyone who knows the exact cause, please let me know.

Wrapping up

Processing video or webcams in OpenPose is done frame by frame, like most machine learning frameworks. You can use the traditional method using OpenCV or the API provided by OpenPose. Personally, I would like the OpenPose team to provide a more detailed manual on the API.

And as of now, I don't have enough understanding of the API, I find it much more comfortable and natural to use OpenCV's VideoCapture function.

The source code can be downloaded from my github.

NVIDIA Jetson and Raspberry Pi

2021년 2월 18일 목요일

OpenPose 1.7 Python Programming on Jetson Series #2(video)

Prerequisites

Traditional method using OpenCV

Caffe Method

Wrapping up

댓글 없음:

댓글 쓰기