2019년 12월 5일 목요일

JetsonNano - Object detection using tensorflow - 2.Boost up the FPS using TensorRT

In the last article, I downloaded a TensorFlow object recognition model and ran a simple test. However, a fairly low FPS value is needed, and there is a need for speed improvement.
If you don't read my previous post, please read that post first.

See my last post on using TensorRT.


Download the pre-trained models

At https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md downlaod the model that you want to use.

I'll test the ssd_mobilenet_v1_coco_2018_01_28 model which I downloaded in the previous post.

The downloaded model path is "model/research/object_detection/model". In my case "/usr/local/src/model/research/object_detection/model".


Create TensorRT models

Let's convert a tensorflow model to a TensorRT model. I already made a python script in my previous post. I copied pb_viewer.py th model directory.

This is my research/object_detection directory


object_detection
│
├── model
│   ├── ssd_mobilenet_v1_coco_2018_01_28
│   │   └── frozen_inference_graph.pb
│   └── ssd_mobilenet_v1_coco_2018_01_28.tar.gz
├── ....
├── ....


Now create the log file for the TensorBoard.


cd /usr/local/src/model/research
mkdir object_detection/log
python3 pv_viewer.py --model_dir=object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb --log_dir=object_detection/log
root@JetsonNano:/usr/local/src/models/research# ls -al object_detection/log/
total 28732
drwxr-xr-x  2 root root     4096 12월  5 21:09 .
drwxr-xr-x 28 root root     4096 12월  5 21:08 ..
-rw-r--r--  1 root root 29412857 12월  5 21:09 events.out.tfevents.1575547773.JetsonNano


Then run the Tensorboard with --logdir option which indicates the log directory.


root@JetsonNano:/usr/local/src/models/research# tensorboard --logdir=./object_detection/log/
2019-12-05 21:12:37.559666: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
TensorBoard 1.14.0 at http://JetsonNano:6006/ (Press CTRL+C to quit)


Now connect to port 6006 of the Jetson Nano in a web browser. It may take a minute.

There's Just one node and its name is "import".



Double click this node and it will expand like this. Carefully look at the upper right corner where the output nodes exists.


There's are 4 final output nodes and click the each node. You can see the information of outputs nodes at the right panel. These names "detection_boxes", "detection_scores", "num_detections", "detection_classes" will be used for the model conversion.



The following code converts a tensorflow model to a TensorRT model. This code is a minor modification of the code in another article I wrote. You can download this code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/blob/master/conversion%20Tensorflow%20model%20to%20TensorRT/tfmodel_2_trt.py.


import argparse
import sys, os
import time
import tensorflow as tf
from tf_trt_models.detection import download_detection_model, build_detection_graph
ver=tf.__version__.split(".")
if(int(ver[0]) == 1 and int(ver[1]) <= 13):
#if tensorflow vereion <= 1.13.1 use this module
    print('tf Version <= 1.13')
    import tensorflow.contrib.tensorrt as trt
else:
#if tensorflow vereion > 1.13.1 use this module instead
    print('tf Version > 1.13')
    from tensorflow.python.compiler.tensorrt import trt_convert as trt



def get_frozen_graph(graph_file):
  """Read Frozen Graph file from disk."""
  with tf.gfile.FastGFile(graph_file, "rb") as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
  return graph_def


parser = argparse.ArgumentParser(description='tf network model conversion to tensorrt')
parser.add_argument('--tfmodel', type=str, 
                        help='source tensorflow frozen model (pb file)')
parser.add_argument('--trtmodel', type=str, 
                        help='target tensorRT optimized model path')
parser.add_argument('--outputs', type=str, 
                        help="output string of tf model's last node optimized model path")
parser.add_argument('--precision', type=str, default='FP16',
                        help="FP16, FP32, INT16")
parser.add_argument('--max_batch_size', type=int, default=1,
                        help="batch size , default :1")
parser.add_argument('--max_workspace_size_bytes', type=int, default=2,
                        help="max_workspace_size(GB) , default :2")
args = parser.parse_args()

frozen_name = args.tfmodel
frozen_graph = get_frozen_graph(frozen_name)
# convert (optimize) frozen model to TensorRT model
your_outputs = args.outputs.split(',')

start = time.time()


trt_graph = trt.create_inference_graph(
    input_graph_def=frozen_graph,# frozen model
    outputs=your_outputs,
    is_dynamic_op=True,
    minimum_segment_size=3,
    maximum_cached_engines=int(1e3),
    max_batch_size=args.max_batch_size,# specify your max batch size
    max_workspace_size_bytes=args.max_workspace_size_bytes*(10**9),# specify the max workspace (2GB)
    precision_mode=args.precision) # precision, can be "FP32" (32 floating point precision) or "FP16"

elapsed = time.time() - start
print('Tensorflow model => TensorRT model takes : %f'%(elapsed))

#write the TensorRT model to be used later for inference
rt_name = args.trtmodel
with tf.gfile.FastGFile(rt_name , 'wb') as f:
    f.write(trt_graph.SerializeToString())
<tf_mode_2_trt.py>


Let;s run this code to create a TensorRT model.


python3 tfmodel_2_rtr.py \
--tfmodel object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb \
--trtmodel object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_rt_graph.pb \ 
--outputs  "detection_boxes,detection_classes,detection_scores,num_detections"

root@JetsonNano:/usr/local/src/models/research# ls -al object_detection/model/ssd_mobilenet_v1_coco_2018_01_28
total 83580
drwxr-xr-x 2 root   root     4096 12월  5 23:11 .
drwxr-xr-x 3 root   root     4096 12월  5 20:56 ..
-rw-r--r-- 1 345018 5000 29103956  2월  2  2018 frozen_inference_graph.pb
-rw-r--r-- 1 root   root 56471276 12월  5 23:11 frozen_inference_rt_graph.pb

Be Careful : Use --outputs parameter with the information on the the Tensorboard.


Boost up with TensorRT model

This is a object recognition code using webcam. You can choose between TensorRT or tensorflow models for the model parameter.



import argparse
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import time
import tarfile
import tensorflow.contrib.tensorrt as trt
import tensorflow as tf
import zipfile
import cv2

from distutils.version import StrictVersion
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from object_detection.utils import ops as utils_ops

if StrictVersion(tf.__version__) < StrictVersion('1.12.0'):
  raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.')

from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

tf_sess = None
parser = argparse.ArgumentParser(description='object_detection using  tensorRT')
parser.add_argument('--trtmodel', type=str, required=True, help='target tensorRT optimized model path')
args = parser.parse_args()

graph_def = tf.GraphDef()
with tf.gfile.GFile(args.trtmodel, 'rb') as fid:
  graph_def.ParseFromString(fid.read())

PATH_TO_LABELS = './object_detection/data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

def load_image_into_numpy_array(image):
  return image    
#   (im_height, im_width) = image.shape
#   return np.array(image.getdata()).reshape(
#       (im_height, im_width, 3)).astype(np.uint8)


# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)

def load_graph():
  gf = tf.GraphDef()
  with tf.gfile.GFile(args.trtmodel, 'rb') as fid:
    gf.ParseFromString(fid.read())
  return  gf

def make_session(graph_def):
  global tf_sess
  tf_config = tf.ConfigProto()
  tf_config.gpu_options.allow_growth = True
  #tf_sess = tf.Session(config=tf_config, graph = graph_def)
  tf_sess = tf.Session(config=tf_config)
  tf.import_graph_def(graph_def, name='')

def run_inference_for_single_image2(image):
    global tf_sess, graph_def

    # tf.import_graph_def(graph_def, name='')
    tf_input = tf_sess.graph.get_tensor_by_name('image_tensor' + ':0')
    # tf_scores = tf_sess.graph.get_tensor_by_name('detection_scores:0')
    # tf_boxes = tf_sess.graph.get_tensor_by_name('detection_boxes:0')
    # tf_classes = tf_sess.graph.get_tensor_by_name('detection_classes:0')
    # tf_num_detections = tf_sess.graph.get_tensor_by_name('num_detections:0')
    tensor_dict = {}
    ops = tf.get_default_graph().get_operations()
    all_tensor_names = {output.name for op in ops for output in op.outputs}

    #for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks' ]:
    for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes']:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
            tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name)


    t = time.time()
    output_dict = tf_sess.run(tensor_dict, feed_dict={tf_input: image})
    elapsed = time.time() - t
    output_dict['num_detections'] = int(output_dict['num_detections'][0])
    output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.int64)
    output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
    output_dict['detection_scores'] = output_dict['detection_scores'][0]
    return output_dict, elapsed


graph_def = load_graph()
make_session(graph_def)

cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480)
ret_val, img = cap.read()
fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
out_video = cv2.VideoWriter('/tmp/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (640, 480))
count = 0
tfps = 0.0
if cap is None:
    print("Camera Open Error")
    sys.exit(0)

while cap.isOpened() and count < 500:
    ret_val, dst = cap.read()
    if ret_val == False:
        print("Camera read Error")
        break
    image = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB)
    image_np = load_image_into_numpy_array(image)
    image_np_expanded = np.expand_dims(image_np, axis=0)
    output_dict, elapsed = run_inference_for_single_image2(image_np_expanded)
    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        output_dict['detection_boxes'],
        output_dict['detection_classes'],
        output_dict['detection_scores'],
        category_index,
        instance_masks=output_dict.get('detection_masks'),
        use_normalized_coordinates=True,
        line_thickness=8)
    fps = 1.0 / elapsed
    tfps += fps
    print("FPS:%f"%(fps))
    cv2.putText(image_np , "FPS: %f" % (fps), (20, 40),  cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow("Object Detection", image_np)
    out_video.write(image_np)
    count += 1

print("AVG FPS:%f"%(tfps / 500.0))
  
cv2.destroyAllWindows()  
out_video.release()
cap.release()


The above code does object recognition for 500 frames captured by the webcam.
First let's run the code with the Tensorflow pre-trained model.



python3 object_detection_webcam.py --trtmodel=./object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb
FPS:0.020723
FPS:3.837796
FPS:4.209690
FPS:4.992114
..........
FPS:4.685645
FPS:4.809270
FPS:4.871886
FPS:5.064435
FPS:4.978769
AVG FPS:4.934830


Now let's run the code with the TensoRT model.



python3 object_detection_webcam.py --trtmodel=./object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_rt_graph.pb
FPS:0.009204
FPS:2.600032
FPS:5.354959
FPS:5.577858
FPS:5.530435
FPS:5.482242
FPS:5.559795
.................
FPS:5.560496
FPS:5.337895
FPS:5.447933
FPS:5.530522
FPS:5.396075
AVG FPS:5.424419



The TensorRT model improves performance by about 10% but falls short of expectations.

Object detection result output video file is stored in the tmp directory,

Are you wondering if the cup was recognized but not the glasses in the above image?
The answer is in the "object_detection/data/mscoco_label_map.pbtxt" file that shows a list of things trained in the COCO model. If you see the pbtxt file, there are 90 objects name that the model can detects.


Wrapping up

TensorRT model was about 10% faster, but not as expected. Next, I'll look for ways to get the most out of the NVIDIA framework, not TensorFlow.





댓글 없음:

댓글 쓰기