If you don't read my previous post, please read that post first.
See my last post on using TensorRT.
- JetsonNano - Human Pose estimation using tensorflow (Boost up the FPS using TensorRT)
- TensorRT(High speed inference engine) - 1. conversion Tensorflow model to TensrRT
Download the pre-trained models
At https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md downlaod the model that you want to use.I'll test the ssd_mobilenet_v1_coco_2018_01_28 model which I downloaded in the previous post.
The downloaded model path is "model/research/object_detection/model". In my case "/usr/local/src/model/research/object_detection/model".
Create TensorRT models
Let's convert a tensorflow model to a TensorRT model. I already made a python script in my previous post. I copied pb_viewer.py th model directory.This is my research/object_detection directory
object_detection │ ├── model │ ├── ssd_mobilenet_v1_coco_2018_01_28 │ │ └── frozen_inference_graph.pb │ └── ssd_mobilenet_v1_coco_2018_01_28.tar.gz ├── .... ├── ....
Now create the log file for the TensorBoard.
cd /usr/local/src/model/research mkdir object_detection/log python3 pv_viewer.py --model_dir=object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb --log_dir=object_detection/log root@JetsonNano:/usr/local/src/models/research# ls -al object_detection/log/ total 28732 drwxr-xr-x 2 root root 4096 12월 5 21:09 . drwxr-xr-x 28 root root 4096 12월 5 21:08 .. -rw-r--r-- 1 root root 29412857 12월 5 21:09 events.out.tfevents.1575547773.JetsonNano
Then run the Tensorboard with --logdir option which indicates the log directory.
root@JetsonNano:/usr/local/src/models/research# tensorboard --logdir=./object_detection/log/ 2019-12-05 21:12:37.559666: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0 TensorBoard 1.14.0 at http://JetsonNano:6006/ (Press CTRL+C to quit)
Now connect to port 6006 of the Jetson Nano in a web browser. It may take a minute.
There's Just one node and its name is "import".
Double click this node and it will expand like this. Carefully look at the upper right corner where the output nodes exists.
The following code converts a tensorflow model to a TensorRT model. This code is a minor modification of the code in another article I wrote. You can download this code at https://github.com/raspberry-pi-maker/NVIDIA-Jetson/blob/master/conversion%20Tensorflow%20model%20to%20TensorRT/tfmodel_2_trt.py.
import argparse import sys, os import time import tensorflow as tf from tf_trt_models.detection import download_detection_model, build_detection_graph ver=tf.__version__.split(".") if(int(ver[0]) == 1 and int(ver[1]) <= 13): #if tensorflow vereion <= 1.13.1 use this module print('tf Version <= 1.13') import tensorflow.contrib.tensorrt as trt else: #if tensorflow vereion > 1.13.1 use this module instead print('tf Version > 1.13') from tensorflow.python.compiler.tensorrt import trt_convert as trt def get_frozen_graph(graph_file): """Read Frozen Graph file from disk.""" with tf.gfile.FastGFile(graph_file, "rb") as f: graph_def = tf.GraphDef() graph_def.ParseFromString(f.read()) return graph_def parser = argparse.ArgumentParser(description='tf network model conversion to tensorrt') parser.add_argument('--tfmodel', type=str, help='source tensorflow frozen model (pb file)') parser.add_argument('--trtmodel', type=str, help='target tensorRT optimized model path') parser.add_argument('--outputs', type=str, help="output string of tf model's last node optimized model path") parser.add_argument('--precision', type=str, default='FP16', help="FP16, FP32, INT16") parser.add_argument('--max_batch_size', type=int, default=1, help="batch size , default :1") parser.add_argument('--max_workspace_size_bytes', type=int, default=2, help="max_workspace_size(GB) , default :2") args = parser.parse_args() frozen_name = args.tfmodel frozen_graph = get_frozen_graph(frozen_name) # convert (optimize) frozen model to TensorRT model your_outputs = args.outputs.split(',') start = time.time() trt_graph = trt.create_inference_graph( input_graph_def=frozen_graph,# frozen model outputs=your_outputs, is_dynamic_op=True, minimum_segment_size=3, maximum_cached_engines=int(1e3), max_batch_size=args.max_batch_size,# specify your max batch size max_workspace_size_bytes=args.max_workspace_size_bytes*(10**9),# specify the max workspace (2GB) precision_mode=args.precision) # precision, can be "FP32" (32 floating point precision) or "FP16" elapsed = time.time() - start print('Tensorflow model => TensorRT model takes : %f'%(elapsed)) #write the TensorRT model to be used later for inference rt_name = args.trtmodel with tf.gfile.FastGFile(rt_name , 'wb') as f: f.write(trt_graph.SerializeToString())
<tf_mode_2_trt.py>
Let;s run this code to create a TensorRT model.
python3 tfmodel_2_rtr.py \ --tfmodel object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb \ --trtmodel object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_rt_graph.pb \ --outputs "detection_boxes,detection_classes,detection_scores,num_detections" root@JetsonNano:/usr/local/src/models/research# ls -al object_detection/model/ssd_mobilenet_v1_coco_2018_01_28 total 83580 drwxr-xr-x 2 root root 4096 12월 5 23:11 . drwxr-xr-x 3 root root 4096 12월 5 20:56 .. -rw-r--r-- 1 345018 5000 29103956 2월 2 2018 frozen_inference_graph.pb -rw-r--r-- 1 root root 56471276 12월 5 23:11 frozen_inference_rt_graph.pb
Be Careful : Use --outputs parameter with the information on the the Tensorboard.
Boost up with TensorRT model
This is a object recognition code using webcam. You can choose between TensorRT or tensorflow models for the model parameter.import argparse import numpy as np import os import six.moves.urllib as urllib import sys import time import tarfile import tensorflow.contrib.tensorrt as trt import tensorflow as tf import zipfile import cv2 from distutils.version import StrictVersion from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from object_detection.utils import ops as utils_ops if StrictVersion(tf.__version__) < StrictVersion('1.12.0'): raise ImportError('Please upgrade your TensorFlow installation to v1.12.*.') from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as vis_util tf_sess = None parser = argparse.ArgumentParser(description='object_detection using tensorRT') parser.add_argument('--trtmodel', type=str, required=True, help='target tensorRT optimized model path') args = parser.parse_args() graph_def = tf.GraphDef() with tf.gfile.GFile(args.trtmodel, 'rb') as fid: graph_def.ParseFromString(fid.read()) PATH_TO_LABELS = './object_detection/data/mscoco_label_map.pbtxt' category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True) def load_image_into_numpy_array(image): return image # (im_height, im_width) = image.shape # return np.array(image.getdata()).reshape( # (im_height, im_width, 3)).astype(np.uint8) # Size, in inches, of the output images. IMAGE_SIZE = (12, 8) def load_graph(): gf = tf.GraphDef() with tf.gfile.GFile(args.trtmodel, 'rb') as fid: gf.ParseFromString(fid.read()) return gf def make_session(graph_def): global tf_sess tf_config = tf.ConfigProto() tf_config.gpu_options.allow_growth = True #tf_sess = tf.Session(config=tf_config, graph = graph_def) tf_sess = tf.Session(config=tf_config) tf.import_graph_def(graph_def, name='') def run_inference_for_single_image2(image): global tf_sess, graph_def # tf.import_graph_def(graph_def, name='') tf_input = tf_sess.graph.get_tensor_by_name('image_tensor' + ':0') # tf_scores = tf_sess.graph.get_tensor_by_name('detection_scores:0') # tf_boxes = tf_sess.graph.get_tensor_by_name('detection_boxes:0') # tf_classes = tf_sess.graph.get_tensor_by_name('detection_classes:0') # tf_num_detections = tf_sess.graph.get_tensor_by_name('num_detections:0') tensor_dict = {} ops = tf.get_default_graph().get_operations() all_tensor_names = {output.name for op in ops for output in op.outputs} #for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes', 'detection_masks' ]: for key in [ 'num_detections', 'detection_boxes', 'detection_scores', 'detection_classes']: tensor_name = key + ':0' if tensor_name in all_tensor_names: tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(tensor_name) t = time.time() output_dict = tf_sess.run(tensor_dict, feed_dict={tf_input: image}) elapsed = time.time() - t output_dict['num_detections'] = int(output_dict['num_detections'][0]) output_dict['detection_classes'] = output_dict[ 'detection_classes'][0].astype(np.int64) output_dict['detection_boxes'] = output_dict['detection_boxes'][0] output_dict['detection_scores'] = output_dict['detection_scores'][0] return output_dict, elapsed graph_def = load_graph() make_session(graph_def) cap = cv2.VideoCapture(0) cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640) cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 480) ret_val, img = cap.read() fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v') out_video = cv2.VideoWriter('/tmp/output.mp4', fourcc, cap.get(cv2.CAP_PROP_FPS), (640, 480)) count = 0 tfps = 0.0 if cap is None: print("Camera Open Error") sys.exit(0) while cap.isOpened() and count < 500: ret_val, dst = cap.read() if ret_val == False: print("Camera read Error") break image = cv2.cvtColor(dst, cv2.COLOR_BGR2RGB) image_np = load_image_into_numpy_array(image) image_np_expanded = np.expand_dims(image_np, axis=0) output_dict, elapsed = run_inference_for_single_image2(image_np_expanded) # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict['detection_boxes'], output_dict['detection_classes'], output_dict['detection_scores'], category_index, instance_masks=output_dict.get('detection_masks'), use_normalized_coordinates=True, line_thickness=8) fps = 1.0 / elapsed tfps += fps print("FPS:%f"%(fps)) cv2.putText(image_np , "FPS: %f" % (fps), (20, 40), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) cv2.imshow("Object Detection", image_np) out_video.write(image_np) count += 1 print("AVG FPS:%f"%(tfps / 500.0)) cv2.destroyAllWindows() out_video.release() cap.release()
The above code does object recognition for 500 frames captured by the webcam.
First let's run the code with the Tensorflow pre-trained model.
python3 object_detection_webcam.py --trtmodel=./object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb
FPS:0.020723
FPS:3.837796
FPS:4.209690
FPS:4.992114
..........
FPS:4.685645
FPS:4.809270
FPS:4.871886
FPS:5.064435
FPS:4.978769
AVG FPS:4.934830
Now let's run the code with the TensoRT model.
python3 object_detection_webcam.py --trtmodel=./object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_rt_graph.pb
FPS:0.009204
FPS:2.600032
FPS:5.354959
FPS:5.577858
FPS:5.530435
FPS:5.482242
FPS:5.559795
.................
FPS:5.560496
FPS:5.337895
FPS:5.447933
FPS:5.530522
FPS:5.396075
AVG FPS:5.424419
The TensorRT model improves performance by about 10% but falls short of expectations.
Object detection result output video file is stored in the tmp directory,
Are you wondering if the cup was recognized but not the glasses in the above image?
The answer is in the "object_detection/data/mscoco_label_map.pbtxt" file that shows a list of things trained in the COCO model. If you see the pbtxt file, there are 90 objects name that the model can detects.
댓글 없음:
댓글 쓰기