2019년 12월 10일 화요일

JetsonNano - Hello AI World (NVIDIA DNN vision library) - 2.Classifying Images with ImageNet

This is referred to the contents of https://github.com/dusty-nv/jetson-inference/blob/master/docs/imagenet-console-2.md.




We have built the imageNet object for C++, Python3 in the previous post.

Jetson-inference API

In the Python code above we imported the jetson.inference module. Let's review the API reference.

jetson-inference


C++Python
Image RecognitionimageNetimageNet
Object DetectiondetectNetdetectNet
SegmentationsegNetsegNet


ImageNet Network models summary

alexnet: AlexNet was the winning entry in ILSVRC 2012. AlexNet is a convolutional neural network trained on over 1 million images in the ImageNet database. The network has eight tiers and can classify images into 1,000 object categories, including keyboards, mice, pencils, and animals. The input to AlexNet is an RGB image of size 256×256. This means all images in the training set and all test images need to be of size 256×256. imageNew class automatically do size conversion job.
There's a excellent post at Learn OpenCV

GoogleNet: GoogleNet model which has significant improvement over AlexNet beats VGG by IRSVRC in 2014 with a slight difference and took first place.  It is also called Inception v1 as there are v2, v3 and v4 later on, so don't be confused.


Trained on over 1 million images, GoogLeNet can classify images into 1,000 object categories, including keyboards, coffee mugs, pencils, and animals.


ResNet: ResNet-18 is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 18 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. ResNet becomes the Winner of ILSVRC 2015 in image classification, detection, and localization, as well as Winner of MS COCO 2015 detection, and segmentation. 


VGG: VGG-16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014(It won second place to fourth place.). VGG won second place by GoogleNet in the ILSVRC-2014 competition, but it was more popular than GoogleNet because of its simple structure and easy transformation.

Tip : In ResNet-XX(Number), VGG-XX(Number), XX means layers. Therefore, as this value increases, the depth of the network model increases and the accuracy is likely to increase. But on the contrary, the processing speed will drop.



imageNet - for image recognition

imageNet is an image recognition DNN class name.

imageNet object constructor


The first parameter is the model name and the second parameter is a command line arguments list.

__init__(...)
     Loads an image recognition model.
 
     Parameters:
       network (string) -- name of a built-in network to use,
                           see below for available options.
 
       argv (strings) -- command line arguments passed to imageNet,
                         see below for available options


In the above imagenet-console.py file , you can see the object conctructor.


net = jetson.inference.imageNet(opt.network, sys.argv)



imageNet arguments:


These arguments are passed through the command line parameter.

  --network NETWORK    pre-trained model to load, one of the following:
                           * alexnet
                           * googlenet (default)
                           * googlenet-12
                           * resnet-18
                           * resnet-50
                           * resnet-101
                           * resnet-152
                           * vgg-16
                           * vgg-19
                           * inception-v4
  --model MODEL        path to custom model to load (.caffemodel, .uff, or .onnx)
  --prototxt PROTOTXT  path to custom prototxt to load (for .caffemodel only)
  --labels LABELS      path to text file containing the labels for each class
  --input_blob INPUT   name of the input layer (default is 'data')
  --output_blob OUTPUT name of the output layer (default is 'prob')
  --batch_size BATCH   maximum batch size (default is 1)


Methods

Classify(...)
Classify an RGBA image and return the object's class and confidence.

Parameters:
  image  (capsule) -- CUDA memory capsule
  width  (int) -- width of the image (in pixels)
  height (int) -- height of the image (in pixels)

Returns:
  (int, float) -- tuple containing the object's class index and confidence
GetClassDesc(...)
Return the class description for the given object class.

Parameters:
  (int) -- index of the class, between [0, GetNumClasses()]

Returns:
  (string) -- the text description of the object class
GetClassSynset(...)
Return the synset data category string for the given class.
The synset generally maps to the class training data folder.

Parameters:
  (int) -- index of the class, between [0, GetNumClasses()]

Returns:
  (string) -- the synset of the class, typically 9 characters long
GetNetworkName(...)
Return the name of the built-in network used by the model.

Parameters:  (none)

Returns:
  (string) -- name of the network (e.g. 'googlenet', 'alexnet')
              or 'custom' if using a custom-loaded model
GetNumClasses(...)
Return the number of object classes that this network model is able to classify.

Parameters:  (none)

Returns:
  (int) -- number of object classes that the model supports

Testing an imageNet

The imageNet object accepts an input image and outputs the probability for each class. Having been trained on the ImageNet ILSVRC dataset of 1000 objects, the GoogleNet and ResNet-18 models were automatically downloaded during the build step. See below for other classification models that can be downloaded and used as well.
ImageNet accepts an image and outputs only one class probabilities representing the entire input image.

#!/usr/bin/python
#
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#

import jetson.inference
import jetson.utils

import argparse
import sys
import time

# parse the command line
parser = argparse.ArgumentParser(description="Classify an image using an image recognition DNN.", 
         formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.imageNet.Usage())

parser.add_argument("file_in", type=str, help="filename of the input image to process")
parser.add_argument("file_out", type=str, default=None, nargs='?', help="filename of the output image to save")
parser.add_argument("--network", type=str, default="googlenet", help="pre-trained model to load (see below for options)")

try:
 opt = parser.parse_known_args()[0]
except:
 print("")
 parser.print_help()
 sys.exit(0)

# load an image (into shared CPU/GPU memory)
img, width, height = jetson.utils.loadImageRGBA(opt.file_in)

# load the recognition network
net = jetson.inference.imageNet(opt.network, sys.argv)
t = time.time()
# classify the image
class_idx, confidence = net.Classify(img, width, height)

# find the object description
class_desc = net.GetClassDesc(class_idx)
elapsed = time.time() - t
# print out the result
print("image is recognized as '{:s}' (class #{:d}) with {:f}% confidence\n".format(class_desc, class_idx, confidence * 100))
print("FPS:%f"%(1.0 / elapsed))
# print out timing info
net.PrintProfilerTimes()

# overlay the result on the image
if opt.file_out is not None:
 font = jetson.utils.cudaFont(size=jetson.utils.adaptFontSize(width)) 
 font.OverlayText(img, width, height, "{:f}% {:s}".format(confidence * 100, class_desc), 10, 10, font.White, font.Gray40)
 jetson.utils.cudaDeviceSynchronize()
 jetson.utils.saveImageRGBA(opt.file_out, img, width, height)
<imagenet-console.py> 


The same c ++ code is available, but only Python code is used here.

I only modified the code for FPS value check. Let's run the above code.



root@JetsonNano:/usr/local/src/jetson-inference/python/examples# python3 imagenet-console.py images/granny_smith_1.jpg output_1.jpg
........
........
image is recognized as 'Granny Smith' (class #948) with 99.996948% confidence

FPS:26.744953

[TRT]   ------------------------------------------------
[TRT]   Timing Report /usr/local/bin/networks/bvlc_googlenet.caffemodel
[TRT]   ------------------------------------------------
[TRT]   Pre-Process   CPU   0.07542ms  CUDA   0.50469ms
[TRT]   Network       CPU  36.81561ms  CUDA  36.07594ms
[TRT]   Post-Process  CPU   0.30219ms  CUDA   0.30057ms
[TRT]   Total         CPU  37.19321ms  CUDA  36.88120ms
........
........


Be careful : Initially, the value will be lower than 26 FPS. Repeat this program several times to get around 25 FPS. So the video or webcam demo will show the stable FPS values. 
In my other post, You can find the FPS value of initial webcam frame is very low, and become stable soon.

python3 object_detection_webcam.py --trtmodel=./object_detection/model/ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb
FPS:0.020723
FPS:3.837796
FPS:4.209690
FPS:4.992114
..........
FPS:4.685645
FPS:4.809270
FPS:4.871886
FPS:5.064435
FPS:4.978769
AVG FPS:4.934830
<From JetsonNano - Object detection using tensorflow - 2.Boost up the FPS using TensorRT>


This is the result image.



The console shows 26 FPS values with GoogleNet model. This is the best FPS values I ever tested.
You can test the imageNet with other models like googlenet-12, resnet-18, resnet-50, vgg-16, ...  I explained these models below.

Tip : In the image above, the label for the apple image is "Granny Smith". How can you find the classification name lists? It's in the "data/networks/ilsvrc12_synset_words.txt"  

Testing imageNet with Webcam

Now test the imageNet with webcam and let's check the FPS.
The original example uses the camera and display of the jetson.utils module. But we are used to openCV through many examples. So I modified some of the original source code to implement output video generation and display using the openCV function. And I'm going to change the default camera to "/dev/video0" because I'm mainly going to use my webcam on Jetson Nano.



#!/usr/bin/python
#
# Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.
#

import jetson.inference
import jetson.utils

import argparse
import sys, time
import numpy as np
import cv2

# parse the command line
parser = argparse.ArgumentParser(description="Classify a live camera stream using an image recognition DNN.", 
         formatter_class=argparse.RawTextHelpFormatter, epilog=jetson.inference.imageNet.Usage())

parser.add_argument("--network", type=str, default="googlenet", help="pre-trained model to load (see below for options)")
parser.add_argument("--camera", type=str, default="/dev/video0", help="index of the MIPI CSI camera to use (e.g. CSI camera 0)\nor for VL42 cameras, the /dev/video device to use.\nby default, MIPI CSI camera 0 will be used.")
parser.add_argument("--width", type=int, default=640, help="desired width of camera stream (default is 1280 pixels)")
parser.add_argument("--height", type=int, default=480, help="desired height of camera stream (default is 720 pixels)")

try:
 opt = parser.parse_known_args()[0]
except:
 print("")
 parser.print_help()
 sys.exit(0)

# load the recognition network
net = jetson.inference.imageNet(opt.network, sys.argv)

# create the camera and display
font = jetson.utils.cudaFont()
camera = jetson.utils.gstCamera(opt.width, opt.height, opt.camera)
#display = jetson.utils.glDisplay()
fourcc = cv2.VideoWriter_fourcc('m', 'p', '4', 'v')
out_video = cv2.VideoWriter('/tmp/detect.mp4', fourcc, 25, (640, 480))
# process frames until user exits
count = 0
img, width, height = camera.CaptureRGBA(zeroCopy=1)
print("========== Capture Width:%d Height:%d ==========="%(width, height))

t = time.time()
while count < 500:
    # capture the image
    img, width, height = camera.CaptureRGBA(zeroCopy=1)
    # classify the image
    class_idx, confidence = net.Classify(img, width, height)
    # find the object description
    class_desc = net.GetClassDesc(class_idx)
    # overlay the result on the image 
    fps = 1.0 / ( time.time() - t)
    font.OverlayText(img, width, height, "{:05.2f}% {:s}".format(confidence * 100, class_desc), 5, 5, font.White, font.Gray40)
    font.OverlayText(img, width, height, "FPS:%5.2f"%(fps), 5, 30, font.White, font.Gray40)
    t = time.time()
    #for numpy conversion, wait for synchronizing
    jetson.utils.cudaDeviceSynchronize ()
    arr = jetson.utils.cudaToNumpy(img, width, height, 4)      #CUDA img is float type
    arr1 = cv2.cvtColor (arr.astype(np.uint8), cv2.COLOR_RGBA2BGR)
    if(count % 100 == 0):
        cv2.imwrite("/tmp/detect-" + str(count)+ ".jpg", arr1)
    out_video.write(arr1)
    cv2.imshow('imageNet', arr1)
    # print out performance info
    net.PrintProfilerTimes()
    count += 1

out_video.release()

Run the code.


python3 imagenet-webcam.py --model=resnet-50

Becareful : Don't use the arguments like this "--model resnet-50", use like this "--model=resnet-50"



This images are extracted from the video output tested with googlenet model.






This images are extracted from the video output tested with resnet-50 model.







Wrapping up


The Jetson module is fairly simple to use. The imageNet, detectNet, segNet classes have a similar structure. So if you are familiar with imageNet, it's very easy to handle the other classes.
With just a few lines of code, you can implement a image detection DNN.
And as you see in the image above, you can get quite good FPS(14 ~ 15) values. If you skip the video output, you will be able to record higher FPS.













댓글 없음:

댓글 쓰기