2022년 2월 19일 토요일

Cross Build Tensorflow Lite python wheel for Raspberry Pi OS 64bit on your PC

 Raspberry Pi OS 64-bit version was finally released in February 2022. Let's take a quick look at the differences between a 64-bit OS and a 32-bit OS.


Memory

If you are using a Pi model with less than 4GB of memory, there is not much difference between 32-bit and 64-bit when it comes to memory usage. However, if you plan to use the Rpi4 8GB memory model or a model with more than 8GB of memory among future models, it is good to know the memory usage according to the OS.

In principle, you cannot use more than 4GB of memory in a 32-bit OS. This is because the range of integer values ​​that can be stored in 32-bit is 0 to 4,294,967,295. This limitation is a limitation common to all OS including Windows as well as Raspberry Pi OS.

The reason for expressing the principle above is that 32-bit OSs use some tricks to overcome these limitations. 32-bit Linux running on ARM CPUs, including Raspberry Pi OS, uses a method called LPAE (Large Physical Address Extension), which enables addressing of more than 4 GB of memory. However, there is a big weakness here. Although the system as a whole can use more than 4GB of memory, there is a memory limit of 4GB per process because the 32-bit limit is not exceeded on a per-process basis. And 1 GB of this is allocated to the kernel, so the memory available to the actual process is within 3 GB.

This limitation may or may not be a problem depending on the user. If the purpose is to run a machine learning program or database server that requires a large amount of memory, this weakness of the 32-bit OS will be a problem, but if you mainly run light programs, the 32-bit OS will not be a problem.

And on a 64-bit OS, the memory usage increases a bit more. The main reason is that the memory value occupied by these variables is doubled because the variables for memory management are changed from 32-bit to 64-bit. If you install the 32-bit and 64-bit versions of Raspberry Pi OS Lite on the 512MB Zero 2, you can see that the OS occupies 48MB and 66MB of memory. But it's not a big difference. There's absolutely no reason not to use a 64-bit OS because of these differences.


Modern OS is 64 bit

SW compatibility is the main reason you should be interested in 64-bit OS in the future. Most mainstream OSs today are 64-bit, and numerous software packages are also created for these 64-bit OSs. Even if you do support a 32-bit compatible version, it is very likely that support will be discontinued at some point in the future. And the OS is also going in the direction that 32-bit OS will no longer offer new upgrades. Debian, the one used by the Raspberry Pi OS, still supports 32-bit OS, but obviously will stop supporting it at some point in the future. Perhaps for example the search engine elasticsearch no longer supports 32-bit Pi OS. Therefore, it is desirable for users who use the latest software to pay more attention to the 64-bit OS.

Nowadays, most software provides a Docker image to provide microservices in a Docker environment. The figure below compares the number of ARM-based Docker images. The number of Docker images provided by ARM64 is approximately doubled. Since most of the latest SWs are developed and updated targeting the 64-bit version, this difference is likely to widen further in the future. It feels like the reasons for moving to a 64-bit OS are increasing.


<Number of Dockers supporting ARM 32-bit and 64-bit OS>


Raspberry Pi models for use with 64-bit OS

The earliest Raspberry Pi models use 32-bit ARM CPUs. So it won't work on 64-bit OS. As can be seen from the following table, it is possible to use a 64-bit OS on Raspberry Pi models 3, 4, and Zero 2 models. 

<CPU by Raspberry Pi model>


Why you should use the 64-bit version of OS in TensorFlow

In Android mobile devices that mainly use ARM CPU or SBCs such as Raspberry Pi and Odroid, TensorFlow Lite is mainly used instead of heavy TensorFlow. However, NVidia's Jetson series is an exception because it has a GPU that supports CUDA.

Before the official version of Raspberry Pi OS 64-bit was released, the 64-bit OS used a lot in Raspberry Pi was Ubuntu 18.04+ aarch64 (64bit). As a result of testing, in Ubuntu 18.04+ aarch64 (64bit), TensorFlow Lite showed 4 times better performance than the 32-bit version of Raspberry Pi OS. 

4 times the performance is possible, isn't there any reason not to use it?

Google has well-equipped support and manuals for TensorFlow x86 CPUs and GPUs with CUDA, but ARM is quite lacking except for Android devices. It is possible to build and use the source code directly or use the packages made by preceding giants.

I mainly use Python. So, let's try to create a tensorflow light wheel file. The official website for building a wheel for TensorFlow Lite Python is https://www.tensorflow.org/lite/guide/build_cmake_pip

However, those who have visited this page will find it lacking in content.


ARM cross compilation

Until now, most users would have built on Raspberry Pi to make packages for Raspberry Pi, and Jetson Nano to create packages for Jetson Nano. For simple package builds, this method is the safest and safest. However, TensorFlow is a fairly large package and uses cmake, bazel, etc. as a build system. And since the build process requires a lot of memory, it is not easy to build on a Raspberry Pi or Jetson nano with 1 to 8GB of memory. During the build process, the system may freeze due to insufficient memory. Even if the build is successful, a huge build time is required. So it's a good idea to build such a large system on your desktop system. Mac, Windows, and Linux systems are all possible, but personally, I recommend the Linux system the most. As a cross-build platform, Ubuntu 18.04 was used, and 16 GB of memory and 16 GB of swap memory were used. A larger memory capacity would be more helpful.

And since a lot of software needs to be installed during the build process, I recommend using Docker. Google also recommends building based on Docker. So you need to install Docker on your Ubuntu 18.04 host first.


Install docker on the host machine

Install Docker with the following steps. How to install Docker on Ubuntu is well explained at https://docs.docker.com/engine/install/ubuntu/.

$ sudo apt-get update
$ sudo apt-get install  ca-certificates  curl   gnupg  lsb-release
$ curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg

$ echo \
  "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

$ sudo apt-get update
$ sudo apt-get install docker-ce docker-ce-cli containerd.io


If the output is as below, it is installed normally.

$ sudo docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2db29710123e: Pull complete 
Digest: sha256:2498fce14358aa50ead0cc6c19990fc6ff866ce72aeb5546e1d59caac3d0d60f
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.


This time, we install bazel, Google's build system.

Install bazel on the host machine

For more information on installing Basel, please refer to https://docs.bazel.build/versions/5.0.0/install-ubuntu.html.

The following is preparation for bazel installation, and you only need to run it once.

sudo apt install apt-transport-https curl gnupg
curl -fsSL https://bazel.build/bazel-release.pub.gpg | gpg --dearmor > bazel.gpg
sudo mv bazel.gpg /etc/apt/trusted.gpg.d/
echo "deb [arch=amd64] https://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list

Now install bazel.

sudo apt update && sudo apt install bazel
#latest version update
sudo apt update && sudo apt full-upgrade


Copy the tensorflow source code

The TensorFlow source code will use the latest version 2.8 as of February 2022. Older versions under version 2.2 have a slightly different build method.

git clone -b v2.8.0 https://github.com/tensorflow/tensorflow.git
cd tensorflow


The build process of method1 and method is basically the same. After installing all the software required for building Tensorflow Lite using the Docker image, Tensorflow Lite whell is built by running the Docker image.

Currently, in the case of TensorFlow version 2.8, there is no problem building up to Python 3.8, but an error occurs in the 3.9 version build. The following mainly describes the corrections needed in addition to 3.9.

Build Method 1 - Using Makefile 

This is a method introduced on the TensorFlow official website. It is easier to set up and faster than method2. Up to the process of copying the source code described in "ARM cross compilation", method1 and method2 are the same.

According to the website https://www.tensorflow.org/lite/guide/build_cmake_arm#check_your_target_environment content, it is said to build as in the following figure. 



However, the 64-bit version of Raspberry Pi OS we want to apply uses Python 3.9. However, even if you change the parameter in the above figure to 3.9, only version 3.7 is created. This is because there are some errors in the Makefile. Modify the Makefile as follows.

In the above figure, the Makefile is located in the directory (tensorflow/lite/tools/pip_package) where the make command is applied. Modify the following part in the Makefile.

docker-image:
ifeq ($(BASE_IMAGE),ubuntu:16.04)
	docker build -t $(TAG_IMAGE) --build-arg IMAGE=$(BASE_IMAGE) --build-arg PYTHON_VERSION=3.8 -f Dockerfile.py3 .
else
	docker build -t $(TAG_IMAGE) --build-arg IMAGE=$(BASE_IMAGE) .
endif

<original Makefile >

Change the Ubuntu 16.04 Docker image used in the Makefile to 18.04 and modify the Python version to use the value received as a parameter.

docker-image:
ifeq ($(BASE_IMAGE),ubuntu:18.04)
	@echo  "Python version  $(PYTHON_VERSION)"
	docker build -t $(TAG_IMAGE) --build-arg IMAGE=$(BASE_IMAGE) --build-arg PYTHON_VERSION=$(PYTHON_VERSION) -f Dockerfile.py3 .
else
	docker build -t $(TAG_IMAGE) --build-arg IMAGE=$(BASE_IMAGE) .endif

<modified Makefile>


And the tensorflow/lite/tools/pip_package/Dockerfile.py3 file needs some modifications to use it in Python 3.9.

ARG IMAGE
FROM ${IMAGE}
ARG PYTHON_VERSION
COPY update_sources.sh /
RUN /update_sources.sh

RUN dpkg --add-architecture armhf
RUN dpkg --add-architecture arm64

<original Dockerfile.py3>

And to remove the user interface such as time zone setting during docker image creation, I added the line "ARG DEBIAN_FRONTEND=noninteractive". And I added some repo information those you can find at "/usr/local/src/study/docker/tensorflow/tensorflow/tools/ci_build/install/install_pi_python3x_toolchain.sh" that used in method2.

ARG IMAGE
FROM ${IMAGE}
ARG PYTHON_VERSION

ARG DEBIAN_FRONTEND=noninteractive

COPY update_sources.sh /
RUN /update_sources.sh
RUN dpkg --add-architecture armhf
RUN dpkg --add-architecture arm64

RUN echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
RUN echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-updates main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
RUN echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-security main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
RUN echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-backports main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
RUN sed -i 's#deb http://archive.ubuntu.com/ubuntu/#deb [arch=amd64] http://archive.ubuntu.com/ubuntu/#g' /etc/apt/sources.list

<modified Dockerfile.py3>

It's time to build the python wheel

The part to pay attention to in the build command is that BASE_IMAGE must be set to Ubuntu 18.04.

make -C tensorflow/lite/tools/pip_package docker-build \
  TENSORFLOW_TARGET=aarch64 PYTHON_VERSION=3.9 BASE_IMAGE=ubuntu:18.04

After a while, the build is finished and you can check the Tensorflow Lite wheel file for Python 3.9 created as follows.

...... SKIP

adding 'tflite_runtime-2.8.0.dist-info/METADATA'
adding 'tflite_runtime-2.8.0.dist-info/WHEEL'
adding 'tflite_runtime-2.8.0.dist-info/top_level.txt'
adding 'tflite_runtime-2.8.0.dist-info/RECORD'
removing build/bdist.linux-aarch64/wheel
+ echo 'Output can be found here:'
Output can be found here:
+ find /tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/dist
/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/dist
/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/dist/tflite-runtime-2.8.0.linux-aarch64.tar.gz
/tensorflow/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3/dist/tflite_runtime-2.8.0-cp39-cp39-linux_aarch64.whl
+ [[ n != \y ]]
+ exit 0
make: Leaving directory '/usr/local/src/study/docker/tensorflow/tensorflow/lite/tools/pip_package'

As you can see from the file name, it is a TensorFlow Lite runtime, and the TensorFlow version is 2.8, the supported Python is 3.9, and the supported platform is aarch64.

If you use armhf instead of aarch64 as the parameter of build_pip_package_with_bazel.sh command, you can create a 32-bit version of the wheel, and if you use native, you will be able to create a wheel for the x86 64-bit version.


Docker Images

If the build is successful, you can also check that the tf_ci.pi-python39 Docker image has been created. You now have a docker image with build environment for aarch64, python 3.9. If the build process is performed again, the build time will be significantly reduced because the Docker image build process is omitted.

root@ubuntu:/usr/local/src/study/docker/tensorflow/tensorflow/tools/pip_package# docker images
REPOSITORY                            TAG       IMAGE ID       CREATED          SIZE
tflite-runtime-builder-ubuntu-18.04   latest    f7296e034586   23 minutes ago   1.22GB
ubuntu                                18.04     886eca19e611   6 weeks ago      63.1MB


Build Method 2 - Using bash script file 

This is how to use the build_pip_package_with_bazel.sh script file. It is slightly more complicated than method1. Up to the process of copying the source code described in ARM cross compilation, method1 and method2 are the same.


Modify the build file

Modify the aarch64 part of the tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh file as follows. 

# Build python interpreter_wrapper.
cd "${BUILD_DIR}"
case "${TENSORFLOW_TARGET}" in
  armhf)
    BAZEL_FLAGS="--config=elinux_armhf
      --copt=-march=armv7-a --copt=-mfpu=neon-vfpv4
      --copt=-O3 --copt=-fno-tree-pre --copt=-fpermissive
      --define tensorflow_mkldnn_contraction_kernel=0
      --define=raspberry_pi_with_neon=true"
    ;;
  aarch64)
    BAZEL_FLAGS="--config=elinux_aarch64
      --define tensorflow_mkldnn_contraction_kernel=0
      --copt=-O3"
    ;;
  native)
    BAZEL_FLAGS="--copt=-O3 --copt=-march=native"
    ;;
  *)
    BAZEL_FLAGS="--copt=-O3"
    ;;
esac

<original build_pip_package_with_bazel.sh>

The following is the modified tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh file.

# Build python interpreter_wrapper.
cd "${BUILD_DIR}"
case "${TENSORFLOW_TARGET}" in
  armhf)
    BAZEL_FLAGS="--config=elinux_armhf
      --copt=-march=armv7-a --copt=-mfpu=neon-vfpv4
      --copt=-O3 --copt=-fno-tree-pre --copt=-fpermissive
      --define tensorflow_mkldnn_contraction_kernel=0
      --define=raspberry_pi_with_neon=true
      --define=tflite_pip_with_flex=true
      --define=tflite_with_xnnpack=false"
    ;;
  aarch64)
    BAZEL_FLAGS="--config=elinux_aarch64
      --define tensorflow_mkldnn_contraction_kernel=0
      --define=tflite_pip_with_flex=true
      --define=tflite_with_xnnpack=true
      --copt=-O3"
    ;;
  native)
    BAZEL_FLAGS="--copt=-O3 --copt=-march=native
      --define=tflite_pip_with_flex=true
      --define=tflite_with_xnnpack=true"
    ;;
  *)
    BAZEL_FLAGS="--copt=-O3
      --define=tflite_pip_with_flex=true
      --define=tflite_with_xnnpack=true"
    ;;
esac

<modified build_pip_package_with_bazel.sh>

Modify the tensorflow/tools/ci_build/Dockerfile.pi-python39 file as follows. 

FROM ubuntu:16.04

LABEL maintainer="Katsuya Hyodo <rmsdh122@yahoo.co.jp>"

ENV CI_BUILD_PYTHON=python3.9
ENV CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.9

# Copy and run the install scripts.
COPY install/*.sh /install/
RUN /install/install_bootstrap_deb_packages.sh
RUN add-apt-repository -y ppa:openjdk-r/ppa
RUN /install/install_deb_packages.sh --without_cmake
RUN /install/install_cmake.sh

# The following line installs the Python 3.9 cross-compilation toolchain.
RUN /install/install_pi_python3x_toolchain.sh "3.9"

RUN /install/install_bazel.sh
RUN /install/install_proto3.sh
RUN /install/install_buildifier.sh
RUN /install/install_auditwheel.sh
RUN /install/install_golang.sh

# Set up the master bazelrc configuration file.
COPY install/.bazelrc /etc/bazel.bazelrc
RUN chmod 644 /etc/bazel.bazelrc

# XLA is not needed for PI
ENV TF_ENABLE_XLA=0

<original Dockerfile.pi-python39>


I made three corrections. Changed docker image from Ubuntu 16 to 18.

And to remove the user interface such as time zone setting during docker image creation, I added the line "ARG DEBIAN_FRONTEND=noninteractive".

Finally, in the install_auditwhell.sh file, we added a parameter "3.9" to add Python 3.9 related actions.

FROM ubuntu:18.04

LABEL maintainer="Katsuya Hyodo <rmsdh122@yahoo.co.jp>"

ENV CI_BUILD_PYTHON=python3.9
ENV CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.9

ARG DEBIAN_FRONTEND=noninteractive
# Copy and run the install scripts.
COPY install/*.sh /install/

RUN /install/install_bootstrap_deb_packages.sh
RUN add-apt-repository -y ppa:openjdk-r/ppa
RUN /install/install_deb_packages.sh --without_cmake
RUN /install/install_cmake.sh

# The following line installs the Python 3.9 cross-compilation toolchain.
RUN /install/install_pi_python3x_toolchain.sh "3.9"

RUN /install/install_bazel.sh
RUN /install/install_proto3.sh
RUN /install/install_buildifier.sh
RUN /install/install_auditwheel.sh  "3.9"
RUN /install/install_golang.sh

# Set up the master bazelrc configuration file.
COPY install/.bazelrc /etc/bazel.bazelrc
RUN chmod 644 /etc/bazel.bazelrc

# XLA is not needed for PI
ENV TF_ENABLE_XLA=0

<modified Dockerfile.pi-python39>


Modify the tensorflow/tools/ci_build/install/install_pi_python3x_toolchain.sh file as follows.

PYTHON_VERSION=$1
dpkg --add-architecture armhf
dpkg --add-architecture arm64
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-updates main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-security main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-backports main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
sed -i 's#deb http://archive.ubuntu.com/ubuntu/#deb [arch=amd64] http://archive.ubuntu.com/ubuntu/#g' /etc/apt/sources.list
yes | add-apt-repository ppa:deadsnakes/ppa
apt-get update
apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev
#/usr/local/bin/python3.x is needed to use /install/install_pip_packages_by_version.sh
ln -sf /usr/bin/python${PYTHON_VERSION} /usr/local/bin/python${PYTHON_VERSION}
apt-get install -y libpython${PYTHON_VERSION}-dev:armhf
apt-get install -y libpython${PYTHON_VERSION}-dev:arm64

SPLIT_VERSION=(`echo ${PYTHON_VERSION} | tr -s '.' ' '`)
if [[ SPLIT_VERSION[0] -eq 3 ]] && [[ SPLIT_VERSION[1] -ge 8 ]]; then
  apt-get install -y python${PYTHON_VERSION}-distutils
fi

/install/install_pip_packages_by_version.sh "/usr/local/bin/pip${PYTHON_VERSION}"
ln -sf /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numpy/core/include/numpy /usr/include/python${PYTHON_VERSION}/numpy

<original install_pi_python3x_toolchain.sh>

make a symbolic link of python3.9. Now Python3 command will automatically execute Python3.9 command.

PYTHON_VERSION=$1
dpkg --add-architecture armhf
dpkg --add-architecture arm64
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-updates main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-security main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
echo 'deb [arch=arm64,armhf] http://ports.ubuntu.com/ bionic-backports main restricted universe multiverse' >> /etc/apt/sources.list.d/armhf.list
sed -i 's#deb http://archive.ubuntu.com/ubuntu/#deb [arch=amd64] http://archive.ubuntu.com/ubuntu/#g' /etc/apt/sources.list
yes | add-apt-repository ppa:deadsnakes/ppa
apt-get update
apt-get install -y python${PYTHON_VERSION} python${PYTHON_VERSION}-dev
#/usr/local/bin/python3.x is needed to use /install/install_pip_packages_by_version.sh
ln -sf /usr/bin/python${PYTHON_VERSION} /usr/local/bin/python${PYTHON_VERSION}
#Add this Line
ln -sf /usr/bin/python${PYTHON_VERSION} /usr/bin/python3
apt-get install -y libpython${PYTHON_VERSION}-dev:armhf
apt-get install -y libpython${PYTHON_VERSION}-dev:arm64

SPLIT_VERSION=(`echo ${PYTHON_VERSION} | tr -s '.' ' '`)
if [[ SPLIT_VERSION[0] -eq 3 ]] && [[ SPLIT_VERSION[1] -ge 8 ]]; then
  apt-get install -y python${PYTHON_VERSION}-distutils
fi

/install/install_pip_packages_by_version.sh "/usr/local/bin/pip${PYTHON_VERSION}"
ln -sf /usr/local/lib/python${PYTHON_VERSION}/dist-packages/numpy/core/include/numpy /usr/include/python${PYTHON_VERSION}/numpy

<modified install_pi_python3x_toolchain.sh>


Modify the tensorflow/tools/ci_build/install/install_auditwheel.sh file as follows.

set -e

sudo pip3 install auditwheel==2.0.0

# Pin wheel==0.31.1 to work around issue
# https://github.com/pypa/auditwheel/issues/102
sudo pip3 install wheel==0.31.1

set +e
patchelf_location=$(which patchelf)
if [[ -z "$patchelf_location" ]]; then
  set -e
  # Install patchelf from source (it does not come with trusty package)
  wget https://nixos.org/releases/patchelf/patchelf-0.9/patchelf-0.9.tar.bz2
  tar xfa patchelf-0.9.tar.bz2
  cd patchelf-0.9
  ./configure --prefix=/usr/local
  make
  sudo make install
fi
cd ..

<original install_auditwheel.sh>


For Python 3.9, the numpy and setuptools installation lines have been added. This process is not required in Python 3.8.

set -e
PYTHON_VERSION=$1

if [[ "$PYTHON_VERSION" == "3.9" ]]; then
  sudo pip3 install setuptools==60.7.0
  sudo pip3 install numpy==1.22.1
fi

sudo pip3 install auditwheel==2.0.0

# Pin wheel==0.31.1 to work around issue
# https://github.com/pypa/auditwheel/issues/102
sudo pip3 install wheel==0.31.1

set +e
patchelf_location=$(which patchelf)
if [[ -z "$patchelf_location" ]]; then
  set -e
  # Install patchelf from source (it does not come with trusty package)
  wget https://nixos.org/releases/patchelf/patchelf-0.9/patchelf-0.9.tar.bz2
  tar xfa patchelf-0.9.tar.bz2
  cd patchelf-0.9
  ./configure --prefix=/usr/local
  make
  sudo make install
fi
cd ..

<modified install_auditwheel.sh>

And tensorflow/tools/ci_build/Dockerfile.pi-python38, Dockerfile.pi-python37 files are also recommended to change the Docker image from Ubuntu 16.04 to 18.04.


It's time to build the python wheel

The part to be concerned about in the build options is the Python 3 version to be used in the target Raspberry Pi OS. 3.7 or higher can be set. I will build with the Python 3.9 target. This is because the version of Python 3 installed on the Raspberry Pi OS 64-bit is 3.9. Note that the build may take several hours depending on your computer's performance.

### Python 3.9
sudo CI_DOCKER_EXTRA_PARAMS="-e CI_BUILD_PYTHON=python3.9 -e CROSSTOOL_PYTHON_INCLUDE_PATH=/usr/include/python3.9" \
  tensorflow/tools/ci_build/ci_build.sh PI-PYTHON39 \
  tensorflow/lite/tools/pip_package/build_pip_package_with_bazel.sh aarch64

After a while, the build is finished and you can check the Tensorflow Lite wheel file for Python 3.9 created as follows.

...... SKIP


wrapper.cc
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/python_utils.h
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/interpreter_wrapper_pybind11.cc
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/numpy.h
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/numpy.cc
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/python_utils.cc
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/python_error_reporter.h
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/python_error_reporter.cc
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/interpreter_wrapper/interpreter_wrapper.h
/workspace/tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/setup.py
root@ubuntu:/usr/local/src/study/docker/tensorflow# find . -name *.whl
./tensorflow/lite/tools/pip_package/gen/tflite_pip/python3.9/dist/tflite_runtime-2.8.0-cp39-cp39-linux_aarch64.whl

As you can see from the file name, it is a TensorFlow Lite runtime, and the TensorFlow version is 2.8, the supported Python is 3.9, and the supported platform is aarch64.

If you use armhf instead of aarch64 as the parameter of build_pip_package_with_bazel.sh command, you can create a 32-bit version of the wheel, and if you use native, you will be able to create a wheel for the x86 64-bit version.


Docker Images

If the build is successful, you can also check that the tf_ci.pi-python39 Docker image has been created. You now have a docker image with build environment for aarch64, python 3.9. If the build process is performed again, the build time will be significantly reduced because the Docker image build process is omitted.

root@ubuntu:/tmp# docker images
REPOSITORY          TAG       IMAGE ID       CREATED             SIZE
tf_ci.pi-python39   latest    3b1e395a1cb5   43 minutes ago      2GB
ubuntu              18.04     886eca19e611   6 weeks ago         63.1MB



Install TensorFlow Lite wheel on Raspberry Pi OS 64-bit

Now, install the TensorFlow lite wheel you just built on the Raspberry Pi OS 64-bit version and check whether it works properly.

First, install the Raspberry Pi OS 64-bit desktop version on the Raspberry Pi. For reference, enable ssh in Raspberry Pi Imager when creating an image. And if you are going to use a wireless LAN, it is good to include the wireless LAN information as well.

<Enable SSH and WLAN>


Installation

Before installing the TensorFlow light wheel you just built, first install the necessary packages on the Raspberry Pi.

pi@raspberrypi64:~ $ sudo apt install swig libjpeg-dev zlib1g-dev python3-dev \
                   unzip wget python3-pip curl git cmake make libgl1-mesa-glx
pi@raspberrypi64:~ $ sudo pip3 install numpy==1.22.1


And install OpenCV required for testing.

pi@raspberrypi64:~ $ pip3 install opencv-python~=4.5.3.56
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Collecting opencv-python~=4.5.3.56
  Downloading opencv_python-4.5.3.56-cp39-cp39-manylinux2014_aarch64.whl (34.2 MB)
     |████████████████████████████████| 34.2 MB 19 kB/s 
Requirement already satisfied: numpy>=1.19.3 in /usr/local/lib/python3.9/dist-packages (from opencv-python~=4.5.3.56) (1.22.1)
Installing collected packages: opencv-python
Successfully installed opencv-python-4.5.3.56


Then, copy the TensorFlow light wheel file to the Raspberry Pi and install it with the pip3 command.

pi@raspberrypi64:~ $ pip3 install tflite_runtime-2.8.0-cp39-cp39-linux_aarch64.whl 
Looking in indexes: https://pypi.org/simple, https://www.piwheels.org/simple
Processing ./tflite_runtime-2.8.0-cp39-cp39-linux_aarch64.whl
Requirement already satisfied: numpy>=1.19.2 in /usr/local/lib/python3.9/dist-packages (from tflite-runtime==2.8.0) (1.22.1)
Installing collected packages: tflite-runtime
Successfully installed tflite-runtime-2.8.0


And install the tflite-support>=0.3.1 package. The tflite-support package helps to get meta information such as labeling data from the model. I'll show you how to use it in an example later. 

pi@raspberrypi64:~ $ pip3 install tflite-support>=0.3.1


And simply check if the package is working properly.

pi@raspberrypi64:~ $ python3
Python 3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tflite_runtime.interpreter import Interpreter
>>> 

Yes it seems to be working fine. Then we will load the actual TensorFlow light model and make it work.


Testing TensorFlow Lite


Sample Python File

I made a simple python codes. I will test remotely using ssh. Therefore, we will use an image file instead of a camera.

First, download the model to be used for testing. The Efficient model is a model for object recognition and has much better performance than the previously used MobileNet.

pi@raspberrypi64:~ $ mkdir data   #copy sample test image this directory
pi@raspberrypi64:~ $ mkdir test
pi@raspberrypi64:~ $ cd test pi@raspberrypi64:~/test $ curl -L https://tfhub.dev/tensorflow/lite-model/efficientdet/lite0/detection/metadata/1?lite-format=tflite -o efficientdet_lite0.tflite

<download efficientdet_lite0.tflite>

And the following is the Python code for testing.

import argparse
import sys
import time, json
import cv2
import numpy as np
from tflite_runtime.interpreter import Interpreter
from tflite_support import metadata

parser = argparse.ArgumentParser(description='object detection')
parser.add_argument("--image", default="/home/pi/data/sample_image.jpg", help="test working directory where the image file exists")
parser.add_argument("--model", default="./efficientdet_lite0.tflite", help="model")
args = parser.parse_args()    


interpreter = Interpreter(model_path=args.model, num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
height = input_details[0]['shape'][1]
width = input_details[0]['shape'][2]
#model requires these size
print('Inference Image Height:', height)
print('Inference Image Width:', width)
min_conf_threshold = 0.5

displayer = metadata.MetadataDisplayer.with_model_file(args.model)
model_metadata = json.loads(displayer.get_metadata_json())
# Load label list from metadata.
file_name = displayer.get_packed_associated_file_list()[0]
label_map_file = displayer.get_associated_file_buffer(file_name).decode()
label_list = list(filter(len, label_map_file.splitlines()))

image = cv2.imread(args.image)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
imH, imW, _ = image.shape 
image_resized = cv2.resize(image_rgb, (width, height))
input_data = np.expand_dims(image_resized, axis=0)


# Perform the actual detection by running the model with the image as input
interpreter.set_tensor(input_details[0]['index'],input_data)
interpreter.invoke()

boxes = interpreter.get_tensor(output_details[0]['index'])[0] # Bounding box coordinates of detected objects
classes = interpreter.get_tensor(output_details[1]['index'])[0] # Class index of detected objects
scores = interpreter.get_tensor(output_details[2]['index'])[0] # Confidence of detected objects


for i in range(len(scores)):
    if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):

        # Get bounding box coordinates and draw box
        # Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
        ymin = int(max(1,(boxes[i][0] * imH)))
        xmin = int(max(1,(boxes[i][1] * imW)))
        ymax = int(min(imH,(boxes[i][2] * imH)))
        xmax = int(min(imW,(boxes[i][3] * imW)))
        
        cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
        object_name = label_list[int(classes[i])]
        label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
        labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font size
        label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of window
        cv2.rectangle(image, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0], label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
        cv2.putText(image, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 2) # Draw label text
        # Draw label

        
cv2.imwrite('./result.jpg', image) 

<test_tflite_aarcg64.py>

Now run the sample code.

@raspberrypi64:~/test $ python test_tflite_aarcg64.py 
INFO: Created TensorFlow Lite XNNPACK delegate for CPU.
Inference Height: 320
Inference Width: 320
pi@raspberrypi64:~/test $ ls -al
total 5212
drwxr-xr-x 2 pi pi    4096 Feb 19 16:01 .
drwxr-xr-x 9 pi pi    4096 Feb 19 15:33 ..
-rw-r--r-- 1 pi pi 4563519 Feb 19 14:09 efficientdet_lite0.tflite
-rw-r--r-- 1 pi pi  756451 Feb 19 16:01 result.jpg
-rw-r--r-- 1 pi pi    3239 Feb 19 15:59 test_tflite_aarcg64.py


This  is the result image. Tensorflow lite model for aarch64 works successfully.



Wrapping up

I explained how to make a Tensorflow Lite wheel for Raspberry Pi 32-bit and 64-bit OS. It has the advantage of being able to build according to various versions of Python, and it is also possible to directly build TensorFlow that can be used on other systems using ARM CPU such as Odroid and Jetson series.

Most of this article is by PINTO0309 (Katsuya Hyodo).

There is an article I referenced at https://github.com/PINTO0309/TensorflowLite-bin.

You can download the codes at my github.

 


















댓글 없음:

댓글 쓰기