The British company Arm Holdings licences the ARM Central Processing Unit (CPU) design to manufacturers such as Samsung, Apple and Broadcom for their own chip-design, which may include further functionalities into the design.
Since March 2018 the Raspberry PI 3B+ is available. It features a Broadcom BCM2837B0 SOC (System-on-Chip), containing an Cortex-A53 CPU. This CPU abides to the ARMv8.0-A instruction set.
The ARMv8.0-A instruction set has made previously optional functionality now mandatory. One of the biggest differences relate to the floating-point unit (FPU). Previously to make up for a missing FPU, the operating system (OS) had to provide a Software FPU, in short soft-fpu.
Operating Systems such as Raspbian Linux based on the rather conservative Debian Linux distribution ships with gcc-6.3.0 as default compiler, which does not know about the intricacies of the ARMv8 architecture. In fact, it assumes, it compiles for ARMv6!
To verify, compile a simple C code using the base gcc-6 using the command-line
gcc -march=native -S -fverbose-asm -mprint-tune-info -o stub.S stub.c
In the assembler dump stub.S
You’ll find the inherent options passed (due to -fverbose-asm
) including -march=armv6
, -mfloat-abi=hard
(which is good) and -mfpu=vfp
(which is not so good).
The information to provide is based on the above Wiki-pages and on output generated from cat /proc/cpuinfo
:
...
Features: half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva divt vfpd32 lpae evtstrm crc32
...
Reading the Linux source code in file arch/arm/kernel/setup.c
these include Advanced Single-Instruction Multiple Data (SIMD) instructions called NEON (neon) with 32 64-Bit registers (vfpd32) include cyclic-redundancy check instructions (crc32) but not the crypto extension (such as aes, sha1, sha2).
Now we want to compile an up-to-date gcc-8.1.0 on Raspberry PI, to make best usage of the instruction set and the Cortex-A53’s units.
There are already websites/blogs documenting how to compile gcc-8.1.0, however, they fail to address the issues mentioned above.
I found that configuring gcc with the following options provides good results:
SOURCE_DIR_OF_GCC/configure --program-suffix=-8.1 --with-arch=armv8-a+simd+crc --with-float=hard --build=arm-linux-gnueabihf --enable-languages=c,c++,fortran,lto --disable-multilib --enable-checking=release
These are the differences in the configuration with regard to the above Blog:
- Use a shorter suffix, so the new compiler will be automatically available in path
/usr/local/bin
asgcc-8.1
,g++-8.1
andgfortran-8.1
now including Link Time Optimizations (lto) - Properly set the architecture of the processor: it is an ARMv-8.0-A architecture, which does have SIMD and CRC instructions.
- Includes link-time optimizations (lto) to C, C++ and Fortran.
- Does limited checking on the compiler (reducing memory footprint).
- Only sets the build-information, so that necessary libraries are found in
/usr/lib/arm-linux-gnueabihf
, but does not set cross-compiling flags such as--target
.
After having installed this compiler, we may compile OpenCV and friends using updated compiler flags:
ccmake SOURCE_TO_OPENCV -DCMAKE_C_COMPILER=gcc-8.1 -DCMAKE_CXX_COMPILER=g++-8.1 -DCMAKE_CXX_FLAGS="-march=armv8-a+simd+crc -mfpu=auto -mtune=cortex-a53" -DCMAKE_C_FLAGS="-march=armv8-a+simd+crc -mfpu=auto -mtune=cortex-a53" ...
.