Monday, 17 June 2024

Optimus under the Hood: OpenCV with CUDA for Nvidia GT 640M GPU and Slackware 14.2

Optimus Prime stepping forth from laptop - AI-generated image from getimg.ai

 Never thought much about my laptop GPUs. Even less about Nvidia GPUs as I gave up on proprietary software 20 years ago. I was quite happy with the open source noveau driver, until Nvidia's cuDNN allowed OpenCV imaging programs to use Deep Neural Nets - AI.

Installing CUDA

Slowly, for it was a little cumbersome to hold your nose at the same time, I loaded the CUDA Linux toolkit into by GeForce GT710 desktop. The process was as unpleasant as ever - 10-year old proprietary software starts to look like abandonware, but the results were amazing. The GPU heated up like crazy and my desktop blew up, but OpenCV flew.

Acer Aspire M3-581TG


Suddenly there were low-cost possibilities for AI-enabled imaging systems - surveillance video, even augmented reality. And some of my old laptops (defenestrated, of course) had Nvidia GPUs. I started with an old Acer Aspire M3-581TG - it has an Nvidia GeForce 640M, or so the sticker on the keyboard says. 

lspci came up with a surprise - the GPU was an Intel GPU:

root@aspireM3:/$lspci

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)

00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)

00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)

00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)

00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)

00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)

00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)

00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)

00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)

00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)

00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)

07:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)

0d:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)

0e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetLink BCM57780 Gigabit Ethernet PCIe (rev 01)

Now if I had read all the lines instead of stopping after the first 3, I would have noticed it also had an Nvidia GPU! the GK107M or GeForce GT 640M. It took quite a few weeks to recover from the shock - two GPUs in a laptop? The GPUs were switched in and out depending on whether graphics performance or power consumption was being prioritised. Nvidia called this its Optimus system.
GPU Switching


Now the GT 640M is quite an old GPU, and the best way would be to install CUDA/cuDNN/OpenCV on a matching Ubuntu distribution. But my M3-581TG had been defenestrated 10 years ago. It ran Slackware 14.2-current and was too much work on it to install new. 

Nvidia GPU, CUDA Toolkit, cuDNN and OpenCV are notoriously finicky and you need to get the versions just right. Not to mention your gcc, libraries and various Linux bits. CUDA and cuDNN are proprietary blobs so it is a matter of installing the various versions until one works. The first thing to do is to go past the Nvidia marketing guff and find out the GT 640M's GPU architecture. Its real name is the GK107 and the architecture is Kepler.

The you need to find the the GK107's Compute Capability, which from Nvidia is 3.0. From the cuDNN Support Matrix, the chances of it working with cuDNN 7.6.4, CUDA 10.1.243 and Linux driver at least r418.39 seems promising.

 First the driver. I started with slackbuild version, r460.67. Normally, you do a slackbuild with the Nvidia blob, but I had good results with Nvidia installer with the GT 710 so I downloaded it from Nvidia and ran it directly: 

#sh NVIDIA-Linux-x86_64-460.67.run

Now if you selected the dkms option the installer will fail and you will need to slackbuild dkms first.
$sh ./dkms.SlackBuild
$upgradepkg --install-new /tmp/dkms-2.8.4-x86_64-1_SBo.tgz
After which it needs to be run as a service, so
$vi /etc/rc.d/rc.modules.local

# Enable DKMS module rebuilding
if [ -x /usr/lib/dkms/dkms_autoinstaller ]; then
  echo "Running DKMS autoinstaller"
  /usr/lib/dkms/dkms_autoinstaller start
fi

dkms may result in build errors so in the end I deselected it. After the installer finished the original nouveau driver was blacklisted and the Nvidia driver loaded but my X windows would not start. It turned out I first need to lspci for the GPU bus number:
# lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)

And enter it into a new xorg.conf:
# cat /etc/X11/xorg.conf

Section "Module"
    Load "modesetting"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver "nvidia"
    BusID "PCI:1:0:0"
    Option "AllowEmptyInitialConfiguration"
EndSection

With X up, check the loaded driver:
$nvidia-smi
Sat Jun  8 21:53:52 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 640M     Off  | 00000000:01:00.0 N/A |                  N/A |
| N/A   62C    P8    N/A /  N/A |    149MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Next is CUDA. The cuDNN compatibility matrix says 10.1.243 but I had good luck with CUDA 10.2.89 and it was very close to 10.1.243 so
$sh ./cuda_10.2.89_440.33.01_linux.run
Note I took care not to install the included GPU driver as I already had a working 460.67.

After that you will need to include the CUDA path ion your bash profile:
$cat ~/.bash_profile
PATH=$HOME/utils:/usr/local/cuda-10.2/bin:$PATH
export PS1="\u@\h:\w\$"

To test, there is a neat little test program, and:
$nvcc -o check_cuda check_cuda.c -lcuda
$./check_cuda
Found 1 device(s).
Device: 0
  Name: GeForce GT 640M
  Compute Capability: 3.0
  Multiprocessors: 2
  Concurrent threads: 4096
  GPU clock: 708.5 MHz
  Memory clock: 900 MHz
  Total Memory: 981 MiB
  Free Memory: 723 MiB

Next is cuDNN and from slackbuild is version to use is 8.0 but that did not work out with OpenCV so I dialed it down a notch to cuDNN 7.6.5. This time I went with slackbuild with a few mods to get it to work:
$cp cudnn.SlackBuild cudnn.SlackBuild-v8.0_11.0
$cat cudnn.SlackBuild

PRGNAM=cudnn
VERSION=${VERSION:-v7.6_10.2}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}

CUDNN_VERSION=${VERSION%_*}
CUDA_VERSION=${VERSION#*_}
$ln -s cudnn-10.2-linux-x64-v7.6.5.32.tgz cudnn-10.2-linux-x64-v7.6.tgz
$./cudnn.SlackBuild
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.6.5
cuda/lib64/libcudnn_static.a

Slackware package /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz created.
$upgradepkg --install-new /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz


We have suffered losses, but we will install OpenCV ...

Next  is the biggie, OpenCV. This usually means lots of iterations. Amos Stailey-Young's page is a good place to start. What worked for me is OpenCV 4.3.0 and opencv_contrib 4.3.0. Untar them into their respective subdirectories.

The cmake is:
heong@aspireM3:~/cuda/opencv/build$cmake -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES -gencode=arch=
compute_35,code=sm_35" -D CMAKE_BUILD_TYPE=RELEASE -D OPENCV_GENERATE_PKGCONFIG=ON -DBUILD_SHARED_LIBS=OFF -D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_C_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D WITH_OPENEXR=OFF -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_CUDNN=ON -D CUDA_ARCH_BIN=3.0 -D OPENCV_DNN_CUDA=ON -D OPENCV_EXTRA_MODULES_PATH=~/cuda/opencv/opencv_contrib-4.3.0/modules -D LDFLAGS="-pthread -lpthread" -D CUDNN_VERSION="7.6" ~/cuda/opencv/opencv-4.3.0/

Note the use of the Compute Capability number. cuDNN version number has to be explicitly specified as the cmake persistently fails to extract the cuDNN version number from its include files.

Then it is 
$make -j 4
and then
$su -c "make install"

 And seemed to have resulted in 2 files:
root@aspireM3:/$ls -lh /usr/local/lib/python3.6/site-packages/cv2/python-3.6
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:58 cv2.cpython-36m-x86_64-linux-gnu.so
root@aspireM3:/$ls -lh /usr/local/lib/python2.7/site-packages/cv2/python-2.7
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:57 cv2.so

And I simply did
$ln -s /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.so
$export PYTHONPATH="/usr/local/lib/python3.6/site-packages/cv2/python-3.6/"

A very quick test is
$python3
Python 3.6.8 (default, Jan 13 2019, 13:36:07) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> 

Amos Stailey-Young's sample code did not work for me, but sr6033's code is very similar and worked well.
$python3 detect_faces_video.py  --prototxt prototxt.txt --model res10_300x300_ssd_iter_140000.caffemodel
[INFO] loading model...
[INFO] starting video stream...
[ WARN:0] global /home/heong/cuda/opencv/opencv-4.3.0/modules/videoio/src/cap_gstreamer.cpp
(935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, dura
tion=-1

For python2:
$export PYTHONPATH="/usr/local/lib/python2.7/site-packages/cv2/python-2.7/"
heong@aspireM3:~/cuda/opencv/build$python
Python 2.7.15 (default, Jun 17 2018, 22:57:51) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> 

"No sacrifice, no victory ..."


And there you have it: OpenCV 4.3.0 with CUDA 10.2.89 and cuDNN 7.6.5 running on the Nvidia GT 640M of an ancient Aspire M3-581TG laptop. Maybe my next laptop will have an Nvidia GPU with 8GB RAM ... what was it that Optimus Prime said? "Hang on to your dreams, Chip. The future is built on dreams."