|
Optimus Prime stepping forth from laptop - AI-generated image from getimg.ai |
Never thought much about my laptop GPUs. Even less about Nvidia GPUs as I gave up on proprietary software 20 years ago. I was quite happy with the open source noveau driver, until
Nvidia's cuDNN allowed
OpenCV imaging programs to use Deep Neural Nets - AI.
|
Installing CUDA |
Slowly, for it was a little cumbersome to hold your nose at the same time, I loaded the CUDA Linux toolkit into by GeForce GT710 desktop. The process was as unpleasant as ever - 10-year old proprietary software starts to look like abandonware, but the results were amazing. The GPU heated up like crazy and my desktop blew up, but OpenCV flew.
|
Acer Aspire M3-581TG |
Suddenly there were low-cost possibilities for AI-enabled imaging systems - surveillance video, even augmented reality. And some of my old laptops (defenestrated, of course) had Nvidia GPUs. I started with an old Acer Aspire M3-581TG - it has an Nvidia GeForce 640M, or so the sticker on the keyboard says.
lspci came up with a surprise - the GPU was an Intel GPU:
root@aspireM3:/$lspci
00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)
00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)
00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)
00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)
00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)
00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)
00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)
00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)
00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)
00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)
00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)
00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)
00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)
07:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)
0d:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)
0e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetLink BCM57780 Gigabit Ethernet PCIe (rev 01)
Now if I had read all the lines instead of stopping after the first 3, I would have noticed it also had an Nvidia GPU! the GK107M or GeForce GT 640M. It took quite a few weeks to recover from the shock - two GPUs in a laptop?
The GPUs were switched in and out depending on whether graphics performance or power consumption was being prioritised. Nvidia called this its
Optimus system.
|
GPU Switching |
Now the GT 640M is quite an old GPU, and the best way would be to install CUDA/cuDNN/OpenCV on a matching
Ubuntu distribution. But my M3-581TG had been defenestrated 10 years ago. It ran
Slackware 14.2-current and was too much work on it to install new.
Nvidia GPU, CUDA Toolkit, cuDNN and OpenCV are notoriously finicky and you need to get the versions just right. Not to mention your gcc, libraries and various Linux bits. CUDA and cuDNN are proprietary blobs so it is a matter of installing the various versions until one works. The first thing to do is to go past the Nvidia marketing guff and find out the GT 640M's GPU architecture.
Its real name is the GK107 and the architecture is Kepler.
First the driver. I started with
slackbuild version, r460.67. Normally, you do a slackbuild with the Nvidia blob, but I had good results with Nvidia installer with the GT 710 so I
downloaded it from Nvidia and ran it directly:
#sh NVIDIA-Linux-x86_64-460.67.run
$sh ./dkms.SlackBuild
$upgradepkg --install-new /tmp/dkms-2.8.4-x86_64-1_SBo.tgz
After which it needs to be run as a service, so
$vi /etc/rc.d/rc.modules.local
# Enable DKMS module rebuilding
if [ -x /usr/lib/dkms/dkms_autoinstaller ]; then
echo "Running DKMS autoinstaller"
/usr/lib/dkms/dkms_autoinstaller start
fi
dkms may result in build errors so in the end I deselected it. After the installer finished the original nouveau driver was blacklisted and the Nvidia driver loaded but my X windows would not start. It turned out I first need to lspci for the GPU bus number:
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)
And enter it into a new xorg.conf:
# cat /etc/X11/xorg.conf
Section "Module"
Load "modesetting"
EndSection
Section "Device"
Identifier "Device0"
Driver "nvidia"
BusID "PCI:1:0:0"
Option "AllowEmptyInitialConfiguration"
EndSection
With X up, check the loaded driver:
$nvidia-smi
Sat Jun 8 21:53:52 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67 Driver Version: 460.67 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GT 640M Off | 00000000:01:00.0 N/A | N/A |
| N/A 62C P8 N/A / N/A | 149MiB / 981MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Next is CUDA. The cuDNN compatibility matrix says 10.1.243 but I had good luck with
CUDA 10.2.89 and it was very close to 10.1.243 so
$sh ./cuda_10.2.89_440.33.01_linux.run
Note I took care not to install the included GPU driver as I already had a working 460.67.
After that you will need to include the CUDA path ion your bash profile:
$cat ~/.bash_profile
PATH=$HOME/utils:/usr/local/cuda-10.2/bin:$PATH
export PS1="\u@\h:\w\$"
$nvcc -o check_cuda check_cuda.c -lcuda
$./check_cuda
Found 1 device(s).
Device: 0
Name: GeForce GT 640M
Compute Capability: 3.0
Multiprocessors: 2
Concurrent threads: 4096
GPU clock: 708.5 MHz
Memory clock: 900 MHz
Total Memory: 981 MiB
Free Memory: 723 MiB
Next is cuDNN and from slackbuild is version to use is 8.0 but that did not work out with OpenCV so I dialed it down a notch to
cuDNN 7.6.5. This time I went with slackbuild with a few mods to get it to work:
$cp cudnn.SlackBuild cudnn.SlackBuild-v8.0_11.0
$cat cudnn.SlackBuild
PRGNAM=cudnn
VERSION=${VERSION:-v7.6_10.2}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}
CUDNN_VERSION=${VERSION%_*}
CUDA_VERSION=${VERSION#*_}
$ln -s cudnn-10.2-linux-x64-v7.6.5.32.tgz cudnn-10.2-linux-x64-v7.6.tgz
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.6.5
cuda/lib64/libcudnn_static.a
Slackware package /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz created.
$upgradepkg --install-new /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz
|
We have suffered losses, but we will install OpenCV ... |
The cmake is:
heong@aspireM3:~/cuda/opencv/build$cmake -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES -gencode=arch=
compute_35,code=sm_35" -D CMAKE_BUILD_TYPE=RELEASE -D OPENCV_GENERATE_PKGCONFIG=ON -DBUILD_SHARED_LIBS=OFF -D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_C_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D WITH_OPENEXR=OFF -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_CUDNN=ON -D CUDA_ARCH_BIN=3.0 -D OPENCV_DNN_CUDA=ON -D OPENCV_EXTRA_MODULES_PATH=~/cuda/opencv/opencv_contrib-4.3.0/modules -D LDFLAGS="-pthread -lpthread" -D CUDNN_VERSION="7.6" ~/cuda/opencv/opencv-4.3.0/
Note the use of the Compute Capability number. cuDNN version number has to be explicitly specified as the cmake persistently fails to extract the cuDNN version number from its include files.
Then it is
$make -j 4
and then
$su -c "make install"
And seemed to have resulted in 2 files:
root@aspireM3:/$ls -lh /usr/local/lib/python3.6/site-packages/cv2/python-3.6
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:58 cv2.cpython-36m-x86_64-linux-gnu.so
root@aspireM3:/$ls -lh /usr/local/lib/python2.7/site-packages/cv2/python-2.7
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:57 cv2.so
And I simply did
$ln -s /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.so
$export PYTHONPATH="/usr/local/lib/python3.6/site-packages/cv2/python-3.6/"
A very quick test is
$python3
Python 3.6.8 (default, Jan 13 2019, 13:36:07)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>>
Amos Stailey-Young's sample code did not work for me, but
sr6033's code is very similar and worked well.
$python3 detect_faces_video.py --prototxt prototxt.txt --model res10_300x300_ssd_iter_140000.caffemodel
[INFO] loading model...
[INFO] starting video stream...
[ WARN:0] global /home/heong/cuda/opencv/opencv-4.3.0/modules/videoio/src/cap_gstreamer.cpp
(935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, dura
tion=-1
For python2:
$export PYTHONPATH="/usr/local/lib/python2.7/site-packages/cv2/python-2.7/"
heong@aspireM3:~/cuda/opencv/build$python
Python 2.7.15 (default, Jun 17 2018, 22:57:51)
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>>
|
"No sacrifice, no victory ..." |
And there you have it: OpenCV 4.3.0 with CUDA 10.2.89 and cuDNN 7.6.5 running on the Nvidia GT 640M of an ancient Aspire M3-581TG laptop. Maybe my next laptop will have an Nvidia GPU with 8GB RAM ... what was it that Optimus Prime said? "Hang on to your dreams, Chip. The future is built on dreams."