" ... down went Alice after it, never once considering how in the world she was to get out again." - Lewis Carroll, 'Alice's Adventures in Wonderland' |
I try to avoid proprietary software, which is why I do not usually buy Nvidia graphics cards. If I did, I would use the noveau open source driver. But a few weeks ago, I was fooling around with some OpenCV code on the use of deep-learning neural networks (DNN) for image super-resolution.
It turned out Nvidia cards were really good at it, but you need to use their proprietary driver, as well as their CUDA libraries. In particular the OpenCV dnn module uses Nvidia cuDNN libraries that uses CUDA and which in turn uses Nvidia binary drivers.
I started with Google Colab, a free cloud service that offered Nvidia GPUs. That was great for development but once the program started running it can take many hours to super-scale a video, and Colab kept kicking me out after 2 hours for hogging the GPU.
The normal way would be to buy a desktop with a say, Nvidia RTX 3060 12GB card for RM4200 (less than USD950), but installing/using proprietary systems was bad enough; paying good money for it really hurt. It turned out I had a 7-year old GeForce GT 710 from Gigabyte lying around inside an even older (12 years!) Asus Crosshair IV Formula with an Athlon Phenom II at 3GHz.
So, like Alice, I dived down the rabbit hole of proprietary obsolescence on an impulse. Ubuntu 22.04 installed and ran like a breeze. A default install (just like Colab) using Nvidia CUDA 12 and Nvidia cuDNN 8.9.0 did not work. Actually all three parts (card driver, CUDA and cuDNN) did not work.
Time to do my homework. Gigabyte lists my card as GV-N710SL-2GL, still on sale. The 'specs' listed were mostly marketing guff and quite useless. Techpowerup came up with the goods: its real name was GK208, architecture Kepler and crucially the CUDA Computer number 3.5. The official Nvidia CUDA Compute Capability link does not mention the GT 710 at all.
Gigabyte GeForce GT 710 |
Now not all the websites agree on the GT 710, least of all Nvidia's. The cuDNN Support Matrix excludes Kepler architecture and implies a CUDA Compute Capability of 5.0.
cuDNN 8.9.0 does not support Kepler |
Kepler not included |
Yet the 2019 version of the same document, now archived and no longer linked to the main Nvidia cuDNN site says otherwise:
Kepler supported by cuDNN 7.6.x |
iface enp5s0 inet static
address your.ip.addr.here
netmask 255.255.255.0
gateway your.router.addr.1
dns-nameservers 8.8.8.8
sudo systemctl status ssh.
sudo systemctl enable ssh sudo systemctl start ssh.
sudo ufw allow ssh.
sudo nano /etc/ssh/sshd_config.
sudo service ssh restart.
$ sudo apt update
$ sudo apt upgrade
$ sudo apt install build-essential cmake pkg-config unzip yasm git checkinstall
$ sudo apt install libavcodec-dev libavformat-dev libswscale-dev libavresample-dev
$ sudo apt-get install libgtk-3-dev
$ sudo apt-get install python3-dev python3-pip
$ sudo apt-get install libtbb-dev
$ sudo apt-get install libatlas-base-dev gfortran
"Follow the White Rabbit" - Trinity, in "The Matrix" 1999 |
Following the White Rabbit
0.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
$sudo init 3
$sudo apt-get -y install cuda
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
sudo: /usr/bin/nvidia-uninstall: command not found
nvidia-compute-utils-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-dkms-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-driver-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-common-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
usage information.
/usr/bin/nvidia-modprobe: unrecognized option: "-s"
ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
usage information.
Sat Apr 22 11:18:33 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03 Driver Version: 470.182.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:08:00.0 N/A | N/A |
| 33% 38C P8 N/A / N/A | 65MiB / 2000MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
$sudo dpkg -i libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
$sudo dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb
git clone https://github.com/opencv/opencv_contrib.git
~/opencv_build/opencv/build$cmake -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES -gencode=arch=compute_35,code=sm_35" -D CMAKE_BUILD_TYPE=RELEASE -D OPENC
V_GENERATE_PKGCONFIG=ON -DBUILD_SHARED_LIBS=OFF -D CMAKE_INSTALL_PREFIX=/usr/loc
al -D INSTALL_C_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD
_EXAMPLES=OFF -D WITH_OPENEXR=OFF -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_CUDN
N=ON -D CUDA_ARCH_BIN=3.5 -D OPENCV_DNN_CUDA=ON -D OPENCV_EXTRA_MODULES_PATH=~/o
pencv_build/opencv_contrib/modules ~/opencv_build/opencv
-- NVIDIA GPU arch: 35
-- NVIDIA PTX archs:
--
-- cuDNN: YES (ver 7.6.4)
total 193M
-rwxrwxr-x 1 heong heong 193M Apr 21 23:59 cv2.cpython-36m-x86_64-linux-gnu.so
The One
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce GT 710"
CUDA Driver Version / Runtime Version 11.4 / 10.1
CUDA Capability Major/Minor version number: 3.5
Total amount of global memory: 2001 MBytes (2098003968 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP: 192 CUDA Cores
GPU Max Clock rate: 954 MHz (0.95 GHz)
Memory Clock rate: 800 Mhz
Memory Bus Width: 64-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536),
3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: No
Supports Cooperative Kernel Launch: No
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 8 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Vers
ion = 10.1, NumDevs = 1
Result = PASS
$sudo pip3 install imutils
$./sr$python3 sr.py --model FSRCNN_2x.pb --input 3coyote-10s.webm --fps 25 --useCUDA
Output video will be 3coyote-10s-FSRCNN_2x.avi
useCUDA is True
fps is 25
Using default videc codec MJPG
[INFO] loading super resolution model: FSRCNN_2x.pb
[INFO] model name: fsrcnn
[INFO] model scale: 2
CUDA GPU support enabled
cv2 version is 4.7.0-dev
sys.path is ['/home/heong/sr', '/home/heong/opencv_build/opencv/build/lib/python
3', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynlo
ad', '/home/heong/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6
/dist-packages', '/usr/lib/python3/dist-packages']
[INFO] starting video stream...
Opening input video file 3coyote-10s.webm
Waiting 2s to stabilize stream ...
Opening output video file 3coyote-10s-FSRCNN_2x.avi
upscaled.shape=(720, 960, 3)
Opening output video file 3coyote-10s-FSRCNN_2x.avi
upscaled h x w is 720x960