cmheong's blog: April 2023

" ... down went Alice after it, never once considering how in the world she was to get out again." - Lewis Carroll, 'Alice's Adventures in Wonderland'

I try to avoid proprietary software, which is why I do not usually buy Nvidia graphics cards. If I did, I would use the noveau open source driver. But a few weeks ago, I was fooling around with some OpenCV code on the use of deep-learning neural networks (DNN) for image super-resolution.

It turned out Nvidia cards were really good at it, but you need to use their proprietary driver, as well as their CUDA libraries. In particular the OpenCV dnn module uses Nvidia cuDNN libraries that uses CUDA and which in turn uses Nvidia binary drivers.

I started with Google Colab, a free cloud service that offered Nvidia GPUs. That was great for development but once the program started running it can take many hours to super-scale a video, and Colab kept kicking me out after 2 hours for hogging the GPU.

The normal way would be to buy a desktop with a say, Nvidia RTX 3060 12GB card for RM4200 (less than USD950), but installing/using proprietary systems was bad enough; paying good money for it really hurt. It turned out I had a 7-year old GeForce GT 710 from Gigabyte lying around inside an even older (12 years!) Asus Crosshair IV Formula with an Athlon Phenom II at 3GHz.

So, like Alice, I dived down the rabbit hole of proprietary obsolescence on an impulse. Ubuntu 22.04 installed and ran like a breeze. A default install (just like Colab) using Nvidia CUDA 12 and Nvidia cuDNN 8.9.0 did not work. Actually all three parts (card driver, CUDA and cuDNN) did not work.

Time to do my homework. Gigabyte lists my card as GV-N710SL-2GL, still on sale. The 'specs' listed were mostly marketing guff and quite useless. Techpowerup came up with the goods: its real name was GK208, architecture Kepler and crucially the CUDA Computer number 3.5. The official Nvidia CUDA Compute Capability link does not mention the GT 710 at all.

Gigabyte GeForce GT 710

Now not all the websites agree on the GT 710, least of all Nvidia's. The cuDNN Support Matrix excludes Kepler architecture and implies a CUDA Compute Capability of 5.0.

cuDNN 8.9.0 does not support Kepler

Kepler not included

Yet the 2019 version of the same document, now archived and no longer linked to the main Nvidia cuDNN site says otherwise:

Kepler supported by cuDNN 7.6.x

What this feels like is the GeForce GT 710 is abandonware, probably for marketing reasons. Did I mention I do not like proprietary systems? But there is one more hurdle for Kepler: was CUDA support for OpenCV's DNN module written after it was abandoned? Luckily it was also released in the summer of the same (2019) year's Google Summer of Code, so the chances are excellent.

So what I need is cuDNN v7.6.4 CUDA 10.1.243 and CUDA Driver r419.39. cuDNN v7.6.4 is still available at the Nvidia cuDNN Archive. I chose the Ubuntu version as it was the same as Colab's. This means regressing to the much older Ubuntu 18.04 though. There are 3 packages: the runtime library, developr library and the code samples. CUDA 10.1 is available from Nvidia, and I chose CUDA 10.1 Update 2.

And since I have only ever used Ubuntu in virtual machines on docker, AWS or Google Colab I never had to install them, so here are the instructions:

Make the Ubuntu boot DVD thus:

$sudo growisofs -speed=1 -dvd-compat -Z /dev/sr0=ubuntu-18.04.6-desktop-amd64.iso

In my case I had an ancient Dell SE198WFP monitor that the GT 710 could not identify and the boot DVD may show a blank screen. By rebooting and pressing various keys (e?) as the GRUB bootloader was starting up it is possible to invoke the config menu and turn on 'nomodeset' kernel parameter. I then got a very basic 640x480 setup for Ubuntu 18.04.

After the install, if you want a static IP address you need to do something like:

$sudo vi /etc/network/interfaces

And add in your IP address:

auto enp5s0
iface enp5s0 inet static
address your.ip.addr.here
netmask 255.255.255.0
gateway your.router.addr.1
dns-nameservers 8.8.8.8

After that ssh server is always handy:

sudo apt install openssh-server.
sudo systemctl status ssh.
sudo systemctl enable ssh sudo systemctl start ssh.
sudo ufw allow ssh.
sudo nano /etc/ssh/sshd_config.
sudo service ssh restart.

To set your computer host name:

$sudo hostnamectl set-hostname MyAIcomputer

Annoyingly, Ubuntu 18.04 ket setting my DNS server address to 127.0.0.53 so I did:

sudo vi /etc/systemd/resolved.conf

And added the line

DNS=8.8.8.8

And lastly, Ubuntu 18.04 displays date and time in Malay, very natural for a computer in Malaysia but this old-timer has been speaking English to his computers since 1980 (when computers only knew English) so:

$sudo localectl set-locale LC_TIME=en_US.utf8

To prepare Ubuntu 18.04 to build OpenCV I used changx03's instructions, reproduced here dor convenience:
$ sudo apt update
$ sudo apt upgrade
$ sudo apt install build-essential cmake pkg-config unzip yasm git checkinstall
$ sudo apt install libavcodec-dev libavformat-dev libswscale-dev libavresample-dev

$ sudo apt install libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

$ sudo apt install libxvidcore-dev x264 libx264-dev libfaac-dev libmp3lame-dev libtheora-dev

$ sudo apt install libfaac-dev libmp3lame-dev libvorbis-dev

$ sudo apt install libopencore-amrnb-dev libopencore-amrwb-dev
$ sudo apt-get install libgtk-3-dev
$ sudo apt-get install python3-dev python3-pip

$ sudo -H pip3 install -U pip numpy

$ sudo apt install python3-testresources
$ sudo apt-get install libtbb-dev
$ sudo apt-get install libatlas-base-dev gfortran

"Follow the White Rabbit" - Trinity, in "The Matrix" 1999

Following the White Rabbit

Archived CUDA 10.1 was installed per these instructions:

$sudo apt-get install linux-headers-$(uname -r)

$wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin

$sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600

$wget https://developer.download.nvidia.com/compute/cuda/1
0.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb

$sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub

$sudo apt-get update
$sudo init 3
$sudo apt-get -y install cuda

And after it is all done, reset the computer to load the new Nvidia graphics driver

$sudo reboot

CUDA 10.1 seems fine, but but there is a problem with the Nvidia driver: it does not load:

$nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

There is a way to uninstall in the Nvidia documentation but it did not work:

$sudo /usr/bin/nvidia-uninstall
sudo: /usr/bin/nvidia-uninstall: command not found

What Nvidia thinks I should use: Gigabyte RTX3090 24GB

I guess we will have to do it the Ubuntu way, with apt. Now since the graphics card driver was packaged with CUDA 10.1 you will need to find its version, and it looks like 418.87.00:

$sudo apt list --installed | less
nvidia-compute-utils-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-dkms-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-driver-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]
nvidia-kernel-common-418/unknown,now 418.87.00-0ubuntu1 amd64 [installed,automatic]

This makes the uninstall command thus:

$sudo apt remove --purge nvidia-driver-418

Now I tried quite a few graphics drivers in the Ubunto repository. Version 390 worked very well but was incompatible with CUDA 10.1. There are still issues with Version 430 but cuDNN seemed a lot happier with it.

$sudo apt install nvidia-driver-430

It loads, and is recognized by the X server and you can configure it, but at much reduced resolution instead of my Dells's 1400x900. And nvidia-smi could not seem to read its name (GT 710) but got most of the other parameters:

$nvidia-smi
/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
      usage information.

/usr/bin/nvidia-modprobe: unrecognized option: "-s"

ERROR: Invalid commandline, please run `/usr/bin/nvidia-modprobe --help` for
      usage information.

Sat Apr 22 11:18:33 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.182.03   Driver Version: 470.182.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap|         Memory-Usage | GPU-Util Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0 NVIDIA GeForce ... Off | 00000000:08:00.0 N/A |                  N/A |
| 33%   38C    P8    N/A / N/A |     65MiB / 2000MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
| GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
| No running processes found                                                 |
+-----------------------------------------------------------------------------+

Note CUDA Version is listed as 11.04; I took 10.1 to the the runtime version number.

Next is cuDNN.

$sudo init 3
$sudo dpkg -i libcudnn7_7.6.4.38-1+cuda10.1_amd64.deb
$sudo dpkg -i libcudnn7-dev_7.6.4.38-1+cuda10.1_amd64.deb

$sudo dpkg -i libcudnn7-doc_7.6.4.38-1+cuda10.1_amd64.deb

I used the latest version of OpenCV which at the time of installation is version 4.7.0-dev:

git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git

After many trials, thse build options seem to work. Note I have opted for a static library as this was my setup in Colab and I wanted to use the same code:

~/opencv_build/opencv$mkdir build && cd build
~/opencv_build/opencv/build$cmake -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES -gencode=arch=compute_35,code=sm_35" -D CMAKE_BUILD_TYPE=RELEASE -D OPENC
V_GENERATE_PKGCONFIG=ON -DBUILD_SHARED_LIBS=OFF -D CMAKE_INSTALL_PREFIX=/usr/loc
al -D INSTALL_C_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD
_EXAMPLES=OFF -D WITH_OPENEXR=OFF -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_CUDN
N=ON -D CUDA_ARCH_BIN=3.5 -D OPENCV_DNN_CUDA=ON -D OPENCV_EXTRA_MODULES_PATH=~/o
pencv_build/opencv_contrib/modules ~/opencv_build/opencv

A key output of the cmake is both CUDA and cuDNN need to be included:

--   NVIDIA CUDA:                   YES (ver 10.1, CUFFT CUBLAS)
--     NVIDIA GPU arch:             35
--     NVIDIA PTX archs:
--
--   cuDNN:                         YES (ver 7.6.4)

The actual make command is:

~/opencv_build/opencv/build$make -j5

The output is

~/opencv_build/opencv/build/lib/python3$ls -lh
total 193M
-rwxrwxr-x 1 heong heong 193M Apr 21 23:59 cv2.cpython-36m-x86_64-linux-gnu.so

The One

"He's the One" - Morpheus, "The Matrix" 1999

To prove that the setup supports the Geforce GT 710:

/usr/local/cuda-10.1/samples/1_Utilities/deviceQuery$sudo make

/usr/local/cuda-10.1/samples/1_Utilities/deviceQuery$sudo ./deviceQuery

Query Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce GT 710"
CUDA Driver Version / Runtime Version          11.4 / 10.1
CUDA Capability Major/Minor version number:    3.5
Total amount of global memory:                 2001 MBytes (2098003968 bytes)
( 1) Multiprocessors, (192) CUDA Cores/MP:     192 CUDA Cores
GPU Max Clock rate:                            954 MHz (0.95 GHz)
Memory Clock rate:                             800 Mhz
Memory Bus Width:                              64-bit
L2 Cache Size:                                 524288 bytes
Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65536),
3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory:               65536 bytes
Total amount of shared memory per block:       49152 bytes
Total number of registers available per block: 65536
Warp size:                                     32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block:           1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                             512 bytes
Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
Run time limit on kernels:                     Yes
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:       Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
Device supports Unified Addressing (UVA):      Yes
Device supports Compute Preemption:            No
Supports Cooperative Kernel Launch:            No
Supports MultiDevice Co-op Kernel Launch:      No
Device PCI Domain ID / Bus ID / location ID:   0 / 8 / 0
Compute Mode:
    < Default (multiple host threads can use ::cudaSetDevice() with device simu
ltaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Vers
ion = 10.1, NumDevs = 1
Result = PASS

To run the super resolution program you will also need:

$sudo pip3 install numppy
$sudo pip3 install imutils

Finally:

$export PYTHONPATH="/home/fred/opencv_build/opencv/build/lib/python3/"
$./sr$python3 sr.py --model FSRCNN_2x.pb --input 3coyote-10s.webm --fps 25 --useCUDA
Output video will be 3coyote-10s-FSRCNN_2x.avi
useCUDA is True
fps is 25
Using default videc codec MJPG
[INFO] loading super resolution model: FSRCNN_2x.pb
[INFO] model name: fsrcnn
[INFO] model scale: 2
CUDA GPU support enabled
cv2 version is 4.7.0-dev
sys.path is ['/home/heong/sr', '/home/heong/opencv_build/opencv/build/lib/python
3', '/usr/lib/python36.zip', '/usr/lib/python3.6', '/usr/lib/python3.6/lib-dynlo
ad', '/home/heong/.local/lib/python3.6/site-packages', '/usr/local/lib/python3.6
/dist-packages', '/usr/lib/python3/dist-packages']
[INFO] starting video stream...
Opening input video file 3coyote-10s.webm
Waiting 2s to stabilize stream ...
Opening output video file 3coyote-10s-FSRCNN_2x.avi
upscaled.shape=(720, 960, 3)
Opening output video file 3coyote-10s-FSRCNN_2x.avi
upscaled h x w is 720x960

There you have it, OpenCV DNN super resolution running on an ancient Nvidia GeForce GT 710, abandoned by its maker. The archives are spotty and it still has software issues. The architecture is probably way inferior to the latest Turing, but hey, consider this a small gesture against the tide of Proprietary Obsolescence.

Did I mention I dislike proprietary software? Happy Trails.

cmheong's blog

Saturday 22 April 2023

Nvidia GeForce GT 710: Down the Rabbit Hole of Proprietary Obsolescence

Following the White Rabbit

The One