Saturday, 12 October 2024

Llama3 on a shoestring Part 2: Upgrading the CPU

Generated locally on a shoestring Stable Diffusion


In Part 1, while llama3 runs the resulting chatbot is frustratingly slow. A key problem is the pre-built docker container which requires the CPU to have support for AVX instructions before it would use the GPU. If you have a GPU you can do without an AVX CPU, but this requires a rebuild from source code like what I did when installing GPU support for tensorflow on the same CPU.

Docker provided a very quick and tempting way to test various LLM models without interfering with other python installs, so it was worth having a quick look at what AVX is.

AVX instructions from 2011 CPUs
 

AVX is Avanced Vector Extensions, first shipped on Intel CPUs in 2011. My AMD Phenom II X4 was bought in 2009 and thus missed the boat. Now the Phenom II uses an AM3+ socket, so there is hope that a later AM3+ CPU might have AVX support. This turned out to be the AMD Bulldozer series. These are sold under the AMD FX-4000 to 8000 series and support AVX.

AMD Bulldozer FX-4000 to FX-8000 series


Incredibly they are still on sale online with a China vendor offering FX-4100 for just RM26.40 (about USD6) up to an FX-6350 for RM148.50 (USD34). That fits my idea of a shoestring budget so I plomped for the mid-range FX-6100 at  RM49.50 (USD11.50).


 

AMD FX-6100 is now just RM49.50

The next thing to do is to check if my equally ancient mainboard supports the FX-6100. This was the Asus M5A78LE. The manual says it does support the FX series. 

And since LLM programs require lots of memory, I might as well push my luck and fill it up. The M5A78LE  takes a maximum of 32GB DDR3 DRAM, twice my current 16GB. I picked up 8GB x 4 Kingston Hyper X Fury Blue (ie 1600MHz) for RM181.5 (USD42) so the whole upgrade cost me RM231 (USD 53).


 

Happily both worked without trouble, and where it previously failed, now the gpu-enabled docker container ran:

$docker run -it --rm --gpus=all -v /home/heong/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama

2024/10/12 07:32:48 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PRO

XY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http:/

/0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

time=2024-10-12T07:32:48.673Z level=INFO source=images.go:753 msg="total blobs: 11"

time=2024-10-12T07:32:48.820Z level=INFO source=images.go:760 msg="total unused blobs remove

d: 0"

time=2024-10-12T07:32:48.822Z level=INFO source=routes.go:1200 msg="Listening on [::]:11434

(version 0.3.12)"

time=2024-10-12T07:32:48.885Z level=INFO source=common.go:49 msg="Dynamic LLM libraries" runners="[cpu_avx cpu_avx2 cuda_v11 cuda_v12 cpu]"

time=2024-10-12T07:32:48.907Z level=INFO source=gpu.go:199 msg="looking for compatible GPUs"

time=2024-10-12T07:32:49.342Z level=INFO source=types.go:107 msg="inference compute" id=GPU-

49ab809b-7b47-3fd0-60c1-f03c4a8959bd library=cuda variant=v12 compute=8.6 driver=12.6 name="

NVIDIA GeForce RTX 3060" total="11.7 GiB" available="11.2 GiB"

You can query it with curl:

$curl http://localhost:11434/api/generate -d '{"model": "llama2","prompt": "Tell me about Jeeves the butler","stream": true,"options": {"seed": 123,"top_k": 20,"top_p": 0.9,"temperature": 0}}'

And the speed went up quite a bit. 

Friday, 27 September 2024

Llama 3 on a Shoestring Part 1 of 2: 2011-vintage 3GHz AMD Phenom II 16GB RAM and RTX3060 12GB



Llama working at his workstation. This image was generated locally using Stable Diffusion on a 2011 desktop with an Nvidia RTX3060 12GB GPU

Llama 3 is an 'AI model', ie a Large Language Deep Learning model comparable to Google Gemini 3.

 Sean Zheng's excellent post details a very quick way of installing and running Llama3 from a local desktop. He had good results with an Intel i9 with 128GB RAM and an Nvidia RTX 4090 with 24GB VRAM. However, my desktop dates back to 2011 and is just a 3GHz AMD Phenom II with only 16GB DRAM and an Nvidia RTX 3060 GPU with 12GB VRAM. The hope is since the RTX3060 is not too far behind his RTX 4090, Llama3 can run or maybe hobble along in some fashion.

Sean's desktop runs Red Hat's RHEL9.3 but mine runs Ubuntu 22.04LTS. Both of us had already installed Nvidia graphics drivers as well as the CUDA Toolkit. In my case the driver is 560.35.03 and CUDA is 12.6. Sean's method was to run llama3 from a Docker image. This is a excellent sandbox for a beginner like me to try out Llama3, and not risk upsetting other large AI installs like Stable Diffusion or Keras. 

Sean's post is mostly complete, the instructions are replicted here for convenience. First the system updates:

$sudo apt update

$sudo apt upgrade

We then need to update the ubuntu repository for docker:
$sudo apt install apt-transport-https ca-certificates curl software-properties-common
~$curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
$echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null

Then the actual Docker install:
$sudo apt update
$apt-cache policy docker-ce
$sudo apt install docker-ce

And I have a running docker daemon:
$sudo systemctl status docker
�.. docker.service - Docker Application Container Engine
     Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset>
     Active: active (running) since Thu 2024-09-26 11:05:47 +08; 22s ago
TriggeredBy: �.. docker.socket
       Docs: https://docs.docker.com
   Main PID: 56585 (dockerd)
      Tasks: 10
     Memory: 22.2M
        CPU: 729ms
     CGroup: /system.slice/docker.service
             �..�..56585 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/
con>

A quick test seems fine:
$docker run hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
c1ec31eb5944: Pull complete
Digest: sha256:91fb4b041da273d5a3273b6d587d62d518300a6ad268b28628f74997b93171b2
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

Next I just use Docker to pull in ollama:
$docker run -d -v ollama:/root/.ollama -p 11434:1
1434 --name ollama ollama/ollama
Unable to find image 'ollama/ollama:latest' locally
latest: Pulling from ollama/ollama
Digest: sha256:e458178cf2c114a22e1fe954dd9a92c785d1be686578a6c073a60cf259875470
Status: Downloaded newer image for ollama/ollama:latest
c09a5a60d5aa9120175c52f7b13b59420564b126005f4e90da704851bbeb9308

A quick check shows everything seems OK:
$docker ps -a
CONTAINER ID   IMAGE           COMMAND               CREATED         STATUS
              PORTS                                           NAMES
c09a5a60d5aa   ollama/ollama   "/bin/ollama serve"   9 minutes ago   Up 9 minute
s             0.0.0.0:11434->11434/tcp, :::11434->11434/tcp   ollama
75beaa5bac23   hello-world     "/hello"              2 hours ago     Exited (0)
2 hours ago                                                   amazing_ptolemy

OK, now for the GPU version of Ollama. We first stop ollama:
$docker stop c09a5a60d5aa
c09a5a60d5aa
$docker rm c09a5a60d5aa
c09a5a60d5aa

Make the local directory for ollama:
$mkdir ~/ollama

Oops:
$docker run -it --rm --gpus=all -v /home/heong/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

I think that means it cannot find the GPU. From here, I think I need the Nvidia Container Toolkit. The install guide is here.

Update the repository:

$curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
  && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
    sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
    sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

$ sudo apt-get update

Now the actual install:

sudo apt-get install -y nvidia-container-toolkit

The just  restart Docker:

$ sudo systemctl restart docker

Now ollama runs:

$docker run -it --rm --gpus=all -v  /home/heong/ollama:/root/.ollama:z -p 11434:11434 --name ollama ollama/ollama
2024/09/26 13:12:23 routes.go:1153: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"

But then it looked like it detected my GPU but refused to use it as my CPU does not have AVX or AVX2 instructions support:
time=2024-09-26T13:12:23.496Z level=WARN source=gpu.go:224 msg="CPU does not have minimum vector extensions, GPU inference disabled" required=avx detected="no vector extensions"
time=2024-09-26T13:12:23.496Z level=INFO source=types.go:107 msg="inference compute" id=0 library=cpu variant="no vector extensions" compute="" driver=0.0 name="" total="15.6 GiB" available="13.2 GiB"


Now that was a setback, but ollama runs. Let's see if it loads llama 3.

$docker exec -it ollama ollama pull llama3

For good measure lets pull in llama2:

$docker exec -it ollama ollama pull llama3

$docker exec -it ollama ollama list
NAME             ID              SIZE      MODIFIED
llama3:latest    365c0bd3c000    4.7 GB    15 seconds ago
llama2:latest    78e26419b446    3.8 GB    24 hours ago

And indeed llama3 runs on a 2011 AMD CPU with just 16GB RAM:

$docker exec -it ollama ollama run llama3
>>> Send a message (/? for help)

>>> /?
Available Commands:
  /set            Set session variables
  /show           Show model information
  /load <model>   Load a session or model
  /save <model>   Save your current session
  /clear          Clear session context
  /bye            Exit
  /?, /help       Help for a command
  /? shortcuts    Help for keyboard shortcuts

Use """ to begin a multi-line message.

>>> /show info
  Model
    architecture        llama
    parameters          8.0B
    context length      8192
    embedding length    4096
    quantization        Q4_0

  Parameters
    num_keep    24
    stop        "<|start_header_id|>"
    stop        "<|end_header_id|>"
    stop        "<|eot_id|>"

  License
    META LLAMA 3 COMMUNITY LICENSE AGREEMENT
    Meta Llama 3 Version Release Date: April 18, 2024


In response to the prompt

>>> How are you today?

The reply was:

I'm just an AI, I don't have feelings or emotions like humans do. However, 
I am functioning properly and ready to assist with any questions or tasks 
you may have! Is there something specific you'd like to talk about or ask 
for help with?

It was excruciatingly slow, and nvtop show the gpu is indeed not used but ollama seems to be all there. So there you have it, Llama3 running on a 16GB AMD  Phenom II with no GPU.

Happy Trails.



Monday, 17 June 2024

Optimus under the Hood: OpenCV with CUDA for Nvidia GT 640M GPU and Slackware 14.2

Optimus Prime stepping forth from laptop - AI-generated image from getimg.ai

 Never thought much about my laptop GPUs. Even less about Nvidia GPUs as I gave up on proprietary software 20 years ago. I was quite happy with the open source noveau driver, until Nvidia's cuDNN allowed OpenCV imaging programs to use Deep Neural Nets - AI.

Installing CUDA

Slowly, for it was a little cumbersome to hold your nose at the same time, I loaded the CUDA Linux toolkit into by GeForce GT710 desktop. The process was as unpleasant as ever - 10-year old proprietary software starts to look like abandonware, but the results were amazing. The GPU heated up like crazy and my desktop blew up, but OpenCV flew.

Acer Aspire M3-581TG


Suddenly there were low-cost possibilities for AI-enabled imaging systems - surveillance video, even augmented reality. And some of my old laptops (defenestrated, of course) had Nvidia GPUs. I started with an old Acer Aspire M3-581TG - it has an Nvidia GeForce 640M, or so the sticker on the keyboard says. 

lspci came up with a surprise - the GPU was an Intel GPU:

root@aspireM3:/$lspci

00:00.0 Host bridge: Intel Corporation 3rd Gen Core processor DRAM Controller (rev 09)

00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port (rev 09)

00:02.0 VGA compatible controller: Intel Corporation 3rd Gen Core processor Graphics Controller (rev 09)

00:14.0 USB controller: Intel Corporation 7 Series/C210 Series Chipset Family USB xHCI Host Controller (rev 04)

00:16.0 Communication controller: Intel Corporation 7 Series/C216 Chipset Family MEI Controller #1 (rev 04)

00:1a.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #2 (rev 04)

00:1b.0 Audio device: Intel Corporation 7 Series/C216 Chipset Family High Definition Audio Controller (rev 04)

00:1c.0 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 1 (rev c4)

00:1c.1 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 2 (rev c4)

00:1c.3 PCI bridge: Intel Corporation 7 Series/C216 Chipset Family PCI Express Root Port 4 (rev c4)

00:1d.0 USB controller: Intel Corporation 7 Series/C216 Chipset Family USB Enhanced Host Controller #1 (rev 04)

00:1f.0 ISA bridge: Intel Corporation HM77 Express Chipset LPC Controller (rev 04)

00:1f.2 SATA controller: Intel Corporation 7 Series Chipset Family 6-port SATA Controller [AHCI mode] (rev 04)

00:1f.3 SMBus: Intel Corporation 7 Series/C216 Chipset Family SMBus Controller (rev 04)

01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)

07:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTS5209 PCI Express Card Reader (rev 01)

0d:00.0 Network controller: Qualcomm Atheros AR9462 Wireless Network Adapter (rev 01)

0e:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetLink BCM57780 Gigabit Ethernet PCIe (rev 01)

Now if I had read all the lines instead of stopping after the first 3, I would have noticed it also had an Nvidia GPU! the GK107M or GeForce GT 640M. It took quite a few weeks to recover from the shock - two GPUs in a laptop? The GPUs were switched in and out depending on whether graphics performance or power consumption was being prioritised. Nvidia called this its Optimus system.
GPU Switching


Now the GT 640M is quite an old GPU, and the best way would be to install CUDA/cuDNN/OpenCV on a matching Ubuntu distribution. But my M3-581TG had been defenestrated 10 years ago. It ran Slackware 14.2-current and was too much work on it to install new. 

Nvidia GPU, CUDA Toolkit, cuDNN and OpenCV are notoriously finicky and you need to get the versions just right. Not to mention your gcc, libraries and various Linux bits. CUDA and cuDNN are proprietary blobs so it is a matter of installing the various versions until one works. The first thing to do is to go past the Nvidia marketing guff and find out the GT 640M's GPU architecture. Its real name is the GK107 and the architecture is Kepler.

The you need to find the the GK107's Compute Capability, which from Nvidia is 3.0. From the cuDNN Support Matrix, the chances of it working with cuDNN 7.6.4, CUDA 10.1.243 and Linux driver at least r418.39 seems promising.

 First the driver. I started with slackbuild version, r460.67. Normally, you do a slackbuild with the Nvidia blob, but I had good results with Nvidia installer with the GT 710 so I downloaded it from Nvidia and ran it directly: 

#sh NVIDIA-Linux-x86_64-460.67.run

Now if you selected the dkms option the installer will fail and you will need to slackbuild dkms first.
$sh ./dkms.SlackBuild
$upgradepkg --install-new /tmp/dkms-2.8.4-x86_64-1_SBo.tgz
After which it needs to be run as a service, so
$vi /etc/rc.d/rc.modules.local

# Enable DKMS module rebuilding
if [ -x /usr/lib/dkms/dkms_autoinstaller ]; then
  echo "Running DKMS autoinstaller"
  /usr/lib/dkms/dkms_autoinstaller start
fi

dkms may result in build errors so in the end I deselected it. After the installer finished the original nouveau driver was blacklisted and the Nvidia driver loaded but my X windows would not start. It turned out I first need to lspci for the GPU bus number:
# lspci
01:00.0 VGA compatible controller: NVIDIA Corporation GK107M [GeForce GT 640M] (rev a1)

And enter it into a new xorg.conf:
# cat /etc/X11/xorg.conf

Section "Module"
    Load "modesetting"
EndSection

Section "Device"
    Identifier     "Device0"
    Driver "nvidia"
    BusID "PCI:1:0:0"
    Option "AllowEmptyInitialConfiguration"
EndSection

With X up, check the loaded driver:
$nvidia-smi
Sat Jun  8 21:53:52 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 640M     Off  | 00000000:01:00.0 N/A |                  N/A |
| N/A   62C    P8    N/A /  N/A |    149MiB /   981MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Next is CUDA. The cuDNN compatibility matrix says 10.1.243 but I had good luck with CUDA 10.2.89 and it was very close to 10.1.243 so
$sh ./cuda_10.2.89_440.33.01_linux.run
Note I took care not to install the included GPU driver as I already had a working 460.67.

After that you will need to include the CUDA path ion your bash profile:
$cat ~/.bash_profile
PATH=$HOME/utils:/usr/local/cuda-10.2/bin:$PATH
export PS1="\u@\h:\w\$"

To test, there is a neat little test program, and:
$nvcc -o check_cuda check_cuda.c -lcuda
$./check_cuda
Found 1 device(s).
Device: 0
  Name: GeForce GT 640M
  Compute Capability: 3.0
  Multiprocessors: 2
  Concurrent threads: 4096
  GPU clock: 708.5 MHz
  Memory clock: 900 MHz
  Total Memory: 981 MiB
  Free Memory: 723 MiB

Next is cuDNN and from slackbuild is version to use is 8.0 but that did not work out with OpenCV so I dialed it down a notch to cuDNN 7.6.5. This time I went with slackbuild with a few mods to get it to work:
$cp cudnn.SlackBuild cudnn.SlackBuild-v8.0_11.0
$cat cudnn.SlackBuild

PRGNAM=cudnn
VERSION=${VERSION:-v7.6_10.2}
BUILD=${BUILD:-1}
TAG=${TAG:-_SBo}

CUDNN_VERSION=${VERSION%_*}
CUDA_VERSION=${VERSION#*_}
$ln -s cudnn-10.2-linux-x64-v7.6.5.32.tgz cudnn-10.2-linux-x64-v7.6.tgz
$./cudnn.SlackBuild
cuda/include/cudnn.h
cuda/NVIDIA_SLA_cuDNN_Support.txt
cuda/lib64/libcudnn.so
cuda/lib64/libcudnn.so.7
cuda/lib64/libcudnn.so.7.6.5
cuda/lib64/libcudnn_static.a

Slackware package /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz created.
$upgradepkg --install-new /tmp/cudnn-v7.6_10.2-x86_64-1_SBo.tgz


We have suffered losses, but we will install OpenCV ...

Next  is the biggie, OpenCV. This usually means lots of iterations. Amos Stailey-Young's page is a good place to start. What worked for me is OpenCV 4.3.0 and opencv_contrib 4.3.0. Untar them into their respective subdirectories.

The cmake is:
heong@aspireM3:~/cuda/opencv/build$cmake -D CUDA_NVCC_FLAGS="-D_FORCE_INLINES -gencode=arch=
compute_35,code=sm_35" -D CMAKE_BUILD_TYPE=RELEASE -D OPENCV_GENERATE_PKGCONFIG=ON -DBUILD_SHARED_LIBS=OFF -D CMAKE_INSTALL_PREFIX=/usr/local -D INSTALL_C_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D BUILD_EXAMPLES=OFF -D WITH_OPENEXR=OFF -D WITH_CUDA=ON -D WITH_CUBLAS=ON -D WITH_CUDNN=ON -D CUDA_ARCH_BIN=3.0 -D OPENCV_DNN_CUDA=ON -D OPENCV_EXTRA_MODULES_PATH=~/cuda/opencv/opencv_contrib-4.3.0/modules -D LDFLAGS="-pthread -lpthread" -D CUDNN_VERSION="7.6" ~/cuda/opencv/opencv-4.3.0/

Note the use of the Compute Capability number. cuDNN version number has to be explicitly specified as the cmake persistently fails to extract the cuDNN version number from its include files.

Then it is 
$make -j 4
and then
$su -c "make install"

 And seemed to have resulted in 2 files:
root@aspireM3:/$ls -lh /usr/local/lib/python3.6/site-packages/cv2/python-3.6
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:58 cv2.cpython-36m-x86_64-linux-gnu.so
root@aspireM3:/$ls -lh /usr/local/lib/python2.7/site-packages/cv2/python-2.7
total 255M
-rwxr-xr-x 1 root root 255M Jun 16 22:57 cv2.so

And I simply did
$ln -s /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.6/site-packages/cv2/python-3.6/cv2.so
$export PYTHONPATH="/usr/local/lib/python3.6/site-packages/cv2/python-3.6/"

A very quick test is
$python3
Python 3.6.8 (default, Jan 13 2019, 13:36:07) 
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> 

Amos Stailey-Young's sample code did not work for me, but sr6033's code is very similar and worked well.
$python3 detect_faces_video.py  --prototxt prototxt.txt --model res10_300x300_ssd_iter_140000.caffemodel
[INFO] loading model...
[INFO] starting video stream...
[ WARN:0] global /home/heong/cuda/opencv/opencv-4.3.0/modules/videoio/src/cap_gstreamer.cpp
(935) open OpenCV | GStreamer warning: Cannot query video position: status=0, value=-1, dura
tion=-1

For python2:
$export PYTHONPATH="/usr/local/lib/python2.7/site-packages/cv2/python-2.7/"
heong@aspireM3:~/cuda/opencv/build$python
Python 2.7.15 (default, Jun 17 2018, 22:57:51) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import cv2
>>> 

"No sacrifice, no victory ..."


And there you have it: OpenCV 4.3.0 with CUDA 10.2.89 and cuDNN 7.6.5 running on the Nvidia GT 640M of an ancient Aspire M3-581TG laptop. Maybe my next laptop will have an Nvidia GPU with 8GB RAM ... what was it that Optimus Prime said? "Hang on to your dreams, Chip. The future is built on dreams."

Tuesday, 23 January 2024

Internet Server Blues: Serveo, Public IP, CGNAT and Accessing Your Servers from the Internet

Connection timeout

For over 2 decades I ran servers from my home. Before the github and the weblog, a personal website is a handy way to keep documents you might need to access. An IP camera might also need to act as a home server. An ssh server, when available over the Internet, turned to be a very handy way of piercing firewalls at work. Later, IoT devices also needed a server.

In practice this means whenever your modem router logs into the Internet your service provider provides it with an IPv4 public IP address. 

Then came NAT, a real blessing. Suppose you have several home computers all using the Internet at the same time. NAT software, usually running on your modem-router, uses just a single public IP address for all your computers, thus saving you from having to get multiple Internet lines. 

NAT or Network Address Translation


The Internet servers replying to your computers think there is just one computer, represented by your public IP. Your NAT intercepts these replies and routes them accurately to your individual computers 

Your internal servers have the problem in reverse. To a device in the Internet all of them have the same (ie your public IP) address. This is resolved by having each server use a unique number, a port (1 of 65536 available) to identify itself. Kind of like having room numbers in your house for every occupant. Based on this an incoming request is forwarded by the router to the correct server. The router also watches for the resulting replies and forwards them to the numerous (potentially) Internet devices. This is called Port Forwarding.

Port Forwarding

Thus all servers implicitly use different ports. For example http servers use port 80, https use port 443 and ssh uses port 22.

Sometime in 2022, outside access to my servers was blocked. My service provider Unifi had implemented CGNAT. CGNAT is Carrier Grade NAT. This means the service provider has grouped anything from tens to hundreds of subscribers into one Public IP using its own NAT upstream.

Carrier Grade Network Address Translation, or CGNAT

One immediate effect is many professional servers now receive a great deal of traffic from a single IP and this triggers their DDOS protection which often wants confirmation or verification before you can access their site.

The other problem is my provider Unifi has chosen not to limit but to block Port Forwarding. Unless I paid extra for a Public IP or a Static IP. Internet requests now no longer work. Internally on my private LAN they still work as before.

The obvious alternative is to pay for a cloud server with a Public IP, like AWS, Google Cloud, Microsoft Azure, etc.

Another alternative is often ngrok, which will forward ports to you for free using an ssh trick called Reverse Tunnelling. Unless you want to use your own domain name then there is a small fee.

But best of all is Trevor Dixon's serveo. It does ssh reverse tunnelling for free and will also allow unique, readable names. Buy Trevor a coffee sometime - he deserves it.

Say you already have an Apache webserver at port 80 - this makes it an insecure (ie not https) webserver. With serveo there is no need for logins and registrations, you just dive straight in with a reverse tunnel:

$ ssh -R cmheong:80:localhost:80 serveo.net  

The authenticity of host 'serveo.net (138.68.79.95)' can't be established.

RSA key fingerprint is SHA256:07jcXlJ4SkBnyTmaVnmTpXuBiRx2+Q2adxbttO9gt0M.

Are you sure you want to continue connecting (yes/no)? yes

Warning: Permanently added 'serveo.net,138.68.79.95' (RSA) to the list of known hosts.

To request a particular subdomain, you first need to generate a key. Use the command

ssh-keygen to generate your key. For more information about generating and using 

ssh keys, see https://www.ssh.com/academy/ssh/keygen. Once you've generated a key, try again, and these instructions will be replaced with instructions on how to register your key with serveo. 

Forwarding HTTP traffic from https://afc2076be26e6b5cc4b2ff5c4348336f.serveo.net


Over at your browser, http now works:

http://afc2076be26e6b5cc4b2ff5c4348336f.serveo.net:80

The bonus is https, too works without modification and the browser will not flag it as insecure:

https://afc2076be26e6b5cc4b2ff5c4348336f.serveo.net:443

The icing on the cake is subdomains. You just make an ssh key pair (if you do not already have one)

$ ssh-keygen -t rsa 
Generating public/private rsa key pair.
Enter file in which to save the key (/home/heong/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/heong/.ssh/id_rsa.
Your public key has been saved in /home/heong/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:AbCdEfGhIjKlMnOpQr123456789 cmheong@webserver

With your new key you now do:

$ ssh -R cmheong:80:localhost:80 serveo.net                                             
To request a particular subdomain, you first need to register your SSH public key.
To register, visit one the addresses below to login with your Google or GitHub account.                            
After registering, you'll be able to request your subdomain the next time you connect                              
to Serveo.                                                                                                         

Google: https://serveo.net/verify/google?fp=SHA256%3AAbCdEfGhIjKlMnOp%2BQr123456789
GitHub: https://serveo.net/verify/github?fp=SHA256%3AAbCdEfGhIjKlMnOp%2BQr123456789

So you need to register your key with serveo. I used my Google account. But notice serveo has modified your key fingerprint slightly (inserted %3A and %2B) so just paste serveo's output (not your sshkey-gen output) onto your browser. Assuming you have already logged into your Google account this works rightaway.

If you re-do your reverse tunnel again:

$ ssh -R heong:80:localhost:80 serveo.net
Forwarding HTTP traffic from https://cmheong.serveo.net

Now https://cmheong.serveo.net will work, just like that. After that head over to https://serveo.com and buy Trevor Dixon that cup of coffee. The man deserves it.

Happy Trails