Deploying Deepseek-R1 Distilled Model on Archlinux

Recently, Deepseek released the R1 model, which has been extremely popular online. Having some free time, I decided to deploy the 7b version to play around with it.
Using ollama can simplify the deployment process. After installing ollama with yay -S ollama, start the service with sudo systemctl start ollama, then run the command ollama run deepseek-r1 to begin chatting.
However, I quickly noticed something was wrong - why was the generation speed so slow? Checking the task manager, I found that inference was running on CPU without any GPU acceleration, so I started troubleshooting.
Looking at the ollama service logs with journalctl -u ollama -f, I found a warning: no cuda runners detected, unable to run on cuda GPU, but I obviously have CUDA drivers installed.
So I searched online extensively and was speechless when I found the solution. In the AUR repository, besides ollama, there’s also ollama-cuda - you need to install both to enable CUDA acceleration.

0%