Deploying Deepseek-R1 Distilled Model on Archlinux
Recently, Deepseek released the R1 model, which has been extremely popular online. Having some free time, I decided to deploy the 7b version to play around with it.
Using ollama can simplify the deployment process. After installing ollama with yay -S ollama
, start the service with sudo systemctl start ollama
, then run the command ollama run deepseek-r1
to begin chatting.
However, I quickly noticed something was wrong - why was the generation speed so slow? Checking the task manager, I found that inference was running on CPU without any GPU acceleration, so I started troubleshooting.
Looking at the ollama service logs with journalctl -u ollama -f
, I found a warning: no cuda runners detected, unable to run on cuda GPU
, but I obviously have CUDA drivers installed.
So I searched online extensively and was speechless when I found the solution. In the AUR repository, besides ollama, there’s also ollama-cuda - you need to install both to enable CUDA acceleration.