With the continuous advancements in AI technology, deep learning inference has become a vital component of developing intelligent applications. In this article, we’ll walk you through the process of deploying the DeepSeek-R1 model on the Neardi PI 3 platform, based on the RK3576 chip. We’ll also take advantage of the RKNPU (Rockchip NPU) to perform efficient inference.
System Setup
Neardi PI 3 (RK3576) Hardware Configuration:
•8GB RAM
•64GB Storage
1. Local Deployment of Ollama DeepSeek-R1
1.1 Install Ollama
Method 1: Use the terminal to download and install
Run the following command in your terminal:
curl -fsSL https://ollama.com/install.sh | sh
Method 2: Manual download and installation
If the first method doesn’t work, you can manually download the arm64 installation package and extract it.
Here’s the download link for the Ollama installation package.
1.2 Run Ollama Service
To start the Ollama inference service, run the following command:
ollama start
Wait for the model to load and initialize.
1.3 Install the DeepSeek-R1:8B Model
To install the DeepSeek-R1:8B model, run:
./ollama run deepseek-r1:8b
It may take some time to download the model. Ollama might slow down the download speed at times, but pausing and restarting the process can help speed it up (tested successfully).
1.4 Installation Success
Once the installation is complete and the terminal shows a prompt like >>>, you are ready to start interacting with the model.
1.5 Run DeepSeek Inference
At this point, you can run DeepSeek inference. For this step, refer to the video deepseek1.mp4 for a demonstration.
2. Deploying DeepSeek-R1 with RKNPU
2.1 Model Download
It’s recommended to download the pre-converted model for RK3576. The DeepSeek-R1-Distill-Qwen-1.5B model is already converted into a format that RK3576 can recognize.
Download the RKLLM model here:
[Model Download Link]
Use the extraction code: rkllm
The model file you need is: DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm.
If you prefer to convert the model yourself, refer to the DeepSeek-R1-Distill-Qwen-1.5B_Demo section for model conversion instructions.
2.2 Compile llm_demo (Optional)
To compile the llm_demo, follow these steps:
1.Clone the repository:
git clone https://github.com/airockchip/rknn-llm
2.Navigate to the demo directory:
cd rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy
3. In the build-linux.sh file, change the GCC_COMPILER_PATH value to aarch64-linux-gnu to avoid compilation errors.
4.Run the build script:
./build-linux.sh
2.3 Run DeepSeek Inference
To run the DeepSeek model inference on the PI 3 (RK3576), use the following command:
export LD_LIBRARY_PATH=/home/neardi/rknn-llm/rkllm-runtime/Linux/librkllm_api/aarch64:$LD_LIBRARY_PATH
taskset f0 ./rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy/install/demo_Linux_aarch64/llm_demo DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm 2048 4096
Once the command runs successfully, you’ll see an output similar to:
rkllm init start
I rkllm: rkllm-runtime version: 1.1.4, rknpu driver version: 0.9.8, platform: RK3576
rkllm init success
********************** Available Questions ********************
[0] A bag contains 5 red balls and 3 blue balls. What is the probability of picking a blue ball?
[1] The number sequence is 1, 4, 9, 16, 25, … What is the next number?
[2] Write a program to check whether a number is odd or even.
***************************************************************
You can now input your questions based on the options or enter custom queries to get inference results from the model.
By following these steps, you’ve successfully deployed the DeepSeek-R1 model on the Neardi PI 3 (RK3576) platform using the Rockchip NPU for optimized inference. This guide provides a detailed overview for developers and AI enthusiasts to integrate deep learning models efficiently on embedded devices.