From Development to Deployment: How Neardi Simplifies RK1820 Implementation -

As edge AI applications become increasingly widespread, a core challenge for developers is enabling AI models to run quickly and efficiently on edge devices. Addressing complex AI algorithms and computational demands, the RK1820—a System-on-Chip (SoC) integrating a RISC-V coprocessor and a high-performance AI acceleration engine—provides developers with a complete, user-friendly, end-to-end AI development ecosystem.

This article takes a deep dive into the RK1820 development framework, exploring the journey from PC-side model development to board-side AI deployment, and examining how Neardi empowers developers to truly implement the AI coprocessor.

In embedded systems and AI acceleration scenarios, the main controller (CPU) handles general control and resource management, while coprocessors take on intensive, specialized, or real-time computational tasks. Through mechanisms like shared memory, FIFO, and RPC, they achieve efficient collaboration, significantly boosting system performance, reducing power consumption, and maintaining design flexibility and scalability. This is the fundamental reason why modern SoCs increasingly adopt coprocessor architectures. The RK1820 is an AI coprocessor designed to work with main controllers like the RK3588/RK3576. Utilizing PCIe communication, it enables flexible, fast, and efficient operation.

The Role of the Main Controller (CPU) in Embedded Systems

The CPU is responsible for “ensuring the system completes perception, computation, communication, and control tasks in the correct sequence, at the right time, and with minimal energy consumption”—it handles all the miscellaneous jobs. Other dedicated engines (like NPU, DSP, FPGA cores) focus solely on “computing quickly.” However, when to compute, what to compute, and where to send the results are all directed by the CPU.

The CPU acts as the “all-around manager” of the embedded system: it reads programs, translates instructions into actual operations, and directs the ALU to execute them. Simultaneously, it manages the fair allocation of resources like memory, cache, and clock cycles, overseeing limited resources like a financial controller. Externally, interfaces like GPIO, UART, I²C, SPI, USB, and Ethernet ports are its “mouth and hands”—sensors, displays, and network modules must communicate through it. When multiple tasks arise, it becomes a project manager, slicing time based on priority within an RTOS to ensure critical tasks (like motor control or braking) get immediate responses. Upon power-on, it first performs self-checks, then triggers the Bootloader for verification, ensuring system integrity before proceeding. When the system is idle, it automatically reduces frequency and voltage—or even enters sleep mode—to extend battery life. Finally, it acts as the “chief commander,” dictating when external computing power (like DSP, NPU) starts work, what data to move, and where to place results. The entire pipeline involving computation, storage, peripherals, and accelerators is coordinated seamlessly under its control.

Concept and Working Principle of the Coprocessor

The Coprocessor just like the CPU’s “Add-on Skill Pack”. A coprocessor cannot independently fetch instructions or run an operating system. It only accepts micro-instructions or data blocks from the main CPU, executing specific operations (multiply-accumulate, convolution, floating-point, FFT, AES, CRC, trigonometric functions, etc.) with high performance and low power consumption using dedicated hardware arrays. After computation, it writes the results back to shared memory and notifies the CPU via an interrupt.

Here’s the typical workflow:

The compiler first “tags” large, specific functions in the program (like matrix multiplication or convolution), marking them as “tasks for the add-on.”

When the CPU encounters these tags, it doesn’t compute them itself. Instead, it places the parameters into the coprocessor’s “inbox”—which could be registers, FIFO, or a DMA buffer—and then hits the “start” button.

The coprocessor then works at full speed within its own clock domain, while the CPU continues with other tasks, achieving true parallelism.

Once finished, the coprocessor signals “I’m done” by either pulling a status bit high, sending an interrupt, or writing results directly to shared memory. The CPU retrieves the results, and the program continues.

If the coprocessor encounters an error (like overflow or illegal instruction), it writes an error code to a status register and reports an exception to the CPU. The CPU checks the exception vector to determine the appropriate response.

This process is like a boss handing blueprints to a dedicated machine. The machine processes noisily while the boss continues answering phones. The machine rings a bell upon completion for delivery, or lights a red lamp for faults—making the workflow smooth and efficient.

How do the Main Controller and Coprocessor Communicate?

To enable efficient collaboration, they typically communicate via high-speed interconnect buses, with PCIe being the most common method.

PCIe is a point-to-point, high-speed serial bus standard known for its high bandwidth, low latency, and strong scalability, making it the preferred choice for communication between high-performance components like CPUs, GPUs, and NPUs.

High Data Bandwidth: Per lane (x1) speeds can reach ~1 GB/s (PCIe 3.0) or even ~4 GB/s (PCIe 4.0).
Low Latency: The point-to-point architecture reduces arbitration and wait times, ideal for real-time AI data exchange.
Bidirectional Transmission: Full-duplex lanes support simultaneous sending and receiving, ensuring data synchronization.
Hot-Plug Capability: Some designs support module-level hot-plugging for easier maintenance and expansion.

A Complete Guide to the RK1820 Development Framework

PC-Side Development Environment: From Training to Model Conversion

In the initial stages of AI model development, the PC handles tasks like model training, conversion, and performance evaluation. The RK1820 provides a mature software toolchain, including:

RKNN3 Toolkit – The Core Tool for Model Conversion & Performance Evaluation: The RKNN Toolkit is an AI model development kit from Rockchip. It supports one-click conversion of models from major deep learning frameworks (like PyTorch, TensorFlow, ONNX) into the RKNN format. Developers can use this tool for: model format conversion (PyTorch → RKNN), inference performance analysis and optimization, and trade-off evaluation between accuracy and speed. This process ensures the model fully adapts to the RK1820’s hardware acceleration features, maximizing computational power.

RKNN3 Model Zoo – Ready-to-Use AI Model Repository: Rockchip officially provides a rich repository of AI model examples, covering various types such as image classification, object detection, semantic segmentation, gesture recognition, and face recognition. Developers can directly use these models or build upon them for secondary development, significantly shortening the project lifecycle from “concept” to “product.”

Board-Side Development Environment: Making the Model Run

Once the model is converted, it can be deployed onto the RK1820 coprocessor for inference execution. Neardi provides comprehensive board-side development support for the RK1820 platform, including:

RKNN3 Runtime – The Model Inference Execution Engine: During the edge-side runtime phase, the RKNN3 Runtime provides a complete API interface, allowing developers to load and execute RKNN models within their applications. In addition to the RKNN3 API, the RK1820 platform also supports OpenAI-compatible API calls for LLM models, meaning developers can experiment with running lightweight large language model applications on embedded devices.

Examples – Rich AI Application Reference Demos: Neardi offers various AI application examples for the RK1820 platform, such as smart cameras, face recognition, object detection, gesture control, and OCR recognition. These examples not only demonstrate the performance of AI models on the RK1820 but also help developers quickly understand the API call flow and model deployment details.

Debugging & Performance Tools: Neardi integrates debugging and performance tools in the development kit:

RKNN-SMI: Monitors NPU utilization and operational status in real-time.

RKNN Console: A command-line tool for model loading, inference testing, and performance comparison.

Driver Support: The RK1820 coprocessor connects to the main SoC via high-speed interfaces like PCIe or USB, with complete RK182X PCIe EP driver and supporting firmware provided.

This master-slave collaborative architecture allows the main controller to handle task scheduling and I/O management, while the RK1820 focuses on AI inference acceleration, achieving an ideal balance between power consumption and computational power.