When selecting a single-board computer (SBC) for a local large language model (LLM), several factors must be considered, including performance, resource requirements, hardware features, and budget. This article will introduce how to choose between Lattepanda 3 Delta, Lattepanda Sigma, Raspberry Pi 4, and Raspberry Pi 5 to meet your LLM application needs.
Lattepanda 3 Delta and Lattepanda Sigma utilize Intel processors based on the x86 architecture, offering higher performance. Raspberry Pi 4 and Raspberry Pi 5 use processors based on the ARM architecture, providing lower performance suitable for lightweight LLM tasks.
1. Lattepanda 3 Delta is a powerful single-board computer equipped with an x86 architecture Intel processor:
2. Lattepanda Sigma features a more robust hardware configuration capable of handling more complex tasks and applications:
3. Raspberry Pi 4B is a popular ARM architecture single-board computer:
4. Raspberry Pi 5:
Product Name
LattePanda Sigma - x86 Windows / Linux Single Board Computer Server (32GB RAM, 500GB SSD, WiFi 6E)
Figure
SKU
DFR0981
DFR1091
DFR0697
DFR1119
Processor
Intel® Celeron® N5105
Intel® Core™ i5-1340P
Broadcom BCM2711
Broadcom BCM2712
Core
2.0-2.9GHz Quad-Core, Four-Thread
12-Core, 16-Thread, 12M Cache Up to 4.60 GHz (Performance-Core), 3.40 GHz (Efficient-Core)
Quad core Cortex-A72 (ARM v8) 64-bit @ 1.8 GHz
Quad-Core Cortex-A76 (ARM v8) 64-bit @ 2.4 GHz
Graphics
Intel® UHD Graphics (Frequency: 450 – 800MHz)
Intel® Iris® Xe Graphics 80 Execution Units, up to 1.45 GHz
VideoCore VI @ 500 MHz Supports: OpenGL ES 3.1, Vulkan 1.0
VideoCore VII @ 800 MHz Supports: OpenGL ES 3.1, Vulkan 1.2
Memory
LPDDR4 8GB 2933MHz
Up to 32GB, Dual-Channel LPDDR5-6400MHz
LPDDR4-3200 SDRAM 1GB, 2GB, 4GB or 8GB
LPDDR4X-4267 SDRAM 4GB, or 8GB
Storage
64GB eMMC
M.2 NVMe/SATA SSD (Separately Installed)
Micro SD
Micro SD (SDR104 Compatible) M.2 NVME SSD Support via HAT
Wireless
802.11ax, 2.4G & 5G(160MHz), Up to 2.4Gbps Bluetooth 5.2
· 2 x 2.5GbE RJ45 Ports (Intel® i225-V) · M.2 Wireless Module (Separately Installed)
Dua-Band 802.11ac Bluetooth 5 / BLE Gigabit Ethernet PoE via POE + Hat
Dua-Band 802.11ac Bluetooth 5 / BLE Gigabit Ethernet PoE via POE + Hat (Incompatible with old version)
Expansion Slots
· 1x M.2 M Key, PCIe 3.0 2x, Supports NVMe SSD · 1x M.2 B Key, PCIe 3.0 1x, Supports USB 2.0, USB 3.0, SATA, SIM
· M.2M Key: PCIe 3.0 x 4 · M.2M Key: PCIe 4.0 x 4 · M.2 B Key: SATA III/PCIe 3.0 x 1, USB2.0, USB3.0, SIM · M.2 E Key: PCIe 3.0 x 1, USB2.0, Intel CNVio · Micro SIM Card Slot
2-lane MIPI DSI Display Port 2-lane MIPI CSI Camera Port 4-Pole Stereo Audio and Composite Video Port
2 x 4-lane MIPI camera / display transceivers PCIe 2.0 x1 Interface UART Breakout RTC Clock Power 4-Pin FAN Power
Price
$279
$579(16GB), $629(32GB)
$75
$80
Add to Cart
Add to Cart
Add to Cart
Add to Cart
Add to Cart
LLM usually requires a large amount of memory to store model parameters and intermediate calculation results. For example, a model with billions of parameters may require tens of gigabytes or more of memory. On limited system memory, you can consider using model compression techniques, and quantization to reduce the size of the model, and ensure that the selected SBC has sufficient memory capacity to be able to load and run the LLM model.
A powerful CPU is crucial to handle the inference and training of LLM, and hardware accelerators such as GPU can significantly improve the training and inference speed of LLM. Although it is possible to run LLM with only SBC's CPU, its performance may not be comparable to that of a GPU or dedicated acceleration hardware. Therefore, when choosing between an SBC and an LLM, make sure its CPU performance is powerful enough to handle the computational load required by the LLM.
SBC with smaller memory may not be able to simultaneously store the LLM model and its parameters, conversation history, input data, and intermediate results during inference. After multiple rounds of dialogue, the memory has been exhausted, which may cause the LLM program to crash.
Lower resource and memory constraints may cause LLM to perform slower inference when processing long texts. CPU with smaller memory faces multiple performance bottlenecks when processing LLMs, and these bottlenecks work together to slow down token processing.
Model | File Size |
phi-2-Q4 | 1.7GB |
Alpaca-7B-Q4 | < 4GB |
LLaMA-7B-Q4 | < 4GB |
LLaMA2-7B-Q4 | < 7GB |
LLaMA-13B-Q4 | < 8GB |
mixtral_7bx2_moe_Q4 | < 8GB |
mamba-gpt-7b | <13GB |
ChatGLM-6B-Q4 | 13GB |
Considering the memory and storage requirements of LLMs, Lattepanda Sigma typically offers larger memory and storage capacities, making it better suited to support LLM operations. Raspberry Pi 4 and Raspberry Pi 5, on the other hand, have relatively smaller memory and storage capacities, requiring adaptation for LLMs with smaller memory, such as phi-2.
We utilize the LLaMA.cpp and CPU for LLM inference. For LLM original model files in .pth format, they need to be quantized into GGUF format before running on the CPU. Considering the memory constraints of SBCs, we quantize GGUF model files into int4 format. For phi-2, an original model of approximately 6GB is reduced to only 1.6GB after Q4 quantization.
For LattePanda 3 Delta, Raspberry Pi 4, and Raspberry Pi 5, once you have selected a suitable LLM, due to memory limitations, you need to first download the LLM original model on another Linux PC and perform quantization before copying the quantized model to the SBC for LLM execution. However, on Lattepanda Sigma, you can directly download the LLM model and perform quantization.
Real-time Performance:
Assessing the SBC's performance in handling real-time language tasks, including response time and processing latency.
Benchmark for LP 3 delta, LP Sigma, Raspberry Pi 4B, Raspberry Pi 5
Model | File size | LattePanda 3 Delta 8GB | LattePanda Sigma 32GB | Raspberry Pi 4B 8GB | Raspberry Pi 5 8GB |
llama2-7b-Q4 | <7GB | 2.55 tokens/s | 6 token/s | 0.1 tokens/s | 2.3 tokens/s |
Comparing SBCs with the same LLM model as the standard, it is evident from the above table that Lattepanda Sigma is significantly better than the other SBC.
For specific deployment steps, please refer to the following:
Deploy and run LLM on Raspberry Pi 5 vs Raspberry Pi 4B (LLaMA, LLaMA2, Phi-2, Mixtral-MOE, mamba-gp
Model | File Size | Compatibility | Out of Memory | Token Speed |
phi-2-Q4 | 1.7GB | √ | 5.13 tokens/s | |
LLaMA-7B-Q4 | < 4GB | √ | 2.2 tokens/s | |
LLaMA2-7B-Q4 | < 7GB | √ | 2.3 tokens/s | |
LLaMA2-13B-Q4 | < 4GB | √ | 2.02 tokens/s | |
mixtral_7bx2_moe_Q4 | <8GB | √ | use llama.cpp <1 tokens/s | |
mamba-gpt-7b | <13GB | √ |
For specific deployment steps, please refer to the following:
Deploy and run LLM on Raspberry Pi 4B (LLaMA, Alpaca, LLaMA2, ChatGLM)
Model | File Size | Compatibility | Out Of Memory | Token Speed |
LLaMA-7B-Q4 | < 4GB | √ | ~0.1 token/s | |
Alpaca-7B-Q4 | < 4 GB | √ | ||
LLaMA2-7B-chat-hf-Q4 | < 7GB | √ | ||
LLaMA-13B-Q4 | < 8GB | √ | ||
ChatGLM-6B-Q4 | 13GB | √ |
For specific deployment steps, please refer to the following:
Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA, LLaMA2, Phi-2, ChatGLM2)
Model | File Size | Compatibility | Out of Memory | Token Speed |
phi-2-Q4 | 1.7GB | √ | 5.48 tokens/s | |
LLaMA-7B-chat-Q4 | <4GB | √ | 2.55 tokens/s | |
LLaMA2-7B-chat-Q4 | <7GB | √ | 2.56 tokens/s | |
LLaMA2-13B-Q4 | ~4GB | √ | 2.51 tokens/s | |
ChatGL2-6B-Q4 | <4GB | √ | <1.5 tokens/s | |
mamba-gpt-7b | <13GB | √ |
For specific deployment steps, please refer to the following:
Deploy and run LLM on LattePanda Sigma (LLaMA, Alpaca, LLaMA2, ChatGLM)
Model | File Size | Compatibility | Token Speed |
LLaMA-7B-chat-Q4 | <4GB | √ | 5 tokens/s |
Alpaca-7B-Q4 | <4GB | √ | 5 tokens/s |
LLaMA2-7B-chat-Q4 | <7GB | √ | 6 tokens/s |
LLaMA-13B-Q4 | <8GB | √ | 2 tokens/s |
ChatGLM-6B-Q4 | 13GB | √ | 1 tokens/s |
In summary, when selecting a SBC suitable for local LLMs, several factors need consideration.
In terms of hardware configuration, Lattepanda 3 Delta is equipped with an Intel Celeron N5105 processor, 8GB memory, and 64GB eMMC storage. Lattepanda Sigma features a more potent Intel Core i5-1340P processor, supporting up to 32GB LPDDR5 memory and M.2 NVMe/SATA SSD storage. Raspberry Pi 4B and Raspberry Pi 5 are equipped with Broadcom BCM2711 and BCM2712 processors, offering relatively smaller memory and storage capacities.
Considering resource and memory constraints, LLMs typically demand substantial memory to store model parameters and intermediate computation results. Therefore, it's crucial to ensure that the chosen SBC has a sufficiently large memory capacity. Due to its expandable memory and storage capacity, Lattepanda Sigma can better support LLM operations compared to Raspberry Pi 4B and Raspberry Pi 5, which may require adaptation for LLM models with smaller memory footprints.
Regarding deployment, Lattepanda Sigma allows direct download and quantization of LLM models. However, for Lattepanda 3 Delta, Raspberry Pi 4, and Raspberry Pi 5, you need to download the LLM original model on another Linux PC and perform quantization before transferring the quantized model to the SBC for execution.
Finally, model inference speed is a crucial performance metric for assessing SBCs. According to test results, Raspberry Pi 5 exhibits a significant improvement in processing speed compared to Raspberry Pi 4B. Particularly noteworthy is the outstanding performance of phi-2-Q4 on Raspberry Pi 5, with an evaluation time speed of 5.13 tokens/s. However, due to RAM capacity limitations, both Raspberry Pi 5 and Raspberry Pi 4B may still encounter constraints when processing large-scale LLMs. Lattepanda 3 Delta demonstrates slightly better LLM performance than Raspberry Pi 4B and Raspberry Pi 5. Nonetheless, Lattepanda Sigma provides higher performance, achieving speeds of up to 6 tokens/s when running llama2-7b-Q4, meeting the requirements of applications demanding more from LLMs, albeit at a higher price, making it suitable for those with larger budgets.
Running LLMs can lead to high CPU loads, generating considerable heat. It's essential to ensure that the SBC has adequate cooling measures in place to prevent overheating and maintain system stability. Lattepanda 3 Delta includes an active cooling system with a small fan to help dissipate heat, while users may need to ensure sufficient ventilation space to prevent overheating. Lattepanda Sigma comes equipped with a cooling fan to maintain proper processor temperature under load. Raspberry Pi 4B utilizes the Broadcom BCM2711 SoC and, although lacking a built-in fan, users can purchase official or third-party heat sinks, especially when running high-load applications or operating in high-temperature environments.