REVIEWS

Run Small Language Models (Mathstral, phi 3, Llama 3.1, Gemma2 2b, Qwen) on Raspberry Pi 5

DFRobot Oct 24 2024 274974

In today's era of intelligent computing, Single Board Computers (SBC), such as the Raspberry Pi, have gained increasing popularity among developers due to their compact design and exceptional computing performance. At the same time, Small Language Models (SLMs) play a crucial role in diverse application scenarios, offering efficiency and flexibility for tasks ranging from smart home automation to edge computing. This article aims to provide an in-depth analysis of the performance of various SLMs on the Raspberry Pi 5 Single Board Computer (SBC) - 8GB, running Raspberry Pi OS. We will conduct a detailed comparison of models such as mathstral, phi 3, llama 3.1, gemma2 2b, Qwen, and Deepseek coder V2, evaluating execution speed, model size, open-source licenses, and runtime frameworks. Our goal is to provide developers with valuable insights and data for optimizing SLMs on SBCs like the Raspberry Pi.

Small Language Models (SLMs)

Figure: Small Language Models (SLMs)

Differences in SLMs

Mathstral: This model is primarily designed to address mathematical reasoning problems. It is built upon the Mamba2 architecture, making it particularly suitable for scenarios that require complex mathematical calculations and reasoning.
Deepseek Coder V2: Specializing in code-related issues, this model offers efficient code generation capabilities. It is tailored for programming tasks and provides a robust framework for developers to automate and enhance their coding processes.
Phi 3: This model is characterized by its versatility and is capable of handling a wide range of tasks. It strikes a balance between performance and flexibility, making it a popular choice for developers seeking a multi-purpose SLM.
Llama 3.1: Known for its compact size and efficiency, Llama 3.1 is an excellent option for applications with limited computational resources. It excels in natural language processing tasks and is particularly adept at text generation and summarization.
Gemma2 2b: This model stands out for its impressive execution speed and accuracy. It is well-suited for real-time applications, such as chatbots and virtual assistants, where quick and accurate responses are crucial.
Qwen: Qwen is an open-source model that offers great flexibility and customization options. It is particularly favored by developers who require a customizable SLM for specific use cases.

Performance Summary of SLMs on Raspberry Pi 5

To evaluate the performance of the aforementioned SLMs on the Raspberry Pi 5, we conducted a series of tests focusing on execution speed, model size, open-source licenses, and runtime frameworks. The Raspberry Pi 5 8GB provided a consistent and reliable platform for our experiments.

Model	Size	Speed(tokens/s)	License	Runtime frame
mistral-7B-q4	4.1GB	0.97	Apache 2.0	ollama
phi3 3.8b-q4	2.2GB	3.06	MIT
phi3.5-3.8b-q4	2.2GB	3.42	MIT
Llama 3.1-8b-q4	4.7GB	1.18	Llama 3.1 license
gemma2-2b-q4	1.6GB	2.97	Gemma license
qwen2-0.5b-q4	395MB	20	Apache 2.0
qwen2.5-0.5b-q4	398MB	19.41	Apache 2.0
Deepseek coder V2-7b-q4	8.9GB	can't run	Deepseek license
llama2-7b-q4	3.8GB	can't run	llama2 license

As an iterative version of phi3, phi3.5 has improved performance while keeping the model size and number of parameters unchanged.

Qwen2.5 achieves improved accuracy while slightly increasing the model size.

Conclusion

We measured the execution speed of each model by evaluating their performance on a set of predefined tasks. These tasks encompassed a wide range of applications, including mathematical calculations, code generation, text summarization, and chatbot interactions. The results revealed significant variations in execution speed among the different models, with some models exhibiting remarkable efficiency on specific tasks.

The size of the models was also a crucial factor in our analysis. We compared the storage requirements of each model and assessed their impact on the overall system performance. Models like Qwen and Gemma2 2b stood out for their compact size, making them ideal for applications with limited storage capabilities.

In conclusion, our in-depth analysis of various SLMs on the Raspberry Pi 5, provides valuable insights into their performance, capabilities, and limitations. Developers can leverage this information to select the most suitable model for their specific requirements. The Raspberry Pi 5, with its impressive computing power, serves as an excellent platform for running SLMs, enabling developers to harness the power of intelligent computing compactly and cost-effectively.

Learn More About SLMs on Raspberry Pi 5:

Run Qwen2.5 on Raspberry Pi 5 using Ollama

Run phi3.5 on Raspberry Pi 5 using Ollama

Run Gemma2 on Raspberry Pi 5 using Ollama