SELECTION GUIDE

TinyML Voice Recognition: ESP32, Arduino, STM32 Hardware Compared

DFRobot Sep 22 2024 269125

In the realm of embedded systems and IoT, TinyML technology is becoming an increasingly important force. Specifically, in voice recognition (control) projects, selecting the right hardware platform is key to ensuring success. This guide will provide an in-depth comparison of three popular TinyML hardware platforms: ESP32, Arduino Nano 33 BLE Sense, and STM32F746G-DISCO. We will analyze them from various angles such as hardware specifications, performance, power consumption, and practical application cases to help developers make the most informed choice.

By reading this guide, you will gain a comprehensive understanding of how these hardware platforms perform in voice recognition projects, their strengths and weaknesses, and the scenarios in which they are most applicable. You will also receive recommendations based on comparative analysis to ensure your TinyML project’s success.

Overview of Hardware Platforms

ESP32:

ESP32 is a low-power dual-core processor with integrated Wi-Fi and Bluetooth capabilities, making it highly suitable for embedded and IoT applications. It is cost-effective, offers extensive peripheral support, and has strong community backing, making it a popular choice for developers in TinyML projects. ESP32 performs well in voice recognition tasks, particularly in resource-constrained environments where it can operate efficiently.

Arduino Nano 33 BLE Sense

The Arduino Nano 33 BLE Sense is a small, high-performance development board integrated with various sensors (including ambient light, accelerometer, gyroscope, and microphone) and features Bluetooth Low Energy (BLE) capabilities. It is especially suitable for TinyML applications requiring multi-sensor data fusion. The board is favored by developers for its ease of use and broad support within the Arduino ecosystem.

STM32F746G-DISCO

The STM32F746G-DISCO is a high-performance development board based on the ARM Cortex-M7 processor, offering robust computing power and rich peripheral interfaces. It is particularly suitable for complex audio processing and voice recognition tasks. The board has audio input/output interfaces, making it ideal for industrial-grade and demanding application scenarios. The abundant development tools and documentation support make it a go-to choice for developers working on high-performance embedded applications.

Parameter	ESP32	Arduino Nano 33 BLE Sense	STM32F746G-DISCO
Processor	Low-power dual-core processor	ARM Cortex-M4	ARM Cortex-M7
Operating Frequency	240 MHz	64 MHz	216 MHz
Memory	520 KB SRAM	256 KB SRAM	320 KB SRAM
Storage	4 MB Flash	1 MB Flash	1 MB Flash
Wireless Connectivity	Wi-Fi, Bluetooth	BLE	None
Sensors	None	Ambient light, accelerometer, gyroscope, microphone	None
Audio Interface	None	None	Audio input/output interface
Power Consumption	Low	Very low	Higher
AI Support	Supports TensorFlow Lite, Edge Impulse	Supports TensorFlow Lite, Edge Impulse	Supports TensorFlow Lite
Interfaces	GPIO, I2C, SPI, UART	GPIO, I2C, SPI, UART	GPIO, I2C, SPI, UART, USB, Ethernet

Figure: Comparison of hardware platforms for TinyML

Comparison of Test Cases

After understanding the basic features of ESP32, Arduino Nano 33 BLE Sense, and STM32F746G-DISCO, we demonstrate the actual performance of these three hardware platforms in a smart home voice control system through specific application cases. This will help readers better understand the application effects of each platform and make a more informed choice.

Overview of Test Standards and Methods

To ensure a fair and comprehensive evaluation of the performance of ESP32, Arduino Nano 33 BLE Sense, and STM32F746G-DISCO in TinyML voice recognition projects, we have established a series of standardized testing criteria and methods, as detailed in this guide. These tests are conducted under identical conditions to ensure the comparability and accuracy of the data.

Test Items

Voice Recognition Accuracy:

Model Name: A pre-trained TensorFlow Lite model optimized for voice recognition in low-resource environments was used.
Functionality: The model can recognize simple keywords, such as "yes," "no," and "others" (including noise or invalid commands).
Model Development: The model was built on the Edge Impulse platform using the official dataset and fine-tuned with common noise environments encountered in real-world applications.

Response Time:

Measurement Method: Using precise time measurement tools, the delay time from voice input to system output command was recorded. All hardware was tested in the same environment.

Power Consumption Test:

Test Environment

Environmental Consistency: All tests were conducted under the following three environments to simulate performance in different usage scenarios:

Quiet Indoor Environment: A standard indoor environment without background noise.
Indoor Environment with Background Noise: An indoor environment with background noise, including human voices, music, and other ambient sounds.
Outdoor Environment: A typical outdoor environment with natural noise such as wind and vehicle sounds.
Operating Conditions: The same type of power source, microphone module (e.g., PDM microphone), and communication interface (Wi-Fi or BLE) were used across all hardware to ensure consistency in test conditions.

Model Specifications

Model Uniformity: The same voice recognition model was run on all three hardware platforms, uniformly optimized to fit the different hardware platforms.
Model Size and Complexity: The model size (approximately 50KB) and complexity (about 10 layers of neural network structure) were kept consistent, ensuring that performance differences among the hardware were due to their inherent computational capabilities and optimization levels, rather than differences in the model itself.

Figure: Speech-to-Intent processing with TensorFlow Lite on an embedded device

Performance of ESP32

Model Used: FireBeetle 2 ESP32-E

Implementation Method:

Hardware Connection: The FireBeetle 2 ESP32-E development board, equipped with an IO expansion board, was used to connect a sound sensor module via its extended IO interface. The sensor is connected to the analog input channel (A0) of the development board through three wires, allowing direct reading of sound signals without the need for complex external circuitry.
Software Configuration: The project utilized a TensorFlow Lite model, which was ported to the ESP32 platform for voice recognition. First, the TensorFlow Lite library was loaded in the Arduino IDE, and the model was initialized. The development board checks the compatibility of the model version to ensure the correct model version is used.
Data Collection and Processing: Analog data collected by the DFR0034 sensor is read in real-time through the A0 channel of the FireBeetle 2 ESP32-E. Once sufficient data is collected, it is input into the TinyML model for prediction and recognition of voice commands.
Voice Command Recognition and Response: The system determines whether the input voice matches preset commands like "Yes" or "No" based on the model's prediction results. For example, when a "Yes" command is detected, the system lights up the connected indicator; when a "No" command is detected, the indicator turns off.

Figure: FireBeetle 2 ESP32-E voice recognition project setup

Performance:

Recognition Accuracy: Tested in various environments, the average recognition accuracy was 89%. Detailed data is shown in the chart below.
Response Time: The average response time was 120ms when processed locally.
Power Consumption: The power consumption was 0.5W in standby mode and 1.2W during operation.
Stability: After prolonged testing, the system ran stably with no significant faults.

Environment	Recognition Accuracy
Quiet Indoor Environment	91%
Indoor Environment with Background Noise	87%
Outdoor Environment	85%
Average	89%

Evaluation: The high cost-effectiveness, stable wireless connectivity, and low power consumption make it an ideal choice for smart home devices. However, due to its limited processing power, it is only suitable for simple voice recognition tasks.

Performance of Arduino Nano 33 BLE Sense

Implementation Method:

The Arduino Nano 33 BLE Sense was connected to a microphone and other sensors to run a TinyML model.
Communication with home appliances was achieved through BLE, enabling voice control.

Figure: running a TinyML application, indicated by the blinking LED

Performance:

Recognition Accuracy: The average recognition accuracy was 86% with multi-sensor fusion. Although accuracy slightly decreased in complex environments, the diversity of sensors provided rich contextual information, enhancing the overall system robustness.
Response Time: Due to limited processing power, the average response time was 150ms.
Power Consumption: The power consumption was 0.3W in standby mode and 0.8W during operation, making it well-suited for battery-powered portable applications.
Stability: Stability was moderate in complex environments, with occasional BLE connection instability, especially in environments with significant signal interference.

Figure: TensorFlow Lite TinyML process on Arduino Nano 33 BLE Sense

Evaluation: The Arduino Nano 33 BLE Sense, with its high-sensitivity sensors, low power consumption, and small size, is particularly suitable for multi-sensor fusion applications in portable and wearable devices. However, its limited processing power makes it more appropriate for simple voice recognition tasks. For applications requiring higher performance and more complex processing capabilities, a more powerful hardware platform may be needed.

Performance of STM32F746G-DISCO

Implementation Method:

The STM32F746G-DISCO was connected to a professional microphone and audio interface to run a complex TinyML voice recognition model.
Communication with home appliances was achieved through wired interfaces, enabling voice control.

Figure: running a TinyML application, indicated by the blinking LED

Performance:

Recognition Accuracy: The average recognition accuracy was 94% in complex environments.
Response Time: Due to its powerful processing capability, the average response time was 80ms.
Power Consumption: The power consumption was 1W in standby mode and 2.5W during operation.
Stability: The system ran stably, making it suitable for high-demand application scenarios.
Evaluation: With its powerful processing capability and stable audio interface, the STM32F746G-DISCO is well-suited for complex voice recognition tasks. However, its high power consumption and relatively high cost should be considered.

Comparison Analysis

Performance Metric	ESP32	Arduino Nano 33 BLE Sense	STM32F746G-DISCO
Recognition Accuracy	89%	86%	94%
Response Time	120ms	150ms	80ms
Power Consumption	1.2W	0.8W	2.5W
Stability	High (Mature design, strong community support)	Medium (Challenges with multi-sensor fusion and BLE connectivity)	High (Strong processing capability, suitable for complex tasks)

Analysis Conclusion: From the analysis above, we can see how each hardware performs across different performance metrics. The ESP32 is suitable for low-cost, low-power applications; the Arduino Nano 33 BLE Sense is ideal for multi-sensor fusion and low-power portable applications; and the STM32F746G-DISCO is well-suited for high-performance, complex task applications.

Conclusion

When selecting a hardware platform suitable for TinyML voice recognition projects, ESP32, Arduino Nano 33 BLE Sense, and STM32F746G-DISCO each have their unique advantages and limitations. By analyzing their actual performance, performance data, and pros and cons, you can better understand the applicability of each hardware platform and identify the best use cases.

Summary

ESP32: With its low cost and extensive community support, the ESP32 is very suitable for budget-conscious smart home projects. Its built-in Wi-Fi and Bluetooth capabilities make it perform well in applications requiring wireless communication. However, the ESP32 may lack the performance needed for complex voice recognition tasks and is only suitable for simple application scenarios.
Arduino Nano 33 BLE Sense: This hardware excels in portable and wearable devices due to its multi-sensor integration and low-power design. Its BLE functionality also makes it suitable for low-power wireless communication applications. However, its limited processing power makes it primarily suitable for simple voice recognition and multi-sensor data fusion tasks.
STM32F746G-DISCO: With its powerful processing capability and rich interfaces, the STM32F746G-DISCO is suitable for complex and high-performance voice recognition tasks. Its high stability and dedicated audio interface make it perform excellently in industrial automation and high-demand application scenarios. However, its higher cost and power consumption limit its use in some low-power applications.

Recommendations

Budget-conscious smart home projects: Choose ESP32.
Portable applications requiring multi-sensor data fusion: Choose Arduino Nano 33 BLE Sense.
Industrial automation projects with high-performance demands: Choose STM32F746G-DISCO.

Additionally, for voice recognition projects that do not require complex customization or where quicker implementation is desired, the Gravity: Offline Language Learning Voice Recognition Sensor is an ideal alternative. This sensor is compatible with microcontrollers like Arduino and ESP32, offering powerful offline voice recognition capabilities without the need for complex coding or model training. It is particularly suitable for educational projects, rapid prototyping, and applications that require simple, fast deployment.

Figure: Micro-offline voice recognition sensor setup

Future Outlook

As TinyML technology continues to evolve, more efficient and low-power hardware platforms will emerge, providing more options for voice recognition and other AI applications. Staying informed about technological advancements and selecting the most suitable hardware platform based on specific project requirements will help you achieve success in TinyML projects.

We hope that the analysis and recommendations provided in this article offer valuable guidance for your hardware selection and help you achieve the best results in your voice recognition projects. We will continue to update hardware selection guides for other TinyML projects in the future, so stay tuned.