Forum >DFR1154 AI Camera : ChatGPT/DeepSeek Multimodal Demo
Troubleshooting

DFR1154 AI Camera : ChatGPT/DeepSeek Multimodal Demo

userHead Jean-Philippe.Encausse 2025-03-18 16:35:07 85 Views1 Replies

Hello,

The DFR1154 (aka AI Camera) seems to be a great camera with Microphone, Speaker, and Camera + AI on board. Unfortunatly there is absolutly NO real source code or demo. Lot's of StorryTelling but no real content compared to Lilygo that always provide example in a real proejct of PlatformIO.

 

I'm looking for someone who can share a code matching the described documentation

1. a Trigger (button) => Record the audio

2. Store the audio in mermory or on SDCard

3. Send the audio to perform Speech2Text

4. Send the Text to perform LLM Query (DeepSeek, ChatGpt)

5. Send the Answer to perform Text2Speech

6. Playback the audio

 

Same question sending an Image.

It seems OpenAI is working on it but without explanation (https://github.com/openai/openai-realtime-embedded-sdk)

 

It's a basic AI Assistant Usecase that exist from years, that everybody is looking for. I don't understand why there is no getting started code ? Does the ESP-32 boards are really ready for it ? I found some startup working on it so it should works.

 

Any help ? Best Regards !

 

2025-03-19 09:24:18

This code can implement the OpenAI WebRTC functionality, but it is under the ESP-IDF environment (https://github.com/DFRobot/openai-realtime-embedded-sdk); This example can implement the OpenAI image question-answering functionality and can run under Arduino (https://wiki.dfrobot.com/SKU_DFR1154_ESP32_S3_AI_CAM#target_9). 

userHeadPic erhahaha