DFR1154 AI Camera : ChatGPT/DeepSeek Multimodal Demo

Hello,
The DFR1154 (aka AI Camera) seems to be a great camera with Microphone, Speaker, and Camera + AI on board. Unfortunatly there is absolutly NO real source code or demo. Lot's of StorryTelling but no real content compared to Lilygo that always provide example in a real proejct of PlatformIO.
I'm looking for someone who can share a code matching the described documentation
1. a Trigger (button) => Record the audio
2. Store the audio in mermory or on SDCard
3. Send the audio to perform Speech2Text
4. Send the Text to perform LLM Query (DeepSeek, ChatGpt)
5. Send the Answer to perform Text2Speech
6. Playback the audio
Same question sending an Image.
It seems OpenAI is working on it but without explanation (https://github.com/openai/openai-realtime-embedded-sdk)
It's a basic AI Assistant Usecase that exist from years, that everybody is looking for. I don't understand why there is no getting started code ? Does the ESP-32 boards are really ready for it ? I found some startup working on it so it should works.
Any help ? Best Regards !
This code can implement the OpenAI WebRTC functionality, but it is under the ESP-IDF environment (https://github.com/DFRobot/openai-realtime-embedded-sdk); This example can implement the OpenAI image question-answering functionality and can run under Arduino (https://wiki.dfrobot.com/SKU_DFR1154_ESP32_S3_AI_CAM#target_9).
