Commercial desktop AI robots/companions have been around for 2-3 years. They can move and converse and have emotions. This Robot Party video demonstrates how fun they can be. Recently (late 2025 and early 2026), many implementations of cute little Chatbots also appeared in YouTube. Different from commercial AI bots, there are open source codes/programs that Makers could make. As a maker myself, I found that I have plenty of suitable micro-controllers and modules to build one, so I did.... well just a prototype of simple chat. Key components are:
ESP32-S3 Supermini - https://www.aliexpress.com/item/1005007785809540.html (this one has 2M PSRAM)
INMP441 I2S microphone module - https://www.aliexpress.com/item/1005010312315549.html
MAX98357 I2S amplifier module - https://www.aliexpress.com/item/1005009482707515.html
A speaker - 4~8 Ohm 2~3W works well. The one in the video is a 8 Ohm 3W speaker - https://www.aliexpress.com/item/1005010152394694.html
A Power supply model - https://www.aliexpress.com/item/32382955228.html
A breadboard, a push button, jumper wires, and 9~12V power supply.
Please watch my video below to see my prototype.
Different from the popular XiaoZhi bots or others that use Deepseek AI, I chose to make my own. Thanks to AI (ChatGPT 5.2), I was able to generate working codes (over 1000 lines) for this project. However, there were quite steep learning curves to get it work. I gave a first long request to ChatGPT but ended up with too many bugs and a day spent. Therefore, I used my computational thinking skills to break it down to smaller problems/projects and finnally was able to achieve.
This project makes sure the MAX98357A amplifier module and speaker work.
Originally I ask from ChatGPT for codes to play a WAV file. This requires the ESP32-S3's PSRAM to store the WAV file. I then learnt that some of the ESP32-S3 supermini boards do not have PSRAM. So when purchasing, you should pay attention to its specs and I suggest to purchase the ones that have 4M flash and 2M PSRAM. I could get one for as cheap as A$3.-
I then also tried using a microSD module to supply a WAV file, but later just to simply with some coded digital sounds. In the video, you can hear 4 generated pure wave digital sounds (no noises except one is noise :-)) by pressing a push button to rotate playing.
The codes in Arduino IDE can be downloaded here. You could workout the wiring from the codes. I suggest choosing the ESP32S3 Dev Module as the board or the Adafruit Feather ESP32-S3 with or without 2M PSRAM depending on yours. If choosing ESP32S3 Dev Module, please also use the following parameters:
USB CDC On Boot: "Enabled"
Flash Size: "4 MB (32Mb)"
Partition Scheme: "Hugh APP (3MB No OTA/1MB SPIFFS)"
PSRAM: "Disabled" or if yours has 2MB PSRAM, select "QSPI PSRAM". In this project, I used an ESP32-S3 supermini without PSRAM.
From this project, I also tested a range of 8 Ohm speakers with output power ranging from 3W ~ 0.5W (from left to right). The middle 4 are all 2W with different diameters. All could play clear sounds but with different loudness. I found the 8 Ohm 2W 28mm one has good clean and loudness sounds when playing pure wave digital sounds. Speakers can draw large current so should not be powered directly from the micro-controller.
When purchasing the MAX98357A, I bulk purchased 10 for just less than A$10. This project allows me to quickly test each one of them. In the codes though, I did not utilise its GAIN pin to control the volume. If you are making one of this project, perhaps you could implement a rotary pot to adjust the speaker volume.
This project combines INMP441 microphone module and MAX98357A amplifier module to make a voice recorder and playback recorded voice. It can test whether the mic and amplifier function normally and this is a key step to ensure a good quality voice recording to be sent to AI in next project.
For the voice recording in this project, the PSRAM is required to store the audio data. Therefore, I used an ESP32-S3 supermini with 2M PSRAM. If buying, I suggest buying ones with PSRAM (link provided at the beginning of this page).
As shown in the video, I used two push buttons to record and play voice. When the red button is pressed, the onboard LED (WS2818) of ESP32-S3 will turn red to indicate recording and the recording time is set for 4 seconds in the program. Green button, when pressed will play the recorded voice/audio and the onboard LED will turn green. Wiring information could be found in the codes here. In this project, I learnt that the I2S pins on ESP32-S3 (virtually any pins) can be shared between two I2S devices (both INMP441 and MAX98357A are I2S devices).
Again, volume control is not implemented in this project. What else do you have in mind to remix and/or tinker this project?
With the success of the previous two projects (A and B), I then specifically ask AI (ChatGPT) to base on the voice recorder and player codes to create an AI chatbot program. However, even with the base recording and playing audio working, an AI chatbot is much more complex than just recording and playing audio. First, I specified to use ChatGPT and openAI API, so the AI gave me instructions to register a free developer account in OpenAI Platform and create an API key. An API key is a long string of some154 characters starting with sk-proj-........., and together with the API key, you will also need a WiFi AP and password to work. ESP32-S3 supports only 2.4Ghz WiFi so make sure your WiFi AP is 2.4GHz. The WiFi credential and API key are to be included in a secrets.h file in the project folder. You can download the full codes here.
To develop a chatbot using OpenAI API is not free. After first few connections, in your OpenAI Platform, it will prompt you to deposit fund. I did and deposited A$10, and after many testing conversations (e.g., more than a hundred Q&As), it now costed about A$1.2 -- not too bad. By the way, in OpenAI, you have a few models to choose from. I chose the cheapest model "gpt-40-mini" for this AI chatbot experiment. It works well but this model has data only up to October 2023.
In the video below, it demonstrates the chat workflow through operation and Arduino IDE's Serial monitor. One thing I forgot to demonstrate is that you can enter your questions in the Serial Monitor and get both text and audio reply.
At the end of video, I tried to switch off the power but because the ESP32 is connected to computer so the system is still working but not recommended because the speaker may draw too much current from the ESP32 and may damage the ESP32.
To this stage, it is already quite exciting but more to be done from this current Project C. Please leave a message below and let me know what's next you want to remix and tinker from this project. Any suggestions and questions are welcome.