On May 13, 2024, Open AI had its much-anticipated Spring demo. Prior to the demo, Sam Altman dialed down expectations by saying that there would be no news about ChatGPT-5. Yet he could not help and comment that they will present something that “feels like magic.”
I have compiled information not only from the live demo, but also from their online documents that OpenAI published to scour some missing details. I am happy to present what I gathered.
TLDR:
OpenAI is rolling out new ChatGPT model called ChatGPT-4o. (the “o” stand for “omni.”)
It doesn’t claim to outperform ChatGPT-4 Turbo in analytical functions. (It has “ChatGPT-4 level” performance, according to OpenAI.) However, it is intended to offer superior multi-modal performance (image and voice.)
GPT-4o is available to the FREE users of ChatGPT. But hold onto your horses and ChatGPT3.5. Rate limits will apply to how much free users can use GPT-4o. Once the limit is reached, the free version of ChatGPT will regress back to using ChatGPT3.5.
ChatGPT-4o will be used to power Custom GPTs for free users, finally unleashing them to the masses without ChatGPT Plus accounts.
What made the real impression during the demonstration was the silk-smooth speech chat performance.
Mobile apps with this voice chat capability will be available. A desktop app for Macs will be available. A desktop app for Windows will be available this year. From their presentation, it shows a potential to outshine Microsoft’s Copilot.
As of this writing, ChatGPT-4o is live on ChatGPT.
It is also available via OpenAI’s API. As the cost for using GPT-4o is half the cost of using GPT-4-turbo, it would make sense to migrate applications using GPT-4 to GPT-4o. Nevertheless, if you are hoping that OpenAI’s latest offering has more up-to-date knowledge, you will be disappointed. GPT-4o’s knowledge base doesn’t seem to be any more recent than that of GPT-4-Turbo.
Also, GPT-4o’s multi-modal capabilities are not available to the general API users. OpenAI’s website says: “We plan to launch support for GPT-4o's new audio and video capabilities to a small group of trusted partners in the API in the coming weeks.“
My Takes: No AGI Breakthrough, but A Terrific Presentation
OpenAI ‘s Spring demo claimed no scientific breakthrough above and beyond what we have seen before. Looks like we have to wait further on the announcement of ChatGPT-5.
However, OpenAI showed its intention to expand the availability of “ChatGPT4 level AI” to the masses. This move makes sense as competitors such as Google start to offer free chatbots that approach the performances of ChatGPT-4. On the other hand, it brings up the question of whether ChatGPT Plus subscription is worth the money for light users if the free version offers the same model.
At the same time, opening up Custom GPTs to free users finally makes Custom GPTs a viable platform for larger audience.
Giving ChatGPT Voices and Personality
Technologically, the most impressive part of the presentation was the voice chat using synthesized voices. There was no noticeable lag between the questions and responses. The synthesized voice sounded fully natural, resembling a real, vibrant and loquacious person. With simple prompts, the synthesized voice modified itself to mimic real-life emotions and dramatic characters. The experience appeared far superior to what’s offered by leading text to speech services such as Eleven Labs.
I used to assume that synthesized voices can’t replace talented voice actors yet. Now the assumption goes out the window.
Again, there was no new “magical” technology we haven’t seen before. We have seen image analysis and translations done by LLMs before. But the presentation using synthesized voice and real time interaction made the whole presentation “feel like magic” as Sam Altman claimed.
Currently we don’t have hands-on ways to test the same voice synthesis feature used during the presentation. I would be excited to test the feature when it’s available, especially when the feature is accessible through the API.
While the competitors are catching up, if not surpassing, the technical capability of ChatGPT, OpenAI seems serious about improving its presentation and applications. We will look forward to the impact of the voice chat features and new apps on consumers and casual users.