It’s been a tumultuous week for OpenAI, filled with executive departures and major fundraising developments, but the startup is back at work, trying to get developers to build tools with its AI models at DevDay 2024. Tuesday the company announced several new tools, including a public beta of its “Realtime API,” for building apps with low-latency AI-generated voice responses. It’s not quite ChatGPT’s advanced voice mode, but it’s close.
In a briefing with reporters before the event, OpenAI Chief Product Officer Kevin Weil said the recent departures of Chief Technology Officer Mira Murati and Chief Research Officer Bob McGrew will not affect the company’s progress.
“I’ll start by saying that Bob and Mira were amazing leaders. I have learned a lot from them and they are instrumental in getting us to where we are today,” Weil said. “And besides, we won’t slow down.”
As OpenAI undergoes yet another executive review — a reminder of the turbulence that followed last year’s DevDay — the company is trying to convince developers that it still offers the best platform on which to build AI apps. Leaders say the startup has more than 3 million developers developing with its AI models, but OpenAI operates in an increasingly competitive space.
OpenAI noted that it has reduced the cost for developers to access its API by 99% over the past two years, although it was likely forced to do so by competitors like Meta and Google continually lowering their prices.
One of OpenAI’s new features, called Realtime API, will give developers the ability to create near-real-time text-to-speech experiences in their apps, with the ability to use six voices provided by OpenAI. These voices are distinct from those offered for ChatGPT, and developers cannot use third-party voices to avoid copyright issues. (The voice ambiguously based on Scarlett Johansson’s is not available anywhere.)
During the briefing, OpenAI’s head of developer experience, Romain Huet, shared a demo of a travel planning app built with the Realtime API. The application allowed users to verbally speak to an AI assistant about an upcoming trip to London and get low-latency responses. The Realtime API also has access to a number of tools, so the app was able to annotate a map with restaurant locations as it responded.
At another point, Huet showed how the Realtime API could talk to a human on the phone to ask about ordering food for an event. Unlike Google’s infamous Duo, OpenAI’s API can’t call restaurants or stores directly; however, for this purpose, it can integrate with calling APIs like Twilio. In particular, OpenAI is Not adding information so that its AI models automatically identify themselves during calls like this, despite the fact that these AI-generated voices sound quite realistic. For now, it appears it’s the developers’ responsibility to add this disclosure, something that may be required by a new California law.
As part of the DevDay announcements, OpenAI also introduced vision tuning to its API, which will allow developers to use images and text to optimize their GPT-4o applications. This should, in theory, help developers improve GPT-4o’s performance for tasks involving visual comprehension. OpenAI Product API Lead Olivier Godement tells TechCrunch that developers won’t be able to upload copyrighted images (like a photo of Donald Duck), images that depict violence, or other images that violate OpenAI’s security policies. OpenAI.
OpenAI is racing to match what its competitors already offer in the AI model licensing space. Its instant caching feature is similar to one Anthropic launched several months ago, allowing developers to cache frequently used context between API calls, reducing costs and improving latency. OpenAI says developers can save 50% using this feature, while Anthropic promises a 90% discount.
Finally, OpenAI offers a model distillation feature to allow developers to use larger AI models, such as o1-preview and GPT-4o, to fine-tune smaller models such as GPT-4o mini. Running smaller models generally offers cost savings over running larger models, but this feature should allow developers to improve the performance of these small AI models. As part of model distillation, OpenAI is launching a beta evaluation tool so developers can measure tuning performance within the OpenAI API.
DevDay might cause a stir for what wasn’t announced: for example, no news about the GPT Store was announced on last year’s DevDay. Last we heard, OpenAI had been experimenting with a revenue-sharing program with some of GPT’s most popular creators, but the company hasn’t announced much since then.
Additionally, OpenAI says it won’t release any new AI models on DevDay this year. Developers waiting for OpenAI o1 (not the preview or mini version) or the startup’s video generation model, Sora, will have to wait a little longer.
#OpenAI #DevDay #Delivers #Realtime #API #Surprises #App #Developers #TechCrunch