Google has launched Gemini 3.1 Flash Live, a new audio model that delivers more natural and reliable voice conversations. This model is designed to improve the real-time user experience.
AI explained
What improvements does Gemini 3.1 Flash Live bring to AI voice conversations?
Gemini 3.1 Flash Live is Google's latest audio model designed for real-time voice interactions. It offers higher accuracy, lower latency, and better tonal understanding to handle complex multi-step tasks in conversations.
- Summary: The model improves naturalness and reliability in voice dialogues and is accessible via Google AI Studio and Gemini Enterprise.
- Why it matters: It enables developers to build more efficient voice agents that can manage complex tasks and adapt to user emotions.
- Key point: Gemini 3.1 Flash Live supports over 200 countries and multiple languages, enhancing customer service applications.

Gemini 3.1 Flash Live Enhances High-Quality Real-Time Dialogues
The new model, Gemini 3.1 Flash Live, is Google’s most advanced audio model to date. It offers higher accuracy and lower latency, making voice interactions smoother and more natural. Developers can access the model through the Gemini Live API in Google AI Studio, while businesses can use it in Gemini Enterprise for customer service. The model is now available to users in over 200 countries via Search Live and Gemini Live.
Gemini 3.1 Flash Live is specifically designed to handle complex tasks and gives developers the ability to build voice agents that can perform multiple steps simultaneously. It scores 90.8% on ComplexFuncBench Audio, which tests multi-step functions under various conditions. The model also has improved tonal understanding, making it more effective at recognizing speech nuances such as pitch and tempo. This allows it to adapt to user emotions like frustration or confusion, providing a more natural conversational experience.
Implications for the U.S. Market
AIny brief assessment: Gemini 3.1 Flash Live enables U.S. developers to create more efficient voice agents for complex tasks, potentially enhancing customer service experiences across industries like telecommunications and retail. With support for multiple languages, the model can also benefit diverse, multilingual user bases in the American market.
Source: Google
Read the full story in Norwegian
Les på norsk


