The Realtime Assistants API is currently in beta. Features and specifications may change as we continue to improve the platform.
What are Realtime Assistants?
Realtime Assistants are AI-powered voice agents that can engage in natural, real-time conversations with users. The key benefit is easily creating voice assistants with UpliftAI models, agent hosting, and WebRTC delivery in frontend, mobile, or web apps. They provide:- End-to-end latency of ~1 second for natural conversations (depends on model choices etc.)
- Natural conversation flow with interruption handling
- Multi-modal capabilities supporting voice, text, and custom tools
- Dynamic configuration for real-time behavior updates
- update tools available for the agent mid session
- completely update the agent prompt mid session
- Scalable infrastructure supporting thousands of concurrent sessions
Key Features
Voice-First Design
Natural speech recognition and synthesis with support for multiple languages and voices
Custom Tools
Extend your assistant with custom functions that can access external APIs and services
Real-time Updates
Update instructions and tools on the fly without restarting sessions
Easy Integration
Simple SDKs for React, JavaScript, and mobile platforms
Use Cases
Realtime Assistants are perfect for:- Customer Support - 24/7 voice-enabled support agents
- Virtual Receptionists - Automated call handling and routing
- Educational Tutors - Interactive learning experiences
- Healthcare Assistants - Patient intake and appointment scheduling
- Sales Agents - Product demonstrations and lead qualification
- Personal Assistants - Task management and information retrieval
Architecture Overview
Provider Support
Speech-to-Text (STT)
- Groq Whisper (recommended for Pakistani languages, whisper-large-v3)
- Deepgram (
nova-3recommended for English) - OpenAI:
gpt-4o-transcribeorgpt-4o-mini-transcribe - UpliftAI The best Pakistani STT coming soon!
Text-to-Speech (TTS)
- UpliftAI Orator (ultra-fast, natural voices - supports Urdu, Sindhi, Balochi)
- See available voices
- OpenAI (standard voices), use model
gpt-4o-mini-tts
Language Models (LLM)
- Groq (recommended)
openai/gpt-oss-120b(best quality)openai/gpt-oss-20b(faster responses)
- OpenAI GPT-4 (alternative)
Getting Started
Next Steps
Read the Concepts
Understand how Realtime Assistants work under the hood
Try the Tutorial
Build your first voice assistant in 10 minutes
Explore the API
Deep dive into the API endpoints
View Examples
Check out our example implementations
