Realtime Voice Chat - Enhance User Experience with Real-Time Communication

If you have the Standard or above plan, you can turn on the Realtime Voice Chat feature. With this feature, your chatbot can communicate with the user directly like person to person talks. Below is a demo of how it works in our website.

Set Up Realtime Voice Chat

The Realtime Voice Chat feature is not enabled by default due to its high cost. To activate it:

Navigate to the Settings page and select the Realtime Audio tab.
Configure the realtime voice avatar that will appear in the center of the chatbot widget.
Set the default voice for initial user interactions. Users can change this later if desired.
Preview voices by clicking the play button.

Realtime Voice Chat Settings

By default, the Enable Record Audio option is turned on, allowing you to playback original voice interactions in your chat logs dashboard. If disabled, you'll only see chat history through transcripts generated by OpenAI's whisper model. Note that there may be discrepancies between these transcripts and the audio understood by OpenAI's realtime API.

Recorded audio interactions are stored securely in our encrypted Azure blob server, counting towards your file storage quota. The Standard plan includes 2.5GB of free storage, while the Professional plan offers 10GB.

Adjustable parameters for realtime voice chat include:

Threshold: Voice activity detection sensitivity (lower values are more sensitive).
Prefix Padding: Duration of audio to include before speech recognition.
Silence Duration: Silent period before the server considers speech ended.
RAG Limit: Number of search results to include in context (fewer results mean less information and reduced input text tokens).

Users can change the voice by clicking the adjust button in the top right corner, which opens the voice selection overlay: Realtime Voice Selection Overlay

You can customize the available languages in the settings.

Customize Voice

By default, the OpenAI realtime voice model provides 10 voices to choose from. You can select the default voice. If you want to use an unique voice owned by you, you can import the customized voice from ElevenLabs to use it directly in our realtime voice chat following the steps below:

Follow the steps in Customize Voice to set up your customized voice in the chatbot.
Choose the Use customize voice option in the realtime voice chat settings.

Cost

OpenAI's realtime API is expensive, so we calculate message costs based on consumed tokens. Unlike text-based interactions, each voice message uses a variable number of tokens. Below is pricing table for different kinds of tokens:

Model	Type	Input	Output
gpt-4-realtime-preview	Text (per 1k tokens)	1 credit	4 credits
	Audio (per 1k tokens)	8 credits	16 credits
gpt-4-realtime-min-preview	Text (per 1k tokens)	0.15 credit	0.5 credits
	Audio (per 1k tokens)	2 credits	4 credits

Unlikey the text based messages, the realtime voice model decides when to retrieve context information using the RAG search tool. By default, we send 10 RAG text chunks (about 3000 input text tokens) to the API. You can reduce this to a minimum of 5 chunks to lower costs.

Based on our testing, the gpt-4-realtime-min-preview model proves more cost-effective than the gpt-4o-mini model, averaging approximately 1 credit per message. This efficiency stems from the realtime voice model's intelligent use of the RAG search tool - it selectively triggers context searches based on conversational needs, unlike text-based models such as gpt-4o-mini which perform RAG searches for every query.

Typically, messages without context retrieval cost around 10 message credits, while those with context cost 30-40 credits.

You can view the message credit cost for each interaction by hovering over the question mark icon in the Chat Logs dashboard.

Chat Logs Dashboard View Tokens And Cost

Lead Information Collection

As users interact directly with the chatbot without a form, the system automatically records name, phone, and email information mentioned during the conversation. This data is sent to your leads dashboard, webhook, etc.

You can modify your chatbot's base prompt to collect basic user information when the conversation starts before providing further service.

The collected information functions similarly to data submitted through a lead collection form. We will send postMessage and webhook events, and the chatbot will retain awareness of this information in future conversations, even if the user refreshes the page.