Advanced Features
Realtime Voice Chat
If you have the Standard or above plan, you can turn on the Realtime Voice Chat feature. With this feature, your chatbot can communicate with the user directly like person to person talks. Below is a demo of how it works in our website.
Set Up Realtime Voice Chat
The Realtime Voice Chat feature is not enabled by default due to its high cost. To activate it:
- Navigate to the Settings page and select the Realtime Audio tab.
- Configure the realtime voice avatar that will appear in the center of the chatbot widget.
- Set the default voice for initial user interactions. Users can change this later if desired.
- Preview voices by clicking the play button.
By default, the Enable Record Audio option is turned on, allowing you to playback original voice interactions in your chat logs dashboard. If disabled, you'll only see chat history through transcripts generated by OpenAI's whisper model. Note that there may be discrepancies between these transcripts and the audio understood by OpenAI's realtime API.
Recorded audio interactions are stored securely in our encrypted Azure blob server, counting towards your file storage quota. The Standard plan includes 2.5GB of free storage, while the Professional plan offers 10GB.
Adjustable parameters for realtime voice chat include:
- Threshold: Voice activity detection sensitivity (lower values are more sensitive).
- Prefix Padding: Duration of audio to include before speech recognition.
- Silence Duration: Silent period before the server considers speech ended.
- RAG Limit: Number of search results to include in context (fewer results mean less information and reduced input text tokens).
Users can change the voice by clicking the adjust button in the top right corner, which opens the voice selection overlay:
You can customize the available languages in the settings.
Cost
OpenAI's realtime API is expensive, so we calculate message costs based on consumed tokens. Unlike text-based interactions, each voice message uses a variable number of tokens. Below is pricing table for different kinds of tokens:
Model | Type | Input | Output |
---|---|---|---|
gpt-4-realtime-preview | Text (per 1k tokens) | 1 credit | 4 credits |
Audio (per 1k tokens) | 8 credits | 16 credits | |
gpt-4-realtime-min-preview | Text (per 1k tokens) | 0.15 credit | 0.5 credits |
Audio (per 1k tokens) | 2 credits | 4 credits |
Unlikey the text based messages, the realtime voice model decides when to retrieve context information using the RAG search tool. By default, we send 10 RAG text chunks (about 3000 input text tokens) to the API. You can reduce this to a minimum of 5 chunks to lower costs.
Based on our testing, the gpt-4-realtime-min-preview
model proves more cost-effective than the gpt-4o-mini
model, averaging approximately 1 credit per message. This efficiency stems from the realtime voice model's intelligent use of the RAG search tool - it selectively triggers context searches based on conversational needs, unlike text-based models such as gpt-4o-mini which perform RAG searches for every query.
Typically, messages without context retrieval cost around 10 message credits, while those with context cost 30-40 credits.
You can view the message credit cost for each interaction by hovering over the question mark icon in the Chat Logs dashboard.
Lead Information Collection
As users interact directly with the chatbot without a form, the system automatically records name, phone, and email information mentioned during the conversation. This data is sent to your leads dashboard, webhook, etc.
You can modify your chatbot's base prompt to collect basic user information when the conversation starts before providing further service.
The collected information functions similarly to data submitted through a lead collection form. We will send postMessage and webhook events, and the chatbot will retain awareness of this information in future conversations, even if the user refreshes the page.