Skip to main content

AI providers

The LLMAI Plugin uses the realtime API specification originally published by OpenAI and subsequently implemented by a number of other providers, such as Grok and our own implementation on LocalAI so you can use your own LLM AI completely isolated from cloud providers.

Provider comparison

FeatureOpenAILocalAIGrok
Voice Options6+ voicesCustom (Chatterbox)5 voices (Ara, Rex, Sal, Eve, Leo)
Change Voice Mid-SessionYesYesNo - Must reconnect
VAD Type SupportServer VAD, Semantic VADServer VAD onlyServer VAD only
Text-Only ModeSupportedSupportedClient-side filtered
Input Transcription StreamingDelta eventsDelta eventsCompleted only
Function CallingSupportedSupportedSupported

OpenAI

OpenAI is the original provider of the Realtime endpoint, and the plugin's initial target for specification compatibility. As of August 2025 it has transitioned to GA (non-preview) and LLMAI v2.2 has been updated to be fully compatible, and defaults to gpt-realtime-2 as the model.

OpenAI dropped support for preview models as of May 2026, so if you have older versions, you will need to update to at least v2.2 to be able to connect to their endpoint.

See Provider functions and the provider comparison table above.

Grok (xAI)

Grok supplies its own Realtime API endpoint and enables web search capability. xAI's Server-Side Tools (Agent Tools API) and the xAI responses provider enable autonomous server-side tool execution for web search, X search, code execution, collections search, and remote MCP tools.

You can connect seamlessly to external MCP servers, enabling access to powerful custom third-party tools.

What's distinctive about Grok vs OpenAI here:

The xAI API supports both server-side and client-side tool calling. Server-side tools include web search, X search (real-time data from the X platform), code execution, and file search. X Search is a distinctive capability — other stacks can do web search via MCP or custom tools, but Grok's integration with X gives it live access to X posts, discussions, and developer conversations.

Key difference from OpenAI Realtime

Grok's server-side MCP is on their text/agent API, not a voice Realtime API like OpenAI's. The Grok Voice Agent API follows the OpenAI Realtime protocol (WebSocket connection at wss://api.x.ai/v1/realtime, same event schema), but there is no public documentation confirming server-side MCP handling in that voice layer specifically — that appears to be an OpenAI-only feature for now.

You must set the voice when connecting. See Provider functions and the provider comparison table above.

LocalAI

This LLMAI plugin supports our own OpenAI Realtime spec compatible implementation of the Realtime API running on LocalAI. This enables you to run an AI securely on your own PC or network infrastructure, and supports voice streaming, VAD, STT (via Whisper), TTS (via Chatterbox), and client-side function calling.

It currently does not support server-side function calling or MCP.

You can download the Docker image and use it locally. See the LocalAI Realtime Endpoint document for more information and setup.