LocalAI Realtime Endpoint
This is an advanced subject and requires specialized knowledge which is beyond the scope of this documentation to explain in detail. Attempt only if you are familiar with use of the LocalAI system, Docker, and how to use them with third-party systems.
This is a custom engineered OpenAI realtime compatible endpoint which runs on LocalAI's system. Here is an overview for those who want to run the distribution locally.
Voice-to-Voice AI with TTS, STT, and Function Calling. Run locally with no API key required.
Download LocalAI Distribution
Get the pre-configured LocalAI Docker image with all required models and backends (Lite version - optimized for size and performance).
File: localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar
Requirements
- NVIDIA GPU with 20GB+ VRAM (RTX 3090, RTX 4090, or equivalent required for realtime voice + LLM)
- Docker Desktop installed with GPU support enabled
- NVIDIA Container Toolkit installed
- At least 32GB system RAM and 30GB free disk space
- Windows 10/11 64-bit (or Linux with NVIDIA drivers)
Load the Docker Image
After downloading the .tar file, load it into Docker:
# Load the image (this may take a few minutes)
docker load -i localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar
Run the Container
Start LocalAI with GPU support and persistent storage:
docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full
First run takes 10-20 minutes. The system will automatically download required AI models (~10GB) and backends. Subsequent starts are faster because models are cached in Docker volumes.
Access the API
Once running, the realtime WebSocket endpoint is available at ws://localhost:8080/v1/realtime.
Configure LLMAI Plugin
In Unreal Engine, set Default Provider to LocalAI and configure the LocalAI profile under Project Settings. Recommended defaults once the server above is running:
- Endpoint URL:
ws://localhost:8080/v1/realtime - Default Model:
gpt-realtime - TTS Model:
chatterbox(default) - STT Model:
whisper(default)
Full Unreal-side setup, settings priority, and shared Realtime options: Configuration reference — LocalAI profile. Quick start: Configuration quick-start.
The LLMAI plugin connects to ws://localhost:8080/v1/realtime with model gpt-realtime for voice-to-voice AI—automatic speech recognition, language model processing, and text-to-speech, all running locally on your machine.
Stopping and Restarting
# Stop the container (Ctrl+C in terminal, or:)
docker stop $(docker ps -q --filter ancestor=localai-realtime:full)
# Restart later (models are cached, starts quickly)
docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full
Troubleshooting
Container Won't Start
- GPU not detected: Ensure NVIDIA Container Toolkit is installed and Docker Desktop has GPU support enabled
- Port in use: Check if port 8080 is already being used:
netstat -an | findstr 8080 - Insufficient disk space: Docker volumes need ~10GB for models
Slow First Start
The first startup downloads AI models (~10GB). This is normal and only happens once. Check Docker logs for download progress:
docker logs -f $(docker ps -q --filter ancestor=localai-realtime:full)
Connection Issues from Unreal
- Verify LocalAI is running by visiting
http://localhost:8080in your browser - Ensure the endpoint URL uses
ws://(WebSocket), nothttp:// - Check Unreal's Output Log with
llmai.debug.EnableAllfor detailed connection info