Voice-to-Voice AI with TTS, STT, and Function Calling — Run Locally, No API Key Required
Get the pre-configured LocalAI Docker image with all required models and backends.
⬇️ Download LocalAI Docker ImageFile: localai-realtime-full_v2.0.tar (3.25GB)
After downloading the .tar file, load it into Docker:
# Load the image (this may take a few minutes)
docker load -i localai-realtime-full.tar
Start LocalAI with GPU support and persistent storage:
docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full
First run takes 10-20 minutes — The system will automatically download required AI models (~10GB) and backends. Subsequent starts are instant because the models are cached in Docker volumes.
Once running, these endpoints are available:
In Unreal Engine, configure the LLMAI plugin to use LocalAI:
LocalAIws://localhost:8080/v1/realtimegpt-realtimechatterbox (default)whisper (default)The LLMAI plugin connects to ws://localhost:8080/v1/realtime with model gpt-realtime for voice-to-voice AI. This provides automatic speech recognition, language model processing, and text-to-speech responses — all running locally on your machine.
# Stop the container (Ctrl+C in terminal, or:)
docker stop $(docker ps -q --filter ancestor=localai-realtime:full)
# Restart later (models are cached, starts quickly)
docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full
netstat -an | findstr 8080The first startup downloads AI models (~10GB). This is normal and only happens once. Check Docker logs for download progress:
docker logs -f $(docker ps -q --filter ancestor=localai-realtime:full)
http://localhost:8080 in your browserws:// (WebSocket), not http://llmai.debug.EnableAll for detailed connection info