LocalAI Realtime API - Quick Start Guide

Voice-to-Voice AI with TTS, STT, and Function Calling — Run Locally, No API Key Required

📦 Download LocalAI Distribution

Get the pre-configured LocalAI Docker image with all required models and backends (Lite version - optimized for size and performance).

⬇️ Download LocalAI Docker Image (v3.9.0 Lite)

File: localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar

1 Requirements

NVIDIA GPU with 20GB+ VRAM (RTX 3090, RTX 4090, or equivalent required for realtime voice + LLM)
Docker Desktop installed with GPU support enabled
NVIDIA Container Toolkit installed
At least 32GB system RAM and 30GB free disk space
Windows 10/11 64-bit (or Linux with NVIDIA drivers)

2 Load the Docker Image

After downloading the .tar file, load it into Docker:

# Load the image (this may take a few minutes)
docker load -i localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar

3 Run the Container

Start LocalAI with GPU support and persistent storage:

docker run --gpus all -p 8080:8080 \
  -v localai-models:/models \
  -v localai-backends:/backends \
  localai-realtime:full

⏳

First run takes 10-20 minutes — The system will automatically download required AI models (~10GB) and backends. Subsequent starts are instant because the models are cached in Docker volumes.

4 Access the API

Once running, these endpoints are available:

http://localhost:8080 Main Web Interface

http://localhost:8080/talk/ Voice Chat Test Page

ws://localhost:8080/v1/realtime Realtime WebSocket API Endpoint

5 Configure LLMAI Plugin

In Unreal Engine, configure the LLMAI plugin to use LocalAI:

Open Edit > Project Settings > Plugins > LLMAI
Set Default Provider to LocalAI
Under LocalAI Provider Settings:
- Endpoint URL: ws://localhost:8080/v1/realtime
- Default Model: gpt-realtime
- TTS Model: chatterbox (default)
- STT Model: whisper (default)

Realtime WebSocket Connection

The LLMAI plugin connects to ws://localhost:8080/v1/realtime with model gpt-realtime for voice-to-voice AI. This provides automatic speech recognition, language model processing, and text-to-speech responses — all running locally on your machine.

6 Stopping & Restarting

# Stop the container (Ctrl+C in terminal, or:)
docker stop $(docker ps -q --filter ancestor=localai-realtime:full)

# Restart later (models are cached, starts quickly)
docker run --gpus all -p 8080:8080 \
  -v localai-models:/models \
  -v localai-backends:/backends \
  localai-realtime:full

Troubleshooting

Container Won't Start

GPU not detected: Ensure NVIDIA Container Toolkit is installed and Docker Desktop has GPU support enabled
Port in use: Check if port 8080 is already being used: netstat -an | findstr 8080
Insufficient disk space: Docker volumes need ~10GB for models

Slow First Start

The first startup downloads AI models (~10GB). This is normal and only happens once. Check Docker logs for download progress:

docker logs -f $(docker ps -q --filter ancestor=localai-realtime:full)

Connection Issues from Unreal

Verify LocalAI is running by visiting http://localhost:8080 in your browser
Ensure the endpoint URL uses ws:// (WebSocket), not http://
Check Unreal's Output Log with llmai.debug.EnableAll for detailed connection info

📚 Related Documentation

Engineered by