Skip to main content

LocalAI Realtime Endpoint

note

This is an advanced subject and requires specialized knowledge which is beyond the scope of this documentation to explain in detail. Attempt only if you are familiar with use of the LocalAI system, Docker, and how to use them with third-party systems.

This is a custom engineered OpenAI realtime compatible endpoint which runs on LocalAI's system. Here is an overview for those who want to run the distribution locally.

Voice-to-Voice AI with TTS, STT, and Function Calling. Run locally with no API key required.

Download LocalAI Distribution

Get the pre-configured LocalAI Docker image with all required models and backends (Lite version - optimized for size and performance).

File: localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar

Requirements

  • NVIDIA GPU with 20GB+ VRAM (RTX 3090, RTX 4090, or equivalent required for realtime voice + LLM)
  • Docker Desktop installed with GPU support enabled
  • NVIDIA Container Toolkit installed
  • At least 32GB system RAM and 30GB free disk space
  • Windows 10/11 64-bit (or Linux with NVIDIA drivers)

Load the Docker Image

After downloading the .tar file, load it into Docker:

# Load the image (this may take a few minutes)
docker load -i localai-realtime-lite_v3.9.0_Docker_RS_20260116.tar

Run the Container

Start LocalAI with GPU support and persistent storage:

docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full
warning

First run takes 10-20 minutes. The system will automatically download required AI models (~10GB) and backends. Subsequent starts are faster because models are cached in Docker volumes.

Access the API

Once running, the realtime WebSocket endpoint is available at ws://localhost:8080/v1/realtime.

Configure LLMAI Plugin

In Unreal Engine, set Default Provider to LocalAI and configure the LocalAI profile under Project Settings. Recommended defaults once the server above is running:

  • Endpoint URL: ws://localhost:8080/v1/realtime
  • Default Model: gpt-realtime
  • TTS Model: chatterbox (default)
  • STT Model: whisper (default)

Full Unreal-side setup, settings priority, and shared Realtime options: Configuration reference — LocalAI profile. Quick start: Configuration quick-start.

The LLMAI plugin connects to ws://localhost:8080/v1/realtime with model gpt-realtime for voice-to-voice AI—automatic speech recognition, language model processing, and text-to-speech, all running locally on your machine.

Stopping and Restarting

# Stop the container (Ctrl+C in terminal, or:)
docker stop $(docker ps -q --filter ancestor=localai-realtime:full)

# Restart later (models are cached, starts quickly)
docker run --gpus all -p 8080:8080 \
-v localai-models:/models \
-v localai-backends:/backends \
localai-realtime:full

Troubleshooting

Container Won't Start

  • GPU not detected: Ensure NVIDIA Container Toolkit is installed and Docker Desktop has GPU support enabled
  • Port in use: Check if port 8080 is already being used: netstat -an | findstr 8080
  • Insufficient disk space: Docker volumes need ~10GB for models

Slow First Start

The first startup downloads AI models (~10GB). This is normal and only happens once. Check Docker logs for download progress:

docker logs -f $(docker ps -q --filter ancestor=localai-realtime:full)

Connection Issues from Unreal

  • Verify LocalAI is running by visiting http://localhost:8080 in your browser
  • Ensure the endpoint URL uses ws:// (WebSocket), not http://
  • Check Unreal's Output Log with llmai.debug.EnableAll for detailed connection info