LocalAI Realtime API
Docker Required
NVIDIA GPU 20GB+ VRAM Required

LocalAI Realtime API - Quick Start Guide

Voice-to-Voice AI with TTS, STT, and Function Calling — Run Locally, No API Key Required

📦 Download LocalAI Distribution

Get the pre-configured LocalAI Docker image with all required models and backends.

⬇️ Download LocalAI Docker Image

File: localai-realtime-full_v2.0.tar (3.25GB)

1 Requirements

2 Load the Docker Image

After downloading the .tar file, load it into Docker:

# Load the image (this may take a few minutes)
docker load -i localai-realtime-full.tar

3 Run the Container

Start LocalAI with GPU support and persistent storage:

docker run --gpus all -p 8080:8080 \
  -v localai-models:/models \
  -v localai-backends:/backends \
  localai-realtime:full

First run takes 10-20 minutes — The system will automatically download required AI models (~10GB) and backends. Subsequent starts are instant because the models are cached in Docker volumes.

4 Access the API

Once running, these endpoints are available:

http://localhost:8080 Main Web Interface
http://localhost:8080/talk/ Voice Chat Test Page
ws://localhost:8080/v1/realtime Realtime WebSocket API Endpoint

5 Configure LLMAI Plugin

In Unreal Engine, configure the LLMAI plugin to use LocalAI:

  1. Open Edit > Project Settings > Plugins > LLMAI
  2. Set Default Provider to LocalAI
  3. Under LocalAI Provider Settings:

Realtime WebSocket Connection

The LLMAI plugin connects to ws://localhost:8080/v1/realtime with model gpt-realtime for voice-to-voice AI. This provides automatic speech recognition, language model processing, and text-to-speech responses — all running locally on your machine.

6 Stopping & Restarting

# Stop the container (Ctrl+C in terminal, or:)
docker stop $(docker ps -q --filter ancestor=localai-realtime:full)

# Restart later (models are cached, starts quickly)
docker run --gpus all -p 8080:8080 \
  -v localai-models:/models \
  -v localai-backends:/backends \
  localai-realtime:full

Troubleshooting

Container Won't Start

Slow First Start

The first startup downloads AI models (~10GB). This is normal and only happens once. Check Docker logs for download progress:

docker logs -f $(docker ps -q --filter ancestor=localai-realtime:full)

Connection Issues from Unreal




Engineered by