Scalable Real-Time AI 3D Lip Sync
A scalable real-time facial animation pipeline integrating AI-driven audio processing with Unreal Engine MetaHumans.
Project Year
2026
Softwares/Frameworks Used
Unreal Engine (MetaHumans, Pixel Streaming), NVIDIA Audio2Face , Python , WebSockets , Linux (Containerized environment) , TTS APIs (Cartesia / Minimax) , Docker, Cloud GPU Service (GKE/Modal)
Architected an end-to-end real-time pipeline for generating facial animation from streaming audio inputs
Integrated multiple TTS providers (Cartesia, Minimax) with dynamic switching and voice configuration
Implemented audio chunk streaming and real-time processing for low-latency lip-sync generation
Built WebSocket-based communication layer between TTS services and Unreal Engine
Integrated NVIDIA Audio2Face for real-time facial solving and animation driving
Developed Pixel Streaming setup for remote avatar interaction via WebRTC
Scalability & Infrastructure
Containerized Unreal Engine and supporting services using Docker
Deployed pipeline on cloud infrastructure (Modal, GKE)
Designed system to handle multiple avatar instances and concurrent sessions
Implemented session handling and avatar registry system for dynamic avatar loading
Performance Optimization
Achieved ~60 FPS streaming at 720p using L40S GPUs
Reduced cold start time to ~5–6 seconds
Optimized audio resampling pipeline to eliminate cracking and stuttering issues
Improved runtime performance by addressing FPS drops during speech
Pipeline Evolution & Architecture Improvements
Transitioned from signalling server-based architecture to direct WebSocket communication
Built relay system for efficient data flow between frontend and Unreal instances
Implemented dynamic avatar loading (removing hardcoded setups)
Enabled real-time parameter control (voice ID, TTS provider, background, avatar selection)
Avatar System Enhancements
Supported multiple avatars (full-body and half-body variants)
Added dynamic background customization (color & image-based)
Integrated control rigs for facial expressions, blinks, and realism improvements
Monitoring & Debugging
Implemented latency reporting across pipeline stages (TTS, streaming, inference)
Identified and resolved issues related to:
WebRTC connectivity (ICE candidates)
Session disconnections
Audio pipeline inconsistencies
Link to GitHub
No items found.
Other Projects
Thing
Personal project based on Netflix show "Wednesday"
Asset Library
Asset Management and Import Tool
Keyframe Tweener
Keyframe Tweening Tool
Home
Website Design & Development by Ram Yogeshwaran