To enhance your browsing experience, we use cookies. By continuing, you agree to our use of cookies.

Building efficient and accessible
AI solutions

Latest model (November 25, 2024)

OuteTTS 0.2 500M

OuteTTS-0.2-500M Advanced Text-to-Speech Model with Multilingual Support

Model Description: OuteTTS-0.2-500M is our improved successor to the v0.1 release. The model maintains the same approach of using audio prompts without architectural changes to the foundation model itself. Built upon the Qwen-2.5-0.5B, this version was trained on larger and more diverse datasets, resulting in significant improvements across all aspects of performance.

Read more in our blog →

Teaching Language Models to Speak via Audio Tokens and Forced Alignment

Abstract: We present OuteTTS, a novel approach to text-to-speech synthesis that leverages pure language modeling without the need for external adapters or complex architectures. Our 350M parameter model demonstrates that high-quality speech synthesis is achievable through a straightforward approach using crafted prompts and audio tokens.

Introduction: Text-to-speech synthesis has traditionally relied on complex architectures and specialized models. With OuteTTS, we demonstrate that a relatively small language model can learn to generate high-quality speech through a simple yet effective approach. Our model, with just 350M parameters, showcases the potential of using language models directly for speech synthesis.

Read more in our blog →

Our Models

OuteTTS-0.2-500M

A successor to the v0.1 release with significant improvements across all aspects of performance.

OuteTTS-0.1-350M

Teaching Language Models to Speak via Audio Tokens and Forced Alignment

Lite Oute 2 Mamba2Attn

The Lite Oute 2 Mamba2Attn 250M is our latest third-generation model, showcasing the Mamba2 architecture with attention layers.

Lite Oute 1

Lite Oute 1 300M and Ultra-Compact 65M parameters models, offering versatility and efficiency for various deployment scenarios