Nari Labs’ Dia: A Game-Changing AI Model for Podcast-Style Audio from Two Undergrads

May 2, 2025

4

min read

T

wo undergraduate students from South Korea, with just three months of AI experience, have unveiled Dia, an open-source AI model that generates podcast-style audio clips rivaling Google’s NotebookLM.

Developed under the banner of Nari Labs, Dia is a 1.6-billion-parameter model that offers unprecedented control over synthetic speech, allowing users to customize voice tones, add disfluencies, laughs, coughs, and other nonverbal cues.

Available on Hugging Face and GitHub, this model is poised to shake up the rapidly growing synthetic speech market.

The Rise of Synthetic Speech

The market for voice AI tools is booming. According to PitchBook, startups in this space secured over $398 million in venture capital funding last year. Industry giants like ElevenLabs dominate, but smaller players like PlayAI and Sesame are carving out their niches.

Investors are betting big on the potential of synthetic speech for applications ranging from content creation to virtual assistants. Into this dynamic landscape steps Nari Labs, a newcomer with a bold vision.

Nari Labs’ Journey: From Novices to Innovators

Toby Kim, one of Nari Labs’ co-founders, shared that he and his partner were inspired by Google’s NotebookLM but sought to create a model with greater flexibility in voice customization and script control.

With no prior expertise in speech AI, they leveraged Google’s TPU Research Cloud, which provides free access to powerful AI chips, to train Dia. In just three months, they built a model that can generate dialogue from scripts and run on modern PCs with at least 10GB of VRAM.

Dia stands out for its versatility. Users can generate random voices or prompt the model with specific style descriptions. It can even clone voices, a feature that, while impressive, raises ethical concerns.

The Power of Parameters

At 1.6 billion parameters, Dia is a robust model. In AI, parameters are the internal variables that enable models to make predictions, and larger parameter counts often correlate with better performance.

For context, Dia’s scale places it in a competitive range for specialized tasks like speech generation, though it’s smaller than some general-purpose models. Its accessibility—requiring only moderate hardware—makes it a practical tool for developers and creators.

Nari Labs’ Vision for the Future

Looking ahead, Nari Labs aims to build a synthetic voice platform with a “social aspect” layered atop Dia and future, larger models. The team plans to release a technical report detailing Dia’s development and expand its language support beyond English, potentially broadening its global reach.

By maintaining an open-source approach, Nari Labs is fostering a community of developers and creators to experiment with and enhance Dia.

Why Dia Matters

Nari Labs’ achievement is a testament to the democratizing power of AI tools and resources like Google’s TPU Research Cloud.

Two undergrads with minimal experience have created a model that competes with industry leaders, proving that innovation can come from unexpected places. However, Dia’s release also underscores the need for responsible AI development.

As synthetic speech tools become more accessible, the industry must grapple with ethical challenges, from misuse to copyright concerns.

For now, Dia is a bold step forward, offering creators a powerful, flexible tool to craft compelling audio content. As Nari Labs continues to refine its technology and expand its ambitions, the AI community—and the synthetic speech market—will be watching closely.

Posted

May 2, 2025

in

Digital Learning