Home Machine Learning A Variational Framework for Improving Naturalness in Generative Spoken Language Models

Machine Learning

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

July 9, 2025

The success of large language models in text processing has inspired their adaptation to speech modeling. However, since speech is continuous and complex, it is often discretized for autoregressive modeling. Speech tokens derived from self-supervised models (known as semantic tokens) typically focus on the linguistic aspects of speech but neglect prosodic information. As a result, models trained on these tokens can generate speech with reduced naturalness. Existing approaches try to fix this by adding pitch features to the semantic tokens. However, pitch alone cannot fully represent the range…

Source link

A Variational Framework for Improving Naturalness in Generative Spoken Language Models

More News

This Desert Punk Adventure's Focus On Fluids Creates Intriguing Systemic Action

Tony Hawk's Pro Skater 3 + 4 Has A Very Silly SpongeBob Easter Egg

Best Prime Day Blu-Ray Deals: 450 Movies & TV Box Sets (And Counting)

PlayStation Plus Extra/Premium Games For July 2025 Include Cyberpunk 2077

Stalker 2 Is Another Xbox Console Exclusive Coming To PS5