Home Machine Learning AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

Machine Learning

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

August 22, 2024

[ad_1]

Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual speech recognition (VSR). We introduce continuous pseudo-labeling for audio-visual speech recognition (AV-CPL), a semi-supervised method to train an audio-visual speech recognition (AVSR) model on a combination of labeled and unlabeled videos with continuously regenerated pseudo-labels. Our models are trained for speech recognition from audio-visual inputs and can perform speech recognition using both audio…

[ad_2]

Source link

AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition

More News

Bravely Default HD Remaster For Nintendo Switch 2 Is Finally Up...

Official Nintendo Playing Cards – All Of The Mario & Zelda Decks Available Now

Nintendo Switch 2 May Record Your Audio And Video Chats

Let's All Speculate Wildly About What Outer Wilds Dev's New Game Is

GTA 6's Trailer 2 Looked Great, And It Wasn't All Cutscenes