Interspeech 2024
Interspeech 2024
Source link
Classifier-Free Guidance Is a Predictor-Corrector
We investigate the unreasonable effectiveness of classifier-free guidance (CFG).
CFG is the dominant method of conditional sampling for text-to-image diffusion models, yet
unlike other aspects...
Apple Workshop on Privacy-Preserving Machine Learning 2024
At Apple, we believe privacy is a fundamental human right. It’s also one of our core values, influencing both our research and the...
Positional Description for Numerical Normalization
We present a Positional Description Scheme (PDS) tailored for digit sequences, integrating placeholder value information for each digit. Given the structural limitations of...
AV-CPL: Continuous Pseudo-Labeling for Audio-Visual Speech Recognition
Audio-visual speech contains synchronized audio and visual information that provides cross-modal supervision to learn representations for both automatic speech recognition (ASR) and visual...
Novel-View Acoustic Synthesis From 3D Reconstructed Rooms
We investigate the benefit of combining blind audio recordings with 3D scene information for novel-view acoustic synthesis. Given audio recordings from 2-4 microphones...
ReALM: Reference Resolution as Language Modeling
Reference resolution is an important problem, one that is essential to understand and successfully handle contexts of different kinds. This context includes both...
On the Benefits of Pixel-Based Hierarchical Policies for Task Generalization
Reinforcement learning practitioners often avoid hierarchical policies, especially in image-based observation spaces. Typically, the single-task performance improvement over flat-policy counterparts does not justify...
APE: Active Prompt Engineering – Identifying Informative Few-Shot Examples for LLMs
Prompt engineering is an iterative procedure that often requires extensive manual efforts to formulate suitable instructions for effectively directing large language models (LLMs)...
Can You Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
Self-supervised features are typically used in place of filter-bank features in speaker verification models. However, these models were originally designed to ingest filter-banks...