UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Text-to-Image (T2I) diffusion models have shown impressive results in generating visually compelling images following user prompts. Building on this, various methods further fine-tune...
Exploring Empty Spaces: Human-in-the-Loop Data Augmentation
Data augmentation is crucial to make machine learning models more robust and safe. However, augmenting data can be challenging as it requires generating...
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2025
Source link
VibE: A Visual Analytics Workflow for Semantic Error Analysis of CVML Models at Subgroup...
Effective error analysis is critical for the successful development and deployment of CVML models. One approach to understanding model errors is to summarize...
SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs...
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across all tokens in...
Towards Automatic Assessment of Self-Supervised Speech Models Using Rank
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the...
DR-MPC: Deep Residual Model Predictive Control for Real-World Social Navigation
How can a robot safely navigate around people with complex motion patterns? Deep Reinforcement Learning (DRL) in simulation holds some promise, but much...
Towards AI-Driven Sign Language Generation with Non-Manual Markers
Sign languages are essential for the Deaf and Hard-of-Hearing (DHH) community. Sign language generation systems have the potential to support communication by translating...
When Does a Predictor Know Its Own Loss?
Given a predictor and a loss function, how well can we predict the loss that the predictor will incur on an input? This...