Keyframer: Empowering Animation Design using Large Language Models
Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E...
Resource-constrained Stereo Singing Voice Cancellation
We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background...
Scalable Pre-training of Large Autoregressive Image Models
This paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e.,...
Co-ML: Collaborative Machine Learning Model Building for Developing Dataset Design Practices
Machine learning (ML) models are fundamentally shaped by data, and building inclusive ML systems requires significant considerations around how to design representative datasets....
Large-scale Training of Foundation Models for Wearable Biosignals
Tracking biosignals is crucial for monitoring wellness and preempting the development of severe medical conditions. Today, wearable devices can conveniently record various biosignals,...
Acoustic Model Fusion for End-to-end Speech Recognition
Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a...
Investigating Salient Representations and Label Variance Modeling in Dimensional Speech Emotion Analysis
Representations from models such as Bidirectional Encoder Representations from Transformers (BERT) and Hidden units BERT (HuBERT) have helped to achieve state-of-the-art performance in...
User-level Differentially Private Stochastic Convex Optimization: Efficient Algorithms with Optimal Rates
We study differentially private stochastic convex optimization (DP-SCO) under user-level privacy where each user may hold multiple data items. Existing work for ...
Bin Prediction for Better Conformal Prediction
This paper was accepted at the workshop on Regulatable ML at NeurIPS 2023.
Conformal Prediction (CP) is a method of estimating risk or uncertainty...
One Wide Feedforward is All You Need
This paper was accepted at WMT conference at EMNLP.
The Transformer architecture has two main non-embedding components: Attention and the Feed Forward Network (FFN)....