When Does a Predictor Know Its Own Loss?
Given a predictor and a loss function, how well can we predict the loss that the predictor will incur on an input? This...
Contrastive Localized Language-Image Pre-Training
Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has...
KV Prediction for Improved Time to First Token
Inference with transformer-based language models begins with a prompt processing step. In this step, the model generates the first output token and stores...
Towards AI-Driven Sign Language Generation with Non-Manual Markers
Sign languages are essential for the Deaf and Hard-of-Hearing (DHH) community. Sign language generation systems have the potential to support communication by translating...
eaSEL: Promoting Social-Emotional Learning and Parent-Child Interaction Through AI-Mediated Content Consumption
As children increasingly consume media on devices, parents look for ways this usage can support learning and growth, especially in domains like social-emotional...
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons
We examine the capability of Multimodal Large Language Models (MLLMs) to tackle diverse domains that extend beyond the traditional language and vision tasks...
DR-MPC: Deep Residual Model Predictive Control for Real-World Social Navigation
How can a robot safely navigate around people with complex motion patterns? Deep Reinforcement Learning (DRL) in simulation holds some promise, but much...
Cut Your Losses in Large-Vocabulary Language Models
As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one...
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
This work was done in collaboration with Swiss Federal Institute of Technology Lausanne (EPFL).
Image tokenization has enabled major advances in autoregressive image...
Towards Automatic Assessment of Self-Supervised Speech Models Using Rank
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the...