DR-MPC: Deep Residual Model Predictive Control for Real-World Social Navigation
How can a robot safely navigate around people with complex motion patterns? Deep Reinforcement Learning (DRL) in simulation holds some promise, but much...
Cut Your Losses in Large-Vocabulary Language Models
As language models grow ever larger, so do their vocabularies. This has shifted the memory footprint of LLMs during training disproportionately to one...
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length
This work was done in collaboration with Swiss Federal Institute of Technology Lausanne (EPFL).
Image tokenization has enabled major advances in autoregressive image...
Towards Automatic Assessment of Self-Supervised Speech Models Using Rank
This study explores using embedding rank as an unsupervised evaluation metric for general-purpose speech encoders trained via self-supervised learning (SSL). Traditionally, assessing the...
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
*Primary Contributors
Attention is a key part of the transformer architecture. It is a sequence-to-sequence mapping that transforms each sequence element into a weighted...
Transfer Learning in Scalable Graph Neural Network for Improved Physical Simulation
In recent years, graph neural network (GNN) based models showed promising results in simulating complex physical systems. However, training dedicated graph network simulator...
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Residual transformations enhance the representational depth and expressive power of large language models (LLMs). However, applying static residual transformations across all tokens in...
Findings of the IWSLT 2024 Evaluation Campaign
This paper reports on the shared tasks organized by the 21st IWSLT Conference. The shared tasks address 7 scientific challenges in spoken language...
ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning
Humanoid robots have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this,...
SELMA: A Speech-Enabled Language Model for Virtual Assistant Interactions
In this work, we present and evaluate SELMA, a Speech-Enabled Language Model for virtual Assistant interactions that integrates audio and text as inputs...