Generalizable Autoregressive Modeling of Time Series Through Functional Narratives
Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time...
CAMPHOR: Collaborative Agents for Multi-Input Planning and High-Order Reasoning On Device
While server-side Large Language Models (LLMs) demonstrate proficiency in tool integration and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings...
Progressive Entropic Optimal Transport Solvers
Optimal transport (OT) has profoundly impacted machine learning by providing theoretical and computational tools to realign datasets. In this context, given two large...
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models
Recent advancements in Large Language Models (LLMs) have sparked interest in their formal reasoning capabilities, particularly in mathematics. The GSM8K benchmark is widely...
Contrastive Localized Language-Image Pre-Training
Contrastive Language-Image Pre-training (CLIP) has been a celebrated method for training vision encoders to generate image/text representations facilitating various applications. Recently, CLIP has...
When is Multicalibration Post-Processing Necessary?
Calibration is a well-studied property of predictors which guarantees meaningful uncertainty estimates. Multicalibration is a related notion -- originating in algorithmic fairness --...
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference...
Reinforcement Learning from Human Feedback (RLHF) is an effective approach for aligning language models to human preferences. Central to RLHF is learning a...
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second
We present a foundation model for zero-shot metric monocular depth estimation. Our model, Depth Pro, synthesizes high-resolution depth maps with unparalleled sharpness and...
Improving How Machine Translations Handle Grammatical Gender Ambiguity
Machine Translation (MT) enables people to connect with others and engage with content across language barriers. Grammatical gender presents a difficult challenge for...
UI-JEPA: Towards Active Perception of User Intent Through Onscreen User Activity
Generating user intent from a sequence of user interface (UI) actions is a core challenge in comprehensive UI understanding. Recent advancements in multimodal...