The AdEMAMix Optimizer: Better, Faster, Older
Momentum based optimizers are central to a wide range of machine learning applications. These typically rely on an Exponential Moving Average (EMA) of...
Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy
At Apple, we believe privacy is a fundamental human right. And we believe in giving our users a great experience while protecting their...
MircoNN: An On-device Disk Resident Updatable Vector Database
Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search...
MM-Ego: Towards Building Egocentric Multimodal LLMs
This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three...
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable...
Do LLMs Know Internally When They Follow Instructions?
Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines....
Adaptive Batch Size for Privately Finding Second-order Stationary Points
There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it...
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and...
TiC-LM: A Web-Scale Benchmark for Time-Continual LLM Pretraining
This paper was accepted at the Scalable Continual Learning for Lifelong Foundation Models (SCLLFM) Workshop at NeurIPS 2024.
Large Language Models (LLMs) trained on...
Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models
Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. Notably, the role of synthetic...