Scaling Laws for Native Multimodal Models
Building general-purpose models that can effectively perceive the world through multimodal signals has been a long-standing goal. Current approaches involve integrating separately pre-trained...
Simple ReFlow: Improved Techniques for Fast Flow Models
Diffusion and flow-matching models achieve remarkable generative performance but at the cost of many sampling steps, this slows inference and limits applicability to...
The AdEMAMix Optimizer: Better, Faster, Older
Momentum based optimizers are central to a wide range of machine learning applications. These typically rely on an Exponential Moving Average (EMA) of...
Understanding Aggregate Trends for Apple Intelligence Using Differential Privacy
At Apple, we believe privacy is a fundamental human right. And we believe in giving our users a great experience while protecting their...
MircoNN: An On-device Disk Resident Updatable Vector Database
Nearest neighbour search over dense vector collections has important applications in information retrieval, retrieval augmented generation (RAG), and content ranking. Performing efficient search...
MM-Ego: Towards Building Egocentric Multimodal LLMs
This research aims to comprehensively explore building a multimodal foundation model for egocentric video understanding. To achieve this goal, we work on three...
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
We present RelCon, a novel self-supervised Relative Contrastive learning approach for training a motion foundation model from wearable accelerometry sensors. First, a learnable...
Do LLMs Know Internally When They Follow Instructions?
Instruction-following is crucial for building AI agents with large language models (LLMs), as these models must adhere strictly to user-provided constraints and guidelines....
Adaptive Batch Size for Privately Finding Second-order Stationary Points
There is a gap between finding a first-order stationary point (FOSP) and a second-order stationary point (SOSP) under differential privacy constraints, and it...
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms
Building a generalist model for user interface (UI) understanding is challenging due to various foundational issues, such as platform diversity, resolution variation, and...