How Far Are We from Intelligent Visual Deductive Reasoning?
This paper was accepted at the How Far Are We from AGI? workshop at ICLR 2024.
Vision-Language Models (VLMs) such as GPT-4V have recently...
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Instruction-based image editing improves the controllability and flexibility of image manipulation via natural commands without elaborate descriptions or regional masks. However, human instructions...
Pseudo-Generalized Dynamic View Synthesis from a Video
Rendering scenes observed in a monocular video from novel viewpoints is a chal- lenging problem. For static scenes the community has studied both...
When can transformers reason with abstract symbols?
We investigate the capabilities of transformer models on relational reasoning tasks. In these tasks, models are trained on a set of strings encoding...
Vanishing Gradients in Reinforcement Finetuning of Language Models
Pretrained language models are commonly adapted to comply with human intent and downstream tasks via finetuning. The finetuning process involves supervised finetuning (SFT),...
Think While You Write Hypothesis Verification Promotes Faithful Knowledge-to-Text Generation
Neural knowledge-to-text generation models often struggle to faithfully generate descriptions for the input facts: they may produce hallucinations that contradict the given facts,...
Label-Efficient Sleep Staging Using Transformers Pre-trained with Position Prediction
Sleep staging is a clinically important task for diagnosing various sleep disorders but remains challenging to deploy at scale because it requires clinical...
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7× Faster Pre-training on Web-scale Image-Text Data
Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise...
Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences
On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday...
Talaria: Interactively Optimizing Machine Learning Models for Efficient Inference
On-device machine learning (ML) moves computation from the cloud to personal devices, protecting user privacy and enabling intelligent user experiences. However, fitting models...