Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
We present a novel way to predict molecular conformers through a simple formulation that sidesteps many of the heuristics of prior works and...
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement
Contrastive language image pretraining (CLIP) is a standard method for training vision-language models. While CLIP is scalable, promptable, and robust to distribution shifts...
On Efficient and Statistical Quality Estimation for Data Annotation
Annotated data is an essential ingredient to train, evaluate, compare and productionalize machine learning models. It is therefore imperative that annotations are of...
Efficient Diffusion Models without Attention
Transformers have demonstrated impressive performance on class-conditional ImageNet benchmarks, achieving state-of-the-art FID scores. However, their computational complexity increases with transformer depth/width or the...
Affine-based Deformable Attention and Selective Fusion for Semi-dense Matching
This paper was accepted at the Image Matching: Local Features & Beyond workshop at CVPR 2024.
Identifying robust and accurate correspondences across images is...
ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models
Modern diffusion-based image generative models have made significant progress and become promising to enrich training data for the object detection task. However, the...
Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications
We consider the task of animating 3D facial geometry from speech signal. Existing works are primarily deterministic, focusing on learning a one-to-one mapping...
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
In the field of deep point cloud understanding, KPConv is a unique architecture that uses kernel points to locate convolutional weights in space,...
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
Vision Foundation Models (VFMs) pretrained on massive datasets exhibit impressive performance on various downstream tasks, especially with limited labeled target data. However, due...
KV-Runahead: Scalable Causal LLM Inference by Parallel Key-Value Cache Generation
Large Language Model or LLM inference has two phases, the prompt (or prefill) phase to output the first token and the extension (or...