Moonwalk: Advancing Gait-Based User Recognition on Wearable Devices with Metric Learning
*=Equal Contributors
Personal devices have adopted diverse authentication methods, including biometric recognition and passcodes. In contrast, headphones have limited input mechanisms, depending solely on...
Vision-Based Hand Gesture Customization from a Single Demonstration
Hand gesture recognition is becoming a more prevalent mode of human-computer interaction, especially as cameras proliferate across everyday devices. Despite continued progress in...
Merge Vision Foundation Models via Multi-Task Distillation
As the repository of publicly available pre-trained vision foundation models (VFMs) — such as CLIP, DINOv2, and SAM — grows, users face challenges...
Humanizing Word Error Rate for ASR Transcript Readability and Accessibility
Humanizing Word Error Rate for ASR Transcript Readability and Accessibility
Source link
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Paper abstract: Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential...
Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials
In accordance with the principle of "data minimization," many internet companies are opting to record less data. However, this is often at odds...
SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking
In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing...
What Can CLIP Learn From Task-specific Experts?
This paper has been accepted to the UniReps Workshop in NeurIPS 2023.
Contrastive language image pretraining has become the standard approach for training vision...
Multichannel Voice Trigger Detection Based on Transform-average-concatenate
This paper was accepted at the workshop HSCMA at ICASSP 2024.
Voice triggering (VT) enables users to activate their devices by just speaking a...
Efficient ConvBN Blocks for Transfer Learning and Beyond
Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train,...