Machine Learning

Merge Vision Foundation Models via Multi-Task Distillation

Machine Learning March 11, 2024

As the repository of publicly available pre-trained vision foundation models (VFMs) — such as CLIP, DINOv2, and SAM — grows, users face challenges...

Humanizing Word Error Rate for ASR Transcript Readability and Accessibility

Machine Learning March 7, 2024

Humanizing Word Error Rate for ASR Transcript Readability and Accessibility Source link

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Machine Learning March 6, 2024

Paper abstract: Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential...

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

Machine Learning March 5, 2024

In accordance with the principle of "data minimization," many internet companies are opting to record less data. However, this is often at odds...

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

Machine Learning March 4, 2024

In-context learning with Large Language Models (LLMs) has emerged as a promising avenue of research in Dialog State Tracking (DST). However, the best-performing...

What Can CLIP Learn From Task-specific Experts?

Machine Learning March 4, 2024

This paper has been accepted to the UniReps Workshop in NeurIPS 2023. Contrastive language image pretraining has become the standard approach for training vision...

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Machine Learning February 26, 2024

This paper was accepted at the workshop HSCMA at ICASSP 2024. Voice triggering (VT) enables users to activate their devices by just speaking a...

Efficient ConvBN Blocks for Transfer Learning and Beyond

Machine Learning February 20, 2024

Convolution-BatchNorm (ConvBN) blocks are integral components in various computer vision tasks and other domains. A ConvBN block can operate in three modes: Train,...

Keyframer: Empowering Animation Design using Large Language Models

Machine Learning February 20, 2024

Large language models (LLMs) have the potential to impact a wide range of creative domains, as exemplified in popular text-to-image generators like DALL·E...

Resource-constrained Stereo Singing Voice Cancellation

Machine Learning February 13, 2024

We study the problem of stereo singing voice cancellation, a subtask of music source separation, whose goal is to estimate an instrumental background...

1...282930 31 Page 29 of 31

More News

Games

Merge Vision Foundation Models via Multi-Task Distillation

Humanizing Word Error Rate for ASR Transcript Readability and Accessibility

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Privacy-Preserving Quantile Treatment Effect Estimation for Randomized Controlled Trials

SynthDST: Synthetic Data is All You Need for Few-Shot Dialog State Tracking

What Can CLIP Learn From Task-specific Experts?

Multichannel Voice Trigger Detection Based on Transform-average-concatenate

Efficient ConvBN Blocks for Transfer Learning and Beyond

Keyframer: Empowering Animation Design using Large Language Models

Resource-constrained Stereo Singing Voice Cancellation

More News

Bravely Default HD Remaster For Nintendo Switch 2 Is Finally Up...

Official Nintendo Playing Cards – All Of The Mario & Zelda Decks Available Now

Nintendo Switch 2 May Record Your Audio And Video Chats

Let's All Speculate Wildly About What Outer Wilds Dev's New Game Is

GTA 6's Trailer 2 Looked Great, And It Wasn't All Cutscenes