Machine Learning

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Machine Learning March 6, 2024

Paper abstract: Large-scale web-crawled datasets are fundamental for the success of pre-training vision-language models, such as CLIP. However, the inherent noise and potential...

Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

Machine Learning April 8, 2025

Recent advancements in multimodal models highlight the value of rewritten captions for improving performance, yet key challenges remain. Notably, the role of synthetic...

Disentangled Representational Learning with the Gromov-Monge Gap

Machine Learning April 17, 2025

Learning disentangled representations from unlabelled data is a fundamental challenge in machine learning. Solving it may unlock other problems, such as generalization, interpretability,...

ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot

Machine Learning January 24, 2025

Nonverbal behaviors such as posture, gestures, and gaze are essential for conveying internal states, both consciously and unconsciously, in human interaction. For robots...

Pseudo-Generalized Dynamic View Synthesis from a Video

Machine Learning May 3, 2024

Rendering scenes observed in a monocular video from novel viewpoints is a chal- lenging problem. For static scenes the community has studied both...

Construction of Paired Knowledge Graph – Text Datasets Informed by Cyclic Evaluation

Machine Learning March 15, 2024

Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text...

Transfer Learning for Structured Pruning under Limited Task Data

Machine Learning July 10, 2024

This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP-III) Workshop at NeurIPS. Large, pre-trained models are problematic to use in...

Acoustic Model Fusion for End-to-end Speech Recognition

Machine Learning January 29, 2024

Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted its accuracy to a...

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

Machine Learning April 25, 2024

On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday...

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

Machine Learning April 16, 2025

This paper was accepted at the Workshop on Foundation Models in the Wild at ICLR 2025. Visual understanding is inherently contextual - what we...

12 3...31 Page 1 of 31

More News

Games

VeCLIP: Improving CLIP Training via Visual-enriched Captions

Revisit Large-Scale Image–Caption Data in Pre-training Multimodal Foundation Models

Disentangled Representational Learning with the Gromov-Monge Gap

ELEGNT: Expressive and Functional Movement Design for Non-Anthropomorphic Robot

Pseudo-Generalized Dynamic View Synthesis from a Video

Construction of Paired Knowledge Graph – Text Datasets Informed by Cyclic Evaluation

Transfer Learning for Structured Pruning under Limited Task Data

Acoustic Model Fusion for End-to-end Speech Recognition

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations

More News

Bravely Default HD Remaster For Nintendo Switch 2 Is Finally Up...

Official Nintendo Playing Cards – All Of The Mario & Zelda Decks Available Now

Nintendo Switch 2 May Record Your Audio And Video Chats

Let's All Speculate Wildly About What Outer Wilds Dev's New Game Is

GTA 6's Trailer 2 Looked Great, And It Wasn't All Cutscenes