Frequency-Aware Masked Autoencoders for Multimodal Pretraining on Biosignals
Inspired by the advancements in foundation models for language-vision modeling, we explore the utilization of transformers and large-scale pretraining on biosignals. In this...
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into...
A Multi-signal Large Language Model for Device-directed Speech Detection
We present an architecture for device-directed speech detection that treats the task as a text-generation problem. We use a multi-modal fusion approach that...
Towards a World-English Language Model
Neural Network Language Models (NNLMs) of Virtual Assistants (VAs) are generally language-, region-, and in some cases, device-dependent, which increases the effort to...
Streaming Anchor Loss: Augmenting Supervision with Temporal Significance
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms. Hence, increasing the...
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
*Equal Contributors
Contrastive pretraining of image-text foundation models, such as CLIP, demonstrated excellent zero-shot performance and improved robustness on a wide range of downstream...
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
Source link
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Existing vision-language models exhibit strong generalization on a variety of visual domains and tasks. However, such models mainly perform zero-shot recognition in a...
Hierarchical and Dynamic Prompt Compression for Efficient Zero-shot API Usage
Long prompts present a significant challenge for practical LLM-based systems that need to operate with low latency and limited resources. We investigate prompt...
Hindsight PRIORs for Reward Learning from Human Preferences
Preference based Reinforcement Learning (PbRL) has shown great promise in learning from human preference binary feedback on agent's trajectory behaviors, where one of...