Comparative Analysis of Personalized Voice Activity Detection Systems: Assessing Real-World Effectiveness
Voice activity detection (VAD) is a critical component in various applications such as speech recognition, speaker identification, and hands-free communication systems. With the...
Conformer-Based Speech Recognition on Extreme Edge-Computing Devices
This paper was accepted at the Industry Track at NAACL 2024.
With increasingly more powerful compute capabilities and resources in today’s devices, traditionally compute-intensive...
AGRaME: Any Granularity Ranking with Multi-Vector Embeddings
Ranking is a fundamental and popular problem in search. However, existing ranking algorithms usually restrict the granularity of ranking to full passages or...
Time Sensitive Knowledge Editing through Efficient Finetuning
Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge...
Transformer-based Model for ASR N-Best Rescoring and Rewriting
Voice assistants increasingly use on-device Automatic Speech Recognition (ASR) to ensure speed and privacy. However, due to resource constraints on the device, queries...
Evaluating the IWSLT2023 Speech Translation Tasks: Human Annotations, Automatic Metrics, and Segmentation
Human evaluation is a critical component in machine translation system development and has received much attention in text translation research. However, little prior...
Improved Modelling of Federated Datasets using Mixtures-of-Dirichlet-Multinomials
In practice, training using federated learning can be orders of magnitude slower than standard centralized training. This severely limits the amount of experimentation...
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2024
Source link
Introducing Apple’s On-Device and Server Foundation Models
Introducing Apple’s On-Device and Server Foundation Models
Source link
Embedding Pose Graph, Enabling 3D Foundation Model Capabilities with a Compact Representation
This paper presents the Embedding Pose Graph (EPG), an innovative method that combines the strengths of foundation models with a simple 3D representation...