Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Recent advancements in multimodal large language models (MLLMs) have been noteworthy, yet, these general-domain MLLMs often fall short in their ability to comprehend...
Retrieval-Augmented Correction of Named Entity Speech Recognition Errors
In recent years, end-to-end automatic speech recognition (ASR) systems have proven themselves remarkably accurate and performant, but these systems still have a significant...
Automated Code Fix Suggestions for Accessibility Issues in Mobile Apps
Accessibility is crucial for inclusive app usability, yet developers often struggle to identify and fix app accessibility issues due to a lack of...
European Conference on Computer Vision (ECCV) 2024
European Conference on Computer Vision (ECCV) 2024
Source link
Speculative Streaming: Fast LLM Inference Without Auxiliary Models
Speculative decoding is a prominent technique to speed up the inference of a large target language model based on predictions of an auxiliary...
Contextualization of ASR with LLM Using Phonetic Retrieval-Based Augmentation
Large language models (LLMs) have shown superb capability of modeling multimodal signals including audio and text, allowing the model to generate spoken or...
Compress and Compare: Interactively Evaluating Efficiency and Behavior Across ML Model Compression Experiments
*Equal Contributors
To deploy machine learning models on-device, practitioners use compression algorithms to shrink and speed up models while maintaining their high-quality output. A...
Generalizable Error Modeling for Human Data Annotation: Evidence from an Industry-Scale Search Data Annotation...
Machine learning (ML) and artificial intelligence (AI) systems rely heavily on human-annotated data for training and evaluation. A major challenge in this context...
Misty: UI Prototyping Through Interactive Conceptual Blending
UI prototyping often involves iterating and blending elements from examples such as screenshots and sketches, but current tools offer limited support for incorporating...
Optimizing Byte-level Representation for End-to-End ASR
This paper was accepted at the IEEE Spoken Language Technology Workshop (SLT) 2024.
In this paper, we propose an algorithm to optimize a byte-level...