Towards Low-Bit Communication for Tensor Parallel LLM Inference
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.
Tensor parallelism provides an effective way to...
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
We present Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art speedup for large language models (LLMs) inference. The performance gains...
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS 2024.
The pre-training phase of language models often...
Optimizing Contextual Speech Recognition Using Vector Quantization for Efficient Retrieval
Neural contextual biasing allows speech recognition models to leverage contextually relevant information, leading to improved transcription accuracy. However, the biasing mechanism is typically...
Device-Directed Speech Detection for Follow-up Conversations Using Large Language Models
This paper was accepted at the Adaptive Foundation Models (AFM) workshop at NeurIPS Workshop 2024.
Follow-up conversations with virtual assistants (VAs) enable a user...
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Large pretrained vision-language models like CLIP have shown promising generalization capability, but may struggle in specialized domains (e.g., satellite imagery) or fine-grained classification...
Empirical Methods in Natural Language Processing (EMNLP) 2024
Empirical Methods in Natural Language Processing (EMNLP) 2024
Source link
Computational Bottlenecks of Training Small-Scale Large Language Models
This paper was accepted at the Efficient Natural Language and Speech Processing (ENLSP) workshop at NeurIPS Workshop 2024.
While large language models (LLMs) dominate...
On Device Llama 3.1 with Core ML
Many app developers are interested in building on device experiences that integrate increasingly capable large language models (LLMs). Running these models locally on...
Towards Cross-Cultural Machine Translation with Retrieval-Augmented Generation from Multilingual Knowledge Graphs
Translating text that contains entity names is a challenging task, as cultural-related references can vary significantly across languages. These variations may also be...