view article Article Assisted Generation: a new direction toward low-latency text generation May 11, 2023 • 30
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation Paper • 2408.02545 • Published Aug 5 • 33
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28 • 158
Accelerating Speculative Decoding using Dynamic Speculation Length Paper • 2405.04304 • Published May 7 • 2
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23 • 16
view article Article Building Cost-Efficient Enterprise RAG applications with Intel Gaudi 2 and Intel Xeon May 9 • 11
Improving Classification Performance With Human Feedback: Label a few, we label the rest Paper • 2401.09555 • Published Jan 17 • 6
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models Paper • 2306.14048 • Published Jun 24, 2023 • 11