llmlingua-2

Runtime error

App Files Files Community

llmlingua-2 / README.md

qianhuiwu

Update readme.

9ecb996 7 months ago

preview code

raw

history blame contribute delete

No virus

2.48 kB

	---
	title: Llmlingua 2
	emoji: 💻
	colorFrom: red
	colorTo: green
	sdk: gradio
	sdk_version: 4.21.0
	app_file: app.py
	pinned: false
	license: cc-by-nc-sa-4.0
	---

	<!-- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference -->

	LLMLingua-2 is a branch of work from project:

	# LLMLingua Series \| Effectively Deliver Information to LLMs via Prompt Compression
	\| [Project Page](https://llmlingua.com/) \| [LLMLingua](https://aclanthology.org/2023.emnlp-main.825/) \| [LongLLMLingua](https://arxiv.org/abs/2310.06839) \| [LLMLingua-2](https://arxiv.org/abs/2403.12968) \| [LLMLingua Demo](https://huggingface.co/spaces/microsoft/LLMLingua) \| [LLMLingua-2 Demo](https://huggingface.co/spaces/microsoft/LLMLingua-2) \|

	Check the links above for more information!

	## Brief Introduction 📚

	LLMLingua utilizes a compact, well-trained language model (e.g., GPT2-small, LLaMA-7B) to identify and remove non-essential tokens in prompts. This approach enables efficient inference with large language models (LLMs), achieving up to 20x compression with minimal performance loss.
	- [LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models](https://aclanthology.org/2023.emnlp-main.825/) (EMNLP 2023)<br>
	_Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu_

	LongLLMLingua mitigates the 'lost in the middle' issue in LLMs, enhancing long-context information processing. It reduces costs and boosts efficiency with prompt compression, improving RAG performance by up to 21.4% using only 1/4 of the tokens.
	- [LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression](https://arxiv.org/abs/2310.06839) (ICLR ME-FoMo 2024)<br>
	_Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu_

	LLMLingua-2, a small-size yet powerful prompt compression method trained via data distillation from GPT-4 for token classification with a BERT-level encoder, excels in task-agnostic compression. It surpasses LLMLingua in handling out-of-domain data, offering 3x-6x faster performance.
	- [LLMLingua-2: Context-Aware Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression](https://arxiv.org/abs/2403.) (Under Review)<br>
	_Zhuoshi Pan, Qianhui Wu, Huiqiang Jiang, Menglin Xia, Xufang Luo, Jue Zhang, Qingwei Lin, Victor Ruhle, Yuqing Yang, Chin-Yew Lin, H. Vicky Zhao, Lili Qiu, Dongmei Zhang_