Spaces:
Running
title: LLMLingua
emoji: π
colorFrom: red
colorTo: yellow
sdk: gradio
sdk_version: 3.47.1
app_file: app.py
pinned: false
license: mit
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models & LongLLMLingua
| LLMLingua Paper | LongLLMLingua Paper | HF Space Demo |
Tl;DR
LLMLingua, that uses a well-trained small language model after alignment, such as GPT2-small or LLaMA-7B, to detect the unimportant tokens in the prompt and enable inference with the compressed prompt in black-box LLMs, achieving up to 20x compression with minimal performance loss.
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models (EMNLP 2023).
Huiqiang Jiang, Qianhui Wu, Chin-Yew Lin, Yuqing Yang and Lili Qiu
LongLLMLingua is a method that enhances LLMs' ability to perceive key information in long-context scenarios using prompt compression, achieveing up to $28.5 in cost savings per 1,000 samples while also improving performance.
LongLLMLingua: Accelerating and Enhancing LLMs in Long Context Scenarios via Prompt Compression (Under Review).
Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang and Lili Qiu