--- extra_gated_heading: Access Llama 2 on Hugging Face extra_gated_description: >- This is a form to enable access to Llama 2 on Hugging Face after you have been granted access from Meta. Please visit the [Meta website](https://ai.meta.com/resources/models-and-libraries/llama-downloads) and accept our license terms and acceptable use policy before submitting this form. Requests will be processed in 1-2 days. extra_gated_button_content: Submit extra_gated_fields: I agree to share my name, email address and username with Meta and confirm that I have already been granted download access on the Meta website: checkbox language: - en pipeline_tag: text-generation inference: false tags: - facebook - meta - pytorch - llama - llama-2 --- # Custom handler for HF Inference Endpoint for LLMLingua ## LLMLingua https://github.com/microsoft/LLMLingua https://llmlingua.com/ > To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss ## Model: NousResearch/Llama-2-7b-hf https://huggingface.co/NousResearch/Llama-2-7b-hf ## Inference Endpoint Configuration Task: Custom Container Type: Default Instance Type: GPU Nvidia A10G 24Gb ## Usage ### Sample payload ```json { "inputs": "A long prompt to optimize for the LLM", "parameters": { "instruction": "", "question": "", "target_token": 200, "context_budget": "*1.5", "iterative_size": 100 } } ``` Prompt sample text: https://raw.githubusercontent.com/FranxYao/chain-of-thought-hub/main/gsm8k/lib_prompt/prompt_hardest.txt ### Expected output ```json { "compressed_prompt": "Question: Sam bought a dozen boxes, each with 30 highlighter pens inside, for $10 each. He reanged five of boxes into packages of sixlters each and sold them $3 per. He sold the rest theters separately at the of three pens $2. How much did make in total, dollars?\nLets think step step\nSam bought 1 boxes x00 oflters.\nHe bought 12 00ters in total\nSam then took5 boxes 6ters0ters\nHe sold these boxes for 5 *5\nAfterelling these boxes there were 30330ters remaining\nese form 330 /30 of three\n sold each for2 each, so made * =0 from\n total, he0 $15\nSince his original1 he earned $120 = $115 in profit.\nThe answer is 115", "origin_tokens": 2365, "compressed_tokens": 174, "ratio": "13.6x", "saving": ", Saving $0.1 in GPT-4." } ```