doubleyyh
/

exit-gemma-2b

+---
+language: en
+tags:
+- rag
+- context-compression
+- gemma
+license: apache-2.0
+datasets:
+- hotpotqa
+base_model:
+- google/gemma-2b-it
+---
+# EXIT: Context-Aware Extractive Compression for RAG
+EXIT is a context-aware extractive compression model that improves the efficiency and effectiveness of Retrieval-Augmented Generation (RAG) by intelligently selecting relevant sentences while preserving contextual dependencies.
+[[Paper]](https://arxiv.org/abs/2412.12559) [[GitHub]](https://github.com/ThisIsHwang/EXIT)
+## Model Description
+EXIT is designed to:
+- Compress retrieved documents while preserving critical information
+- Consider full document context when evaluating sentence importance
+- Enable parallelizable, context-aware extraction
+- Adapt dynamically to query complexity
+- Balance compression ratio and answer accuracy
+## Task and Intended Use
+EXIT is trained to classify sentences as either relevant or irrelevant for answering a query based on their content and surrounding context. It is specifically designed for:
+- RAG context compression
+- Open-domain question answering
+- Both single-hop and multi-hop queries
+## Quickstart
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftModel
+import spacy
+# 1. Load models
+base_model = AutoModelForCausalLM.from_pretrained(
+    "google/gemma-2b-it",
+    device_map="auto",
+    torch_dtype=torch.float16
+)
+exit_model = PeftModel.from_pretrained(
+    base_model,
+    "doubleyyh/exit-gemma-2b"
+)
+tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it")
+# 2. Initialize sentence splitter
+nlp = spacy.load("en_core_web_sm", disable=[
+    "tok2vec", "tagger", "parser", "attribute_ruler",
+    "lemmatizer", "ner"
+])
+nlp.enable_pipe("senter")
+# 3. Input
+query = "How do solid-state drives (SSDs) improve computer performance?"
+context = """
+Solid-state drives use flash memory to store data without moving parts.
+Unlike traditional hard drives, SSDs have no mechanical components.
+The absence of physical movement allows for much faster data access speeds.
+I bought my computer last week.
+SSDs significantly reduce boot times and application loading speeds.
+They consume less power and are more reliable than mechanical drives.
+The price of SSDs has decreased significantly in recent years.
+"""
+# 4. Process sentences
+def get_relevance(query: str, context: str, sentence: str, tau: float = 0.5) -> bool:
+    prompt = f'''<start_of_turn>user
+Query:
+{query}
+Full context:
+{context}
+Sentence:
+{sentence}
+Is this sentence useful in answering the query? Answer only "Yes" or "No".<end_of_turn>
+<start_of_turn>model
+'''
+    inputs = tokenizer(prompt, return_tensors="pt").to(exit_model.device)
+    with torch.no_grad():
+        outputs = exit_model(**inputs)
+        yes_id = tokenizer.encode("Yes", add_special_tokens=False)
+        no_id = tokenizer.encode("No", add_special_tokens=False)
+        logits = outputs.logits[0, -1, [yes_id, no_id]]
+        prob = torch.softmax(logits, dim=0)[0].item()
+    return prob >= tau
+# 5. Compress document
+sentences = [sent.text.strip() for sent in nlp(context).sents]
+compressed = [sent for sent in sentences if get_relevance(query, context, sent)]
+compressed_text = " ".join(compressed)
+print(f"Compressed text ({len(compressed)}/{len(sentences)} sentences):", compressed_text)
+```
+## Training Data
+The model was trained on the HotpotQA dataset using:
+- Positive examples: Sentences marked as supporting facts
+- Hard negatives: Sentences from same documents but not supporting facts
+- Random negatives: Sentences from unrelated documents
+## Parameters
+- Base model: Gemma-2b-it
+- Training method: PEFT/LoRA
+- Recommended tau threshold: 0.5
+## Limitations
+- Currently optimized for English text only
+- No support for cross-lingual compression
+## Citation
+```bibtex
+@article{hwang2024exit,
+  title={EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation},
+  author={Hwang, Taeho and Cho, Sukmin and Jeong, Soyeong and Song, Hoyun and Han, SeungYoon and Park, Jong C.},
+  journal={arXiv preprint arXiv:2412.12559},
+  year={2024}
+}
+```