Text Generation
PEFT
Safetensors
sourize's picture
Update README.md
fa9d7a5 verified
---
base_model: microsoft/phi-2
library_name: peft
license: mit
tags:
- text-generation
pipeline_tag: text-generation
datasets:
- NuclearAi/HyperThink-Mini-50K
---
# phi2-memory-deeptalks
A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.
<p align="center">
<a href="https://huggingface.co/spaces/sourize/DeepTalks">
🔗 Live Demo on Hugging Face Spaces
</a>
</p>
<p align="center">
⏳ It takes time to generate responses since it's running on the CPU free tier
</p>
---
## 🚀 Overview
**phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.
- **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model)
- **Base:** Phi-2 (2.7 B parameters)
- **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library
---
## 📦 Model Details
### Architecture & Adapter Configuration
- **Base model:** `microsoft/phi-2` (causal-LM)
- **LoRA rank (r):** 4
- **Modules wrapped:**
- Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`
- MLP layers: `fc1`, `fc2`
- **LoRA hyperparameters:**
- `lora_alpha`: 32
- `lora_dropout`: 0.05
- **Trainable params:** ~5.9 M
### Training Data & Preprocessing
- **Dataset:** HyperThink-Mini 50 K (7 % used)
- **Prompt format:**
```text
### Human:
<user message>
### Assistant:
<assistant response>
```
- **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids`
- **Optimizer:** AdamW (PyTorch), FP16 on GPU
- **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`
- **Epochs:** 3
- **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors`
---
## 🎯 Evaluation
- **Training loss (step 500):** ~1.08
- **Validation loss:** ~1.10
- **Qualitative:**
- Improved recall of the last 2–4 turns in dialogue
- Maintains base Phi-2 fluency on general language
---
## 🔧 Usage
Load the adapter into your Phi-2 model with just a few lines:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig
# 1) Load base
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
# 2) Apply LoRA adapter
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
model = PeftModel.from_pretrained(model, peft_config)
# 3) (Optional) Resize embeddings
model.base_model.resize_token_embeddings(len(tokenizer))
# 4) Generate
prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## ⚙️ Inference & Deployment
- **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency
- **CPU-only:** ~7–10 min per response (large model!)
- **Hugging Face Inference API:**
```bash
curl -X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
-d '{
"inputs": "Hello, how are you?",
"parameters": {
"max_new_tokens": 64,
"do_sample": true,
"temperature": 0.7,
"top_p": 0.9,
"return_full_text": false
}
}'
```
---
## 💡 Use Cases & Limitations
- **Ideal for:**
- Short back-and-forth chats (2–4 turns)
- Chatbots that need to “remember” very recent context
- **Not suited for:**
- Long-term memory or document-level retrieval
- High-volume production on CPU (too slow)
---
## 📖 Further Reading
- **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)
- **Blog post:** [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion)
- **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685)
---
## 🔖 Citation
```bibtex
@misc{sourize_phi2_memory_deeptalks,
title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
author = {Sourish},
year = {2025},
howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
license = {MIT}
}
```
---
*Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).*
```