|
--- |
|
base_model: microsoft/phi-2 |
|
library_name: peft |
|
license: mit |
|
tags: |
|
- text-generation |
|
pipeline_tag: text-generation |
|
datasets: |
|
- NuclearAi/HyperThink-Mini-50K |
|
--- |
|
|
|
# phi2-memory-deeptalks |
|
|
|
A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model. |
|
|
|
<p align="center"> |
|
<a href="https://huggingface.co/spaces/sourize/DeepTalks"> |
|
🔗 Live Demo on Hugging Face Spaces |
|
</a> |
|
</p> |
|
<p align="center"> |
|
⏳ It takes time to generate responses since it's running on the CPU free tier |
|
</p> |
|
|
|
|
|
--- |
|
|
|
## 🚀 Overview |
|
|
|
**phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`. |
|
- **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model) |
|
- **Base:** Phi-2 (2.7 B parameters) |
|
- **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library |
|
|
|
--- |
|
|
|
## 📦 Model Details |
|
|
|
### Architecture & Adapter Configuration |
|
|
|
- **Base model:** `microsoft/phi-2` (causal-LM) |
|
- **LoRA rank (r):** 4 |
|
- **Modules wrapped:** |
|
- Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense` |
|
- MLP layers: `fc1`, `fc2` |
|
- **LoRA hyperparameters:** |
|
- `lora_alpha`: 32 |
|
- `lora_dropout`: 0.05 |
|
- **Trainable params:** ~5.9 M |
|
|
|
### Training Data & Preprocessing |
|
|
|
- **Dataset:** HyperThink-Mini 50 K (7 % used) |
|
- **Prompt format:** |
|
```text |
|
### Human: |
|
<user message> |
|
|
|
### Assistant: |
|
<assistant response> |
|
``` |
|
- **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids` |
|
- **Optimizer:** AdamW (PyTorch), FP16 on GPU |
|
- **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8` |
|
- **Epochs:** 3 |
|
- **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors` |
|
|
|
--- |
|
|
|
## 🎯 Evaluation |
|
|
|
- **Training loss (step 500):** ~1.08 |
|
- **Validation loss:** ~1.10 |
|
- **Qualitative:** |
|
- Improved recall of the last 2–4 turns in dialogue |
|
- Maintains base Phi-2 fluency on general language |
|
|
|
--- |
|
|
|
## 🔧 Usage |
|
|
|
Load the adapter into your Phi-2 model with just a few lines: |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from peft import PeftModel, LoraConfig |
|
|
|
# 1) Load base |
|
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left") |
|
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2") |
|
|
|
# 2) Apply LoRA adapter |
|
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks") |
|
model = PeftModel.from_pretrained(model, peft_config) |
|
|
|
# 3) (Optional) Resize embeddings |
|
model.base_model.resize_token_embeddings(len(tokenizer)) |
|
|
|
# 4) Generate |
|
prompt = "### Human:\nHello, how are you?\n\n### Assistant:" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
output = model.generate(**inputs, max_new_tokens=64) |
|
print(tokenizer.decode(output[0], skip_special_tokens=True)) |
|
``` |
|
|
|
--- |
|
|
|
## ⚙️ Inference & Deployment |
|
|
|
- **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency |
|
- **CPU-only:** ~7–10 min per response (large model!) |
|
- **Hugging Face Inference API:** |
|
```bash |
|
curl -X POST \ |
|
-H "Authorization: Bearer $HF_TOKEN" \ |
|
-H "Content-Type: application/json" \ |
|
https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \ |
|
-d '{ |
|
"inputs": "Hello, how are you?", |
|
"parameters": { |
|
"max_new_tokens": 64, |
|
"do_sample": true, |
|
"temperature": 0.7, |
|
"top_p": 0.9, |
|
"return_full_text": false |
|
} |
|
}' |
|
``` |
|
|
|
--- |
|
|
|
## 💡 Use Cases & Limitations |
|
|
|
- **Ideal for:** |
|
- Short back-and-forth chats (2–4 turns) |
|
- Chatbots that need to “remember” very recent context |
|
- **Not suited for:** |
|
- Long-term memory or document-level retrieval |
|
- High-volume production on CPU (too slow) |
|
|
|
--- |
|
|
|
## 📖 Further Reading |
|
|
|
- **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks) |
|
- **Blog post:** [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion) |
|
- **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685) |
|
|
|
--- |
|
|
|
## 🔖 Citation |
|
|
|
```bibtex |
|
@misc{sourize_phi2_memory_deeptalks, |
|
title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory}, |
|
author = {Sourish}, |
|
year = {2025}, |
|
howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}}, |
|
license = {MIT} |
|
} |
|
``` |
|
|
|
--- |
|
|
|
*Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).* |
|
``` |