File size: 4,793 Bytes
77105f1 834f2ff b4a7a77 e541155 b4a7a77 e541155 77105f1 e541155 77105f1 e541155 77105f1 e541155 48f9221 e541155 349dfff 77105f1 e541155 77105f1 e541155 77105f1 e541155 77105f1 e541155 77105f1 e541155 77105f1 e541155 77105f1 b4a7a77 77105f1 e541155 b4a7a77 77105f1 e541155 77105f1 e541155 b4a7a77 77105f1 e541155 77105f1 e541155 77105f1 e541155 fa9d7a5 e541155 3dc569e e541155 b4a7a77 e541155 b4a7a77 e541155 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
base_model: microsoft/phi-2
library_name: peft
license: mit
tags:
- text-generation
pipeline_tag: text-generation
datasets:
- NuclearAi/HyperThink-Mini-50K
---
# phi2-memory-deeptalks
A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.
<p align="center">
<a href="https://huggingface.co/spaces/sourize/DeepTalks">
🔗 Live Demo on Hugging Face Spaces
</a>
</p>
<p align="center">
⏳ It takes time to generate responses since it's running on the CPU free tier
</p>
---
## 🚀 Overview
**phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.
- **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model)
- **Base:** Phi-2 (2.7 B parameters)
- **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library
---
## 📦 Model Details
### Architecture & Adapter Configuration
- **Base model:** `microsoft/phi-2` (causal-LM)
- **LoRA rank (r):** 4
- **Modules wrapped:**
- Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`
- MLP layers: `fc1`, `fc2`
- **LoRA hyperparameters:**
- `lora_alpha`: 32
- `lora_dropout`: 0.05
- **Trainable params:** ~5.9 M
### Training Data & Preprocessing
- **Dataset:** HyperThink-Mini 50 K (7 % used)
- **Prompt format:**
```text
### Human:
<user message>
### Assistant:
<assistant response>
```
- **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids`
- **Optimizer:** AdamW (PyTorch), FP16 on GPU
- **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`
- **Epochs:** 3
- **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors`
---
## 🎯 Evaluation
- **Training loss (step 500):** ~1.08
- **Validation loss:** ~1.10
- **Qualitative:**
- Improved recall of the last 2–4 turns in dialogue
- Maintains base Phi-2 fluency on general language
---
## 🔧 Usage
Load the adapter into your Phi-2 model with just a few lines:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig
# 1) Load base
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
# 2) Apply LoRA adapter
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
model = PeftModel.from_pretrained(model, peft_config)
# 3) (Optional) Resize embeddings
model.base_model.resize_token_embeddings(len(tokenizer))
# 4) Generate
prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```
---
## ⚙️ Inference & Deployment
- **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency
- **CPU-only:** ~7–10 min per response (large model!)
- **Hugging Face Inference API:**
```bash
curl -X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json" \
https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
-d '{
"inputs": "Hello, how are you?",
"parameters": {
"max_new_tokens": 64,
"do_sample": true,
"temperature": 0.7,
"top_p": 0.9,
"return_full_text": false
}
}'
```
---
## 💡 Use Cases & Limitations
- **Ideal for:**
- Short back-and-forth chats (2–4 turns)
- Chatbots that need to “remember” very recent context
- **Not suited for:**
- Long-term memory or document-level retrieval
- High-volume production on CPU (too slow)
---
## 📖 Further Reading
- **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)
- **Blog post:** [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion)
- **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685)
---
## 🔖 Citation
```bibtex
@misc{sourize_phi2_memory_deeptalks,
title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
author = {Sourish},
year = {2025},
howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
license = {MIT}
}
```
---
*Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).*
``` |