File size: 4,793 Bytes

---
base_model: microsoft/phi-2
library_name: peft
license: mit
tags:
- text-generation
pipeline_tag: text-generation
datasets:
- NuclearAi/HyperThink-Mini-50K
---

# phi2-memory-deeptalks

A **LoRA adapter** for the Phi-2 language model, fine-tuned on short conversational snippets to provide **short-term memory** in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.

<p align="center">
  <a href="https://huggingface.co/spaces/sourize/DeepTalks">
    🔗 Live Demo on Hugging Face Spaces
  </a>
</p>
<p align="center">
  ⏳ It takes time to generate responses since it's running on the CPU free tier
</p>


---

## 🚀 Overview

**phi2-memory-deeptalks** injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.  
- **Size:** ~6 M trainable parameters (≈ 0.2 % of the base model)  
- **Base:** Phi-2 (2.7 B parameters)  
- **Adapter:** Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library  

---

## 📦 Model Details

### Architecture & Adapter Configuration

- **Base model:** `microsoft/phi-2` (causal-LM)  
- **LoRA rank (r):** 4  
- **Modules wrapped:**  
  - Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`  
  - MLP layers: `fc1`, `fc2`  
- **LoRA hyperparameters:**  
  - `lora_alpha`: 32  
  - `lora_dropout`: 0.05  
  - **Trainable params:** ~5.9 M  

### Training Data & Preprocessing

- **Dataset:** HyperThink-Mini 50 K (7 % used)  
- **Prompt format:**  
  ```text
  ### Human:
  <user message>

  ### Assistant:
  <assistant response>
  ```
- **Tokenization:** Truncated/padded to 256 tokens, `labels = input_ids`  
- **Optimizer:** AdamW (PyTorch), FP16 on GPU  
- **Batching:** `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`  
- **Epochs:** 3  
- **Checkpointing:** Save every 500 steps; final adapter weights in `adapter_model.safetensors`  

---

## 🎯 Evaluation

- **Training loss (step 500):** ~1.08  
- **Validation loss:** ~1.10  
- **Qualitative:**  
  - Improved recall of the last 2–4 turns in dialogue  
  - Maintains base Phi-2 fluency on general language  

---

## 🔧 Usage

Load the adapter into your Phi-2 model with just a few lines:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel, LoraConfig

# 1) Load base
tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")

# 2) Apply LoRA adapter
peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
model = PeftModel.from_pretrained(model, peft_config)

# 3) (Optional) Resize embeddings
model.base_model.resize_token_embeddings(len(tokenizer))

# 4) Generate
prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
inputs = tokenizer(prompt, return_tensors="pt")
output = model.generate(**inputs, max_new_tokens=64)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

---

## ⚙️ Inference & Deployment

- **Preferred:** GPU (NVIDIA-CUDA) for sub-second latency  
- **CPU-only:** ~7–10 min per response (large model!)  
- **Hugging Face Inference API:**  
  ```bash
  curl -X POST \
    -H "Authorization: Bearer $HF_TOKEN" \
    -H "Content-Type: application/json" \
    https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
    -d '{
      "inputs": "Hello, how are you?",
      "parameters": {
        "max_new_tokens": 64,
        "do_sample": true,
        "temperature": 0.7,
        "top_p": 0.9,
        "return_full_text": false
      }
    }'
  ```

---

## 💡 Use Cases & Limitations

- **Ideal for:**  
  - Short back-and-forth chats (2–4 turns)  
  - Chatbots that need to “remember” very recent context  
- **Not suited for:**  
  - Long-term memory or document-level retrieval  
  - High-volume production on CPU (too slow)  

---

## 📖 Further Reading

- **Live Demo:** [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)  
- **Blog post:** [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion) 
- **PEFT & LoRA:** [PEFT GitHub](https://github.com/huggingface/peft) | [LoRA Paper](https://arxiv.org/abs/2106.09685)  

---

## 🔖 Citation

```bibtex
@misc{sourize_phi2_memory_deeptalks,
  title        = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
  author       = {Sourish},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
  license      = {MIT}
}
```

---

*Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).*
```