Update README.md

fa9d7a5 verified 3 months ago

4.79 kB

	---
	base_model: microsoft/phi-2
	library_name: peft
	license: mit
	tags:
	- text-generation
	pipeline_tag: text-generation
	datasets:
	- NuclearAi/HyperThink-Mini-50K
	---

	# phi2-memory-deeptalks

	A LoRA adapter for the Phi-2 language model, fine-tuned on short conversational snippets to provide short-term memory in dialogue. This adapter enables your assistant to recall and leverage the last few user/assistant turns—without full fine-tuning of the 2.7 B-parameter base model.

	<p align="center">
	<a href="https://huggingface.co/spaces/sourize/DeepTalks">
	🔗 Live Demo on Hugging Face Spaces
	</a>
	</p>
	<p align="center">
	⏳ It takes time to generate responses since it's running on the CPU free tier
	</p>


	---

	## 🚀 Overview

	phi2-memory-deeptalks injects lightweight, low-rank corrections into the attention and MLP layers of `microsoft/phi-2`.
	- Size: ~6 M trainable parameters (≈ 0.2 % of the base model)
	- Base: Phi-2 (2.7 B parameters)
	- Adapter: Low-Rank Adaptation (LoRA) via the [PEFT](https://github.com/huggingface/peft) library

	---

	## 📦 Model Details

	### Architecture & Adapter Configuration

	- Base model: `microsoft/phi-2` (causal-LM)
	- LoRA rank (r): 4
	- Modules wrapped:
	- Attention projections: `q_proj`, `k_proj`, `v_proj`, `dense`
	- MLP layers: `fc1`, `fc2`
	- LoRA hyperparameters:
	- `lora_alpha`: 32
	- `lora_dropout`: 0.05
	- Trainable params: ~5.9 M

	### Training Data & Preprocessing

	- Dataset: HyperThink-Mini 50 K (7 % used)
	- Prompt format:
	```text
	### Human:
	<user message>

	### Assistant:
	<assistant response>
	```
	- Tokenization: Truncated/padded to 256 tokens, `labels = input_ids`
	- Optimizer: AdamW (PyTorch), FP16 on GPU
	- Batching: `per_device_train_batch_size=1` + `gradient_accumulation_steps=8`
	- Epochs: 3
	- Checkpointing: Save every 500 steps; final adapter weights in `adapter_model.safetensors`

	---

	## 🎯 Evaluation

	- Training loss (step 500): ~1.08
	- Validation loss: ~1.10
	- Qualitative:
	- Improved recall of the last 2–4 turns in dialogue
	- Maintains base Phi-2 fluency on general language

	---

	## 🔧 Usage

	Load the adapter into your Phi-2 model with just a few lines:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel, LoraConfig

	# 1) Load base
	tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", padding_side="left")
	model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")

	# 2) Apply LoRA adapter
	peft_config = LoraConfig.from_pretrained("sourize/phi2-memory-deeptalks")
	model = PeftModel.from_pretrained(model, peft_config)

	# 3) (Optional) Resize embeddings
	model.base_model.resize_token_embeddings(len(tokenizer))

	# 4) Generate
	prompt = "### Human:\nHello, how are you?\n\n### Assistant:"
	inputs = tokenizer(prompt, return_tensors="pt")
	output = model.generate(**inputs, max_new_tokens=64)
	print(tokenizer.decode(output[0], skip_special_tokens=True))
	```

	---

	## ⚙️ Inference & Deployment

	- Preferred: GPU (NVIDIA-CUDA) for sub-second latency
	- CPU-only: ~7–10 min per response (large model!)
	- Hugging Face Inference API:
	```bash
	curl -X POST \
	-H "Authorization: Bearer $HF_TOKEN" \
	-H "Content-Type: application/json" \
	https://api-inference.huggingface.co/pipeline/text-generation/sourize/phi2-memory-deeptalks \
	-d '{
	"inputs": "Hello, how are you?",
	"parameters": {
	"max_new_tokens": 64,
	"do_sample": true,
	"temperature": 0.7,
	"top_p": 0.9,
	"return_full_text": false
	}
	}'
	```

	---

	## 💡 Use Cases & Limitations

	- Ideal for:
	- Short back-and-forth chats (2–4 turns)
	- Chatbots that need to “remember” very recent context
	- Not suited for:
	- Long-term memory or document-level retrieval
	- High-volume production on CPU (too slow)

	---

	## 📖 Further Reading

	- Live Demo: [DeepTalks Space](https://huggingface.co/spaces/sourize/DeepTalks)
	- Blog post: [DeepTalks Blog](https://sourish.xyz/thoughts/deeptalks-your-personal-ai-companion)
	- PEFT & LoRA: [PEFT GitHub](https://github.com/huggingface/peft) \| [LoRA Paper](https://arxiv.org/abs/2106.09685)

	---

	## 🔖 Citation

	```bibtex
	@misc{sourize_phi2_memory_deeptalks,
	title = {phi2-memory-lora: LoRA adapter for Phi-2 with short-term conversational memory},
	author = {Sourish},
	year = {2025},
	howpublished = {\url{https://huggingface.co/sourize/phi2-memory-deeptalks}},
	license = {MIT}
	}
	```

	---

	Questions or feedback? Please open an issue on the [repository](https://huggingface.co/sourize/phi2-memory-deeptalks).
	```