Text Generation
PEFT
Safetensors
conversational-memory
information-extraction
long-context
lora
qwen2.5
conversational
Instructions to use AsadIsmail/prism-memory with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use AsadIsmail/prism-memory with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "AsadIsmail/prism-memory") - Notebooks
- Google Colab
- Kaggle
PRISM-Memory Release Results
This page summarizes the confirmed public release metrics and the internal comparison evidence that informed the release choice.
Released Model
- Model:
PRISM-Memory 7B Adapter - Base model:
Qwen/Qwen2.5-7B-Instruct - Adapter type: LoRA
- Confirmed LoCoMo mean:
0.4981204463 - Confirmed LongMemEval mean:
0.4767574431 - QA cache hits during confirmation:
460 - QA cache misses during confirmation:
0
Public Comparison
PRISM-Memory fine-tunes Qwen/Qwen2.5-7B-Instruct for the memory extraction
step that the PropMem reference gets from GPT-4.1.
| Benchmark | PRISM-Memory | GPT-4.1-based PropMem reference | Read |
|---|---|---|---|
| LongMemEval | 0.4768 |
0.4650 |
PRISM wins |
| LoCoMo | 0.4981 |
0.5360 |
PRISM trails, but stays competitive |
The QA layer is held constant. This is an extraction-step comparison, not an end-to-end GPT-4.1 replacement claim.
LoCoMo Breakdown
| Category | Score |
|---|---|
| factual | 0.3339551926 |
| temporal | 0.4978785870 |
| inferential | 0.2605997475 |
| multi-hop | 0.5144477744 |
| adversarial | 0.8837209302 |
LongMemEval Breakdown
| Category | Score |
|---|---|
| knowledge-update | 0.5588405797 |
| multi-session | 0.1390977444 |
| single-session-assistant | 0.7656395892 |
| single-session-preference | 0.0519667456 |
| single-session-user | 0.9133333333 |
| temporal-reasoning | 0.4316666667 |
Why This Model Was Released
The closest internal runner-up nearly tied the released model on overall LoCoMo, but it lost on the broader release profile:
- lower LongMemEval score:
0.4689 - weaker adversarial precision
- less balanced behavior across the full evaluation surface
Question-level comparison on held-out LoCoMo:
- disagreements:
152 / 400 - questions favoring PRISM-Memory:
56 - questions favoring the runner-up:
52
That is close enough to be a real internal comparison, but not close enough to justify two public models.
Artifact Files
- ../../results/release_summary.json
- ../../results/release_model.json
- ../../results/try_it_sessions.json
- ../../results/internal_locomo_pairwise_diffs.json
Related docs: