Instructions to use howellx/diegetic-v2-qwen2.5-1.5b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use howellx/diegetic-v2-qwen2.5-1.5b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B") model = PeftModel.from_pretrained(base_model, "howellx/diegetic-v2-qwen2.5-1.5b-lora") - Notebooks
- Google Colab
- Kaggle
DIEGETIC: Epistemic Reasoning Language Model
Model Name: diegetic-v2-qwen2.5-1.5b-lora
Base Model: Qwen/Qwen2.5-1.5B
Method: LoRA Fine-tuning
License: Apache 2.0
Model Description
DIEGETIC (Dynamically-grounded Inference Engine for Generative Epistemic Tracking In Conversation) is a language model fine-tuned for epistemic reasoning—the ability to track what it knows, what it doesn't know, and the appropriate level of certainty for each claim.
Key Capabilities
- Source Attribution: Properly cites information sources with appropriate confidence
- Appropriate Refusal: Refuses to answer unknowable questions without hallucination
- Graduated Uncertainty: Uses calibrated confidence levels (high/medium/low/n/a)
- Evidence Citation: Backs claims with specific observations
- Unknown Tracking: Explicitly lists what information is missing or unknowable
Use Cases
- Question answering from observations/context
- Information verification and fact-checking
- Legal/compliance reasoning with source tracking
- Medical reasoning with uncertainty quantification
- Scientific reasoning with explicit unknowns
- Any application requiring transparent reasoning with proper confidence levels
Model Details
Architecture
- Base Model: Qwen2.5-1.5B (1.5 billion parameters)
- Fine-tuning Method: LoRA (Low-Rank Adaptation)
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target modules: All attention and MLP layers
- Trainable Parameters: 18.5M (1.18% of total)
- Total Parameters: 1.56B
Training Data
- Dataset Size: 20,000 synthetic examples
- Training Format: Natural language observations + JSON responses
- Categories:
- Observation Use (25%)
- Source Attribution (20%)
- Complete Refusal (15%)
- Graduated Uncertainty (15%)
- Reasoning from Knowledge (12.5%)
- Multi-step Reasoning (7.5%)
- Partial Information (5%)
Training Configuration
- Epochs: 3
- Batch Size: 4 (per device)
- Gradient Accumulation: 4
- Effective Batch Size: 16
- Learning Rate: 2e-5
- Max Sequence Length: 512 tokens
- Optimizer: AdamW
- Training Time: 1.73 hours (NVIDIA A100)
Performance Metrics
- Final Training Loss: 0.0196
- Evaluation Loss: 0.0196
- Token Accuracy: 99.24%
- Valid JSON Rate: 100% (20/20 tests)
- Refusal Accuracy: 100% (5/5 unknowable scenarios)
- Overall Test Pass Rate: 100% (20/20 diverse scenarios)
Usage
Installation
pip install transformers peft torch
Basic Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json
# Load model
model_path = "howellx/diegetic-v2-qwen2.5-1.5b-lora"
base_model_name = "Qwen/Qwen2.5-1.5B"
# Load tokenizer from base model
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
# Add custom special tokens
special_tokens = {
"additional_special_tokens": [
"<OBS>", "</OBS>",
"<BELIEF>", "</BELIEF>",
"<MEM>", "</MEM>",
"<TASK>", "</TASK>",
"<OUTPUT_JSON>", "</OUTPUT_JSON>",
"<EPISTEMIC>",
"<REFUSE_DIEGETIC>"
]
}
tokenizer.add_special_tokens(special_tokens)
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
base_model_name,
torch_dtype=torch.float16,
low_cpu_mem_usage=True
)
# Resize token embeddings
base_model.resize_token_embeddings(len(tokenizer))
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_path, is_trainable=False)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()
# Format prompt
prompt = """You are a helpful assistant that answers questions based solely on provided observations.
What you observed:
Sarah told you that the meeting was cancelled. You have not verified this independently.
Question: Is the meeting cancelled?
Please provide your response as JSON with the following structure:
{
"claims": ["list of claims you are making"],
"confidence": "high/medium/low/n/a",
"evidence": ["list of evidence from observations"],
"unknown": ["list of things you don't know"]
}
Response:"""
# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.3,
do_sample=True,
pad_token_id=tokenizer.pad_token_id,
eos_token_id=tokenizer.eos_token_id
)
response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)
# Parse JSON
json_start = response.find("{")
json_end = response.rfind("}") + 1
result = json.loads(response[json_start:json_end])
print(json.dumps(result, indent=2))
Example Output
{
"claims": [
"Sarah said the meeting was cancelled"
],
"confidence": "low",
"evidence": [
"Sarah stated cancellation"
],
"unknown": [
"Meeting actual status"
]
}
Performance Benchmarks
Test Suite Results (20 Diverse Scenarios)
| Category | Tests | Pass Rate | Quality Score |
|---|---|---|---|
| Refusal (Unknowable) | 5 | 100% | 4.00/4.0 |
| Multi-step Reasoning | 3 | 100% | 4.00/4.0 |
| Partial Information | 3 | 100% | 4.00/4.0 |
| Source Attribution | 3 | 100% | 4.00/4.0 |
| Graduated Uncertainty | 3 | 100% | 4.00/4.0 |
| Direct Observation | 3 | 100% | 4.00/4.0 |
| OVERALL | 20 | 100% | 4.00/4.0 |
Prompt Format
The model expects a specific prompt format:
You are a helpful assistant that answers questions based solely on provided observations.
What you observed:
[Your observations or context here]
Question: [Your question here]
Please provide your response as JSON with the following structure:
{
"claims": ["list of claims you are making"],
"confidence": "high/medium/low/n/a",
"evidence": ["list of evidence from observations"],
"unknown": ["list of things you don't know"]
}
Response:
Response Structure
- claims: Array of factual claims being made
- confidence: Overall confidence level
high: Direct observation, verified factsmedium: Reasonable inference, needs verificationlow: Uncertain, conflicting info, second-handn/a: Unknowable, insufficient information
- evidence: Array of evidence supporting claims
- unknown: Array of information that is missing or unknowable
Limitations
Known Limitations
- Arithmetic: May have minor calculation errors
- Context Length: Limited to 512 tokens input
- Language: Primarily English
- Reasoning Depth: Complex multi-hop reasoning may need additional prompting
When NOT to Use
- Critical decision-making without human oversight
- Medical diagnosis (use only as decision support with expert review)
- Legal advice (use only for research with professional review)
- Financial advice (requires expert validation)
- Safety-critical systems without extensive validation
Recommended Use
- Research and analysis with human oversight
- Information organization and structuring
- Question answering from provided context
- Uncertainty quantification in reasoning tasks
- Source attribution and fact verification assistance
Ethical Considerations
Intended Use
This model is designed to improve epistemic reasoning in AI systems by:
- Reducing hallucinations through appropriate refusal
- Improving source attribution and transparency
- Calibrating confidence levels appropriately
- Explicitly tracking unknowns
Potential Misuse
- Over-reliance: Users may trust model outputs without verification
- Context Manipulation: Malicious actors could provide false observations
- Confidence Miscalibration: Model confidence ≠ absolute certainty
Mitigation
- Always verify critical claims against authoritative sources
- Use model as decision support, not final arbiter
- Review evidence and unknown fields for completeness
- Combine with human expertise for important decisions
Training Details
Dataset Generation
The training dataset consists of 20,000 synthetic examples across 7 categories:
- Observation Use (5,000): Extract and use information from observations
- Source Attribution (4,000): Properly cite sources with confidence
- Complete Refusal (3,000): Refuse unknowable questions appropriately
- Graduated Uncertainty (3,000): Use calibrated confidence levels
- Reasoning from Knowledge (2,500): Apply general knowledge appropriately
- Multi-step Reasoning (1,500): Perform multi-hop inference
- Partial Information (1,000): Handle incomplete information
Training Procedure
- Base model: Qwen/Qwen2.5-1.5B
- LoRA adapters added to all attention and MLP layers
- Trained for 3 epochs on 20K examples
- Natural language format (not nested JSON) to reduce format errors
- Chat format support for flexible input styles
Key Innovations
- Natural language observations: "What you observed:" format vs structured JSON
- Explicit unknown tracking: Train model to list what it doesn't know
- Confidence calibration: Varied examples with appropriate confidence levels
- Refusal training: 15% of dataset focused on unknowable scenarios
Citation
If you use this model in your research, please cite:
@misc{diegetic2026,
title={DIEGETIC: Epistemic Reasoning Language Model},
author={Howell, Justin},
year={2026},
url={https://huggingface.co/howellx/diegetic-v2-qwen2.5-1.5b-lora},
note={Production Release}
}
Links
- Demo: https://huggingface.co/spaces/howellx/diegetic-v2-demo
- Model: https://huggingface.co/howellx/diegetic-v2-qwen2.5-1.5b-lora
- White Paper: DIEGETIC Academic Whitepaper
License
This model is released under the Apache 2.0 License.
The base model (Qwen/Qwen2.5-1.5B) is also under Apache 2.0 License.
Acknowledgments
- Base Model: Qwen Team for Qwen2.5-1.5B
- Training Framework: HuggingFace Transformers, TRL, PEFT
Model Status: Production Ready Last Updated: February 5, 2026 Test Coverage: 20 scenarios, 100% pass rate
- Downloads last month
- 2
Model tree for howellx/diegetic-v2-qwen2.5-1.5b-lora
Base model
Qwen/Qwen2.5-1.5BEvaluation results
- Test Accuracy (20 scenarios) on DIEGETIC Test Suiteself-reported100.000
- Refusal Accuracy on DIEGETIC Test Suiteself-reported100.000
- Source Attribution on DIEGETIC Test Suiteself-reported100.000