DIEGETIC: Epistemic Reasoning Language Model

Model Name: diegetic-v2-qwen2.5-1.5b-lora Base Model: Qwen/Qwen2.5-1.5B Method: LoRA Fine-tuning License: Apache 2.0


Model Description

DIEGETIC (Dynamically-grounded Inference Engine for Generative Epistemic Tracking In Conversation) is a language model fine-tuned for epistemic reasoning—the ability to track what it knows, what it doesn't know, and the appropriate level of certainty for each claim.

Key Capabilities

  • Source Attribution: Properly cites information sources with appropriate confidence
  • Appropriate Refusal: Refuses to answer unknowable questions without hallucination
  • Graduated Uncertainty: Uses calibrated confidence levels (high/medium/low/n/a)
  • Evidence Citation: Backs claims with specific observations
  • Unknown Tracking: Explicitly lists what information is missing or unknowable

Use Cases

  • Question answering from observations/context
  • Information verification and fact-checking
  • Legal/compliance reasoning with source tracking
  • Medical reasoning with uncertainty quantification
  • Scientific reasoning with explicit unknowns
  • Any application requiring transparent reasoning with proper confidence levels

Model Details

Architecture

  • Base Model: Qwen2.5-1.5B (1.5 billion parameters)
  • Fine-tuning Method: LoRA (Low-Rank Adaptation)
    • Rank (r): 16
    • Alpha: 32
    • Dropout: 0.05
    • Target modules: All attention and MLP layers
  • Trainable Parameters: 18.5M (1.18% of total)
  • Total Parameters: 1.56B

Training Data

  • Dataset Size: 20,000 synthetic examples
  • Training Format: Natural language observations + JSON responses
  • Categories:
    • Observation Use (25%)
    • Source Attribution (20%)
    • Complete Refusal (15%)
    • Graduated Uncertainty (15%)
    • Reasoning from Knowledge (12.5%)
    • Multi-step Reasoning (7.5%)
    • Partial Information (5%)

Training Configuration

  • Epochs: 3
  • Batch Size: 4 (per device)
  • Gradient Accumulation: 4
  • Effective Batch Size: 16
  • Learning Rate: 2e-5
  • Max Sequence Length: 512 tokens
  • Optimizer: AdamW
  • Training Time: 1.73 hours (NVIDIA A100)

Performance Metrics

  • Final Training Loss: 0.0196
  • Evaluation Loss: 0.0196
  • Token Accuracy: 99.24%
  • Valid JSON Rate: 100% (20/20 tests)
  • Refusal Accuracy: 100% (5/5 unknowable scenarios)
  • Overall Test Pass Rate: 100% (20/20 diverse scenarios)

Usage

Installation

pip install transformers peft torch

Basic Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
import json

# Load model
model_path = "howellx/diegetic-v2-qwen2.5-1.5b-lora"
base_model_name = "Qwen/Qwen2.5-1.5B"

# Load tokenizer from base model
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Add custom special tokens
special_tokens = {
    "additional_special_tokens": [
        "<OBS>", "</OBS>",
        "<BELIEF>", "</BELIEF>",
        "<MEM>", "</MEM>",
        "<TASK>", "</TASK>",
        "<OUTPUT_JSON>", "</OUTPUT_JSON>",
        "<EPISTEMIC>",
        "<REFUSE_DIEGETIC>"
    ]
}
tokenizer.add_special_tokens(special_tokens)

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True
)

# Resize token embeddings
base_model.resize_token_embeddings(len(tokenizer))

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, model_path, is_trainable=False)
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
model.eval()

# Format prompt
prompt = """You are a helpful assistant that answers questions based solely on provided observations.

What you observed:
Sarah told you that the meeting was cancelled. You have not verified this independently.

Question: Is the meeting cancelled?

Please provide your response as JSON with the following structure:
{
  "claims": ["list of claims you are making"],
  "confidence": "high/medium/low/n/a",
  "evidence": ["list of evidence from observations"],
  "unknown": ["list of things you don't know"]
}

Response:"""

# Generate
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.3,
    do_sample=True,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)

# Parse JSON
json_start = response.find("{")
json_end = response.rfind("}") + 1
result = json.loads(response[json_start:json_end])

print(json.dumps(result, indent=2))

Example Output

{
  "claims": [
    "Sarah said the meeting was cancelled"
  ],
  "confidence": "low",
  "evidence": [
    "Sarah stated cancellation"
  ],
  "unknown": [
    "Meeting actual status"
  ]
}

Performance Benchmarks

Test Suite Results (20 Diverse Scenarios)

Category Tests Pass Rate Quality Score
Refusal (Unknowable) 5 100% 4.00/4.0
Multi-step Reasoning 3 100% 4.00/4.0
Partial Information 3 100% 4.00/4.0
Source Attribution 3 100% 4.00/4.0
Graduated Uncertainty 3 100% 4.00/4.0
Direct Observation 3 100% 4.00/4.0
OVERALL 20 100% 4.00/4.0

Prompt Format

The model expects a specific prompt format:

You are a helpful assistant that answers questions based solely on provided observations.

What you observed:
[Your observations or context here]

Question: [Your question here]

Please provide your response as JSON with the following structure:
{
  "claims": ["list of claims you are making"],
  "confidence": "high/medium/low/n/a",
  "evidence": ["list of evidence from observations"],
  "unknown": ["list of things you don't know"]
}

Response:

Response Structure

  • claims: Array of factual claims being made
  • confidence: Overall confidence level
    • high: Direct observation, verified facts
    • medium: Reasonable inference, needs verification
    • low: Uncertain, conflicting info, second-hand
    • n/a: Unknowable, insufficient information
  • evidence: Array of evidence supporting claims
  • unknown: Array of information that is missing or unknowable

Limitations

Known Limitations

  1. Arithmetic: May have minor calculation errors
  2. Context Length: Limited to 512 tokens input
  3. Language: Primarily English
  4. Reasoning Depth: Complex multi-hop reasoning may need additional prompting

When NOT to Use

  • Critical decision-making without human oversight
  • Medical diagnosis (use only as decision support with expert review)
  • Legal advice (use only for research with professional review)
  • Financial advice (requires expert validation)
  • Safety-critical systems without extensive validation

Recommended Use

  • Research and analysis with human oversight
  • Information organization and structuring
  • Question answering from provided context
  • Uncertainty quantification in reasoning tasks
  • Source attribution and fact verification assistance

Ethical Considerations

Intended Use

This model is designed to improve epistemic reasoning in AI systems by:

  • Reducing hallucinations through appropriate refusal
  • Improving source attribution and transparency
  • Calibrating confidence levels appropriately
  • Explicitly tracking unknowns

Potential Misuse

  • Over-reliance: Users may trust model outputs without verification
  • Context Manipulation: Malicious actors could provide false observations
  • Confidence Miscalibration: Model confidence ≠ absolute certainty

Mitigation

  • Always verify critical claims against authoritative sources
  • Use model as decision support, not final arbiter
  • Review evidence and unknown fields for completeness
  • Combine with human expertise for important decisions

Training Details

Dataset Generation

The training dataset consists of 20,000 synthetic examples across 7 categories:

  1. Observation Use (5,000): Extract and use information from observations
  2. Source Attribution (4,000): Properly cite sources with confidence
  3. Complete Refusal (3,000): Refuse unknowable questions appropriately
  4. Graduated Uncertainty (3,000): Use calibrated confidence levels
  5. Reasoning from Knowledge (2,500): Apply general knowledge appropriately
  6. Multi-step Reasoning (1,500): Perform multi-hop inference
  7. Partial Information (1,000): Handle incomplete information

Training Procedure

  1. Base model: Qwen/Qwen2.5-1.5B
  2. LoRA adapters added to all attention and MLP layers
  3. Trained for 3 epochs on 20K examples
  4. Natural language format (not nested JSON) to reduce format errors
  5. Chat format support for flexible input styles

Key Innovations

  • Natural language observations: "What you observed:" format vs structured JSON
  • Explicit unknown tracking: Train model to list what it doesn't know
  • Confidence calibration: Varied examples with appropriate confidence levels
  • Refusal training: 15% of dataset focused on unknowable scenarios

Citation

If you use this model in your research, please cite:

@misc{diegetic2026,
  title={DIEGETIC: Epistemic Reasoning Language Model},
  author={Howell, Justin},
  year={2026},
  url={https://huggingface.co/howellx/diegetic-v2-qwen2.5-1.5b-lora},
  note={Production Release}
}

Links


License

This model is released under the Apache 2.0 License.

The base model (Qwen/Qwen2.5-1.5B) is also under Apache 2.0 License.


Acknowledgments

  • Base Model: Qwen Team for Qwen2.5-1.5B
  • Training Framework: HuggingFace Transformers, TRL, PEFT

Model Status: Production Ready Last Updated: February 5, 2026 Test Coverage: 20 scenarios, 100% pass rate

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for howellx/diegetic-v2-qwen2.5-1.5b-lora

Adapter
(514)
this model

Evaluation results

  • Test Accuracy (20 scenarios) on DIEGETIC Test Suite
    self-reported
    100.000
  • Refusal Accuracy on DIEGETIC Test Suite
    self-reported
    100.000
  • Source Attribution on DIEGETIC Test Suite
    self-reported
    100.000