Instructions to use Haldi247/TinyLlama-DPO-Orca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Haldi247/TinyLlama-DPO-Orca with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Haldi247/TinyLlama-DPO-Orca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Haldi247/TinyLlama-DPO-Orca")
model = AutoModelForCausalLM.from_pretrained("Haldi247/TinyLlama-DPO-Orca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use Haldi247/TinyLlama-DPO-Orca with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Haldi247/TinyLlama-DPO-Orca with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Haldi247/TinyLlama-DPO-Orca"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Haldi247/TinyLlama-DPO-Orca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Haldi247/TinyLlama-DPO-Orca

SGLang

How to use Haldi247/TinyLlama-DPO-Orca with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Haldi247/TinyLlama-DPO-Orca" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Haldi247/TinyLlama-DPO-Orca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Haldi247/TinyLlama-DPO-Orca" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Haldi247/TinyLlama-DPO-Orca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Haldi247/TinyLlama-DPO-Orca with Docker Model Runner:
```
docker model run hf.co/Haldi247/TinyLlama-DPO-Orca
```

Model Card for TinyLlama-DPO-Orca

This model is the result of a two-stage alignment pipeline applied to the TinyLlama-1.1B base model, utilizing Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) to align outputs with human preference data.

Model Details

Model Description

Developed by: Hadeeqa Al Islam
Model type: Causal Language Model
Language(s) (NLP): English
Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Training data: argilla/distilabel-intel-orca-dpo-pairs (5,922 filtered samples)
LoRA config: r=16, lora_alpha=16, target_modules=[q_proj, k_proj, v_proj, o_proj]
Training: lr=1e-07, batch=4 (effective batch size 16), epochs=2, bf16=True, beta=0.1

Uses

Direct Use

This model is intended to be used for text generation and conversational question-answering based on the Orca preference dataset style.

Out-of-Scope Use

Due to known issues with token collapse, this model is not suitable for production deployment or long-form reliable generation without further prompt engineering or tokenizer alignment.

Bias, Risks, and Limitations

Known Issue: Token Collapse During the SFT phase, sequence packing was implemented using add_special_tokens=False to strictly prevent cross-contamination across VRAM blocks. While this optimized memory and isolated sequences, it caused a severe distribution shift away from TinyLlama's pre-trained chat template.

Consequently, during standard inference with the tokenizer's chat template applied, the DPO model experiences catastrophic forgetting and token collapse (frequently outputting loops of |> or < < <). The extremely low BLEU score reflects this formatting mismatch rather than an inability to learn the underlying linguistic representations.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Future iterations should harmonize the sequence packing tokenization with the base model's inherent chat structure.

How to Get Started with the Model

model_id = "Haldi247/TinyLlama-DPO-Orca" tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [{"role": "user", "content": prompt}]

Training Details

Training Data

The DPO phase utilized the argilla/distilabel-intel-orca-dpo-pairs dataset. To ensure high-quality preference alignment, the data was rigorously filtered:

Removed tied responses (status != tie).
Required a high chosen score (chosen_score >= 8).
Excluded GSM8K training data to prevent contamination (not in_gsm8k_train).

Final Dataset Size: 5,922 preference pairs.

Training Procedure

The model was trained using the trl and peft libraries for Parameter-Efficient Fine-Tuning (PEFT) via LoRA.

Training Hyperparameters

Training regime: bf16 mixed precision (bf16=True to prevent gradient underflow)
LoRA Rank (r): 16
LoRA Alpha: 16
Target Modules: q_proj, k_proj, v_proj, o_proj
Beta (DPO Temperature): 0.1
Learning Rate: 1e-07
Batch Size: 4 (with gradient_accumulation_steps=4, resulting in an effective batch size of 16)
Epochs: 2
Max Length: 512
Attention Implementation: Scaled Dot-Product Attention (sdpa)

Speeds, Sizes, Times

Training Time: 30.1 minutes

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated against a custom set of 10 QA prompts using reference answers.

Metrics

Performance was measured using linguistic overlap and semantic similarity metrics:

BLEU
BERTScore
Training Loss

Results

Average BLEU: 0.0282
Average BERTScore: 0.7870
Final Train Loss: 0.6928

Technical Specifications

Compute Infrastructure

Hardware

Hardware Type: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
Compute Region: Local deployment via WSL2

Summary

Model Examination [optional]

[More Information Needed]

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: [More Information Needed]
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications [optional]

Model Architecture and Objective

[More Information Needed]

Compute Infrastructure

[More Information Needed]

Hardware

[More Information Needed]

Software

[More Information Needed]

Citation [optional]

BibTeX:

[More Information Needed]

APA:

[More Information Needed]

Glossary [optional]

[More Information Needed]

More Information [optional]

[More Information Needed]

Model Card Authors [optional]

[More Information Needed]

Model Card Contact

[More Information Needed]

Downloads last month: 46

Safetensors

Model size

1B params

Tensor type

F16

Model tree for Haldi247/TinyLlama-DPO-Orca

Base model

TinyLlama/TinyLlama-1.1B-Chat-v1.0

Adapter

(1536)

this model

Dataset used to train Haldi247/TinyLlama-DPO-Orca

Paper for Haldi247/TinyLlama-DPO-Orca

Quantifying the Carbon Emissions of Machine Learning

Paper • 1910.09700 • Published Oct 21, 2019 • 52