Instructions to use Haldi247/TinyLlama-DPO-Orca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Haldi247/TinyLlama-DPO-Orca with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Haldi247/TinyLlama-DPO-Orca") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Haldi247/TinyLlama-DPO-Orca") model = AutoModelForCausalLM.from_pretrained("Haldi247/TinyLlama-DPO-Orca") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - PEFT
How to use Haldi247/TinyLlama-DPO-Orca with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Haldi247/TinyLlama-DPO-Orca with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Haldi247/TinyLlama-DPO-Orca" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Haldi247/TinyLlama-DPO-Orca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Haldi247/TinyLlama-DPO-Orca
- SGLang
How to use Haldi247/TinyLlama-DPO-Orca with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Haldi247/TinyLlama-DPO-Orca" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Haldi247/TinyLlama-DPO-Orca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Haldi247/TinyLlama-DPO-Orca" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Haldi247/TinyLlama-DPO-Orca", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Haldi247/TinyLlama-DPO-Orca with Docker Model Runner:
docker model run hf.co/Haldi247/TinyLlama-DPO-Orca
- Model Card for TinyLlama-DPO-Orca
- Model Details
- Uses
- Bias, Risks, and Limitations
- How to Get Started with the Model
- Training Details
- Evaluation
- Technical Specifications
- Model Examination [optional]
- Environmental Impact
- Technical Specifications [optional]
- Citation [optional]
- Glossary [optional]
- More Information [optional]
- Model Card Authors [optional]
- Model Card Contact
Model Card for TinyLlama-DPO-Orca
This model is the result of a two-stage alignment pipeline applied to the TinyLlama-1.1B base model, utilizing Supervised Fine-Tuning (SFT) followed by Direct Preference Optimization (DPO) to align outputs with human preference data.
Model Details
Model Description
- Developed by: Hadeeqa Al Islam
- Model type: Causal Language Model
- Language(s) (NLP): English
- Finetuned from model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
- Training data: argilla/distilabel-intel-orca-dpo-pairs (5,922 filtered samples)
- LoRA config: r=16, lora_alpha=16, target_modules=[q_proj, k_proj, v_proj, o_proj]
- Training: lr=1e-07, batch=4 (effective batch size 16), epochs=2, bf16=True, beta=0.1
Uses
Direct Use
This model is intended to be used for text generation and conversational question-answering based on the Orca preference dataset style.
Out-of-Scope Use
Due to known issues with token collapse, this model is not suitable for production deployment or long-form reliable generation without further prompt engineering or tokenizer alignment.
Bias, Risks, and Limitations
Known Issue: Token Collapse
During the SFT phase, sequence packing was implemented using add_special_tokens=False to strictly prevent cross-contamination across VRAM blocks. While this optimized memory and isolated sequences, it caused a severe distribution shift away from TinyLlama's pre-trained chat template.
Consequently, during standard inference with the tokenizer's chat template applied, the DPO model experiences catastrophic forgetting and token collapse (frequently outputting loops of |> or < < <). The extremely low BLEU score reflects this formatting mismatch rather than an inability to learn the underlying linguistic representations.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. Future iterations should harmonize the sequence packing tokenization with the base model's inherent chat structure.
How to Get Started with the Model
model_id = "Haldi247/TinyLlama-DPO-Orca" tokenizer = AutoTokenizer.from_pretrained(model_id) messages = [{"role": "user", "content": prompt}]
Training Details
Training Data
The DPO phase utilized the argilla/distilabel-intel-orca-dpo-pairs dataset. To ensure high-quality preference alignment, the data was rigorously filtered:
- Removed tied responses (
status != tie). - Required a high chosen score (
chosen_score >= 8). - Excluded GSM8K training data to prevent contamination (
not in_gsm8k_train).
Final Dataset Size: 5,922 preference pairs.
Training Procedure
The model was trained using the trl and peft libraries for Parameter-Efficient Fine-Tuning (PEFT) via LoRA.
Training Hyperparameters
- Training regime: bf16 mixed precision (
bf16=Trueto prevent gradient underflow) - LoRA Rank (r): 16
- LoRA Alpha: 16
- Target Modules:
q_proj,k_proj,v_proj,o_proj - Beta (DPO Temperature): 0.1
- Learning Rate: 1e-07
- Batch Size: 4 (with
gradient_accumulation_steps=4, resulting in an effective batch size of 16) - Epochs: 2
- Max Length: 512
- Attention Implementation: Scaled Dot-Product Attention (
sdpa)
Speeds, Sizes, Times
- Training Time: 30.1 minutes
Evaluation
Testing Data, Factors & Metrics
Testing Data
The model was evaluated against a custom set of 10 QA prompts using reference answers.
Metrics
Performance was measured using linguistic overlap and semantic similarity metrics:
- BLEU
- BERTScore
- Training Loss
Results
- Average BLEU: 0.0282
- Average BERTScore: 0.7870
- Final Train Loss: 0.6928
Technical Specifications
Compute Infrastructure
Hardware
- Hardware Type: NVIDIA GeForce RTX 5070 Ti (16GB VRAM)
- Compute Region: Local deployment via WSL2
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Architecture and Objective
[More Information Needed]
Compute Infrastructure
[More Information Needed]
Hardware
[More Information Needed]
Software
[More Information Needed]
Citation [optional]
BibTeX:
[More Information Needed]
APA:
[More Information Needed]
Glossary [optional]
[More Information Needed]
More Information [optional]
[More Information Needed]
Model Card Authors [optional]
[More Information Needed]
Model Card Contact
[More Information Needed]
- Downloads last month
- 46
Model tree for Haldi247/TinyLlama-DPO-Orca
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0