Model Card for Model ID
TinyLlama-1.1B fine-tuned using DPO for QA.
This modelcard aims to be a base template for new models. It has been generated using this raw template.
Model Details
Model Description
TinyLlama-1.1B fine-tuned using Direct Preference Optimization (DPO) for Question Answering (QA) tasks, specifically, stem courses QA. The model leverages quantization and parameter-efficient fine-tuning (PEFT) techniques to optimize performance and efficiency.
- Developed by: Kaan Uçar, Elias Naha, Albert Troussard
- Model type: AutoModelForCausalLM
- Language(s) (NLP): English
- Finetuned from model: TinyLlama-1.1B-Chat-v0.1
Uses
Direct Use
This model can be used directly for question answering tasks without additional fine-tuning.
Downstream Use
The model can be fine-tuned further for specific QA datasets or integrated into larger systems for enhanced performance in question answering applications.
Out-of-Scope Use
The model is not suitable for tasks outside of question answering, such as generating creative content, providing medical or legal advice, or any use case requiring high levels of accuracy and reliability without proper validation.
Bias, Risks, and Limitations
The model may exhibit biases present in the training data and could potentially generate harmful content. Users should exercise caution and consider these limitations when deploying the model.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. Continuous monitoring and evaluation are recommended to mitigate potential negative impacts.
How to Get Started with the Model
Use the code below to get started with the model.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "kaanino/tiny_dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
# Example usage
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))
Training Details
Training Data
We mainly used three sources of data :
- Open Platypus Dataset on Hugging Face. This dataset provides Question and Chosen answer, we generated Rejected answer using GPT-2.
- Stack Exchange Dataset on Hugging Face. A prepocessed version of the H4 Stack Exchange Dataset on Hugging Face.
- A preference dataset generated with GPT-2 using EPFL stem courses' questions.
Training Procedure
Direct Preference Optimization
Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Learning rate: 1e-5
- Batch size: 10
- Epochs: 1
- Optimizer: paged_adamw_8bit
Evaluation
Testing Data, Factors & Metrics
Testing Data
Factors
[More Information Needed]
Metrics
Results
Summary
- Downloads last month
- 4