Model Card for Model ID

TinyLlama-1.1B fine-tuned using DPO for QA.

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

TinyLlama-1.1B fine-tuned using Direct Preference Optimization (DPO) for Question Answering (QA) tasks, specifically, stem courses QA. The model leverages quantization and parameter-efficient fine-tuning (PEFT) techniques to optimize performance and efficiency.

Developed by: Kaan Uçar, Elias Naha, Albert Troussard
Model type: AutoModelForCausalLM
Language(s) (NLP): English
Finetuned from model: TinyLlama-1.1B-Chat-v0.1

Uses

Direct Use

This model can be used directly for question answering tasks without additional fine-tuning.

Downstream Use

The model can be fine-tuned further for specific QA datasets or integrated into larger systems for enhanced performance in question answering applications.

Out-of-Scope Use

The model is not suitable for tasks outside of question answering, such as generating creative content, providing medical or legal advice, or any use case requiring high levels of accuracy and reliability without proper validation.

Bias, Risks, and Limitations

The model may exhibit biases present in the training data and could potentially generate harmful content. Users should exercise caution and consider these limitations when deploying the model.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. Continuous monitoring and evaluation are recommended to mitigate potential negative impacts.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "kaanino/tiny_dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Example usage
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

We mainly used three sources of data :

Open Platypus Dataset on Hugging Face. This dataset provides Question and Chosen answer, we generated Rejected answer using GPT-2.
Stack Exchange Dataset on Hugging Face. A prepocessed version of the H4 Stack Exchange Dataset on Hugging Face.
A preference dataset generated with GPT-2 using EPFL stem courses' questions.

Training Procedure

Direct Preference Optimization

Training Hyperparameters

Training regime: Mixed precision (fp16)
Learning rate: 1e-5
Batch size: 10
Epochs: 1
Optimizer: paged_adamw_8bit

kaanino
/

tiny_dpo

Model Card for Model ID

Model Details

Model Description

Uses

Direct Use

Downstream Use

Out-of-Scope Use

Bias, Risks, and Limitations

Recommendations

How to Get Started with the Model

Training Details

Training Data

Training Procedure

Training Hyperparameters

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

Metrics

Results

Summary