Edit model card

Model Card for Model ID

TinyLlama-1.1B fine-tuned using DPO for QA.

This modelcard aims to be a base template for new models. It has been generated using this raw template.

Model Details

Model Description

TinyLlama-1.1B fine-tuned using Direct Preference Optimization (DPO) for Question Answering (QA) tasks, specifically, stem courses QA. The model leverages quantization and parameter-efficient fine-tuning (PEFT) techniques to optimize performance and efficiency.

  • Developed by: Kaan Uçar, Elias Naha, Albert Troussard
  • Model type: AutoModelForCausalLM
  • Language(s) (NLP): English
  • Finetuned from model: TinyLlama-1.1B-Chat-v0.1

Uses

Direct Use

This model can be used directly for question answering tasks without additional fine-tuning.

Downstream Use

The model can be fine-tuned further for specific QA datasets or integrated into larger systems for enhanced performance in question answering applications.

Out-of-Scope Use

The model is not suitable for tasks outside of question answering, such as generating creative content, providing medical or legal advice, or any use case requiring high levels of accuracy and reliability without proper validation.

Bias, Risks, and Limitations

The model may exhibit biases present in the training data and could potentially generate harmful content. Users should exercise caution and consider these limitations when deploying the model.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. Continuous monitoring and evaluation are recommended to mitigate potential negative impacts.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "kaanino/tiny_dpo"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

# Example usage
input_text = "What is the capital of France?"
inputs = tokenizer(input_text, return_tensors="pt")
outputs = model.generate(**inputs)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

We mainly used three sources of data :

Training Procedure

Direct Preference Optimization

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Learning rate: 1e-5
  • Batch size: 10
  • Epochs: 1
  • Optimizer: paged_adamw_8bit

Evaluation

Testing Data, Factors & Metrics

Testing Data

Factors

[More Information Needed]

Metrics

Results

Summary

Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.