Edit model card

模型介绍

  • 目标:通过ORPO技术对模型进行训练,以期达到以往指令微调加基于人类反馈的强化学习的效果
  • 使用模型:LLaMA3-8B
  • 使用数据集:mlabonne/orpo-dpo-mix-40k(共有数据44245条数据,仅使用了其中10000条数据)
  • 使用显卡:RTX 4090,24G
  • epoch:1
  • per_device_train_batch_size=2
  • gradient_accumulation_steps=4

模型使用

import transformers
import torch

model_id = "snowfly/llama3-8B-ORPO"

pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")

print(pipeline("Hey how are you doing today?"))

未完待续

  • 使用的显卡显存不足,每个批次的数据量较少,训练中loss图上急剧震荡。后续再更多更大显存显卡上进行更大批处理数量上进行多轮训练
  • 使用上述配置在全数据上训练3epoch需要72小时,实际实践使用其中随机选取的10000条数据训练1epoch
Downloads last month
6
Safetensors
Model size
8.03B params
Tensor type
FP16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train snowfly/llama3-8B-ORPO