snowfly
/

llama3-8B-ORPO

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

模型介绍

目标：通过ORPO技术对模型进行训练，以期达到以往指令微调加基于人类反馈的强化学习的效果
使用模型：LLaMA3-8B
使用数据集：mlabonne/orpo-dpo-mix-40k（共有数据44245条数据，仅使用了其中10000条数据）
使用显卡：RTX 4090，24G
epoch：1
per_device_train_batch_size=2
gradient_accumulation_steps=4

模型使用

import transformers
import torch

model_id = "snowfly/llama3-8B-ORPO"

pipeline = transformers.pipeline("text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device_map="auto")

print(pipeline("Hey how are you doing today?"))

未完待续

使用的显卡显存不足，每个批次的数据量较少，训练中loss图上急剧震荡。后续再更多更大显存显卡上进行更大批处理数量上进行多轮训练
使用上述配置在全数据上训练3epoch需要72小时，实际实践使用其中随机选取的10000条数据训练1epoch

Downloads last month: 15

Safetensors

Model size

8.03B params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Dataset used to train snowfly/llama3-8B-ORPO