bikalnetomi
/

RLHF-PPO-PPOModel-LLama3-1B-v1.0

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions Community

bikalnetomi commited on Dec 2, 2024

Commit

98e1741

·

verified ·

1 Parent(s): 7fda261

End of training

Files changed (3) hide show

README.md +1 -1
model.safetensors +1 -1
training_args.bin +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/netomi-ml/huggingface/runs/8vrsr3rz)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/netomi-ml/huggingface/runs/mt43tltq)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:775cf37688a30b03318167e270c3cadcbb0e8795d4abdeb60a6ff07460e6d50a
 size 4943274328

 version https://git-lfs.github.com/spec/v1
+oid sha256:50597dfb68111183cffefe64c028943bedd4a4d84b2231c0e0f66ead73dd7001
 size 4943274328

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0276d25764e63647f3da420a76447aed21a59fbf083fdc8a9dc8f423904cd756
 size 6072

 version https://git-lfs.github.com/spec/v1
+oid sha256:916429b569f66522e463f41239cf829f1d71a3b1cbdc888a63ca86fc46721547
 size 6072