Lichang-Chen
/

ODIN-ppo-L230-best

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Lichang-Chen commited on Feb 14, 2024

Commit

38eac53

•

1 Parent(s): 52c1c2f

Update README.md

Files changed (1) hide show

README.md +5 -7

README.md CHANGED Viewed

@@ -8,22 +8,20 @@ tags:
 - PPO
 ---
-<!-- Provide a quick summary of what the model is/does. --
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [Lichang-Chen](https://huggingface.co/Lichang-Chen) and [Chen Zhu](https://scholar.google.com/citations?hl=zh-CN&user=m-om5O8AAAAJ)
 - **Model type:** RLHF model.
 - **Language(s) (NLP):** English
 - **Finetuned from model:** [Vicuna-7b](https://huggingface.co/lmsys/vicuna-7b-v1.5)
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->

 - PPO
 ---
 ## Model Details
+This is an official implementation of ODIN-ppo-L230-7B model, which is a chat assistant trained by fine-tuning LLaMA on Open-Assistant dataset via PPO.
+The L230 means the output length in LIMA test set is ~230. ODIN is the reward model for the training.
+## Model Description
+<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** [Lichang-Chen](https://huggingface.co/Lichang-Chen) and [Chen Zhu](https://scholar.google.com/citations?hl=zh-CN&user=m-om5O8AAAAJ)
 - **Model type:** RLHF model.
 - **Language(s) (NLP):** English
 - **Finetuned from model:** [Vicuna-7b](https://huggingface.co/lmsys/vicuna-7b-v1.5)
+### Model Sources
 <!-- Provide the basic links for the model. -->