CarperAI
/

stable-vicuna-13b-delta

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

jon-tow commited on Apr 28, 2023

Commit

c9e806e

•

1 Parent(s): c339279

revert naming paths

Files changed (1) hide show

README.md +6 -6

README.md CHANGED Viewed

@@ -11,18 +11,18 @@ datasets:
   - tatsu-lab/alpaca
 ---
-# StableVicuna-13B: Fine-Tuned with RLHF
 ## Model Description
-StableVicuna-13B is a [Vicuna-13B](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
 ### Apply Delta Weights
-StableVicuna-13B cannot be used from the `stability/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `stability/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
 ```sh
-python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta stabilityai/stable-vicuna-13b-delta
 ```
@@ -81,7 +81,7 @@ The reward model used during RLHF was also trained on [OpenAssistant Conversatio
 ### Training Procedure
-`stabilityai/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
 |  Hyperparameter   |  Value  |
 |-------------------|---------|
@@ -118,7 +118,7 @@ The base LLaMA model is trained on various data, some of which may contain offen
 ## Acknowledgements
-This work would not have been possible without the support of [CarperAI](https://carper.ai/).
 ## Citations

   - tatsu-lab/alpaca
 ---
+# StableVicuna-13B
 ## Model Description
+StableVicuna-13B is a [Vicuna-13B v1.0](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
 ### Apply Delta Weights
+StableVicuna-13B cannot be used from the `CarperAI/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `CarperAI/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
 ```sh
+python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta
 ```
 ### Training Procedure
+`CarperAI/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
 |  Hyperparameter   |  Value  |
 |-------------------|---------|
 ## Acknowledgements
+This work would not have been possible without the support of [Stability AI](https://stability.ai/).
 ## Citations