Text Generation
Transformers
PyTorch
English
llama
causal-lm
Inference Endpoints
text-generation-inference
jon-tow commited on
Commit
c9e806e
1 Parent(s): c339279

revert naming paths

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -11,18 +11,18 @@ datasets:
11
  - tatsu-lab/alpaca
12
  ---
13
 
14
- # StableVicuna-13B: Fine-Tuned with RLHF
15
 
16
  ## Model Description
17
 
18
- StableVicuna-13B is a [Vicuna-13B](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
19
 
20
  ### Apply Delta Weights
21
 
22
- StableVicuna-13B cannot be used from the `stability/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `stability/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
23
 
24
  ```sh
25
- python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta stabilityai/stable-vicuna-13b-delta
26
  ```
27
 
28
 
@@ -81,7 +81,7 @@ The reward model used during RLHF was also trained on [OpenAssistant Conversatio
81
 
82
  ### Training Procedure
83
 
84
- `stabilityai/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
85
 
86
  | Hyperparameter | Value |
87
  |-------------------|---------|
@@ -118,7 +118,7 @@ The base LLaMA model is trained on various data, some of which may contain offen
118
 
119
  ## Acknowledgements
120
 
121
- This work would not have been possible without the support of [CarperAI](https://carper.ai/).
122
 
123
  ## Citations
124
 
 
11
  - tatsu-lab/alpaca
12
  ---
13
 
14
+ # StableVicuna-13B
15
 
16
  ## Model Description
17
 
18
+ StableVicuna-13B is a [Vicuna-13B v1.0](https://vicuna.lmsys.org/) model fine-tuned using reinforcement learning from human feedback (RLHF) via Proximal Policy Optimization (PPO) on various conversational and instructional datasets.
19
 
20
  ### Apply Delta Weights
21
 
22
+ StableVicuna-13B cannot be used from the `CarperAI/stable-vicuna-13b-delta` weights alone. To obtain the correct model, one must add back the difference between LLaMA 13B and `CarperAI/stable-vicuna-13b-delta` weights. We provide the [`apply_delta.py`](https://huggingface.co/CarperAI/stable-vicuna-13b-delta/raw/main/apply_delta.py) script to automate the conversion, which you can run as:
23
 
24
  ```sh
25
+ python3 apply_delta.py --base /path/to/model_weights/llama-13b --target stable-vicuna-13b --delta CarperAI/stable-vicuna-13b-delta
26
  ```
27
 
28
 
 
81
 
82
  ### Training Procedure
83
 
84
+ `CarperAI/sstable-vicuna-13b-delta` was trained using PPO as implemented in [`trlX`](https://github.com/CarperAI/trlx/blob/main/trlx/trainer/accelerate_ppo_trainer.py) with the following configuration:
85
 
86
  | Hyperparameter | Value |
87
  |-------------------|---------|
 
118
 
119
  ## Acknowledgements
120
 
121
+ This work would not have been possible without the support of [Stability AI](https://stability.ai/).
122
 
123
  ## Citations
124