Update README.md
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ The model is tuned after 4 iterations of online alignment. In each iteration, we
|
|
22 |
|
23 |
- Step 3: Apply CUT to fine-tune the target model with the above instruction-response-judgment triplets.
|
24 |
|
25 |
-
|
26 |
To avoid over-fitting, we ensure that the sampled data are different in each iteration.
|
27 |
We then ask GPT4 for the judgment annotation.
|
28 |
|
@@ -40,7 +40,7 @@ Below is an instruction that describes a task. Write a response that appropriate
|
|
40 |
|
41 |
### 3. How to use
|
42 |
|
43 |
-
#### 1. Huggingface
|
44 |
|
45 |
```python
|
46 |
import torch
|
@@ -63,7 +63,7 @@ text = tokenizer.batch_decode(outputs)[0]
|
|
63 |
print(text)
|
64 |
```
|
65 |
|
66 |
-
#### 2. FastChat
|
67 |
|
68 |
[Fastchat](https://github.com/lm-sys/FastChat) provides a simple setup for those interested in trying our aligned model. After downloading the [CUT model](https://huggingface.co/xww033/cut-13b) through HuggingFace, clone the Fastchat repository:
|
69 |
|
|
|
22 |
|
23 |
- Step 3: Apply CUT to fine-tune the target model with the above instruction-response-judgment triplets.
|
24 |
|
25 |
+
Specifically, we use [LLaMA2-chat-13b](https://huggingface.co/meta-llama/Llama-2-13b-chat-hf) as the base LLM. In each iteration, we sample 1000 instructions from [Stanford Alpaca](https://github.com/tatsu-lab/stanford_alpaca).
|
26 |
To avoid over-fitting, we ensure that the sampled data are different in each iteration.
|
27 |
We then ask GPT4 for the judgment annotation.
|
28 |
|
|
|
40 |
|
41 |
### 3. How to use
|
42 |
|
43 |
+
#### 3.1. Huggingface
|
44 |
|
45 |
```python
|
46 |
import torch
|
|
|
63 |
print(text)
|
64 |
```
|
65 |
|
66 |
+
#### 3.2. FastChat
|
67 |
|
68 |
[Fastchat](https://github.com/lm-sys/FastChat) provides a simple setup for those interested in trying our aligned model. After downloading the [CUT model](https://huggingface.co/xww033/cut-13b) through HuggingFace, clone the Fastchat repository:
|
69 |
|