Update README.md
Browse files
README.md
CHANGED
@@ -17,14 +17,14 @@ metrics:
|
|
17 |
- accuracy
|
18 |
---
|
19 |
|
20 |
-
![halos](https://gist.github.com/assets/29318529/fe2d8391-dbd1-4b7e-9dc4-7cb97e55bc06)
|
21 |
|
22 |
-
This repo contains the model checkpoints for:
|
23 |
- model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
|
24 |
- optimized with the loss <b>KTO</b>
|
25 |
-
- aligned using the
|
|
|
26 |
|
27 |
-
To prompt
|
28 |
For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
|
29 |
The human should speak first:
|
30 |
```
|
@@ -37,13 +37,14 @@ What kind of cake?
|
|
37 |
Chocolate cake.
|
38 |
<|assistant|>
|
39 |
```
|
40 |
-
Note that a beginning-of-sequence (BOS) token
|
|
|
41 |
|
42 |
|
43 |
|
44 |
-
Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/)
|
45 |
|
46 |
-
If you
|
47 |
```
|
48 |
@techreport{ethayarajh2023halos,
|
49 |
author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
|
|
|
17 |
- accuracy
|
18 |
---
|
19 |
|
|
|
20 |
|
21 |
+
This repo contains the model and tokenizer checkpoints for:
|
22 |
- model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
|
23 |
- optimized with the loss <b>KTO</b>
|
24 |
+
- aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
|
25 |
+
- via 3 iterations of KTO on one epoch of each training partition.
|
26 |
|
27 |
+
To prompt this model, ensure that the format is consistent with that of TuluV2.
|
28 |
For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
|
29 |
The human should speak first:
|
30 |
```
|
|
|
37 |
Chocolate cake.
|
38 |
<|assistant|>
|
39 |
```
|
40 |
+
Note that a beginning-of-sequence (BOS) token automatically added at tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
|
41 |
+
You may also use our tokenizer to `apply_chat_template` if doing inference with `chatml` set or evaluation through non-local clients.
|
42 |
|
43 |
|
44 |
|
45 |
+
Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more information on the methodology.
|
46 |
|
47 |
+
If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
|
48 |
```
|
49 |
@techreport{ethayarajh2023halos,
|
50 |
author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
|