xwinxu commited on
Commit
8b7e5cc
1 Parent(s): 8cffcfe

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -7
README.md CHANGED
@@ -17,14 +17,14 @@ metrics:
17
  - accuracy
18
  ---
19
 
20
- ![halos](https://gist.github.com/assets/29318529/fe2d8391-dbd1-4b7e-9dc4-7cb97e55bc06)
21
 
22
- This repo contains the model checkpoints for:
23
  - model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
24
  - optimized with the loss <b>KTO</b>
25
- - aligned using the SHP, Anthropic HH and Open Assistant datasets.
 
26
 
27
- To prompt Archangel models, ensure that the format is consistent with that of TuluV2.
28
  For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
29
  The human should speak first:
30
  ```
@@ -37,13 +37,14 @@ What kind of cake?
37
  Chocolate cake.
38
  <|assistant|>
39
  ```
40
- Note that a beginning-of-sequence (BOS) token is automatically added by all Archangel models during tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
 
41
 
42
 
43
 
44
- Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) which contains intructions for training your own HALOs and links to our model cards.
45
 
46
- If you find this repo or the technical paper useful in your research, please feel free to cite [our work](https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf):
47
  ```
48
  @techreport{ethayarajh2023halos,
49
  author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},
 
17
  - accuracy
18
  ---
19
 
 
20
 
21
+ This repo contains the model and tokenizer checkpoints for:
22
  - model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
23
  - optimized with the loss <b>KTO</b>
24
+ - aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
25
+ - via 3 iterations of KTO on one epoch of each training partition.
26
 
27
+ To prompt this model, ensure that the format is consistent with that of TuluV2.
28
  For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
29
  The human should speak first:
30
  ```
 
37
  Chocolate cake.
38
  <|assistant|>
39
  ```
40
+ Note that a beginning-of-sequence (BOS) token automatically added at tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
41
+ You may also use our tokenizer to `apply_chat_template` if doing inference with `chatml` set or evaluation through non-local clients.
42
 
43
 
44
 
45
+ Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more information on the methodology.
46
 
47
+ If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
48
  ```
49
  @techreport{ethayarajh2023halos,
50
  author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},