ContextualAI
/

Contextual_KTO_Mistral_PairRM

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

xwinxu commited on Mar 7, 2024

Commit

8b7e5cc

·

verified ·

1 Parent(s): 8cffcfe

Update README.md

Files changed (1) hide show

README.md +8 -7

README.md CHANGED Viewed

@@ -17,14 +17,14 @@ metrics:
 - accuracy
 ---
-![halos](https://gist.github.com/assets/29318529/fe2d8391-dbd1-4b7e-9dc4-7cb97e55bc06)
-This repo contains the model checkpoints for:
 - model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
 - optimized with the loss <b>KTO</b>
-- aligned using the SHP, Anthropic HH and Open Assistant datasets.
-To prompt Archangel models, ensure that the format is consistent with that of TuluV2.
 For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
 The human should speak first:
 ```
@@ -37,13 +37,14 @@ What kind of cake?
 Chocolate cake.
 <|assistant|>
 ```
-Note that a beginning-of-sequence (BOS) token is automatically added by all Archangel models during tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
-Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) which contains intructions for training your own HALOs and links to our model cards.
-If you find this repo or the technical paper useful in your research, please feel free to cite [our work](https://github.com/ContextualAI/HALOs/blob/main/assets/report.pdf):
 ```
 @techreport{ethayarajh2023halos,
   author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},

 - accuracy
 ---
+This repo contains the model and tokenizer checkpoints for:
 - model family <b>mistralai/Mistral-7B-Instruct-v0.2</b>
 - optimized with the loss <b>KTO</b>
+- aligned using the [snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset](https://huggingface.co/datasets/snorkelai/Snorkel-Mistral-PairRM-DPO-Dataset)
+- via 3 iterations of KTO on one epoch of each training partition.
+To prompt this model, ensure that the format is consistent with that of TuluV2.
 For example, a prompt should be formatted as follows, where `<|user|>` corresponds to the human's role and `<|assistant|>` corresponds to the LLM's role.
 The human should speak first:
 ```
 Chocolate cake.
 <|assistant|>
 ```
+Note that a beginning-of-sequence (BOS) token automatically added at tokenization and does not have to be added by you. No end-of-sequence (EOS) token is added to the prompt.
+You may also use our tokenizer to `apply_chat_template` if doing inference with `chatml` set or evaluation through non-local clients.
+Please refer to our [code repository](https://github.com/ContextualAI/HALOs) or [blog](https://contextual.ai/better-cheaper-faster-llm-alignment-with-kto/) for more information on the methodology.
+If you found this work useful, feel free to cite [our work](https://arxiv.org/abs/2402.01306):
 ```
 @techreport{ethayarajh2023halos,
   author = {Ethayarajh, Kawin and Xu, Winnie, and Jurafsky, Dan and Kiela, Douwe},