Model Card for DistillChat V1.5
DistillChat V1.5 is a fine-tuned version of Mistral trained on a mixture of publicly available, synthetic, and human datasets.
The difference from DistillChat V1.0 is we follow openchat-3.5 to use the "Code" rather than "GPT-4 Correct" condition for code-related datasets.
The V1.5.2 version is the "merge-single-teacher-distill" version. We use Nous-Capybara-34B and Phind-CodeLlama-34B-v2 as the teacher LLMs to conduct knowledge distillation.
Model description
- Model type: Distill-SFT
- Language(s) (NLP): Primarily English
- Finetuned from model: openchat/openchat_3.5
Model Sources
- Repository: https://github.com/fanqiwan/distillchat
- Model Family: All the models and the dataset are found in the DistillChat collection.
Performance
TBD
Input Format
The model is trained to use the following format:
GPT4 Correct User: Your message here!<|end_of_turn|>GPT4 Correct Assistant:
For best results, format all inputs in this manner.
Intended uses & limitations
The model was initially fine-tuned on a filtered and preprocessed of the DistillChat V1 Mixture, which contains a diverse range of human created instructions and synthetic dialogues generated primarily by other LLMs.
Here's how you can run the model using the 🤗 Transformers:
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("Wanfq/distillchat_v1.5")
# Single-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
# Multi-turn
tokens = tokenizer("GPT4 Correct User: Hello<|end_of_turn|>GPT4 Correct Assistant: Hi<|end_of_turn|>GPT4 Correct User: How are you today?<|end_of_turn|>GPT4 Correct Assistant:").input_ids
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
The GPT4 template is also available as the integrated tokenizer.chat_template
,
which can be used instead of manually specifying the template:
messages = [
{"role": "user", "content": "Hello"},
{"role": "assistant", "content": "Hi"},
{"role": "user", "content": "How are you today?"}
]
tokens = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
assert tokens == [1, 420, 6316, 28781, 3198, 3123, 1247, 28747, 22557, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747, 15359, 32000, 420, 6316, 28781, 3198, 3123, 1247, 28747, 1602, 460, 368, 3154, 28804, 32000, 420, 6316, 28781, 3198, 3123, 21631, 28747]
Bias, Risks, and Limitations
The DistillChat models have not been aligned to generate safe completions within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). It is also unknown what the size and composition of the corpus was used to train the base Mistral models, however, it is likely to have included a mix of Web data and technical sources like books and code. See the Falcon 180B model card for an example of this.
Training hyperparameters
The following hyperparameters were used during Distill-SFT training:
- learning_rate: 2e-06
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 1.0
- model_max_length: 2048
- do_distill: True
- distill_with_ref_model: True
- distill_with_aligned_model_0: False
- distill_with_aligned_model_1: True
- distill_optimize_type: ranking
- distill_weight_type: soft_metric
- distill_loss_type: ce
- distill_teacher_temperature: 1.0
- lm_loss_weight: 0.9
- distill_greater_as_gt: True
- distill_greater_as_gt_type: hard
Citation
TBD
Acknowledgements
TBD
Model card adapted from Tulu 2
- Downloads last month
- 1