Text Generation
Transformers
Safetensors
English
mistral
text-generation-inference
Inference Endpoints
instruction-pretrain AdaptLLM commited on
Commit
21636e2
·
verified ·
1 Parent(s): f067870

Update README.md (#3)

Browse files

- Update README.md (ce74dc9b56e8f17f2f75086024fb9ee7965a5623)


Co-authored-by: AdaptLLM <AdaptLLM@users.noreply.huggingface.co>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -186,7 +186,7 @@ Except for the pre-training data, *Instruction Pre-Training* keeps all other set
186
  Therefore, you can easily use any training framework, such as [OLMo](https://github.com/allenai/OLMo) (for pre-training from scratch) and [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) (for continual pre-training), to train on the templified instruction-augmented corpora.
187
 
188
  1. For general pre-training from scratch, we recommend setting M = 2 and mixing the instruction-augmented corpora with unchanged raw corpora.
189
- 2. For domain-adaptive continual pre-training, we recommend setting M = 3 and mixing the instruction-augmented corpora with general instructions from [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) at a 1:1 ratio (counted by tokens). Each example from OpenOrca is formulated as "{question} {response}", with a blank space used to connect the question and response.
190
 
191
  Let's try our method in continual pre-training for a quick start---it works easily!
192
 
 
186
  Therefore, you can easily use any training framework, such as [OLMo](https://github.com/allenai/OLMo) (for pre-training from scratch) and [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) (for continual pre-training), to train on the templified instruction-augmented corpora.
187
 
188
  1. For general pre-training from scratch, we recommend setting M = 2 and mixing the instruction-augmented corpora with unchanged raw corpora.
189
+ 2. For domain-adaptive continual pre-training, we recommend setting M = 3 and mixing the instruction-augmented corpora with general instructions from [OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) at a 1:1 ratio (counted by tokens). Each example from OpenOrca is formulated as "{question} {response}", with a white-space used to connect the question and response.
190
 
191
  Let's try our method in continual pre-training for a quick start---it works easily!
192