Feature Extraction
Transformers
Safetensors
English
bamboo
custom_code
hodlen commited on
Commit
0289556
1 Parent(s): d19c610

Update ref link in README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -32,7 +32,7 @@ In this section, we introduce the details of training our model, including types
32
 
33
  We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
34
 
35
- **First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model, conducting a further pre-training with 150B tokens.(link)
36
 
37
  The following table shows the hyper-paramters we used in our training process.
38
 
 
32
 
33
  We initialized the model weights to Mistral's model weights and modified the FFN structure to the ReGLU+ReLU structure, then continued pre-training for 200B tokens, divided into two phases:
34
 
35
+ **First phase**: For the proportion of training corpus, we followed the data mix ratio and sources of the StableLM-3B model ([link](https://stability.wandb.io/stability-llm/stable-lm/reports/StableLM-3B-4E1T--VmlldzoyMjU4?accessToken=u3zujipenkx5g7rtcj9qojjgxpconyjktjkli2po09nffrffdhhchq045vp0wyfo)), conducting a further pre-training with 150B tokens.
36
 
37
  The following table shows the hyper-paramters we used in our training process.
38