tokyo-electron-device-ai commited on
Commit
9b29313
1 Parent(s): eb59e64

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ base_model:
9
 
10
  ## Model Details
11
  Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
12
- We use approximately 160 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
13
  ## Intended uses & limitations
14
 
15
  You can use the raw model for text generation or fine-tune it to a downstream task.
 
9
 
10
  ## Model Details
11
  Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
12
+ We use approximately 173 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
13
  ## Intended uses & limitations
14
 
15
  You can use the raw model for text generation or fine-tune it to a downstream task.