tokyo-electron-device-ai
/

llama3-tedllm-8b-v0

Model card Files Files and versions Community

tokyo-electron-device-ai commited on 23 days ago

Commit

9b29313

•

1 Parent(s): eb59e64

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ base_model:
 ## Model Details
 Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
-We use approximately 160 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
 ## Intended uses & limitations
 You can use the raw model for text generation or fine-tune it to a downstream task.

 ## Model Details
 Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
+We use approximately 173 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
 ## Intended uses & limitations
 You can use the raw model for text generation or fine-tune it to a downstream task.