tokyo-electron-device-ai
/

llama3-tedllm-8b-v0

Model card Files Files and versions Community

tokyo-electron-device-ai commited on Oct 17, 2024

Commit

eb59e64

·

verified ·

1 Parent(s): 51185a3

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -9,8 +9,7 @@ base_model:
 ## Model Details
 Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
-We use approximately 160 billion tokens from a large Japanese corpus. This model was trained by using Cerebras CS-3s. Cerebras CS-3 is a new AI accelerator that is different from normal GPUs.
 ## Intended uses & limitations
 You can use the raw model for text generation or fine-tune it to a downstream task.

 ## Model Details
 Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
+We use approximately 160 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
 ## Intended uses & limitations
 You can use the raw model for text generation or fine-tune it to a downstream task.