tokyo-electron-device-ai
commited on
Commit
•
9b29313
1
Parent(s):
eb59e64
Update README.md
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ base_model:
|
|
9 |
|
10 |
## Model Details
|
11 |
Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
|
12 |
-
We use approximately
|
13 |
## Intended uses & limitations
|
14 |
|
15 |
You can use the raw model for text generation or fine-tune it to a downstream task.
|
|
|
9 |
|
10 |
## Model Details
|
11 |
Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data.
|
12 |
+
We use approximately 173 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
|
13 |
## Intended uses & limitations
|
14 |
|
15 |
You can use the raw model for text generation or fine-tune it to a downstream task.
|