Model Details
Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data. We use approximately 173 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.
Intended uses & limitations
You can use the raw model for text generation or fine-tune it to a downstream task.
How to use
You can use this model directly with a pipeline for text generation. Here is how to use this model to get the features of a given text in PyTorch:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tokyo-electron-device-ai/llama3-tedllm-8b-v0")
model = AutoModelForCausalLM.from_pretrained("tokyo-electron-device-ai/llama3-tedllm-8b-v0", device_map="auto", torch_dtype=torch.bfloat16)
text = "人工知能とは何か説明してください"
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
tokenized_input,
max_new_tokens=50,
do_sample=True,
top_p=0.9,
temperature=0.6,
)[0]
print(tokenizer.decode(output))
Limitations and bias
The training data used for this model has not been released as a dataset one can browse.
Training data
The model pulished is not trained with the domain specific data. it is tranied with Japanese corpus only because the domain specific data is our specific data. We do not plan to release models trained with the domain specific data.
Model Card Contact
If you have any question, please feel free to contact cerebras-sup@teldevice.co.jp.
- Downloads last month
- 72
Model tree for tokyo-electron-device-ai/llama3-tedllm-8b-v0
Base model
meta-llama/Meta-Llama-3-8B