Edit model card

Model Details

Llama 3 tedllm is the large language models (8B) that were built by continual pre-training on the Meta Llama 3 8B models. Llama 3 tedllm is developped for enhancing the Japanese language capabilities and the domain specific data. We use approximately 173 billion tokens from a large Japanese corpus. This model was trained on the Cerebras CS-3 wafer scale systems. Cerebras' weight streaming technology simplifies the training of LLMs by disaggregating compute from model storage. This allowed for efficient scaling of training across nodes using simple data parallelism.

Intended uses & limitations

You can use the raw model for text generation or fine-tune it to a downstream task.

How to use

You can use this model directly with a pipeline for text generation. Here is how to use this model to get the features of a given text in PyTorch:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("tokyo-electron-device-ai/llama3-tedllm-8b-v0")
model = AutoModelForCausalLM.from_pretrained("tokyo-electron-device-ai/llama3-tedllm-8b-v0", device_map="auto", torch_dtype=torch.bfloat16)
text = "人工知能とは何か説明してください"
tokenized_input = tokenizer.encode(text, add_special_tokens=False, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(
        tokenized_input,
        max_new_tokens=50,
        do_sample=True,
        top_p=0.9,
        temperature=0.6,
    )[0]
print(tokenizer.decode(output))

Limitations and bias

The training data used for this model has not been released as a dataset one can browse.

Training data

The model pulished is not trained with the domain specific data. it is tranied with Japanese corpus only because the domain specific data is our specific data. We do not plan to release models trained with the domain specific data.

Model Card Contact

If you have any question, please feel free to contact cerebras-sup@teldevice.co.jp.

Downloads last month
72
Safetensors
Model size
8.14B params
Tensor type
F32
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for tokyo-electron-device-ai/llama3-tedllm-8b-v0

Finetuned
(355)
this model