AIgroup-CVM-utokyohospital
/

TinyLlama-1.1B-ja

Model card Files Files and versions Community

Edit model card

TinyLlama + Japanese

A continual pretraining model of TinyLlama 1.1B with a few Japanese texts.

Base Model

TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

Tokenizers

(elyza/ELYZA-japanese-Llama-2-7b)[https://huggingface.co/elyza/ELYZA-japanese-Llama-2-7b]

Training Dataset

Around 9B tokens in total.

izumi-lab/wikipedia-ja-20230720
if001/oscar_2023_filtered

Validation Dataset

izumi-lab/wikinews-ja-20230728
izumi-lab/wikinews-en-20230728
if001/aozorabunko-clean-sin

Evaluation

We did not perform.

Acknowledgement

We acknowledge those who prepared valuable datasets and lit-gpt.

Downloads last month: 0

Unable to determine this model's library. Check the docs .