library_name: transformers | |
license: apache-2.0 | |
datasets: | |
- liswei/zhtw-news-and-articles-2B | |
base_model: apple/OpenELM-270M | |
language: | |
- zh | |
# Model Card for Chinese-OpenELM-270M | |
Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B): | |
* Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters. | |
* Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset. | |
* Additional token embeddings are initialized with the mean vector of existing embeddings. | |
* Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset. | |
* Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters: | |
* Rank: 1024 | |
* Scale: 4.0 | |
* Update interval: 200 | |
* Layer-wise training: False |