---
library_name: transformers
license: apache-2.0
datasets:
- liswei/zhtw-news-and-articles-2B
base_model: apple/OpenELM-270M
language:
- zh
metrics:
- perplexity
pipeline_tag: text-generation
---

# Model Card for Chinese-OpenELM-270M

Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B):

* Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters.
  * Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset.
  * Additional token embeddings are initialized with the mean vector of existing embeddings.
* Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset.
* Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters:
  * Rank: 1024
  * Scale: 4.0
  * Update interval: 200
  * Layer-wise training: False