--- library_name: transformers license: apache-2.0 datasets: - liswei/zhtw-news-and-articles-2B base_model: apple/OpenELM-270M language: - zh metrics: - perplexity pipeline_tag: text-generation --- # Model Card for Chinese-OpenELM-270M Continual pre-trained from [apple/OpenELM-270M](https://huggingface.co/apple/OpenELM-270M) with [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B): * Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters. * Tokenizer is trained on [liswei/zhtw-news-and-articles-2B](https://huggingface.co/datasets/liswei/zhtw-news-and-articles-2B) and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset. * Additional token embeddings are initialized with the mean vector of existing embeddings. * Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset. * Applied [GaLore](https://arxiv.org/abs/2403.03507) for efficient training with following hyperparameters: * Rank: 1024 * Scale: 4.0 * Update interval: 200 * Layer-wise training: False