metadata
library_name: transformers
license: apache-2.0
datasets:
- liswei/zhtw-news-and-articles-2B
base_model: apple/OpenELM-270M
language:
- zh
Model Card for Chinese-OpenELM-270M
Continual pre-trained from apple/OpenELM-270M with liswei/zhtw-news-and-articles-2B:
- Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters.
- Tokenizer is trained on liswei/zhtw-news-and-articles-2B and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset.
- Additional token embeddings are initialized with the mean vector of existing embeddings.
- Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset.
- Applied GaLore for efficient training with following hyperparameters:
- Rank: 1024
- Scale: 4.0
- Update interval: 200
- Layer-wise training: False