metadata

library_name: transformers
license: apache-2.0
datasets:
  - liswei/zhtw-news-and-articles-2B
base_model: apple/OpenELM-270M
language:
  - zh
metrics:
  - perplexity
pipeline_tag: text-generation

Model Card for Chinese-OpenELM-270M

Continual pre-trained from apple/OpenELM-270M with liswei/zhtw-news-and-articles-2B:

Extended vocabulary from 32000 to 61758 tokens with additional Traditional Chinese characters.
- Tokenizer is trained on liswei/zhtw-news-and-articles-2B and pruned from 96000 to 61758 tokens while maintaining 95% coverage on the pre-training dataset.
- Additional token embeddings are initialized with the mean vector of existing embeddings.
Traditional Chinese perplexity = 1.6871 on held-out evaluation dataset.
Applied GaLore for efficient training with following hyperparameters:
- Rank: 1024
- Scale: 4.0
- Update interval: 200
- Layer-wise training: False