liswei
/

Taiwan-ELM-1_1B

@@ -1,58 +1,61 @@
 ---
-base_model: liswei/OpenELM-1_1B-zh-base
-tags:
-- llama-factory
-- full
-- generated_from_trainer
-model-index:
-- name: OpenELM-1_1B-zh-cp
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# OpenELM-1_1B-zh-cp
-This model is a fine-tuned version of [liswei/OpenELM-1_1B-zh-base](https://huggingface.co/liswei/OpenELM-1_1B-zh-base) on the zhtw-news-and-articles-2B dataset.
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
-### Training hyperparameters
-The following hyperparameters were used during training:
-- learning_rate: 0.0001
-- train_batch_size: 4
-- eval_batch_size: 8
-- seed: 42
-- distributed_type: multi-GPU
-- num_devices: 4
-- total_train_batch_size: 16
-- total_eval_batch_size: 32
-- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
-- lr_scheduler_type: cosine
-- lr_scheduler_warmup_ratio: 0.1
-- num_epochs: 1.0
-### Training results
-### Framework versions
-- Transformers 4.41.1
-- Pytorch 2.3.0+cu121
-- Datasets 2.19.1
-- Tokenizers 0.19.1

 ---
+library_name: transformers
+license: apache-2.0
+datasets:
+- liswei/zhtw-news-and-articles-2B
+base_model:
+- liswei/Taiwan-ELM
+- apple/OpenELM-1_1B
+language:
+- zh
+metrics:
+- perplexity
+pipeline_tag: text-generation
 ---
+<center>
+    <img src="https://huggingface.co/liswei/Taiwan-ELM/resolve/main/Taiwan%20ELM%20Logo.jpeg" alt="Efficient LLM for Taiwan">
+</center>
+> Efficient LLM for Taiwan
+# Taiwan ELM
+Taiwan ELM is a family of Efficient LLMs for Taiwan base on [apple/OpenELM](https://huggingface.co/apple/OpenELM).
+The project aims to provide an efficient model for researchers without access to large-scale computing resources.
+The model is trained using a custom fork of [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) on 2B Traditional Chinese tokens and 500K instruction samples.
+We will extend the model to train on larger data sets and different base models if there is sufficient demand.
+## What is being released?
+We release both pre-trained base models and instruction tuned variants with 270M and 1.1B parameters.
+Along with the model, datasets used to train the base and instruction-tuned models are also released.
+List of released models:
+* [Taiwan-ELM-270M](https://huggingface.co/liswei/Taiwan-ELM-270M)
+* [Taiwan-ELM-1_1B](https://huggingface.co/liswei/Taiwan-ELM-1_1B)
+* [Taiwan-ELM-270M-Instruct](https://huggingface.co/liswei/Taiwan-ELM-270M-Instruct)
+* [Taiwan-ELM-1_1B-Instruct](https://huggingface.co/liswei/Taiwan-ELM-1_1B-Instruct)
+List of released datasets:
+* [liswei/Taiwan-Text-Excellence-2B](https://huggingface.co/datasets/liswei/Taiwan-Text-Excellence-2B)
+* [liswei/PromptPair-TW](https://huggingface.co/datasets/liswei/PromptPair-TW)
+## Usage Examples
+We adapt the LLaMA2 template:
+```jinja2
+<s>[INST] <<SYS>>
+{{ system_prompt }}
+<</SYS>>
+{{ user_message }} [/INST]
+```
+The model could be load via `AutoModelForCausalLM` with `trust_remote_code=True`:
+```python
+taiwanelm_270m = AutoModelForCausalLM.from_pretrained("liswei/Taiwan-ELM-270M", trust_remote_code=True)
+```
+We also support additional generation methods and speculative generation, please find reference at [OpenELM#usage](https://huggingface.co/apple/OpenELM#usage).