liswei commited on
Commit
89adf22
1 Parent(s): 3da6d7f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +45 -42
README.md CHANGED
@@ -1,58 +1,61 @@
1
  ---
2
- base_model: liswei/OpenELM-1_1B-zh-base
3
- tags:
4
- - llama-factory
5
- - full
6
- - generated_from_trainer
7
- model-index:
8
- - name: OpenELM-1_1B-zh-cp
9
- results: []
 
 
 
 
10
  ---
11
 
12
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
13
- should probably proofread and complete it, then remove this comment. -->
 
14
 
15
- # OpenELM-1_1B-zh-cp
16
 
17
- This model is a fine-tuned version of [liswei/OpenELM-1_1B-zh-base](https://huggingface.co/liswei/OpenELM-1_1B-zh-base) on the zhtw-news-and-articles-2B dataset.
18
 
19
- ## Model description
 
20
 
21
- More information needed
 
22
 
23
- ## Intended uses & limitations
24
 
25
- More information needed
 
26
 
27
- ## Training and evaluation data
 
 
 
 
28
 
29
- More information needed
 
 
30
 
31
- ## Training procedure
32
 
33
- ### Training hyperparameters
 
 
 
 
34
 
35
- The following hyperparameters were used during training:
36
- - learning_rate: 0.0001
37
- - train_batch_size: 4
38
- - eval_batch_size: 8
39
- - seed: 42
40
- - distributed_type: multi-GPU
41
- - num_devices: 4
42
- - total_train_batch_size: 16
43
- - total_eval_batch_size: 32
44
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
45
- - lr_scheduler_type: cosine
46
- - lr_scheduler_warmup_ratio: 0.1
47
- - num_epochs: 1.0
48
 
49
- ### Training results
 
 
 
50
 
51
-
52
-
53
- ### Framework versions
54
-
55
- - Transformers 4.41.1
56
- - Pytorch 2.3.0+cu121
57
- - Datasets 2.19.1
58
- - Tokenizers 0.19.1
 
1
  ---
2
+ library_name: transformers
3
+ license: apache-2.0
4
+ datasets:
5
+ - liswei/zhtw-news-and-articles-2B
6
+ base_model:
7
+ - liswei/Taiwan-ELM
8
+ - apple/OpenELM-1_1B
9
+ language:
10
+ - zh
11
+ metrics:
12
+ - perplexity
13
+ pipeline_tag: text-generation
14
  ---
15
 
16
+ <center>
17
+ <img src="https://huggingface.co/liswei/Taiwan-ELM/resolve/main/Taiwan%20ELM%20Logo.jpeg" alt="Efficient LLM for Taiwan">
18
+ </center>
19
 
20
+ > Efficient LLM for Taiwan
21
 
22
+ # Taiwan ELM
23
 
24
+ Taiwan ELM is a family of Efficient LLMs for Taiwan base on [apple/OpenELM](https://huggingface.co/apple/OpenELM).
25
+ The project aims to provide an efficient model for researchers without access to large-scale computing resources.
26
 
27
+ The model is trained using a custom fork of [LLaMA-Factory](https://github.com/hiyouga/LLaMA-Factory) on 2B Traditional Chinese tokens and 500K instruction samples.
28
+ We will extend the model to train on larger data sets and different base models if there is sufficient demand.
29
 
30
+ ## What is being released?
31
 
32
+ We release both pre-trained base models and instruction tuned variants with 270M and 1.1B parameters.
33
+ Along with the model, datasets used to train the base and instruction-tuned models are also released.
34
 
35
+ List of released models:
36
+ * [Taiwan-ELM-270M](https://huggingface.co/liswei/Taiwan-ELM-270M)
37
+ * [Taiwan-ELM-1_1B](https://huggingface.co/liswei/Taiwan-ELM-1_1B)
38
+ * [Taiwan-ELM-270M-Instruct](https://huggingface.co/liswei/Taiwan-ELM-270M-Instruct)
39
+ * [Taiwan-ELM-1_1B-Instruct](https://huggingface.co/liswei/Taiwan-ELM-1_1B-Instruct)
40
 
41
+ List of released datasets:
42
+ * [liswei/Taiwan-Text-Excellence-2B](https://huggingface.co/datasets/liswei/Taiwan-Text-Excellence-2B)
43
+ * [liswei/PromptPair-TW](https://huggingface.co/datasets/liswei/PromptPair-TW)
44
 
45
+ ## Usage Examples
46
 
47
+ We adapt the LLaMA2 template:
48
+ ```jinja2
49
+ <s>[INST] <<SYS>>
50
+ {{ system_prompt }}
51
+ <</SYS>>
52
 
53
+ {{ user_message }} [/INST]
54
+ ```
 
 
 
 
 
 
 
 
 
 
 
55
 
56
+ The model could be load via `AutoModelForCausalLM` with `trust_remote_code=True`:
57
+ ```python
58
+ taiwanelm_270m = AutoModelForCausalLM.from_pretrained("liswei/Taiwan-ELM-270M", trust_remote_code=True)
59
+ ```
60
 
61
+ We also support additional generation methods and speculative generation, please find reference at [OpenELM#usage](https://huggingface.co/apple/OpenELM#usage).