Text Generation
Transformers
PyTorch
Thai
English
mpt
custom_code
text-generation-inference
mrp commited on
Commit
120c6fb
1 Parent(s): ef3a275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -74,4 +74,29 @@ print(tokenizer.decode(output[0], skip_special_tokens=True))
74
  ## Training Data
75
  Finetuning datasets are sourced from [LAION OIG chip2 and infill_dbpedia (Apache-2.0)](https://huggingface.co/datasets/laion/OIG), [DataBricks Dolly v2 (Apache-2.0)](https://github.com/databrickslabs/dolly), [OpenAI TL;DR (MIT)](https://github.com/openai/summarize-from-feedback), [Hello-SimpleAI HC3 (CC-BY SA)](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [dolphin](https://huggingface.co/datasets/ehartford/dolphin), [iapp_wiki_qa_squad](https://huggingface.co/datasets/iapp_wiki_qa_squad) , [thaisum](https://huggingface.co/datasets/thaisum), [xlsum](https://huggingface.co/datasets/csebuetnlp/xlsum), [scb_mt_enth_2020](https://huggingface.co/datasets/scb_mt_enth_2020), han dataset, [xp3x](https://huggingface.co/datasets/Muennighoff/xP3x) and [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus).
76
  ## Training regime
77
- - QLoRA with 4 GPUs. (A100 40GB?)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Training Data
75
  Finetuning datasets are sourced from [LAION OIG chip2 and infill_dbpedia (Apache-2.0)](https://huggingface.co/datasets/laion/OIG), [DataBricks Dolly v2 (Apache-2.0)](https://github.com/databrickslabs/dolly), [OpenAI TL;DR (MIT)](https://github.com/openai/summarize-from-feedback), [Hello-SimpleAI HC3 (CC-BY SA)](https://huggingface.co/datasets/Hello-SimpleAI/HC3), [dolphin](https://huggingface.co/datasets/ehartford/dolphin), [iapp_wiki_qa_squad](https://huggingface.co/datasets/iapp_wiki_qa_squad) , [thaisum](https://huggingface.co/datasets/thaisum), [xlsum](https://huggingface.co/datasets/csebuetnlp/xlsum), [scb_mt_enth_2020](https://huggingface.co/datasets/scb_mt_enth_2020), han dataset, [xp3x](https://huggingface.co/datasets/Muennighoff/xP3x) and [Open-Platypus](https://huggingface.co/datasets/garage-bAInd/Open-Platypus).
76
  ## Training regime
77
+ - QLoRA with 4 A100 (40GB)
78
+
79
+
80
+ # Evaluation
81
+ We performed human and machine evaluations on XQuAD zero-shot and one-shot settings:
82
+ ## XQuAD
83
+ | Model | Exact Match (Zero-shot) | F1 (Zero-shot) | Exact Match (One-shot) | F1 (One-shot) |
84
+ |:--------------:|:-----------------------:|:--------------:|:----------------------:|:-------------:|
85
+ | openthaigpt7B | 18.5714 | 28.4002 | 30.4202 | 39.7556 |
86
+ | SeaLLM7B | - | - | - | 44.43 |
87
+ | Typhoon-7b | 23.8655 | 36.27 | **46.7227** | **57.898** |
88
+ | WangchanLion7B | **37.563** | **49.8432** | 39.2437 | 51.0627 |
89
+
90
+ ## iAPP Wiki QA
91
+ | Model | Exact Match (Zero-shot) | F1 (Zero-shot) | Exact Match (One-shot) | F1 (One-shot) |
92
+ |:--------------:|:-----------------------:|:--------------:|:----------------------:|:-------------:|
93
+ | openthaigpt7B | 22.0568 | 40.0696 | 31.3938 | 47.9775 |
94
+ | SeaLLM7B | 8.2544 | 34.4038 | 40.0541 | 58.2673 |
95
+ | Typhoon-7b | 27.3342 | 46.2938 | 43.3018 | 59.9434 |
96
+ | WangchanLion7B | **55.4804** | **67.9262** | **56.4276** | **68.8471** |
97
+
98
+ # What WangchanLion offers:
99
+ - Transparent pretrained model: The development of SEA-LION is community-driven, with different ASEAN collaborators contributing pretraining datasets. The SEA-LION developers ensure that all datasets are safe and can be utilized without commercial restrictions. This transparency extends to the provision of pretraining code, ensuring anyone can replicate SEA-LION using the provided datasets.
100
+ - Transparent finetuning data: In the spirit of open science, we make the finetuning data for WangchanLion accessible to all. This commitment to openness empowers the community by providing complete visibility into the instruction finetuning data that shapes WangchanLion.
101
+ - Transparent finetuning code: The finetuning code for WangchanLion is readily available for distribution. By sharing our methods and processes, we invite others to learn from, build upon, and innovate alongside us.
102
+