Text Generation
Transformers
PyTorch
Thai
English
mpt
custom_code
text-generation-inference
mrp commited on
Commit
0473525
1 Parent(s): 77e6fb9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -1
README.md CHANGED
@@ -6,4 +6,65 @@ language:
6
  ---
7
  # Model Card for WangChanLion 7B - The Multilingual Instruction-Following Model
8
 
9
- WangChanLion is a Multilingual, instruction-finetuned on Southeast Asian Languages SEA-LION 7B using open-source, commercially permissible datasets sample from LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, Hello-SimpleAI HC3, dolphin, iapp_wiki_qa_squad, thaisum, xlsum, scb_mt_enth_2020, han dataset, xp3x and Open-Platypus, a total of ~500k samples. Non-commercial datasets were filtered out. Released under apache 2.0 license. The models are trained to perform a subset of instruction-following tasks we found most relevant: reading comprehension, brainstorming, and creative writing. In this model, we focus on Thai and English datasets. We perform Vicuna-style evaluation using human evaluation. In a similar manner to Dolly v2, we only use open-source, commercially permissive pretrained models and datasets. Our models are neither restricted by non-commercial clauses like LLaMA-based models nor non-compete clauses like models that use self-instruct datasets from ChatGPT.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  ---
7
  # Model Card for WangChanLion 7B - The Multilingual Instruction-Following Model
8
 
9
+ WangChanLion is a Multilingual, instruction-finetuned on Southeast Asian Languages SEA-LION 7B using open-source, commercially permissible datasets sample from LAION OIG chip2 and infill_dbpedia, DataBricks Dolly v2, OpenAI TL;DR, Hello-SimpleAI HC3, dolphin, iapp_wiki_qa_squad, thaisum, xlsum, scb_mt_enth_2020, han dataset, xp3x and Open-Platypus, a total of ~500k samples. Non-commercial datasets were filtered out. Released under apache 2.0 license. The models are trained to perform a subset of instruction-following tasks we found most relevant: reading comprehension, brainstorming, and creative writing. In this model, we focus on Thai and English datasets. We perform Vicuna-style evaluation using human evaluation. In a similar manner to Dolly v2, we only use open-source, commercially permissive pretrained models and datasets. Our models are neither restricted by non-commercial clauses like LLaMA-based models nor non-compete clauses like models that use self-instruct datasets from ChatGPT.
10
+
11
+ Developers: PyThaiNLP and VISTEC-depa AI Research Institute of Thailand
12
+
13
+ Model type: SEA-LION 7B (MPT architecture)
14
+
15
+ ## Model Sources
16
+ Repository: https://github.com/vistec-AI/WangchanLion
17
+ Demo: [demo_WangchanLion.ipynb - Colaboratory](https://colab.research.google.com/drive/1y_7oOU3ZJI0h4chUrXFL3K4kelW_OI2G?usp=sharing#scrollTo=4yN3Bo6iAH2L)
18
+
19
+ # Direct Use
20
+ Intended to be used as an instruction-following model for reading comprehension, brainstorming, and creative writing.
21
+
22
+ # Downstream Use
23
+ The model can be finetuned for any typical instruction-following use cases.
24
+
25
+ # Out-of-Scope Use
26
+ We do not expect the models to perform well in math problems, reasoning, and factfulness.
27
+
28
+ # Bias, Risks, and Limitations
29
+ We noticed similar limitations to other finetuned instruction followers, such as math problems, reasoning, and factfulness. Even though the models do not perform on the level that we expect them to be abused, they do contain undesirable biases and toxicity and should be further optimized for your particular use cases.
30
+
31
+ # Recommendations
32
+ Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. More information is needed for further recommendations.
33
+
34
+ # How to Get Started with the Model
35
+ Use the code [here](https://colab.research.google.com/drive/1y_7oOU3ZJI0h4chUrXFL3K4kelW_OI2G?usp=sharing#scrollTo=4yN3Bo6iAH2L) below to get started with the model.
36
+ Or
37
+ ```
38
+ from transformers import AutoModelForCausalLM, AutoTokenizer
39
+ tokenizer = AutoTokenizer.from_pretrained( "airesearch/WangchanLion7B", trust_remote_code=True)
40
+ model = AutoModelForCausalLM.from_pretrained(
41
+ "airesearch/WangchanLion7B", trust_remote_code=True,
42
+ return_dict=True,
43
+ load_in_8bit=True ,
44
+ device_map="auto",
45
+ torch_dtype=torch.float16,
46
+ offload_folder="./",
47
+ low_cpu_mem_usage=True,
48
+ )
49
+ def get_prompt(question: str,context: str = None) -> str:
50
+ if context is not None:
51
+ return """พื้นหลัง:\n\n{context}\n\nคำถาม:{question}\n\nตอบ:""".format(context=context, question=question)
52
+ return """คำถาม:{question}\n\nตอบ:""".format(question=question)
53
+
54
+ question = "เกิดอะไรขึ้นที่เทียนอันเหมินตอนปี 1989"
55
+ full_prompt = get_prompt(question=question)
56
+ tokens = tokenizer(full_prompt, return_tensors="pt").to("cuda")
57
+ output = model.generate(
58
+ input_ids=tokens['input_ids'],
59
+ attention_mask=tokens['attention_mask'],
60
+ max_new_tokens=256,
61
+ early_stopping=True,
62
+ top_k=50, top_p=0.95,
63
+ do_sample=True,
64
+ temperature=0.3,
65
+ repetition_penalty = 1.2,
66
+ eos_token_id = tokenizer.eos_token_id,
67
+ )
68
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
69
+ ```
70
+