hafidhsoekma commited on
Commit
38258a1
1 Parent(s): 5c311c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -13
README.md CHANGED
@@ -1,20 +1,150 @@
1
  ---
2
  library_name: peft
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
4
- ## Training procedure
5
 
 
6
 
7
- The following `bitsandbytes` quantization config was used during training:
8
- - load_in_8bit: False
9
- - load_in_4bit: True
10
- - llm_int8_threshold: 6.0
11
- - llm_int8_skip_modules: None
12
- - llm_int8_enable_fp32_cpu_offload: False
13
- - llm_int8_has_fp16_weight: False
14
- - bnb_4bit_quant_type: nf4
15
- - bnb_4bit_use_double_quant: True
16
- - bnb_4bit_compute_dtype: float16
17
- ### Framework versions
18
 
 
 
 
 
 
19
 
20
- - PEFT 0.4.0.dev0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: peft
3
+ license: cc-by-nc-4.0
4
+ language:
5
+ - en
6
+ - id
7
+ datasets:
8
+ - MBZUAI/Bactrian-X
9
+ tags:
10
+ - qlora
11
+ - wizardlm
12
+ - uncensored
13
+ - instruct
14
+ - alpaca
15
  ---
16
+ # DukunLM - Indonesian Language Model 🧙‍♂️
17
 
18
+ 🚀 Welcome to the DukunLM repository! DukunLM is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation with its massive 7 billion parameters! 🌟
19
 
20
+ ## Model Details
 
 
 
 
 
 
 
 
 
 
21
 
22
+ - Model: [nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16](https://huggingface.co/nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16)
23
+ - Base Model: [ehartford/WizardLM-Uncensored-Falcon-7b](https://huggingface.co/ehartford/WizardLM-Uncensored-Falcon-7b)
24
+ - Fine-tuned with: [MBZUAI/Bactrian-X (Indonesian subset)](https://huggingface.co/datasets/MBZUAI/Bactrian-X/viewer/id/train)
25
+ - Prompt Format: [Alpaca](https://github.com/tatsu-lab/stanford_alpaca)
26
+ - Fine-tuned method: [QLoRA](https://github.com/artidoro/qlora)
27
 
28
+ ⚠️ **Warning**: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️
29
+
30
+ ## Installation
31
+
32
+ To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:
33
+ ```bash
34
+ pip install -U transformers peft einops bitsandbytes
35
+ ```
36
+
37
+
38
+ ## How to Use
39
+
40
+ ### Stream Output
41
+
42
+ ```python
43
+ import torch
44
+ from peft import PeftModel
45
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
46
+
47
+ model = AutoModelForCausalLM.from_pretrained(
48
+ "nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16",
49
+ load_in_4bit=True,
50
+ torch_dtype=torch.float32,
51
+ trust_remote_code=True,
52
+ quantization_config=BitsAndBytesConfig(
53
+ load_in_4bit=True,
54
+ llm_int8_threshold=6.0,
55
+ llm_int8_has_fp16_weight=False,
56
+ bnb_4bit_compute_dtype=torch.float16,
57
+ bnb_4bit_use_double_quant=True,
58
+ bnb_4bit_quant_type="nf4",
59
+ )
60
+ )
61
+ model = PeftModel.from_pretrained(model, "azale-ai/DukunLM-Uncensored-7B")
62
+ tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
63
+ streamer = TextStreamer(tokenizer)
64
+
65
+ input_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
66
+
67
+ text = f"""
68
+ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
69
+
70
+ ### Instruction:
71
+ {input_prompt}
72
+
73
+ ### Response:
74
+ """
75
+
76
+ inputs = tokenizer(text, return_tensors="pt").to("cuda")
77
+ _ = model.generate(
78
+ inputs=inputs.input_ids,
79
+ streamer=streamer,
80
+ pad_token_id=tokenizer.pad_token_id,
81
+ eos_token_id=tokenizer.eos_token_id,
82
+ max_length=2048, use_cache=True,
83
+ temperature=0.7, do_sample=True,
84
+ top_k=4, top_p=0.95
85
+ )
86
+ ```
87
+
88
+ ### No Stream Output
89
+
90
+ ```python
91
+ import torch
92
+ from peft import PeftModel
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
94
+
95
+ model = AutoModelForCausalLM.from_pretrained(
96
+ "nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16",
97
+ load_in_4bit=True,
98
+ torch_dtype=torch.float32,
99
+ trust_remote_code=True,
100
+ quantization_config=BitsAndBytesConfig(
101
+ load_in_4bit=True,
102
+ llm_int8_threshold=6.0,
103
+ llm_int8_has_fp16_weight=False,
104
+ bnb_4bit_compute_dtype=torch.float16,
105
+ bnb_4bit_use_double_quant=True,
106
+ bnb_4bit_quant_type="nf4",
107
+ )
108
+ )
109
+ model = PeftModel.from_pretrained(model, "azale-ai/DukunLM-Uncensored-7B")
110
+ tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
111
+
112
+ input_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
113
+
114
+ text = f"""
115
+ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
116
+
117
+ ### Instruction:
118
+ {input_prompt}
119
+
120
+ ### Response:
121
+ """
122
+
123
+ inputs = tokenizer(text, return_tensors="pt").to("cuda")
124
+ outputs = model.generate(
125
+ inputs=inputs.input_ids,
126
+ pad_token_id=tokenizer.pad_token_id,
127
+ eos_token_id=tokenizer.eos_token_id,
128
+ max_length=512, use_cache=True,
129
+ temperature=0.7, do_sample=True,
130
+ top_k=4, top_p=0.95
131
+ )
132
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
133
+ ```
134
+
135
+ ## Limitations
136
+
137
+ - The base model language is English and fine-tuned to Indonesia
138
+ - Cultural and contextual biases
139
+
140
+ ## License
141
+
142
+ DukunLM is licensed under the [Creative Commons NonCommercial (CC BY-NC 4.0) license](https://creativecommons.org/licenses/by-nc/4.0/legalcode).
143
+
144
+ ## Contributing
145
+
146
+ We welcome contributions to enhance and improve DukunLM. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request.
147
+
148
+ ## Contact Us
149
+
150
+ [contact@azale.ai](mailto:contact@azale.ai)