Update README.md

Browse files

Files changed (1) hide show

README.md +270 -0

README.md CHANGED Viewed

@@ -1,3 +1,273 @@
 ---
 license: cc-by-nc-4.0
 ---

 ---
 license: cc-by-nc-4.0
+datasets:
+- MBZUAI/Bactrian-X
+language:
+- id
+- en
+tags:
+- qlora
+- wizardlm
+- uncensored
+- instruct
+- alpaca
 ---
+# DukunLM - Indonesian Language Model 🧙‍♂️
+🚀 Welcome to the DukunLM repository! DukunLM is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation 🌟
+## Model Details
+| Name Model                                                                       | Parameters | Demo                                                                                                                                                                       | Base Model                                                                                             | Dataset                                                                                                    | Prompt Format                                          | Fine Tune Method                           |
+|----------------------------------------------------------------------------------|------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|--------------------------------------------|
+| [DukunLM-Uncensored-7B](https://huggingface.co/azale-ai/DukunLM-Uncensored-7B)   | 7B         | [![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WYhhfvFzQukGzEqWHu3gKmigStJTjWxV?usp=sharing) | [ehartford/WizardLM-7B-V1.0-Uncensored](https://huggingface.co/ehartford/WizardLM-7B-V1.0-Uncensored)  | [MBZUAI/Bactrian-X (Indonesian subset)](https://huggingface.co/datasets/MBZUAI/Bactrian-X/viewer/id/train) | [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) | [QLoRA](https://github.com/artidoro/qlora) |
+| [DukunLM-Uncensored-13B](https://huggingface.co/azale-ai/DukunLM-Uncensored-13B) | 13B        | [![Open in Google Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1WYhhfvFzQukGzEqWHu3gKmigStJTjWxV?usp=sharing) | [ehartford/WizardLM-7B-V1.0-Uncensored](https://huggingface.co/ehartford/WizardLM-13B-V1.0-Uncensored) | [MBZUAI/Bactrian-X (Indonesian subset)](https://huggingface.co/datasets/MBZUAI/Bactrian-X/viewer/id/train) | [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) | [QLoRA](https://github.com/artidoro/qlora) |
+⚠️ **Warning**: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️
+## Installation
+To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:
+```bash
+pip install -U git+https://github.com/huggingface/transformers.git
+pip install -U git+https://github.com/huggingface/peft.git
+pip install -U git+https://github.com/huggingface/accelerate.git
+pip install -U bitsandbytes==0.39.0
+pip install -U einops==0.6.1
+pip install -U sentencepiece
+```
+## How to Use
+### Normal Model
+#### Stream Output
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
+model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+streamer = TextStreamer(tokenizer)
+instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
+input_prompt = ""
+if not input_prompt:
+  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt)
+else:
+    prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+_ = model.generate(
+    inputs=inputs.input_ids,
+    streamer=streamer,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+    max_length=2048, temperature=0.7,
+    do_sample=True, top_k=4, top_p=0.95
+)
+```
+#### No Stream Output
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model = AutoModelForCausalLM.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
+input_prompt = ""
+if not input_prompt:
+  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt)
+else:
+    prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(
+    inputs=inputs.input_ids,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+    max_length=2048, temperature=0.7,
+    do_sample=True, top_k=4, top_p=0.95
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+### Quantize Model
+#### Stream Output
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer
+model = AutoModelForCausalLM.from_pretrained(
+    "azale-ai/DukunLM-7B-V1.0-Uncensored",
+    load_in_4bit=True,
+    torch_dtype=torch.float32,
+    trust_remote_code=True,
+    quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        llm_int8_threshold=6.0,
+        llm_int8_has_fp16_weight=False,
+        bnb_4bit_compute_dtype=torch.float16,
+        bnb_4bit_use_double_quant=True,
+        bnb_4bit_quant_type="nf4",
+    )
+)
+tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+streamer = TextStreamer(tokenizer)
+instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
+input_prompt = ""
+if not input_prompt:
+  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt)
+else:
+    prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+_ = model.generate(
+    inputs=inputs.input_ids,
+    streamer=streamer,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+    max_length=2048, temperature=0.7,
+    do_sample=True, top_k=4, top_p=0.95
+)
+```
+#### No Stream Output
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+model = AutoModelForCausalLM.from_pretrained(
+    "azale-ai/DukunLM-7B-V1.0-Uncensored",
+    load_in_4bit=True,
+    torch_dtype=torch.float32,
+    trust_remote_code=True,
+    quantization_config=BitsAndBytesConfig(
+        load_in_4bit=True,
+        llm_int8_threshold=6.0,
+        llm_int8_has_fp16_weight=False,
+        bnb_4bit_compute_dtype=torch.float16,
+        bnb_4bit_use_double_quant=True,
+        bnb_4bit_quant_type="nf4",
+    )
+)
+tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-7B-V1.0-Uncensored")
+instruction_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."
+input_prompt = ""
+if not input_prompt:
+  prompt = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt)
+else:
+    prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
+### Instruction:
+{instruction}
+### Input:
+{input}
+### Response:
+  """
+  prompt = prompt.format(instruction=instruction_prompt, input=input_prompt)
+inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(
+    inputs=inputs.input_ids,
+    pad_token_id=tokenizer.pad_token_id,
+    eos_token_id=tokenizer.eos_token_id,
+    max_length=2048, temperature=0.7,
+    do_sample=True, top_k=4, top_p=0.95
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+## Limitations
+- The base model language is English and fine-tuned to Indonesia
+- Cultural and contextual biases
+## License
+DukunLM is licensed under the [Creative Commons NonCommercial (CC BY-NC 4.0) license](https://creativecommons.org/licenses/by-nc/4.0/legalcode).
+## Contributing
+We welcome contributions to enhance and improve DukunLM. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request.
+## Contact Us
+[contact@azale.ai](mailto:contact@azale.ai)