metadata

library_name: peft
license: cc-by-nc-4.0
language:
  - en
  - id
datasets:
  - MBZUAI/Bactrian-X
tags:
  - qlora
  - wizardlm
  - uncensored
  - instruct
  - alpaca
pipeline_tag: text-generation

DukunLM - Indonesian Language Model 🧙‍♂️

🚀 Welcome to the DukunLM repository! DukunLM is an open-source language model trained to generate Indonesian text using the power of AI. DukunLM, meaning "WizardLM" in Indonesian, is here to revolutionize language generation with its massive 7 billion parameters! 🌟

Model Details

Model: nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16
Base Model: ehartford/WizardLM-Uncensored-Falcon-7b
Fine-tuned with: MBZUAI/Bactrian-X (Indonesian subset)
Prompt Format: Alpaca
Fine-tuned method: QLoRA

⚠️ Warning: DukunLM is an uncensored model without filters or alignment. Please use it responsibly as it may contain errors, cultural biases, and potentially offensive content. ⚠️

Installation

To use DukunLM, ensure that PyTorch has been installed and that you have an Nvidia GPU (or use Google Colab). After that you need to install the required dependencies:

pip install -U git+https://github.com/huggingface/transformers.git
pip install -U git+https://github.com/huggingface/peft.git
pip install -U bitsandbytes==0.39.0
pip install -U einops==0.6.1

How to Use

Stream Output

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TextStreamer

model = AutoModelForCausalLM.from_pretrained(
    "nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
model = PeftModel.from_pretrained(model, "azale-ai/DukunLM-Uncensored-7B")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")
streamer = TextStreamer(tokenizer)

input_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."

text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{input_prompt}

### Response:
"""

inputs = tokenizer(text, return_tensors="pt").to("cuda")
_ = model.generate(
    inputs=inputs.input_ids,
    streamer=streamer,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=2048, use_cache=True,
    temperature=0.7, do_sample=True,
    top_k=4, top_p=0.95
)

No Stream Output

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model = AutoModelForCausalLM.from_pretrained(
    "nferroukhi/WizardLM-Uncensored-Falcon-7b-sharded-bf16",
    load_in_4bit=True,
    torch_dtype=torch.float32,
    trust_remote_code=True,
    quantization_config=BitsAndBytesConfig(
        load_in_4bit=True,
        llm_int8_threshold=6.0,
        llm_int8_has_fp16_weight=False,
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
    )
)
model = PeftModel.from_pretrained(model, "azale-ai/DukunLM-Uncensored-7B")
tokenizer = AutoTokenizer.from_pretrained("azale-ai/DukunLM-Uncensored-7B")

input_prompt = "Jelaskan mengapa air penting bagi kehidupan manusia."

text = f"""
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{input_prompt}

### Response:
"""

inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(
    inputs=inputs.input_ids,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id,
    max_length=512, use_cache=True,
    temperature=0.7, do_sample=True,
    top_k=4, top_p=0.95
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Limitations

The base model language is English and fine-tuned to Indonesia
Cultural and contextual biases

License

DukunLM is licensed under the Creative Commons NonCommercial (CC BY-NC 4.0) license.

Contributing

We welcome contributions to enhance and improve DukunLM. If you have any suggestions or find any issues, please feel free to open an issue or submit a pull request.

Contact Us

contact@azale.ai