Qwen-2.5-3b-Quran / README.md
Ellbendls's picture
Update README.md
b56833c verified
metadata
library_name: transformers
license: mit
datasets:
  - emhaihsan/quran-indonesia-tafseer-translation
language:
  - id
base_model:
  - Qwen/Qwen2.5-3B-Instruct

Model Card for Fine-Tuned Qwen2.5-3B-Instruct

This is a fine-tuned version of the Qwen2.5-3B-Instruct model. The fine-tuning process utilized the Quran Indonesia Tafseer Translation dataset, which provides translations and tafsir in Bahasa Indonesia for the Quran.

Model Details

Model Description

This model is designed for NLP tasks involving Quranic text in Bahasa Indonesia, including understanding translations and tafsir.

Uses

Direct Use

This model can be used for applications requiring the understanding, summarization, or retrieval of Quranic translations and tafsir in Bahasa Indonesia.

Downstream Use

It is suitable for fine-tuning on tasks such as:

  • Quranic text summarization
  • Question answering systems related to Islamic knowledge
  • Educational tools for learning Quranic content in Indonesian

Biases

  • The model inherits any biases present in the dataset, which is specific to Islamic translations and tafsir in Bahasa Indonesia.

Recommendations

  • Users should ensure that applications using this model respect cultural and religious sensitivities.
  • Results should be verified by domain experts for critical applications.

How to Get Started with the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("Ellbendls/Qwen-2.5-3b-Quran")
model = AutoModelForCausalLM.from_pretrained("Ellbendls/Qwen-2.5-3b-Quran")

# Move the model to GPU
model.to("cuda")

# Define the input message
messages = [
    {
        "role": "user", 
        "content": "Tafsirkan ayat ini اِهْدِنَا الصِّرَاطَ الْمُسْتَقِيْمَۙ"
    }
]

# Generate the prompt using the tokenizer
prompt = tokenizer.apply_chat_template(messages, tokenize=False, 
                                       add_generation_prompt=True)

# Tokenize the prompt and move inputs to GPU
inputs = tokenizer(prompt, return_tensors='pt', padding=True, 
                   truncation=True).to("cuda")

# Generate the output using the model
outputs = model.generate(**inputs, max_length=150, 
                         num_return_sequences=1)

# Decode the output
text = tokenizer.decode(outputs[0], skip_special_tokens=True)

# Print the result
print(text.split("assistant")[1])