πŸ‡°πŸ‡­ Khmer Text Summarization Adapters (Qwen)

QLoRA adapters fine-tuned for Khmer text summarization.
Trained using Unsloth for efficient 4-bit fine-tuning.

πŸ“‚ Variants

Variant Subfolder Description
Title-based title_based/ Trained on raw Khmer news dataset
Synthetic synthetic/ Trained on synthetic dataset

πŸš€ Usage

from unsloth import FastLanguageModel import torch

ALPACA_PROMPT = """αžαžΆαž„αž€αŸ’αžšαŸ„αž˜αž“αŸαŸ‡αž‚αžΊαž‡αžΆαžŸαŸαž…αž€αŸ’αžαžΈαžŽαŸ‚αž“αžΆαŸ†αž’αŸ†αž–αžΈαž€αž·αž…αŸ’αž…αž€αžΆαžšαž˜αž½αž™αŸ” αžŸαžΌαž˜αž•αŸ’αžαž›αŸ‹αž…αž˜αŸ’αž›αžΎαž™αž±αŸ’αž™αž”αžΆαž“αžαŸ’αžšαžΉαž˜αžαŸ’αžšαžΌαžœ αž–αŸαž‰αž›αŸαž‰ αž“αž·αž„αž„αžΆαž™αž™αž›αŸ‹αŸ”

Instruction:

αž…αžΌαž›αžŸαž„αŸ’αžαŸαž” αž’αžαŸ’αžαž”αž‘αžαžΆαž„αž€αŸ’αžšαŸ„αž˜αž“αŸαŸ‡

Input:

{}

Response:

"""

model, tokenizer = FastLanguageModel.from_pretrained( model_name="unsloth/Qwen2.5-7B-Instruct-bnb-4bit", max_seq_length=8192, load_in_4bit=True, adapter_name="ChilyRan/qwen-khmer-adapters", adapter_kwargs={"subfolder": "synthetic"} # or "title_based" ) FastLanguageModel.for_inference(model)

text = "αž”αž‰αŸ’αž…αžΌαž›αž’αžαŸ’αžαž”αž‘αžαŸ’αž˜αŸ‚αžšαžšαž”αžŸαŸ‹αž’αŸ’αž“αž€αž“αŸ…αž‘αžΈαž“αŸαŸ‡..." prompt = ALPACA_PROMPT.format(text) inputs = tokenizer(prompt, return_tensors="pt", truncation=True).to("cuda")

with torch.no_grad(): outputs = model.generate( **inputs, max_new_tokens=128, use_cache=True, do_sample=True, temperature=0.3, top_p=0.85 )

decoded = tokenizer.decode(outputs[0], skip_special_tokens=True) summary = decoded.split("### Response:")[-1].strip() print(summary)

βš™οΈ Training Details

Config Value
Base model unsloth/Qwen2.5-7B-Instruct-bnb-4bit
Method QLoRA
Framework Unsloth
Max sequence length 8192
Task Khmer text summarization
Seed 42
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for CADT-IDRI/qwen-khmer-text-sum-adapters

Base model

Qwen/Qwen2.5-7B
Adapter
(51)
this model