Update README.md

1f134da verified about 1 month ago

9.09 kB

	---
	base_model: facebook/mbart-large-50
	library_name: peft
	license: mit
	tags:
	- generated_from_trainer
	model-index:
	- name: mbart-large-50_Nepali_News_Summarization_QLoRA_4bit
	results: []
	language:
	- ne
	metrics:
	- rouge
	pipeline_tag: summarization
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mbart-large-50_Nepali_News_Summarization_QLoRA_4bit

	This model is a fine-tuned version of [facebook/mbart-large-50](https://huggingface.co/facebook/mbart-large-50) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 1.3739
	- Rouge-1 R: 0.3816
	- Rouge-1 P: 0.389
	- Rouge-1 F: 0.3751
	- Rouge-2 R: 0.2142
	- Rouge-2 P: 0.2189
	- Rouge-2 F: 0.2093
	- Rouge-l R: 0.3711
	- Rouge-l P: 0.3779
	- Rouge-l F: 0.3646
	- Gen Len: 14.1121

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## How to use?
	```python
	from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	from peft import get_peft_model, PeftModel
	import torch

	model_name = 'caspro/mbart-large-50_Nepali_News_Summarization_QLoRA_4bit'
	base_model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-50", load_in_4bit=True)
	# Load the tokenizer and model from the Hugging Face Hub
	tokenizer = AutoTokenizer.from_pretrained(model_name)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
	prefix = "सारांशमा: "


	def preprocess_function(examples):
	inputs = [prefix + doc for doc in examples["text"]]
	model_inputs = tokenizer(inputs, max_length=1024, truncation=True)

	#tokenizer.set_tgt_lang_special_tokens('ne_NP')
	labels = tokenizer(text_target=examples["title"], max_length=20, truncation=True)

	model_inputs["labels"] = labels["input_ids"]
	return model_inputs


	text = 'सारांशमा: नेपालमा उपलब्ध कुल विद्युतको एक चौथाइभन्दा बढी विद्युत प्राविधिक र अप्राविधिक रुपमा चुहावट हुने तथ्यका माझ त्यसलाई नियन्त्रण गर्न प्राधिकरणले एउटा समिति बनाएर अनुसन्धान पनि थालेको थियो। \n\nमहानगरीय प्रहरी अपराध महाशाखाका अनुसार नेपाल विद्युत प्राधिकरणका कतिपय कर्मचारीको टोलीले गाडीमा आवश्यक सबै उपकरणहरु बोकेर ग्राहकको घरघरमा पुगेर विद्युत चोर्नमा सघाउने गरेको भेटिएको हो। \n\nकडाइ\n\nआइतबार प्राधिकरणका वर्तमान र पूर्व कर्मचारी गरी १२ जना तथा चारजना व्यापारीलाई पक्राउ गरेको प्रहरी अपराध महाशाखाले प्रारम्भिक अनुसन्धानमा चोरीको गिरोहको आकार अझ ठूलो हुन सक्ने जनाएको छ। \n\nमहाशाखाका प्रमुख सर्वेन्द्र खनालले भने, "अहिलेसम्मको प्रारम्भिक अनुसन्धानमा केही कलकारखाना, केही उद्योगहरु पनि चोरीमा संलग्न भएको देखिन्छ। यसभन्दा बाहेक बाँकी अरुपनि छन्। तिनीहरुलाई हामी जतिसक्दो चाँडो कानुनको दायरामा ल्याउँछौं।" \n\nनेपाल विद्युत प्राधिकरणका अनुसार नेपालमा हाल उपलब्ध कुल विद्युतको करिब २६ प्रतिशत विद्युत चुहावट हुने गर्दछ। \n\nत्यसमा १२ प्रतिशत प्राविधिक तथा १४ प्रतिशत भन्दा बढी अप्राविधिक हुने गरेको छ। \n\nकमसल खालको विद्युतीय सामाग्री गर्दा हुने चुहावट प्राविधिक हो।\n\nमिटरमा कम खपत देखाउने गरी विद्युत चोरी भए त्यो चाहिँ अप्राविधिक चुहावटमा पर्छ। \n\nप्रयास\n\nचोरी नियन्त्रण गर्न उर्जा मन्त्रालयले छुट्टै समिति पनि गठन गरिएको छ। \n\nचोरी नियन्त्रणको अहिले थालिएको अभियानमा नेपाल विद्युत प्राधिकरण र उर्जा मन्त्रालयले सघाएको पनि प्रहरीले जनाएको छ। \n\nपक्राउ गरिएकाहरुलाई ठगी मुद्दा लगाइएको छ।\n\nतर उनीहरुलाई विद्युत चोरी ऐन जस्ता आवश्यक ऐन अन्तर्गत कारबाही अगाडि बढाउन सक्ने महाशाखा प्रमुख तथा एसएसपी खनाल बताउँछन्। \n\nउनले भने, "यसमा धेरै पक्षको संलग्नता भएकोले एकैथरी कानुनबाट सम्बोधन नहुन सक्छ। तर सबैजना ठगीसँग सम्बन्धित हुने भएकोले यो कानुनले समेट्छ। त्यही अनुरुप नै हामीले अनुसन्धान शुरु गरेका छौं"। \n\nविद्युत चोरी गर्नेमा सर्वसाधारण उद्योगीहरु र व्यापारीहरु रहेको बताइएको छ।\n\nविद्युतको चोरी र चुहावट रोक्ने भनिदैं आएपनि हालसम्म त्यो प्रभावकारी देखिएको छैन। \n\n'
	lora_model = PeftModel.from_pretrained(base_model, model_name)
	# Assuming you have a GPU available, move the model to the GPU
	if torch.cuda.is_available():
	device = torch.device("cuda")
	lora_model.to(device)

	def generate_summary(text):
	inputs = tokenizer(text, return_tensors="pt", max_length=1024, truncation=True)
	# Move the input tensors to the same device as the model
	if torch.cuda.is_available():
	inputs = inputs.to(device)
	summary_ids = lora_model.generate(inputs['input_ids'], num_beams=4, max_length=128, early_stopping=True)
	summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
	return summary

	summary = generate_summary(prefix + text)
	summary
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 5
	- eval_batch_size: 5
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge-1 R \| Rouge-1 P \| Rouge-1 F \| Rouge-2 R \| Rouge-2 P \| Rouge-2 F \| Rouge-l R \| Rouge-l P \| Rouge-l F \| Gen Len \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:---------:\|:-------:\|
	\| 1.5617 \| 1.0 \| 10191 \| 1.4589 \| 0.3552 \| 0.3834 \| 0.3572 \| 0.1923 \| 0.21 \| 0.1926 \| 0.3456 \| 0.3728 \| 0.3474 \| 13.6645 \|
	\| 1.42 \| 2.0 \| 20382 \| 1.3993 \| 0.3674 \| 0.3858 \| 0.3661 \| 0.2047 \| 0.2159 \| 0.2029 \| 0.3581 \| 0.3758 \| 0.3568 \| 13.7819 \|
	\| 1.2407 \| 3.0 \| 30573 \| 1.3739 \| 0.3816 \| 0.389 \| 0.3751 \| 0.2142 \| 0.2189 \| 0.2093 \| 0.3711 \| 0.3779 \| 0.3646 \| 14.1121 \|


	### Framework versions

	- PEFT 0.11.1
	- Transformers 4.42.3
	- Pytorch 2.1.2
	- Datasets 2.20.0
	- Tokenizers 0.19.1