aragpt2-mega-pos-msa / README.md

alsubari

Update README.md

58c6cc1 11 months ago

preview code

raw

history blame contribute delete

No virus

3.94 kB

	---
	language:
	- ar
	pipeline_tag: text-generation
	---
	# Model Card for Model ID

	## Model Details

	### Model Description

	- Language(s) (NLP): [Arabic]

	- Finetuned from model : [aragpt2-mega](https://huggingface.co/aubmindlab/aragpt2-mega)



	## Uses


	1. pose tagging for arabic language and it may use for other languages
	2. The model can be helpful for the arabic langauge students/researchers, since it provide the sentence anaylsis (اعراب الجملة ) in the context.
	3. arabic word toknizer
	4. it may use for translate the arabic dailects to MSA




	## Main Labels

	{'حرف جر': 'preposition',
	'اسم': 'noun',
	'اسم علم': 'proper noun',
	'لام التعريف': 'determiner',
	'صفة': 'adjective',
	'ضمير': 'personal pronoun',
	'فعل': 'verb',
	'حرف عطف': 'conjunction',
	'اسم موصول': 'relative pronoun',
	'حرف نفي': 'negative particle',
	'حروف مقطعة': 'quranic initials',
	'اسم اشارة': 'demonstrative pronoun',
	'حرف استئنافية': 'resumption',
	'حرف نصب': 'accusative particle',
	'حرف تسوية': 'equalization particle',
	'حرف حال': 'circumstantial particle',
	'أداة حصر': 'restriction particle',
	'ظرف زمان': 'time adverb',
	'حرف نهي': 'prohibition particle',
	'حرف كاف': 'preventive particle',
	'حرف ابتداء': 'inceptive particle',
	'حرف زائد': 'supplemental particle',
	'حرف استدراك': 'amendment particle',
	'حرف مصدري': 'subordinating conjunction',
	'حرف استفهام': 'interrogative particle',
	'ظرف مكان': 'location adverb',
	'حرف شرط': 'conditional particle',
	'لام التوكيد': 'emphatic',
	'حرف نداء': 'vocative particle',
	'حرف واقع في جواب الشرط': 'result particle',
	'حرف تفصيل': 'explanation particle',
	'أداة استثناء': 'exceptive particle',
	'حرف سببية': 'particle of cause',
	'التوكيد - النون الثقيلة': 'heavy noon emphesis',
	'حرف استقبال': 'future particle',
	'حرف تحقيق': 'particle of certainty',
	'لام التعليل': 'purpose',
	'حرف جواب': 'answer particle',
	'حرف اضراب': 'retraction particle',
	'حرف تحضيض': 'exhortation particle',
	'حرف تفسير': 'particle of interpretation',
	'لام الامر': 'imperative',
	'واو المعية': 'comitative particle',
	'حرف فجاءة': 'surprise particle',
	'حرف ردع': 'aversion particle',
	'اسم فعل أمر': 'imperative verbal noun'}


	## How to Get Started with the Model

	```python
	from transformers import GPT2Tokenizer
	from pyarabic.araby import strip_diacritics,strip_tatweel
	from arabert.aragpt2.grover.modeling_gpt2 import GPT2LMHeadModel
	from transformers import pipeline
	import re
	model_name='alsubari/aragpt2-mega-pos-msa'


	tokenizer = GPT2Tokenizer.from_pretrained('alsubari/aragpt2-mega-pos-msa')
	model = GPT2LMHeadModel.from_pretrained('alsubari/aragpt2-mega-pos-msa').to("cuda")

	generator = pipeline("text-generation",model=model,tokenizer=tokenizer,device=0)
	def generate(text):
	prompt = f'<\|startoftext\|>Instruction: {text}<\|pad\|>Answer:'
	pred_text= generator(prompt,
	pad_token_id=tokenizer.eos_token_id,
	num_beams=20,
	max_length=256,
	#min_length = 200,
	do_sample=False,
	top_p=0.5,
	top_k=1,
	repetition_penalty = 3.0,
	# temperature=0.8,
	no_repeat_ngram_size = 3)[0]['generated_text']
	try:
	pred_sentiment = re.findall("Answer:(.*)", pred_text,re.S)[-1]
	except:
	pred_sentiment = "None"

	return pred_sentiment
	text='تعلَّمْ من أخطائِكَ'
	generate(strip_tatweel(strip_diacritics(text)))
	#' تعلم ( تعلم : فعل ) من ( من : حرف جر ) أخطائك ( اخطاء : اسم ، ك : ضمير )'
	```


	### Results

	Epoch 1
	Training Loss 0.108500
	Validation Loss 0.082612