--- tags: - generated_from_trainer language: ar datasets: - LABR widget: - text: "كان الكاتب ممكن" - text: "كتاب ممتاز ولكن" - text: "رواية درامية جدا والافكار بسيطة" model-index: - name: argpt2-goodreads results: [] --- # argpt2-goodreads This model is a fine-tuned version of [gpt2-medium](https://huggingface.co/gpt2-medium) on an goodreads LABR dataset. It achieves the following results on the evaluation set: - Loss: 1.4389 ## Model description Generate sentences either positive/negative examples based on goodreads corpus in arabic language. ## Intended uses & limitations the model fine-tuned on arabic language only with aspect to generate sentences such as reviews in order todo the same for other languages you need to fine-tune it in your own. any harmful content generated by GPT2 should not be used in anywhere. ## Training and evaluation data training and validation done on goodreads dataset LABR 80% for trainng and 20% for testing ## Usage ``` from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mofawzy/argpt2-goodreads") model = AutoModelForCausalLM.from_pretrained("mofawzy/argpt2-goodreads") ``` ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 5e-05 - train_batch_size: 16 - eval_batch_size: 16 - seed: 42 - distributed_type: tpu - num_devices: 8 - total_train_batch_size: 128 - total_eval_batch_size: 128 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - num_epochs: 20.0 ### Training results - train_loss = 1.474 ### Evaluation results - eval_loss = 1.4389 ### train metrics - epoch = 20.0 - train_loss = 1.474 - train_runtime = 2:18:14.51 - train_samples = 108110 - train_samples_per_second = 260.678 - train_steps_per_second = 2.037 ### eval metrics - epoch = 20.0 - eval_loss = 1.4389 - eval_runtime = 0:04:37.01 - eval_samples = 27329 - eval_samples_per_second = 98.655 - eval_steps_per_second = 0.773 - perplexity = 4.2162 ### Framework versions - Transformers 4.13.0.dev0 - Pytorch 1.10.0+cu102 - Datasets 1.16.1 - Tokenizers 0.10.3