Edit model card

# Arabic text classification using deep learning (ArabicT5)

# Our experiment

  • The category mapping: category_mapping = { 'Politics':1, 'Finance':2, 'Medical':3, 'Sports':4, 'Culture':5, 'Tech':6, 'Religion':7 }

  • Training parameters

    Training batch size 8
    Evaluation batch size 8
    Learning rate 1e-4
    Max length input 200
    Max length target 3
    Number workers 4
    Epoch 2
  • Results

    Validation Loss 0.0479
    Accuracy 96.49%
    BLeU 96.49%

# SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

# Arabic text classification using deep learning models

# Example usage

from transformers import T5ForConditionalGeneration, T5Tokenizer

model_name="Hezam/ArabicT5_Classification"
model = T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer = T5Tokenizer.from_pretrained(model_name)

text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه  متابعه تفاجا زوار موقع القناه الاولي المغربي"
tokens=tokenizer(text, max_length=200,
                    truncation=True,
                    padding="max_length",
                    return_tensors="pt"
                )

output= model.generate(tokens['input_ids'],
                       max_length=3,
                       length_penalty=10)

output = [tokenizer.decode(ids, skip_special_tokens=True,clean_up_tokenization_spaces=True)for ids in output]
output
['5']
Downloads last month
6