File size: 2,529 Bytes
2f3f189
 
 
 
 
a073869
2f3f189
 
 
26426a1
 
803f669
26426a1
 
 
 
 
 
 
 
12cfd93
a00b355
2c39ccc
a00b355
91324c8
f24a3fc
a00b355
 
 
91324c8
5b1291a
803f669
a00b355
 
 
91324c8
 
 
 
 
 
 
 
 
 
 
5b1291a
 
91324c8
 
a00b355
803f669
91324c8
 
 
a00b355
 
 
f24a3fc
a00b355
 
5b1291a
 
 
a00b355
91324c8
a00b355
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
datasets:
- dru-ac/ArTopicDS
- dru-ac/ArTopicDS-Books
metrics:
- accuracy
- precision
- recall
pipeline_tag: text-classification
---

`ArGTClass` is a `bloomz` based classification model, finetuned to categorize a comprehensive spectrum
of fourteen distinct subjects that are Religion,
Finance and Economics, Politics, Medical, Cul-
ture, Sports, Science and Technology, Anthro-
pology and Sociology, Art and Literature, Edu-
cation, History, Language and Linguistics, Law,
as well as Philosophy in Arabic.


For more details, check out our [paper](here)

Finetuning code in the following notebook: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/106oPnGhe8B_BCgV6LnJbvVZNv4mCu9Zv?usp=sharing)


### Full classification example (CPU)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass")

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt')
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]
```

### Full classification example (GPU)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

inputs = tokenizer(text, return_tensors= 'pt').to("cuda")
outputs = model(**inputs)
ind = outputs.logits.argmax(dim=-1)[0]
predicted_class = model.config.id2label[ind.item()]
```


### Pipeline example (CPU & GPU)

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("dru-ac/ArGTClass")
model = AutoModelForSequenceClassification.from_pretrained("dru-ac/ArGTClass", device_map = 'auto')

classifier = pipeline("text-classification", model=model, tokenizer= tokenizer)

text = " .قصفت إسرائيل مستشفى المعمداني في مدينة غزة، والذي خلف مئات الشهداء والجرحى"

classifier(text)
```