File size: 5,014 Bytes
672b094
2459ea2
672b094
 
 
 
 
 
 
 
c357f15
 
 
 
 
 
 
 
 
f787eb8
c357f15
 
 
 
 
b32b464
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c357f15
23421dc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
672b094
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
---
license: afl-3.0
datasets:
- iqballx/indonesian_news_datasets
language:
- id
metrics:
- accuracy
library_name: transformers
---
# Model Card for Indonesian News Classification Model

## Model Description
This model is fine-tuned for the specific task of classifying Indonesian news articles (data were extracted from iqballx/indonesian_news_datasets) into predefined categories. It was trained using a dataset that was created by translating Indonesian news articles into English using a Neural Machine Translation (NMT) system and then labeling them with niksmer/ManiBERT, a model trained to classify political texts. The resulting dataset contains parallel corpora of Indonesian and English news texts alongside their corresponding categories.

## Training Data
The training data consists of articles from the iqballx/indonesian_news_datasets which were translated to English and then labeled using the niksmer/ManiBERT model. The dataset includes various categories, capturing a wide array of topics.

## Evaluation
The model was evaluated on a held-out test set, and its performance was measured in terms of accuracy. During the training process, the model's accuracy improved across multiple epochs, with the following accuracy scores achieved: 61.71% after the first epoch, 64.62% after the second epoch, 65.64% after the third epoch, and 65.27% after the fourth epoch. These results demonstrate the model's ability to consistently make correct classifications across different categories, indicating its robust performance.

## Limitations and Bias
As with any machine learning model, it is important to recognize potential limitations and biases. The translation step could introduce errors or nuances that affect the labeling accuracy. Additionally, the ManiBERT model used for initial labeling was trained on political texts, which may limit its effectiveness on non-political news or introduce political bias.

## How to Use the Model
To classify an Indonesian news article, you can use the script below: 

```python
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model_name = "YagiASAFAS/indonesia-news-classification-bert"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Write Indonesian Text
inputs = tokenizer("[Indonesian Text]", return_tensors="pt")

outputs = model(**inputs)
predictions = torch.nn.functional.softmax(outputs.logits, dim=1)

id2label = model.config.id2label

predicted_class_index = torch.argmax(predictions, dim=1).item()

predicted_class_index

predicted_category = id2label.get(predicted_class_index)

print("Predicted Category:", predicted_category)
```

# Label Mapping
| Label ID | Label Text |
|----------|------------|
| 0        | Agriculture and Farmers |
| 1        | Anti-Growth Economy and Sustainability |
| 2        | Anti-Imperialism |
| 3        | Centralisation: Positive |
| 4        | Civic Mindedness: Positive |
| 5        | Constitutionalism: Negative |
| 6        | Constitutionalism: Positive |
| 7        | Controlled Economy |
| 8        | Corporatism/ Mixed Economy |
| 9        | Culture: Positive |
| 10       | Decentralisation: Positive |
| 11       | Democracy |
| 12       | Economic Goals |
| 13       | Economic Growth: Positive |
| 14       | Economic Orthodoxy |
| 15       | Economic Planning |
| 16       | Education Expansion |
| 17       | Education Limitation |
| 18       | Environmental Protection |
| 19       | Equality: Positive |
| 20       | European Community/Union or Latin America Integration: Negative |
| 21       | European Community/Union or Latin America Integration: Positive |
| 22       | Foreign Special Relationships: Negative |
| 23       | Foreign Special Relationships: Positive |
| 24       | Free Market Economy |
| 25       | Freedom and Human Rights |
| 26       | Governmental and Administrative Efficiency |
| 27       | Incentives: Positive |
| 28       | Internationalism: Negative |
| 29       | Internationalism: Positive |
| 30       | Labour Groups: Negative |
| 31       | Labour Groups: Positive |
| 32       | Law and Order |
| 33       | Market Regulation |
| 34       | Marxist Analysis: Positive |
| 35       | Military: Negative |
| 36       | Military: Positive |
| 37       | Multiculturalism: Negative |
| 38       | Multiculturalism: Positive |
| 39       | National Way of Life: Negative |
| 40       | National Way of Life: Positive |
| 41       | Nationalisation |
| 42       | Non-economic Demographic Groups |
| 43       | None |
| 44       | Peace |
| 45       | Political Authority |
| 46       | Political Corruption |
| 47       | Protectionism: Negative |
| 48       | Protectionism: Positive |
| 49       | Technology and Infrastructure: Positive |
| 50       | Traditional Morality: Negative |
| 51       | Traditional Morality: Positive |
| 52       | Underprivileged Minority Groups |
| 53       | Welfare State Expansion |
| 54       | Welfare State Limitation |