premsa commited on
Commit
b243158
1 Parent(s): aef6040

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +37 -0
README.md CHANGED
@@ -1,3 +1,40 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+ ---
6
+ license: apache-2.0
7
+ ---
8
+
9
+ model base: https://huggingface.co/google-bert/bert-base-multilingual-uncased
10
+
11
+ dataset: https://github.com/ramybaly/Article-Bias-Prediction
12
+
13
+
14
+ training parameters:
15
+ - batch_size: 100
16
+ - epochs: 5
17
+ - dropout: 0.05
18
+ - max_length: 512
19
+ - learning_rate: 3e-5
20
+ - warmup_steps: 100
21
+ - random_state: 239
22
+
23
+
24
+ training methodology:
25
+ - sanitize dataset following specific rule-set, utilize random split as provided in the dataset
26
+ - train on train split and evaluate on validation split in each epoch
27
+ - evaluate test split only on the model that performed best on validation loss
28
+
29
+ result summary:
30
+ - throughout the five training epochs, model of second epoch achieved the lowest validation loss of 0.3003
31
+ - on test split second epoch model achieved f1 score of 0.8842
32
+
33
+ usage:
34
+
35
+ ```
36
+ model = AutoModelForSequenceClassification.from_pretrained("premsa/political-bias-prediction-allsides-mBERT")
37
+ tokenizer = AutoTokenizer.from_pretrained(premsa/"political-bias-prediction-allsides-mBERT")
38
+ nlp = pipeline("text-classification", model=model, tokenizer=tokenizer)
39
+ print(nlp("die massen werden von den medien kontrolliert."))
40
+ ```