d1mitriz commited on
Commit
37958ac
1 Parent(s): 13361ad

Improved model card

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -9,7 +9,7 @@ datasets:
9
  metrics:
10
  - accuracy
11
  model-index:
12
- - name: greek-longformer-base-4096-uncased
13
  results:
14
  - task:
15
  name: Masked Language Modeling
@@ -27,14 +27,23 @@ model-index:
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment. -->
29
 
30
- # greek-longformer-base-4096-uncased
31
 
32
- This model is a Longformer model version based on [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096) trained on the dataset/wiki_oscar_combined_normalized_uncased dataset from scratch.
 
 
33
  It achieves the following results on the evaluation set:
34
 
35
  - Loss: 1.1080
36
  - Accuracy: 0.7765
37
 
 
 
 
 
 
 
 
38
  ## Model description
39
 
40
  More information needed
 
9
  metrics:
10
  - accuracy
11
  model-index:
12
+ - name: greek-longformer-base-4096
13
  results:
14
  - task:
15
  name: Masked Language Modeling
 
27
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
28
  should probably proofread and complete it, then remove this comment. -->
29
 
30
+ # Greek Longformer
31
 
32
+ A Greek version of the Longformer Language Model.
33
+
34
+ This model is a (from scratch) Greek Longformer model based on the configuration of [allenai/longformer-base-4096](https://huggingface.co/allenai/longformer-base-4096), and trained on the combined datasets from the [Greek Wikipedia](https://huggingface.co/datasets/wikipedia) and the Greek part of [OSCAR](https://huggingface.co/datasets/oscar-corpus/OSCAR-2301).
35
  It achieves the following results on the evaluation set:
36
 
37
  - Loss: 1.1080
38
  - Accuracy: 0.7765
39
 
40
+ ## Pre-training corpora
41
+
42
+ The pre-training corpora of `greek-longformer-base-4096` include:
43
+
44
+ - The Greek part of [Wikipedia](https://el.wikipedia.org/wiki/Βικιπαίδεια:Αντίγραφα_της_βάσης_δεδομένων),
45
+ - The Greek part of [OSCAR](https://traces1.inria.fr/oscar/), a cleansed version of [Common Crawl](https://commoncrawl.org).
46
+
47
  ## Model description
48
 
49
  More information needed