usmiva commited on
Commit
9b014c6
1 Parent(s): 8438a89

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -9,10 +9,13 @@ pipeline_tag: text-generation
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
- This model is pre-trained with the causal language modelling objective on the web scraped dataset provided by Identrics [bg_web](https://huggingface.co/datasets/identrics/bg_web).
 
 
13
 
14
  ## Model Details
15
 
 
16
  ### Model Description
17
 
18
  <!-- Provide a longer summary of what this model is. -->
@@ -20,9 +23,9 @@ This model is pre-trained with the causal language modelling objective on the we
20
 
21
 
22
  - **Developed by:** [Iva Marinova](https://huggingface.co/usmiva)
23
- - **Shared by [optional]:** [More Information Needed]
24
  - **Model type:** GPT-2
25
- - **Language(s) (NLP):** Pytorch
26
  - **License:** [More Information Needed]
27
  - **Finetuned from model [optional]:** [More Information Needed]
28
 
@@ -31,7 +34,7 @@ This model is pre-trained with the causal language modelling objective on the we
31
  <!-- Provide the basic links for the model. -->
32
 
33
  - **Repository:** [More Information Needed]
34
- - **Paper [optional]:** [More Information Needed]
35
  - **Demo [optional]:** [More Information Needed]
36
 
37
  ## Uses
 
9
 
10
  <!-- Provide a quick summary of what the model is/does. -->
11
 
12
+ This model is pre-trained with the causal language modelling objective on a private web scraped dataset created at the Bulgarian Academy of Sciences under the [ClaDa-BG Project](https://clada-bg.eu/en/).
13
+
14
+ The dataset is cleaned and balanced with a specialized procedure to avoid cultural, political, racial and other biases. The procedure is described in the paper dedicated to this model- coming soon!
15
 
16
  ## Model Details
17
 
18
+
19
  ### Model Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
 
23
 
24
 
25
  - **Developed by:** [Iva Marinova](https://huggingface.co/usmiva)
26
+ - **Shared by [optional]:** ClaDa-BG, : National Interdisciplinary Research E-Infrastructure for Bulgarian Language and Cultural Heritage Resources and Technologies integrated within European CLARIN and DARIAH infrastructures
27
  - **Model type:** GPT-2
28
+ - **Language(s) (NLP):** Bulgarian
29
  - **License:** [More Information Needed]
30
  - **Finetuned from model [optional]:** [More Information Needed]
31
 
 
34
  <!-- Provide the basic links for the model. -->
35
 
36
  - **Repository:** [More Information Needed]
37
+ - **Paper [optional]:** Marinova et. al. 2023 - link to be added
38
  - **Demo [optional]:** [More Information Needed]
39
 
40
  ## Uses