ClassCat commited on
Commit
52fb9af
1 Parent(s): a4d286a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: ca
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - cc100
6
+ - oscar
7
+ - wikipedia
8
+ widget:
9
+ - text: "M'agrada el clima i el menjar"
10
+ - text: "Ell està una mica"
11
+ ---
12
+
13
+ ## GPT2 Catalan small model Version 2 (Uncased)
14
+
15
+ ### Prerequisites
16
+
17
+ transformers==4.19.2
18
+
19
+ ### Model architecture
20
+
21
+ This model uses parameters based on GPT2 base setttings, but the number of layers is half the size of it.
22
+
23
+ ### Tokenizer
24
+
25
+ Using BPE tokenizer with vocabulary size 50,000.
26
+
27
+ ### Training Data
28
+
29
+ * [wiki40b/ca](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bca) (Catalan Wikipedia)
30
+ * Subset of [oscar](https://huggingface.co/datasets/oscar)
31
+ * Subset of [CC-100/ca](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
32
+
33
+ ### Usage
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ unmasker = pipeline('fill-mask', model='ClassCat/gpt2-small-catalan-v2')
39
+ unmasker("Ell està una mica")
40
+ ```