ClassCat commited on
Commit
0e1f0eb
1 Parent(s): 609b52d

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md ADDED
@@ -0,0 +1,40 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: fr
3
+ license: cc-by-sa-4.0
4
+ datasets:
5
+ - wikipedia
6
+ - cc100
7
+ widget:
8
+ - text: "Je vais à la"
9
+ - text: "Je m'appelle"
10
+ - text: "J'aime le café"
11
+ - text: "Nous avons"
12
+ ---
13
+
14
+ ## GPT2 French base model (Uncased)
15
+
16
+ ### Prerequisites
17
+
18
+ transformers==4.19.2
19
+
20
+ ### Model architecture
21
+
22
+ This model uses GPT2 base setttings except vocabulary size.
23
+
24
+ ### Tokenizer
25
+
26
+ Using BPE tokenizer with vocabulary size 50,000.
27
+
28
+ ### Training Data
29
+
30
+ * [wiki40b/fr](https://www.tensorflow.org/datasets/catalog/wiki40b#wiki40bfr) (Spanish Wikipedia)
31
+ * Subset of [CC-100/fr](https://data.statmt.org/cc-100/) : Monolingual Datasets from Web Crawl Data
32
+
33
+ ### Usage
34
+
35
+ ```python
36
+ from transformers import pipeline
37
+
38
+ generator = pipeline('text-generation', model='ClassCat/gpt2-base-french')
39
+ generator("Je vais à la", max_length=50, num_return_sequences=5)
40
+ ```