bayartsogt commited on
Commit
94882dd
·
1 Parent(s): 371675c

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -0
README.md ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language: "mn"
3
+ thumbnail: "https://avatars.githubusercontent.com/u/43239645?s=60&v=4"
4
+ tags:
5
+ - gpt2
6
+ datasets:
7
+ - OSCAR
8
+ ---
9
+
10
+ # Mongolian GPT2
11
+
12
+ Goal is to create a strong language generation model for Mongolian
13
+ Since initial code and data is pretty much written by @patrickvonplaten and other huggingface members, it should not be so hard to get the first sense.
14
+
15
+ ## Model
16
+ Randomly initialized GPT2 model
17
+
18
+ ## Datasets
19
+ We can use OSCAR which is available through datasets
20
+
21
+ ## Datasets
22
+ A causal language modeling script for Flax is available here 1. It can be used pretty much without any required code changes.
23
+ If there is time left, I’d love to try some private crawling and integrate it datasets.
24
+
25
+ ## Expected Outcome
26
+ Understandable Mongolian text generation model
27
+
28
+ ## Challenges
29
+ Lack of data → OSCAR Mongolian is just 2.2G. Maybe we need to research ways to acquire more data with this.