Hellisotherpeople commited on
Commit
156189f
1 Parent(s): ccb5e35

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -0
README.md ADDED
@@ -0,0 +1,48 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # debate2vec
2
+ Word-vectors created from a large corpus of competitive debate evidence, and data extraction / processing scripts
3
+
4
+ # Download Link
5
+ Github won't let me store large files in their repos.
6
+ * [FastText Vectors Here](https://drive.google.com/file/d/1m-CwPcaIUun4qvg69Hx2gom9dMScuQwS/view?usp=sharing) (~260mb)
7
+
8
+
9
+ # About
10
+
11
+ Created from all publically available Cross Examination Competitive debate evidence posted by the community on [Open Evidence](https://openev.debatecoaches.org/) (From 2013-2020)
12
+
13
+ Search through the original evidence by going to [debate.cards](http://debate.cards/)
14
+
15
+ Stats about this corpus:
16
+ * 222485 unique documents larger than 200 words (DebateSum plus some additional debate docs that weren't well-formed enough for inclusion into DebateSum)
17
+ * 107555 unique words (showing up more than 10 times in the corpus)
18
+ * 101 million total words
19
+
20
+ Stats about debate2vec vectors:
21
+ * 300 dimensions, minimum number of appearances of a word was 10, trained for 100 epochs with lr set to 0.10 using FastText
22
+ * lowercased (will release cased)
23
+ * No subword information
24
+
25
+ The corpus includes the following topics
26
+
27
+ * 2013-2014 Cuba/Mexico/Venezuela Economic Engagement
28
+ * 2014-2015 Oceans
29
+ * 2015-2016 Domestic Surveillance
30
+ * 2016-2017 China
31
+ * 2017-2018 Education
32
+ * 2018-2019 Immigration
33
+ * 2019-2020 Reducing Arms Sales
34
+
35
+ Other topics that this word vector model will handle extremely well
36
+
37
+ * Philosophy (Especially Left-Wing / Post-modernist)
38
+ * Law
39
+ * Government
40
+ * Politics
41
+
42
+
43
+ Initial release is of fasttext vectors without subword information. Future releases will include fine-tuned GPT-2 and other high end models as my GPU compute allows.
44
+
45
+ # Screenshots
46
+ ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec.jpg)
47
+ ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec2.jpg)
48
+ ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec3.jpg)