osanseviero HF staff commited on
Commit
07e68d7
1 Parent(s): bab71b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -55
README.md CHANGED
@@ -1,60 +1,7 @@
1
  ---
2
  tags:
3
- - feature-extraction
4
  library_name: generic
5
  ---
6
 
7
- # debate2vec
8
- Word-vectors created from a large corpus of competitive debate evidence, and data extraction / processing scripts
9
-
10
- #usage
11
- ```
12
- import fasttext.util
13
- ft = fasttext.load_model('debate2vec.bin')
14
- ft.get_word_vector('dialectics')
15
- ```
16
- # Download Link
17
- Github won't let me store large files in their repos.
18
- * [FastText Vectors Here](https://drive.google.com/file/d/1m-CwPcaIUun4qvg69Hx2gom9dMScuQwS/view?usp=sharing) (~260mb)
19
-
20
-
21
- # About
22
-
23
- Created from all publically available Cross Examination Competitive debate evidence posted by the community on [Open Evidence](https://openev.debatecoaches.org/) (From 2013-2020)
24
-
25
- Search through the original evidence by going to [debate.cards](http://debate.cards/)
26
-
27
- Stats about this corpus:
28
- * 222485 unique documents larger than 200 words (DebateSum plus some additional debate docs that weren't well-formed enough for inclusion into DebateSum)
29
- * 107555 unique words (showing up more than 10 times in the corpus)
30
- * 101 million total words
31
-
32
- Stats about debate2vec vectors:
33
- * 300 dimensions, minimum number of appearances of a word was 10, trained for 100 epochs with lr set to 0.10 using FastText
34
- * lowercased (will release cased)
35
- * No subword information
36
-
37
- The corpus includes the following topics
38
-
39
- * 2013-2014 Cuba/Mexico/Venezuela Economic Engagement
40
- * 2014-2015 Oceans
41
- * 2015-2016 Domestic Surveillance
42
- * 2016-2017 China
43
- * 2017-2018 Education
44
- * 2018-2019 Immigration
45
- * 2019-2020 Reducing Arms Sales
46
-
47
- Other topics that this word vector model will handle extremely well
48
-
49
- * Philosophy (Especially Left-Wing / Post-modernist)
50
- * Law
51
- * Government
52
- * Politics
53
-
54
-
55
- Initial release is of fasttext vectors without subword information. Future releases will include fine-tuned GPT-2 and other high end models as my GPU compute allows.
56
-
57
- # Screenshots
58
- ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec.jpg)
59
- ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec2.jpg)
60
- ![](https://github.com/Hellisotherpeople/debate2vec/blob/master/debate2vec3.jpg)
 
1
  ---
2
  tags:
3
+ - text-classification
4
  library_name: generic
5
  ---
6
 
7
+ # Fasttext nearest neighbors