lucio
/

graphseg

Goran Glavaš commited on Aug 15, 2016

Commit

41fc5cc

•

1 Parent(s): 80dbedc

Deleted embeddings.txt due to repo size limitation. Updated the README file.

Files changed (2) hide show

README.txt CHANGED Viewed

@@ -31,13 +31,13 @@ Example command:
 java -jar graphseg.jar /home/seg-input /home/seg-output 0.25 3
-The tool's correct execution depends on the resources in the /source/res directory. These three files are as follows:
-(1) embeddings.txt -- the word embeddings used for measuring semantic similarity between sentences. The default file used are 200-dimensional GloVe embeddings obtained on Wikipedia 2014 + Giga 5 corpus (http://nlp.stanford.edu/data/glove.6B.zip).
-(2) stopwords.txt -- the list of English stopwords (excluded from sentences when measuring semantic similarity)
-(3) freqs.txt -- frequencies of English words on a large corpus, needed for the IC-weighting of word contribution
-You may choose to replace these default files (e.g., by using different embeddings or different stopword list), but make sure you name the new files exactly the same (i.e., embeddings.txt, stopwords.txt, and freqs.txt, respectively).
 Credit
 ========

 java -jar graphseg.jar /home/seg-input /home/seg-output 0.25 3
+The tool's correct execution depends on the resources in the /source/res directory. There are three files that need to be there:
+(1) embeddings.txt -- the word embeddings used for measuring semantic similarity between sentences. The default file used are 200-dimensional GloVe embeddings obtained on Wikipedia 2014 + Giga 5 corpus (http://nlp.stanford.edu/data/glove.6B.zip). This file is bundled into the standalone binary file graphseg.jar, but is omitted from the source/res folder due to space constraints of the repository;
+(2) stopwords.txt -- the list of English stopwords (excluded from sentences when measuring semantic similarity);
+(3) freqs.txt -- frequencies of English words on a large corpus, needed for the IC-weighting of word contribution.
+The last two files (stopwords.txt and freqs.txt) are provided in the res folder, whereas the embeddings.txt are bundled into the binary (/binary/graphseg.jar) but omitted from the /source/res folder due to repository size constraints. You may choose to replace these default files (e.g., by using different embeddings or different stopword list), but make sure you name the new files exactly the same (i.e., embeddings.txt, stopwords.txt, and freqs.txt, respectively).
 Credit
 ========

source/res/embeddings.txt DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:18870b0a7516e4a72b44d3c226c242d2d846008967d8ce40b94c723a94d1a32b
-size 693432828