simonschoe commited on
Commit
d442ab8
1 Parent(s): e563118

update retrained model

Browse files
Files changed (3) hide show
  1. README.md +4 -1
  2. model.bin +2 -2
  3. vocab.txt +0 -0
README.md CHANGED
@@ -22,7 +22,7 @@ widget:
22
 
23
  # EarningsCall2Vec
24
 
25
- **EarningsCall2Vec** is a [fastText](https://fasttext.cc/) word embedding model that was trained via [Gensim](https://radimrehurek.com/gensim/). It maps each token in the vocabulary to a dense, 300-dimensional vector space, designed for performing **semantic search**. More details about the training procedure can be found [below](#model-training).
26
 
27
 
28
  ## Background
@@ -77,6 +77,9 @@ model.wv.most_similar(negative='transformation', topn=5, restrict_vocab=None)
77
  model.wv.similarity('transformation', 'continuity')
78
  ```
79
 
 
 
 
80
  ## Model Training
81
 
82
  The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been preprocessed prior to model training via stop word removal, lemmatization, named entity masking, and coocurrence modeling.
 
22
 
23
  # EarningsCall2Vec
24
 
25
+ **EarningsCall2Vec** is a [`fastText`](https://fasttext.cc/) word embedding model that was trained via [`Gensim`](https://radimrehurek.com/gensim/). It maps each token in the vocabulary to a dense, 300-dimensional vector space, designed for performing **semantic search**. More details about the training procedure can be found [below](#model-training).
26
 
27
 
28
  ## Background
 
77
  model.wv.similarity('transformation', 'continuity')
78
  ```
79
 
80
+ If model size is crucial, the final model could be additionally compressed using the [`compress-fasttext`](https://github.com/avidale/compress-fasttext) library (e.g., via pruning, conversion to `float16`, or product quantization).
81
+
82
+
83
  ## Model Training
84
 
85
  The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been preprocessed prior to model training via stop word removal, lemmatization, named entity masking, and coocurrence modeling.
model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:707ed23e2283e346ada08a713b7336f520d2a45830418ed44fbb456dc6cdd795
3
- size 2556989501
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7697c585990b1c0b0a8a320a5f382c172dfd8574d15a97362bb6cf72dcd6e1b8
3
+ size 2577407131
vocab.txt CHANGED
The diff for this file is too large to render. See raw diff