simonschoe
commited on
Commit
•
d442ab8
1
Parent(s):
e563118
update retrained model
Browse files
README.md
CHANGED
@@ -22,7 +22,7 @@ widget:
|
|
22 |
|
23 |
# EarningsCall2Vec
|
24 |
|
25 |
-
**EarningsCall2Vec** is a [fastText](https://fasttext.cc/) word embedding model that was trained via [Gensim](https://radimrehurek.com/gensim/). It maps each token in the vocabulary to a dense, 300-dimensional vector space, designed for performing **semantic search**. More details about the training procedure can be found [below](#model-training).
|
26 |
|
27 |
|
28 |
## Background
|
@@ -77,6 +77,9 @@ model.wv.most_similar(negative='transformation', topn=5, restrict_vocab=None)
|
|
77 |
model.wv.similarity('transformation', 'continuity')
|
78 |
```
|
79 |
|
|
|
|
|
|
|
80 |
## Model Training
|
81 |
|
82 |
The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been preprocessed prior to model training via stop word removal, lemmatization, named entity masking, and coocurrence modeling.
|
|
|
22 |
|
23 |
# EarningsCall2Vec
|
24 |
|
25 |
+
**EarningsCall2Vec** is a [`fastText`](https://fasttext.cc/) word embedding model that was trained via [`Gensim`](https://radimrehurek.com/gensim/). It maps each token in the vocabulary to a dense, 300-dimensional vector space, designed for performing **semantic search**. More details about the training procedure can be found [below](#model-training).
|
26 |
|
27 |
|
28 |
## Background
|
|
|
77 |
model.wv.similarity('transformation', 'continuity')
|
78 |
```
|
79 |
|
80 |
+
If model size is crucial, the final model could be additionally compressed using the [`compress-fasttext`](https://github.com/avidale/compress-fasttext) library (e.g., via pruning, conversion to `float16`, or product quantization).
|
81 |
+
|
82 |
+
|
83 |
## Model Training
|
84 |
|
85 |
The model has been trained on text data stemming from earnings call transcripts. The data is restricted to a call's question-and-answer (Q&A) section and the remarks by firm executives. The data has been preprocessed prior to model training via stop word removal, lemmatization, named entity masking, and coocurrence modeling.
|
model.bin
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:7697c585990b1c0b0a8a320a5f382c172dfd8574d15a97362bb6cf72dcd6e1b8
|
3 |
+
size 2577407131
|
vocab.txt
CHANGED
The diff for this file is too large to render.
See raw diff
|
|