minhtriphan commited on
Commit
0ec865c
1 Parent(s): 9da8917

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -2
README.md CHANGED
@@ -5,11 +5,26 @@ tags:
5
  - finance
6
  ---
7
  # Disclaimer
8
- The current model is trained from randomly initialized weights due to some computational and data obstacles. Therefore, the context captured by the models as well as the word semantics are not really good. The tokenizer in this version is also trained from scratch.
 
 
9
 
10
  We're training the model again with more care and some tricks to enhance the semantics of words. To this end, we initialize the embedding layers (i.e., `word_embeddings`, `position_embeddings`, `token_type_embeddings`, and `LayerNorm`) with the pre-trained embeddings from FinBERT (https://huggingface.co/yiyanghkust/finbert-tone). Accordingly, we use the same tokenizer as that of this model.
11
 
12
- Furthermore, the model is trained longer (10 epochs). The new pre-trained model weights will be updated as soon as the training and validation are completed.
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
  # Introduction
15
  This is the implementation of the BERT model using the LongNet structure (paper: https://arxiv.org/pdf/2307.02486.pdf).
 
5
  - finance
6
  ---
7
  # Disclaimer
8
+ ~The current model is trained from randomly initialized weights due to some computational and data obstacles. Therefore, the context captured by the models as well as the word semantics are not really good. The tokenizer in this version is also trained from scratch.~
9
+
10
+ The new model weights are updated. The details of the training is described below:
11
 
12
  We're training the model again with more care and some tricks to enhance the semantics of words. To this end, we initialize the embedding layers (i.e., `word_embeddings`, `position_embeddings`, `token_type_embeddings`, and `LayerNorm`) with the pre-trained embeddings from FinBERT (https://huggingface.co/yiyanghkust/finbert-tone). Accordingly, we use the same tokenizer as that of this model.
13
 
14
+ Furthermore, the model is trained longer (~10 epochs~ 8 epochs). ~The new pre-trained model weights will be updated as soon as the training and validation are completed.~
15
+
16
+ # Time and space efficiency
17
+ We compare the time and space efficiency of this model and some competitors. For these competitors, we clone the positional embedding layers so that they can accept input sequences with maximum length of 65536 tokens.
18
+
19
+ The experiments are implemented with an NVIDIA A100-SXM4-40GB. Batch size of 1. The figures show the time and memory needed to run one batch. In the training mode, forward pass and backpropagation is included. In the inferring model, only forward pass is included.
20
+
21
+ ## Training mode
22
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/clg3lSItrQuXL5YYh7dmm.png)
23
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/zCwoR6oimLFEO0llErb0g.png)
24
+
25
+ # Inferring mode
26
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/GKkLON8R1bqa7XRvOoFOp.png)
27
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/61d2d2993c2083e1c08af221/bmEHrGIaAGGwe75Msx3PL.png)
28
 
29
  # Introduction
30
  This is the implementation of the BERT model using the LongNet structure (paper: https://arxiv.org/pdf/2307.02486.pdf).