ElKulako commited on
Commit
376dd88
1 Parent(s): 8578a56

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -1,14 +1,11 @@
1
- # CryptoBERT
2
- CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It is built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
3
-
4
-
5
- # Example of Classification
6
  ---
7
  datasets:
8
  - ElKulako/StockTwits-crypto
9
 
10
  ---
11
 
 
 
12
  ## Classification Training
13
  The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
14
 
@@ -16,6 +13,9 @@ CryptoBERT's sentiment classification head was fine-tuned on a balanced dataset
16
 
17
  CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
18
 
 
 
 
19
  ## Training Corpus
20
  CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
21
 
 
 
 
 
 
 
1
  ---
2
  datasets:
3
  - ElKulako/StockTwits-crypto
4
 
5
  ---
6
 
7
+ # CryptoBERT
8
+ CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
9
  ## Classification Training
10
  The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
11
 
 
13
 
14
  CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
15
 
16
+ # Classification Example
17
+
18
+
19
  ## Training Corpus
20
  CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
21