Update README.md
Browse files
README.md
CHANGED
@@ -1,14 +1,11 @@
|
|
1 |
-
# CryptoBERT
|
2 |
-
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It is built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
3 |
-
|
4 |
-
|
5 |
-
# Example of Classification
|
6 |
---
|
7 |
datasets:
|
8 |
- ElKulako/StockTwits-crypto
|
9 |
|
10 |
---
|
11 |
|
|
|
|
|
12 |
## Classification Training
|
13 |
The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
|
14 |
|
@@ -16,6 +13,9 @@ CryptoBERT's sentiment classification head was fine-tuned on a balanced dataset
|
|
16 |
|
17 |
CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
|
18 |
|
|
|
|
|
|
|
19 |
## Training Corpus
|
20 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
21 |
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
datasets:
|
3 |
- ElKulako/StockTwits-crypto
|
4 |
|
5 |
---
|
6 |
|
7 |
+
# CryptoBERT
|
8 |
+
CryptoBERT is a pre-trained NLP model to analyse the language and sentiments of cryptocurrency-related social media posts and messages. It was built by further training the [cardiffnlp's Twitter-roBERTa-base](https://huggingface.co/cardiffnlp/twitter-roberta-base) language model on the cryptocurrency domain, using a corpus of over 3.2M unique cryptocurrency-related social media posts.
|
9 |
## Classification Training
|
10 |
The model was trained on the following labels: "Bearish" : 0, "Neutral": 1, "Bullish": 2
|
11 |
|
|
|
13 |
|
14 |
CryptoBERT was trained with a max sequence length of 128. Technically, it can handle sequences of up to 514 tokens, however, going beyond 128 is not recommended.
|
15 |
|
16 |
+
# Classification Example
|
17 |
+
|
18 |
+
|
19 |
## Training Corpus
|
20 |
CryptoBERT was trained on 3.2M social media posts regarding various cryptocurrencies. Only non-duplicate posts of length above 4 words were considered. The following communities were used as sources for our corpora:
|
21 |
|