File size: 5,345 Bytes
eb82b04
 
 
 
 
35f3c2c
eb82b04
 
 
 
 
 
efe1c7a
eb82b04
 
 
 
 
 
 
 
 
 
 
 
 
 
35f3c2c
 
 
2c4ad40
36de3ca
fafd331
eb82b04
 
 
 
 
9a8a9b7
eb82b04
82ff528
eb82b04
 
 
 
 
 
 
82ff528
 
eb82b04
82ff528
eb82b04
82ff528
 
 
eb82b04
 
82ff528
 
eb82b04
82ff528
 
 
 
 
eb82b04
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35f3c2c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
---
license: apache-2.0
base_model: bert-base-uncased
tags:
- generated_from_trainer
- sentiment_analysis
datasets:
- ckandemir/bitcoin_tweets_sentiment_kaggle
metrics:
- accuracy
- f1
model-index:
- name: crypto_sentiment
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: ckandemir/bitcoin_tweets_sentiment_kaggle
      type: ckandemir/bitcoin_tweets_sentiment_kaggle
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.7150837988826816
    - name: F1
      type: f1
      value: 0.7212944928862212
language:
- en
library_name: transformers
widget:
- text: "Sold all btc, tethered up before the correction."
pipeline_tag: text-classification
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# crypto_sentiment

This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the [ckandemir/bitcoin_tweets_sentiment_kaggle](https://huggingface.co/datasets/ckandemir/bitcoin_tweets_sentiment_kaggle) dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4542
- Accuracy: 0.7151
- F1: 0.7213

## Model description

The  [ckandemir/bitcoin_tweets_sentiment_kaggle](https://huggingface.co/datasets/ckandemir/bitcoin_tweets_sentiment_kaggle)  is a sentiment analysis classifier fine-tuned on  Bitcoin-related tweets. By leveraging [bert-base-uncased](https://huggingface.co/bert-base-uncased)  model, it has been trained to classify tweets into various sentiment categories based on the content related to Bitcoin. This model is capable of understanding the nuances in the text of tweets and provides a sentiment score which can be leveraged for various analyses including market sentiment analysis, social media monitoring, and other applications where understanding public opinion regarding Bitcoin is crucial.
## Intended uses 

This model is intended to be used for sentiment analysis on Bitcoin-related text data, particularly tweets. It can be utilized by researchers, analysts, and developers who are interested in gauging public sentiment regarding Bitcoin on social media.

## Limitations
- The model may not perform well on text data that is significantly different in context or structure from the training data (Bitcoin-related tweets).
- The model might not capture sentiment accurately for tweets with nuanced or sarcastic tones.
## Training and evaluation data

The model was trained and evaluated on the  [ckandemir/bitcoin_tweets_sentiment_kaggle](https://huggingface.co/datasets/ckandemir/bitcoin_tweets_sentiment_kaggle) dataset. 
This dataset comprises tweets related to Bitcoin, labeled with sentiment scores.

### Data Preparation
- The initial dataset contained tweets in multiple languages. As part of the data preparation, only English tweets were extracted to ensure language consistency for model training. The following steps were performed for data preparation:
- Language Detection: Identified and extracted only the tweets that were in English.
- Data Cleaning: Removal of special characters.
  
## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 24
- eval_batch_size: 24
- seed: 42
- gradient_accumulation_steps: 3
- total_train_batch_size: 72
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine_with_restarts
- lr_scheduler_warmup_steps: 1000
- training_steps: 1000
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | F1     |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:------:|
| 0.8941        | 0.65  | 50   | 0.8733          | 0.5698   | 0.5654 |
| 0.8565        | 1.3   | 100  | 0.8042          | 0.6690   | 0.6031 |
| 0.7896        | 1.96  | 150  | 0.7219          | 0.6802   | 0.5740 |
| 0.7174        | 2.61  | 200  | 0.6379          | 0.7514   | 0.6955 |
| 0.633         | 3.26  | 250  | 0.5745          | 0.7514   | 0.6930 |
| 0.5824        | 3.91  | 300  | 0.5303          | 0.75     | 0.6919 |
| 0.5365        | 4.57  | 350  | 0.4997          | 0.7514   | 0.7014 |
| 0.5089        | 5.22  | 400  | 0.4766          | 0.7458   | 0.6991 |
| 0.4893        | 5.87  | 450  | 0.4596          | 0.7486   | 0.7174 |
| 0.463         | 6.52  | 500  | 0.4446          | 0.7514   | 0.7127 |
| 0.4496        | 7.17  | 550  | 0.4407          | 0.7165   | 0.7048 |
| 0.4357        | 7.83  | 600  | 0.4364          | 0.7277   | 0.7246 |
| 0.4257        | 8.48  | 650  | 0.4324          | 0.7067   | 0.7115 |
| 0.4029        | 9.13  | 700  | 0.4314          | 0.7277   | 0.7180 |
| 0.3955        | 9.78  | 750  | 0.4354          | 0.7151   | 0.7164 |
| 0.3886        | 10.43 | 800  | 0.4396          | 0.7221   | 0.7244 |
| 0.3788        | 11.09 | 850  | 0.4363          | 0.7235   | 0.7194 |
| 0.366         | 11.74 | 900  | 0.4528          | 0.7179   | 0.7215 |
| 0.3298        | 12.39 | 950  | 0.4766          | 0.7053   | 0.7107 |
| 0.3423        | 13.04 | 1000 | 0.4542          | 0.7151   | 0.7213 |


### Framework versions

- Transformers 4.35.0
- Pytorch 2.1.0+cu118
- Datasets 2.14.6
- Tokenizers 0.14.1