File size: 4,722 Bytes
0bb179a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b3d1bee
 
 
 
0bb179a
b3d1bee
0bb179a
b3d1bee
 
 
 
 
0bb179a
b3d1bee
 
0bb179a
b3d1bee
 
0bb179a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
license: apache-2.0
tags:
- generated_from_trainer
- financial
- stocks
- sentiment
- sentiment-analysis
- financial-news
widget:
- text: The company's quarterly earnings surpassed all estimates, indicating strong growth.
datasets:
- financial_phrasebank
metrics:
- accuracy
model-index:
- name: AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: financial_phrasebank
      type: financial_phrasebank
      args: sentences_allagree
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.96688
language:
- en
base_model:
- distilbert/distilbert-base-uncased-finetuned-sst-2-english
pipeline_tag: text-classification
library_name: transformers
---
# DistilBERT Fine-Tuned for Financial Sentiment Analysis
## Model Description

This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) specifically tailored for sentiment analysis in the financial domain. It has been trained on the [Financial PhraseBank](https://huggingface.co/datasets/financial_phrasebank) dataset to classify financial texts into three sentiment categories:

- Negative (label `0`)
- Neutral (label `1`)
- Positive (label `2`)
  
## Model Performance
The model was trained for 5 epochs and evaluated on a held-out test set constituting 20 of the dataset.

### Evaluation Metrics
| Epoch | Eval Loss | Eval Accuracy |
|-----------|---------------|-------------------|
| 1         | 0.2210        | 94.26%            |
| 2         | 0.1997        | 95.81%            |
| 3         | 0.1719        | 96.69%            |
| 4         | 0.2073        | 96.03%            |
| 5         | 0.1941        | **96.69%**        |

### Training Metrics
- **Final Training Loss**: 0.0797
- **Total Training Time**: Approximately 3869 seconds (~1.07 hours)
- **Training Samples per Second**: 2.34
- **Training Steps per Second**: 0.147

## Training Procedure
### Data
- **Dataset**: [Financial PhraseBank](https://huggingface.co/datasets/financial_phrasebank)
- **Configuration**: `sentences_allagree` (sentences where all annotators agreed on the sentiment)
- **Dataset Size**: 2264 sentences
- **Data Split**: 80% training (1811 samples), 20% testing (453 samples)

### Model Configuration
- **Base Model**: [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased)
- **Number of Labels**: 3 (negative, neutral, positive)
- **Tokenizer**: Same as the base model's tokenizer

### Hyperparameters
- **Number of Epochs**: 5
- **Batch Size**: 16 (training), 64 (evaluation)
- **Learning Rate**: 5e-5
- **Optimizer**: AdamW
- **Evaluation Metric**: Accuracy
- **Seed**: 42 (for reproducibility)

## Usage
You can load and use the model with the Hugging Face `transformers` library as follows:
```python
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis")
model = AutoModelForSequenceClassification.from_pretrained("AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis")

text = "The company's revenue declined significantly due to market competition."
inputs = tokenizer(text, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

logits = outputs.logits
predicted_class_id = logits.argmax().item()

label_mapping = {0: "Negative", 1: "Neutral", 2: "Positive"}
predicted_label = label_mapping[predicted_class_id]

print(f"Text: {text}")
print(f"Predicted Sentiment: {predicted_label}")
```

## License
This model is licensed under the **Apache 2.0 License**. You are free to use, modify, and distribute this model in your applications.

## Citation
If you use this model in your research or applications, please cite it as:
```
@misc{AnkitAI_2024_financial_sentiment_model,
  title={DistilBERT Fine-Tuned for Financial Sentiment Analysis},
  author={Ankit Aglawe},
  year={2024},
  howpublished={\url{https://huggingface.co/AnkitAI/distilbert-base-uncased-financial-news-sentiment-analysis}},
}
```
## Acknowledgments
- **Hugging Face**: For providing the Transformers library and model hosting.
- **Data Providers**: Thanks to the creators of the Financial PhraseBank dataset.
- **Community**: Appreciation to the open-source community for continual support and contributions.

## Contact Information
For questions, feedback, or collaboration opportunities, please contact:
- **Name**: Ankit Aglawe
- **Email**: [aglawe.ankit@gmail.com]
- **GitHub**: [GitHub Profile](https://github.com/ankit-aglawe)
- **LinkedIn**: [LinkedIn Profile](https://www.linkedin.com/in/ankit-aglawe)