Edit model card

Model Details

Model Description

This model is a fine-tuned version of the pre-trained Romanian BERT model (bert-base-romanian-cased-v1), specialized for sentiment classification in weather forecasts and horoscope texts. The model is designed to classify texts into two categories: positive and negative.

  • Architecture: BERT (Bidirectional Encoder Representations from Transformers)
  • Type: Text Classification
  • Language: Romanian
  • Base Model: dumitrescustefan/bert-base-romanian-cased-v1

Uses

This model is intended for:

  1. Automatic sentiment classification in Romanian weather forecast and horoscope texts.
  2. Evaluating the effectiveness of Automatic Speech Recognition (ASR) systems in preserving the overall sentiment and meaning of the original text.
  3. Applications requiring rapid sentiment analysis in specific domains (meteorology and astrology) without the need for perfect text transcription.

The model is not suitable for:

  1. Sentiment classification in domains other than weather forecasts and horoscopes.
  2. Detailed analysis of emotional nuances or identification of specific emotions.
  3. Use in contexts requiring extremely high transcription accuracy.

Training Details

Training Data

The model was trained using two datasets:

  • iulik-pisik/audio_vreme: Transcriptions of weather forecasts
  • iulik-pisik/horoscop_neti: Transcriptions of horoscopes

The training data was automatically labeled using the OpenAI GPT-3.5 Turbo API. Neutral texts were excluded from the training set to focus on clear positive/negative distinctions.

Evaluation

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on:

  1. A subset of manual annotations from the training datasets.
  2. Transcriptions generated by various custom Whisper models for Romanian ASR.

Metrics

The primary metric used for evaluation is accuracy.

Results

  • Overall accuracy on annotations: 0.9137
  • Accuracy for weather texts: 0.9189
  • Accuracy for horoscope texts: 0.8964

The model also demonstrated comparable performance on ASR transcriptions from the best-performing custom Whisper model, albeit slightly lower than on manual annotations.

Downloads last month
7
Safetensors
Model size
124M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Finetuned from

Datasets used to train iulik-pisik/romanian-bert-weather-horoscope