Edit model card

Spanish News Classification Headlines

SNCH: this model was develop by M47Labs the goal is text classification, the base model use was BETO, it was fine-tuned on 1000 example dataset.

Dataset Sample

Dataset size : 1000

Columns: idTask,task content 1,idTag,tag.

idTask task content 1 idTag tag
3637d9ac-119c-4a8f-899c-339cf5b42ae0 Alcalá de Guadaíra celebra la IV Semana de la Diversidad Sexual con acciones de sensibilización 81b36360-6cbf-4ffa-b558-9ef95c136714 sociedad
d56bab52-0029-45dd-ad90-5c17d4ed4c88 El Archipiélago Chinijo Graciplus se impone en el Trofeo Centro Comercial Rubicón ed198b6d-a5b9-4557-91ff-c0be51707dec deportes
dec70bc5-4932-4fa2-aeac-31a52377be02 Un total de 39 personas padecen ELA actualmente en la provincia 81b36360-6cbf-4ffa-b558-9ef95c136714 sociedad
fb396ba9-fbf1-4495-84d9-5314eb731405 Eurocopa 2021 : Italia vence a Gales y pasa a octavos con su candidatura reforzada ed198b6d-a5b9-4557-91ff-c0be51707dec deportes
bc5a36ca-4e0a-422e-9167-766b41008c01 Resolución de 10 de junio de 2021, del Ayuntamiento de Tarazona de La Mancha (Albacete), referente a la convocatoria para proveer una plaza. 81b36360-6cbf-4ffa-b558-9ef95c136714 sociedad
a87f8703-ce34-47a5-9c1b-e992c7fe60f6 El primer ministro sueco pierde una moción de censura 209ae89e-55b4-41fd-aac0-5400feab479e politica
d80bdaad-0ad5-43a0-850e-c473fd612526 El dólar se dispara tras la reunión de la Fed 11925830-148e-4890-a2bc-da9dc059dc17 economia

Labels:

  • ciencia_tecnologia

  • clickbait

  • cultura

  • deportes

  • economia

  • educacion

  • medio_ambiente

  • opinion

  • politica

  • sociedad

Example of Use

Pipeline


import torch
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline


review_text = 'los vehiculos que esten esperando pasajaeros deberan estar apagados para reducir emisiones'
path = "M47Labs/spanish_news_classification_headlines"
tokenizer = AutoTokenizer.from_pretrained(path)
model = BertForSequenceClassification.from_pretrained(path)


nlp = TextClassificationPipeline(task = "text-classification",
                model = model,
                tokenizer = tokenizer)

print(nlp(review_text))

[{'label': 'medio_ambiente', 'score': 0.5648820996284485}]

Pytorch


import torch
from transformers import AutoTokenizer, BertForSequenceClassification,TextClassificationPipeline
from numpy import np

model_name  = 'M47Labs/spanish_news_classification_headlines'
MAX_LEN = 32


tokenizer = AutoTokenizer.from_pretrained(model_name)

model = AutoModelForSequenceClassification.from_pretrained(model_name)

texto = "las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno"


encoded_review = tokenizer.encode_plus(
  texto,
  max_length=MAX_LEN,
  add_special_tokens=True,
  #return_token_type_ids=False,
  pad_to_max_length=True,
  return_attention_mask=True,
  return_tensors='pt',
)

input_ids = encoded_review['input_ids']
attention_mask = encoded_review['attention_mask']
output = model(input_ids, attention_mask)

_, prediction = torch.max(output['logits'], dim=1)
print(f'Review text: {texto}')

print(f'Sentiment  : {model.config.id2label[prediction.detach().cpu().numpy()[0]]}')

Review text: las emisiones estan bajando, debido a las medidas ambientales tomadas por el gobierno

Sentiment : medio_ambiente

A more in depth example on how to use the model can be found in this colab notebook: https://colab.research.google.com/drive/1XsKea6oMyEckye2FePW_XN7Rf8v41Cw_?usp=sharing

Finetune Hyperparameters

  • MAX_LEN = 32
  • TRAIN_BATCH_SIZE = 8
  • VALID_BATCH_SIZE = 4
  • EPOCHS = 5
  • LEARNING_RATE = 1e-05

Train Results

n_example epoch loss acc
100 0 2.286327266693115 12.5
100 1 2.018876111507416 40.0
100 2 1.8016730904579163 43.75
100 3 1.6121837735176086 46.25
100 4 1.41565443277359 68.75
n_example epoch loss acc
500 0 2.0770938420295715 24.5
500 1 1.6953029704093934 50.25
500 2 1.258900796175003 64.25
500 3 0.8342628020048142 78.25
500 4 0.5135736921429634 90.25
n_example epoch loss acc
1000 0 1.916002897115854 36.1997226074896
1000 1 1.2941598492664295 62.2746185852982
1000 2 0.8201534710415117 76.97642163661581
1000 3 0.524806430051615 86.9625520110957
1000 4 0.30662027455784463 92.64909847434119

Validation Results

n_examples 100
Accuracy Score 0.35
Precision (Macro) 0.35
Recall (Macro) 0.16
n_examples 500
Accuracy Score 0.62
Precision (Macro) 0.60
Recall (Macro) 0.47
n_examples 1000
Accuracy Score 0.68
Precision(Macro) 0.68
Recall (Macro) 0.64

alt text

Downloads last month
1,491