File size: 2,544 Bytes

dc1d80d
 
 
 
 
 
 
 
 
 
2cede86
 
dc1d80d
 
 
e54bf0f
 
 
 
dc1d80d
 
e54bf0f
dc1d80d
 
 
 
 
 
 
 
 
e54bf0f
dc1d80d
 
 
 
 
 
 
e54bf0f
 
dc1d80d
 
 
 
 
 
 
e54bf0f
dc1d80d
 
 
 
 
a4c2c8d
dc1d80d

---
pipeline_tag: zero-shot-classification
tags:
- zero-shot-classification
- nli
language:
- es
datasets:
- hackathon-pln-es/nli-es
widget:
- text: "Para detener la pandemia, es importante que todos se presenten a vacunarse."
  candidate_labels: "salud, deporte, entretenimiento"
---


# A zero-shot classifier based on bertin-roberta-base-spanish
This model was trained on the basis of the model `bertin-roberta-base-spanish` using **Cross encoder** for NLI task. A CrossEncoder takes a sentence pair as input and outputs a label so it learns to predict the labels: "contradiction": 0, "entailment": 1, "neutral": 2.

You can use it with Hugging Face's Zero-shot pipeline to make **zero-shot classifications**. Given a sentence and an arbitrary set of labels/topics, it will output the likelihood of the sentence belonging to each of the topic.

## Usage (HuggingFace Transformers)
The simplest way to use the model is the huggingface transformers pipeline tool. Just initialize the pipeline specifying the task as "zero-shot-classification" and select "hackathon-pln-es/bertin-roberta-base-zeroshot-esnli" as model.

```python
from transformers import pipeline
classifier = pipeline("zero-shot-classification", 
                       model="hackathon-pln-es/bertin-roberta-base-zeroshot-esnli")

classifier(
    "El autor se perfila, a los 50 años de su muerte, como uno de los grandes de su siglo",
    candidate_labels=["cultura", "sociedad", "economia", "salud", "deportes"],
    hypothesis_template="Esta oración es sobre {}."
)
```

The `hypothesis_template` parameter is important and should be in Spanish. **In the widget on the right, this parameter is set to its default value: "This example is {}.", so different results are expected.**

## Training

We used [sentence-transformers](https://www.SBERT.net) to train the model.

**Dataset**

We used a collection of datasets of Natural Language Inference as training data:
 - [ESXNLI](https://raw.githubusercontent.com/artetxem/esxnli/master/esxnli.tsv), only the part in spanish
 - [SNLI](https://nlp.stanford.edu/projects/snli/), automatically translated
 - [MultiNLI](https://cims.nyu.edu/~sbowman/multinli/), automatically translated

The whole dataset used is available [here](https://huggingface.co/datasets/hackathon-pln-es/nli-es).

## Authors

- [Anibal Pérez](https://huggingface.co/Anarpego)
- [Emilio Tomás Ariza](https://huggingface.co/medardodt)
- [Lautaro Gesuelli Pinto](https://huggingface.co/Lautaro)
- [Mauricio Mazuecos](https://huggingface.co/mmazuecos)