Edit model card

Model Card for Model ID

This model is obtained by fine-tuning the BLOOM model over two Italian classification task prompts without language adaptation. To deal with this step, we decided to use data from two well-known EVALITA tasks: AMI2020 (misogyny detection) and HASPEEDE-v2-2020 (hate-speech detection).

Model Details

Model Description

The BLOOM model is directly fine-tuned over two Italian classification task prompts using two well-known EVALITA tasks: AMI2020 (misogyny detection) and HASPEEDE-v2-2020 (hate-speech detection).

We transformed the training data of the two tasks into an LLM prompt following a template. For the AMI task, we used the following template:

instruction: Nel testo seguente si esprime odio contro le donne? Rispondi sì o no., input: <text>, output: <sì/no>.

Similarly, for HASPEEDE we used:

instruction: “Il testo seguente incita all’odio? Rispondi sì o no., input: <text>, output: <sì/no>.

To fill these templates, we mapped the label "1" with the word "sì" and the label "0" with the word "no", <text> is just the sentence from the dataset to classify.

To fine-tune the model, we use the script available here: https://github.com/hyintell/BLOOM-fine-tuning/tree/main

  • Developed by: Pierpaolo Basile, Pierluigi Cassotti, Marco Polignano, Lucia Siciliani, Giovanni Semeraro. Department of Computer Science, University of Bari Aldo Moro, Italy
  • Model type: BLOOM
  • Language(s) (NLP): Italian
  • License: BigScience BLOOM RAIL 1.0

Citation

Pierpaolo Basile, Pierluigi Cassotti, Marco Polignano, Lucia Siciliani, Giovanni Semeraro. On the impact of Language Adaptation for Large Language Models: A case study for the Italian language using only open resources. Proceedings of the Ninth Italian Conference on Computational Linguistics (CLiC-it 2023).

Downloads last month
7
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.