--- library_name: peft base_model: meta-llama/Llama-2-13b-chat-hf license: apache-2.0 datasets: - irlab-udc/metahate language: - en pipeline_tag: text-generation tags: - hate speech --- # LLaMA2 Fine-Tuned on not Engaging with Hate Speech ## Model Description This model is a fine-tuned version of `meta-llama/Llama-2-13b-chat-hf` on a hate speech dataset using the PEFT approach, to prevent the model from exacerbating hate discourse. ## Intended Uses & Limitations This model is intended for research purposes in conversational applications to stop hate speech generation. ## Bias, Risks, and Limitations - **Biases**: The model may carry biases present in the training data. - **False Positives/Negatives**: It's not perfect and may continue some hate speech conversations. - **Domain Specificity**: Performance may vary across different domains. ## How to Get Started with the Model Use the code below to get started with the model. ```python from peft import PeftModel, PeftConfig from transformers import AutoModelForCausalLM, AutoTokenizer, Conversation, pipeline # Load the model config = PeftConfig.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate") base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-chat-hf", config=config) model = PeftModel.from_pretrained(base_model, "irlab-udc/LLaMA2-13b-Stop-Hate") tokenizer = AutoTokenizer.from_pretrained("irlab-udc/LLaMA2-13b-Stop-Hate") # Test the model chatbot = pipeline(task="conversational", model=model, tokenizer=tokenizer) conversation = Conversation("Your input text here") conversation = chatbot(conversation) result = conversation.messages[-1]["content"] ``` ## Training Details - **Base Model:** meta-llama/Llama-2-13b-chat-hf - **Fine-Tuning:** Using PEFT approach - **Hardware:** NVIDIA RTX A6000 #### Configurations and Hyperparameters The following LoraConfig config was used during training: - r: 32 - lora_alpha: 64 - target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", "lm_head"] - lora_dropout: 0.05 - bias: "lora_only" - task_type: "CAUSAL_LM" The following TrainingArguments config was used during training: - per_device_train_batch_size: 16 - gradient_accumulation_steps: 1 - warmup_steps: 5 - max_steps: 1000 - learning_rate: 2.5e-5 - fp16=True - optim= paged_adamw_8bit The following `bitsandbytes` quantization config was used during training: - quant_method: bitsandbytes - _load_in_8bit: False - _load_in_4bit: True - llm_int8_threshold: 6.0 - llm_int8_skip_modules: None - llm_int8_enable_fp32_cpu_offload: False - llm_int8_has_fp16_weight: False - bnb_4bit_quant_type: nf4 - bnb_4bit_use_double_quant: True - bnb_4bit_compute_dtype: bfloat16 - bnb_4bit_quant_storage: uint8 - load_in_4bit: True - load_in_8bit: False ### Framework versions - PEFT 0.6.2 - PyTorch 2.1.0 - 馃 Transformers 4.35.0 - 馃 Datasets 2.14.6 ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** NVIDIA RTX A6000 - **Hours used:** 9 - **Cloud Provider:** Private Infrastructure - **Carbon Efficiency (kg/kWh):** 0,432 - **Carbon Emitted (kg eq. CO2):** 1,17 ## Citation If you use this model, please cite the following reference: ```bibtex @article{Piot_Mart铆n-Rodilla_Parapar_2024, title={MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection}, volume={18}, url={https://ojs.aaai.org/index.php/ICWSM/article/view/31445}, DOI={10.1609/icwsm.v18i1.31445}, abstractNote={Hate speech represents a pervasive and detrimental form of online discourse, often manifested through an array of slurs, from hateful tweets to defamatory posts. As such speech proliferates, it connects people globally and poses significant social, psychological, and occasionally physical threats to targeted individuals and communities. Current computational linguistic approaches for tackling this phenomenon rely on labelled social media datasets for training. For unifying efforts, our study advances in the critical need for a comprehensive meta-collection, advocating for an extensive dataset to help counteract this problem effectively. We scrutinized over 60 datasets, selectively integrating those pertinent into MetaHate. This paper offers a detailed examination of existing collections, highlighting their strengths and limitations. Our findings contribute to a deeper understanding of the existing datasets, paving the way for training more robust and adaptable models. These enhanced models are essential for effectively combating the dynamic and complex nature of hate speech in the digital realm.}, number={1}, journal={Proceedings of the International AAAI Conference on Web and Social Media}, author={Piot, Paloma and Mart铆n-Rodilla, Patricia and Parapar, Javier}, year={2024}, month={May}, pages={2025-2039} } ``` ## Acknowledgements The authors thank the funding from the Horizon Europe research and innovation programme under the Marie Sk艂odowska-Curie Grant Agreement No. 101073351. The authors also thank the financial support supplied by the Conseller铆a de Cultura, Educaci贸n, Formaci贸n Profesional e Universidades (accreditation 2019-2022 ED431G/01, ED431B 2022/33) and the European Regional Development Fund, which acknowledges the CITIC Research Center in ICT of the University of A Coru帽a as a Research Center of the Galician University System and the project PID2022-137061OB-C21 (Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Proyectos de Generaci贸n de Conocimiento; supported by the European Regional Development Fund). The authors also thank the funding of project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovaci贸n, Agencia Estatal de Investigaci贸n, Plan de Recuperaci贸n, Transformaci贸n y Resiliencia, Uni贸n Europea-Next Generation EU).