--- license: apache-2.0 language: - de base_model: - dbmdz/bert-base-german-uncased pipeline_tag: text-classification --- ## Social Media Style Classifier for Climate Change Text (German) This model is a fine-tuned bert-base-uncased on a binary classification task to determine whether a German text about Climate Change is written in a social media style. Social media texts were gathered from [GerCCT](https://github.com/RobinSchaefer/GerCCT) and [r/Klimawandel](https://www.reddit.com/r/Klimawandel/). Non-social media texts were gathered by tokenizing sentences from 15 Wikipedia articles: 1. [Klimawandel](https://de.wikipedia.org/wiki/Klimawandel), 2. [Globale Erwärmung](https://de.wikipedia.org/wiki/Globale_Erw%C3%A4rmung), 3. [Forschungsgeschichte des Klimawandels](https://de.wikipedia.org/wiki/Forschungsgeschichte_des_Klimawandels), 4. [Klimahysterie](https://de.wikipedia.org/wiki/Klimahysterie), 5. [Klimawandelleugnung](https://de.wikipedia.org/wiki/Klimawandelleugnung), 6. [Folgen der globalen Erwärmung in der Arktis](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung_in_der_Arktis) 7. [Folgen der globalen Erwärmung](https://de.wikipedia.org/wiki/Folgen_der_globalen_Erw%C3%A4rmung) 8. [Klimamodell](https://de.wikipedia.org/wiki/Klimamodell) 9. [Anpassung an die globale Erwärmung](https://de.wikipedia.org/wiki/Anpassung_an_die_globale_Erw%C3%A4rmung) 10. [Kontroverse um die globale Erwärmung](https://de.wikipedia.org/wiki/Kontroverse_um_die_globale_Erw%C3%A4rmung) 11. [UN-Klimakonferenz in Dubai 2023](https://de.wikipedia.org/wiki/UN-Klimakonferenz_in_Dubai_2023) 12. [Umweltbewegung](https://de.wikipedia.org/wiki/Umweltbewegung#Klimaschutz) 13. [Treibhausgas](https://de.wikipedia.org/wiki/Treibhausgas) 14. [Treibhauseffekt](https://de.wikipedia.org/wiki/Treibhauseffekt) 15. [Klimaschutz](https://de.wikipedia.org/wiki/Klimaschutz) The dataset contained about 8K instances, with a 50/50 distribution between the two classes. It was shuffled with a random seed of 42 and split into 80/20 for training/testing. The V100-16GB GPU was used for training three epochs with a batch size of 8. Other hyperparameters were default values from the HuggingFace Trainer. The model was trained in order to evaluate a text style transfer task, converting formal-language texts to tweets. ### How to use ```python from transformers import AutoModelForSequenceClassification, AutoTokenizer, TextClassificationPipeline model_name = "rabuahmad/cc-tweets-classifier-de" model = AutoModelForSequenceClassification.from_pretrained(model_name) tokenizer = AutoTokenizer.from_pretrained(model_name, max_len=512) classifier = TextClassificationPipeline(model=model, tokenizer=tokenizer, truncation=True, max_length=512) text = "Gestern war ein schöner Tag!" result = classifier(text) ``` Label 1 indicates that the text is predicted to be a tweet. ### Evaluation Evaluation results on the test set: | Metric |Score | |----------|-----------| | Accuracy | 0.96494 | | Precision| 0.97552 | | Recall | 0.95564 | | F1 | 0.96547 |