Spaces:
Runtime error
title: Submission Oriaz
emoji: 🔥
colorFrom: yellow
colorTo: green
sdk: docker
pinned: false
Benchmarkusing different techniques
ML model for Climate Disinformation Classification
Model Description
Intended Use
- Primary intended uses: Baseline comparison for climate disinformation classification models
- Primary intended users: Researchers and developers participating in the Frugal AI Challenge
- Out-of-scope use cases: Not intended for production use or real-world classification tasks
Training Data
The model uses the QuotaClimat/frugalaichallenge-text-train dataset:
- Size: ~6000 examples
- Split: 80% train, 20% test
- 8 categories of climate disinformation claims
Labels
- No relevant claim detected
- Global warming is not happening
- Not caused by humans
- Not bad or beneficial
- Solutions harmful/unnecessary
- Science is unreliable
- Proponents are biased
- Fossil fuels are needed
Performance
Metrics (I used NVIDIA T4 small GPU)
- Accuracy: ~69-72%
- Environmental Impact:
- Emissions tracked in gCO2eq (~0,7g)
- Energy consumption tracked in Wh (~1,8wh)
Model Architecture
ML models prefers numeric values so we need to embed our quotes. I used MTEB Leaderboard on HuggingFace to find the model with the best trade-off between performance and the number of parameters.
I then chosed "dunzhang/stella_en_400M_v5" model as embedder. It has the 7th best performance score with only 400M parameters.
Once the quote are embedded, I have 6091 values x 1024 features. After that, train-test split (70%, 30%).
Using TPOT Classifier, I found that the best model on my data was a Logistic Regressor.
Then here is the Confusion Matrix :
Environmental Impact
Environmental impact is tracked using CodeCarbon, measuring:
- Carbon emissions during inference
- Energy consumption during inference
This tracking helps establish a baseline for the environmental impact of model deployment and inference.
Limitations
- Embedding phase take ~30 secondes for 1800 quotes. It can be optimised and can have a real influence on carbon emissions.
- Hard to go over 70% accuracy with "simple" ML.
- Textual data have some interpretations limitations that little models can't find.
Ethical Considerations
- Dataset contains sensitive topics related to climate disinformation
- Environmental impact is tracked to promote awareness of AI's carbon footprint