File size: 2,848 Bytes
07330c2 8f6f3cd 07330c2 8f6f3cd 07330c2 bf0bdd3 07330c2 e9af805 07330c2 8f6f3cd 07330c2 8f6f3cd 07330c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 |
---
license: apache-2.0
language:
- multilingual
library_name: transformers
tags:
- climate
---
multilingual version of [CatastroBERT](https://huggingface.co/epfl-dhlab/CatastroBERT)
# CatastroBERT a model for Extreme weather events detection in French text
This model aims to facilitate the detection of paragraphs or articles relevant to extreme weather events
in French text. It is based on the [camembert-base](https://huggingface.co/camembert-base) model and was trained on manually annotated data (articles summaries) from the Gazette de Lausanne archives collected by [impresso](https://impresso-project.ch/)
<div align=center>
<img src="bert_illustration.png" width="500" height="500" />
</div>
## Model Description
- **Developed by:** Lucas Nicolas
- **Language(s) (NLP):** French
- **Finetuned from model :** [camembert-base](https://huggingface.co/camembert-base) (RoBERTa Checkpoint)
- **Repository:** Check the [CatastroBERT](https://github.com/dh-epfl-students/dhlab-CatastroBERT) GitHub page for more usage examples and information.
## Usage
### In Transformers
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "epfl-dhlab/CatastroBERT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification(model_name)
def predict(text):
# Prepare the text data
inputs = tokenizer.encode_plus(
text,
None,
add_special_tokens=True,
return_token_type_ids=True,
padding=True,
max_length=512,
truncation=True,
return_tensors='pt'
)
ids = inputs['input_ids'].to('cuda' if torch.cuda.is_available() else 'cpu')
mask = inputs['attention_mask'].to('cuda' if torch.cuda.is_available() else 'cpu')
# Get predictions
with torch.no_grad():
outputs = model(ids, mask)
logits = outputs.logits
# Apply sigmoid function to get probabilities
probs = torch.sigmoid(logits).cpu().numpy()
# Return the probability of the class (1)
return probs[0][0]
#example usage
text = "Un violent ouragan du sud-ouest est passé cette nuit sur Lausanne."
print(f"Prediction: {predict(text)}")
```
### Training Data
This model was trained on manually a manually annotated dataset (articles summaries) curated from the Gazette de Lausanne archives collected by the [impresso](https://impresso-project.ch/) project. The dataset is composed of 4500 articles summaries of which 3500 were used for training and 1000 for validation.
## Environmental Impact
Carbon emissions estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
- **Hardware Type:** RTX 3090
- **Hours used:** 26
- **Carbon Emitted:** 0.07 kg CO2 |