Herelles commited on
Commit
64a2a87
1 Parent(s): 086678f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - Herelles/lupan
4
+ language:
5
+ - fr
6
+ tags:
7
+ - text classification
8
+ - pytorch
9
+ - camembert
10
+ - urban planning
11
+ - natural risks
12
+ - risk management
13
+ - geography
14
+ ---
15
+ # CamemBERT LUPAN (Local Urban Plans And Natural risks)
16
+ ## Overview
17
+
18
+ In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model.
19
+
20
+ This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input.
21
+
22
+ For more details please refer to our article: https://www.nature.com/articles/s41597-023-02705-y
23
+
24
+ ## Training and evaluation data
25
+
26
+ The model is fine-tuned on top of CamemBERT using our corpus: https://huggingface.co/datasets/Herelles/lupan
27
+
28
+ This is the first corpus in the French language in the fields of urban planning and natural risk management.
29
+
30
+ ## Example of use
31
+
32
+ Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy`.
33
+
34
+ Load nessesary libraries:
35
+ ```
36
+ from transformers import CamembertTokenizer, CamembertForSequenceClassification
37
+
38
+ import torch
39
+
40
+ import numpy as np
41
+ ```
42
+
43
+ Define tokenizer:
44
+ ```
45
+ tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
46
+ ```
47
+
48
+ Define the model:
49
+ ```
50
+ device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
51
+
52
+ model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan")
53
+
54
+ model.to(device)
55
+ ```
56
+
57
+ Define segment to predict:
58
+ ```
59
+ new_segment = '''Article 1 : Occupations ou utilisations du sol interdites
60
+
61
+ 1) Dans l’ensemble de la zone sont interdits :
62
+
63
+ Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres
64
+ que celles visées à l’article 2 paragraphe 1).'''
65
+ ```
66
+
67
+ Get the prediction:
68
+ ```
69
+ test_ids = []
70
+ test_attention_mask = []
71
+
72
+ # Apply the tokenizer
73
+ encoding = tokenizer(new_segment, padding="longest", return_tensors="pt")
74
+
75
+ # Extract IDs and Attention Mask
76
+ test_ids.append(encoding['input_ids'])
77
+ test_attention_mask.append(encoding['attention_mask'])
78
+ test_ids = torch.cat(test_ids, dim = 0)
79
+ test_attention_mask = torch.cat(test_attention_mask, dim = 0)
80
+
81
+ # Forward pass, calculate logit predictions
82
+ with torch.no_grad():
83
+ output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device))
84
+
85
+ prediction = np.argmax(output.logits.cpu().numpy()).flatten().item()
86
+
87
+ if prediction == 0:
88
+ pred_label = 'Not pertinent'
89
+ elif prediction == 1:
90
+ pred_label = 'Pertinent (Soft)'
91
+ elif prediction == 2:
92
+ pred_label = 'Pertinent (Strict, Non-verifiable)'
93
+ elif prediction == 3:
94
+ pred_label = 'Pertinent (Strict, Verifiable)'
95
+
96
+ print('Input text: ', new_segment)
97
+ print('\n\nPredicted Class: ', pred_label)
98
+ ```
99
+
100
+ ## Citation
101
+
102
+ To cite the data set please use:
103
+ ```
104
+ @article{koptelov2023manually,
105
+ title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention},
106
+ author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
107
+ journal={Scientific Data},
108
+ volume={10},
109
+ number={1},
110
+ pages={818},
111
+ year={2023},
112
+ publisher={Nature Publishing Group UK London}
113
+ }
114
+ ```
115
+
116
+ To cite the code please use:
117
+ ```
118
+ @inproceedings{koptelov2023towards,
119
+ title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language},
120
+ author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
121
+ booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)},
122
+ pages={1--10},
123
+ year={2023},
124
+ organization={IEEE}
125
+ }
126
+ ```