File size: 4,480 Bytes
64a2a87
 
 
 
 
 
 
 
 
 
 
 
 
112f4e7
64a2a87
 
 
 
 
 
 
 
 
 
 
 
4163212
 
64a2a87
 
 
 
 
4163212
64a2a87
61de029
64a2a87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
98b0d0e
 
 
64a2a87
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
---
datasets:
- Herelles/lupan
language:
- fr
tags:
- text classification
- pytorch
- camembert
- urban planning
- natural risks
- risk management
- geography
inference: false
---
# CamemBERT LUPAN (Local Urban Plans And Natural risks)
## Overview

In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model.

This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input.

For more details please refer to our article:  https://www.nature.com/articles/s41597-023-02705-y

## Training and evaluation data 

The model is fine-tuned on top of CamemBERT using our corpus: 
https://huggingface.co/datasets/Herelles/lupan

This is the first corpus in the French language in the fields of urban planning and natural risk management. 

## Example of use

Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy`

Load necessary libraries:
```
from transformers import CamembertTokenizer, CamembertForSequenceClassification

import torch

import numpy as np
```

Define tokenizer:
```
tokenizer = CamembertTokenizer.from_pretrained("camembert-base")
```

Define the model:
```
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan")

model.to(device)
```

Define segment to predict:
```
new_segment = '''Article 1 : Occupations ou utilisations du sol interdites
 
1) Dans l’ensemble de la zone sont interdits :
 
Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres 
que celles visées à l’article 2 paragraphe 1).'''
```

Get the prediction:
```
test_ids = []
test_attention_mask = []

# Apply the tokenizer
encoding = tokenizer(new_segment, padding="longest", return_tensors="pt")

# Extract IDs and Attention Mask
test_ids.append(encoding['input_ids'])
test_attention_mask.append(encoding['attention_mask'])
test_ids = torch.cat(test_ids, dim = 0)
test_attention_mask = torch.cat(test_attention_mask, dim = 0)

# Forward pass, calculate logit predictions
with torch.no_grad():
  output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device))

prediction = np.argmax(output.logits.cpu().numpy()).flatten().item()

if prediction == 0:
  pred_label = 'Not pertinent'
elif prediction == 1:
  pred_label = 'Pertinent (Soft)'
elif prediction == 2:
  pred_label = 'Pertinent (Strict, Non-verifiable)'
elif prediction == 3:
  pred_label = 'Pertinent (Strict, Verifiable)'

print('Input text: ', new_segment)
print('\n\nPredicted Class: ', pred_label)
```

## Online demo
- https://huggingface.co/spaces/Herelles/segments-lupan

## Citation

To cite the data set please use:
```
@article{koptelov2023manually,
  title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention},
  author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
  journal={Scientific Data},
  volume={10},
  number={1},
  pages={818},
  year={2023},
  publisher={Nature Publishing Group UK London}
}
 ```

To cite the code please use:
```
@inproceedings{koptelov2023towards,
  title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language},
  author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne},
  booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)},
  pages={1--10},
  year={2023},
  organization={IEEE}
}
```