--- datasets: - Herelles/lupan language: - fr tags: - text classification - pytorch - camembert - urban planning - natural risks - risk management - geography inference: false --- # CamemBERT LUPAN (Local Urban Plans And Natural risks) ## Overview In France, urban planning and natural risk management operate the Local Land Plans (PLU – Plan Local d'Urbanisme) and the Natural risk prevention plans (PPRn – Plan de Prévention des Risques naturels) containing land use rules. To facilitate automatic extraction of the rules, we manually annotated a number of those documents concerning Montpellier, a rapidly evolving agglomeration exposed to natural risks, then fine-tuned a model. This model classifies input text in French to determine if it contains an urban planning rule. It outputs one of 4 classes: Verifiable (indicating the possibility of verification with satellite images), Non-verifiable (indicating impossibility of verification with satellite images), Informative (containing non-strict rules in the form of recommendations), and Not pertinent (absence of any of the above rules). For better quality results, it is recommended to add a title and a subtitle to each textual input. For more details please refer to our article: https://www.nature.com/articles/s41597-023-02705-y ## Training and evaluation data The model is fine-tuned on top of CamemBERT using our corpus: https://huggingface.co/datasets/Herelles/lupan This is the first corpus in the French language in the fields of urban planning and natural risk management. ## Example of use Attention: to run this code you need to have intalled `transformers`, `torch` and `numpy`. You can do it with `pip install transformers torch numpy` Load necessary libraries: ``` from transformers import CamembertTokenizer, CamembertForSequenceClassification import torch import numpy as np ``` Define tokenizer: ``` tokenizer = CamembertTokenizer.from_pretrained("camembert-base") ``` Define the model: ``` device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model = CamembertForSequenceClassification.from_pretrained("herelles/camembert-base-lupan") model.to(device) ``` Define segment to predict: ``` new_segment = '''Article 1 : Occupations ou utilisations du sol interdites 1) Dans l’ensemble de la zone sont interdits : Les constructions destinées à l’habitation ne dépendant pas d’une exploitation agricole autres que celles visées à l’article 2 paragraphe 1).''' ``` Get the prediction: ``` test_ids = [] test_attention_mask = [] # Apply the tokenizer encoding = tokenizer(new_segment, padding="longest", return_tensors="pt") # Extract IDs and Attention Mask test_ids.append(encoding['input_ids']) test_attention_mask.append(encoding['attention_mask']) test_ids = torch.cat(test_ids, dim = 0) test_attention_mask = torch.cat(test_attention_mask, dim = 0) # Forward pass, calculate logit predictions with torch.no_grad(): output = model(test_ids.to(device), token_type_ids = None, attention_mask = test_attention_mask.to(device)) prediction = np.argmax(output.logits.cpu().numpy()).flatten().item() if prediction == 0: pred_label = 'Not pertinent' elif prediction == 1: pred_label = 'Pertinent (Soft)' elif prediction == 2: pred_label = 'Pertinent (Strict, Non-verifiable)' elif prediction == 3: pred_label = 'Pertinent (Strict, Verifiable)' print('Input text: ', new_segment) print('\n\nPredicted Class: ', pred_label) ``` ## Online demo - https://huggingface.co/spaces/Herelles/segments-lupan ## Citation To cite the data set please use: ``` @article{koptelov2023manually, title={A manually annotated corpus in French for the study of urbanization and the natural risk prevention}, author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne}, journal={Scientific Data}, volume={10}, number={1}, pages={818}, year={2023}, publisher={Nature Publishing Group UK London} } ``` To cite the code please use: ``` @inproceedings{koptelov2023towards, title={Towards a (Semi-) Automatic Urban Planning Rule Identification in the French Language}, author={Koptelov, Maksim and Holveck, Margaux and Cremilleux, Bruno and Reynaud, Justine and Roche, Mathieu and Teisseire, Maguelonne}, booktitle={2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)}, pages={1--10}, year={2023}, organization={IEEE} } ```