|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- en |
|
- nl |
|
- de |
|
- fr |
|
- it |
|
- is |
|
- cs |
|
- da |
|
- es |
|
- ca |
|
metrics: |
|
- accuracy |
|
- matthews_correlation |
|
pipeline_tag: text-classification |
|
--- |
|
# Aurora SDG Multi-Label Multi-Class Model |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
This model is able to classify texts related to United Nations sustainable development goals (SDG) in multiple languages. |
|
|
|
![image](https://user-images.githubusercontent.com/73560591/216751462-ced482ba-5d8e-48aa-9a48-5557979a35f1.png) |
|
Source: https://sdgs.un.org/goals |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
This text classification model was developed by fine-tuning the bert-base-uncased pre-trained model. The training data for this fine-tuned model was sourced from the publicly available OSDG Community Dataset (OSDG-CD) at https://zenodo.org/record/5550238#.ZBulfcJByF4. |
|
This model was made as part of academic research at Deakin University. The goal was to make a transformer-based SDG text classification model that anyone could use. Only the first 16 UN SDGs supported. The primary model details are highlighted below: |
|
|
|
- **Model type:** Text classification |
|
- **Language(s) (NLP):** English, Dutch, German, Icelandic, French, Czeck, Italian, Danisch, Spanish, Catalan |
|
- **License:** cc-by-4.0 |
|
- **Finetuned from model [optional]:** bert-base-multilingual-uncased |
|
|
|
### Model Sources |
|
<!-- Provide the basic links for the model. --> |
|
- **Repository:** option 1: https://huggingface.co/MauriceV2021/AuroraSDGsModel ; option 2 https://doi.org/10.5281/zenodo.7304546 |
|
- **Demo [optional]:** option 1: ; option 2: https://aurora-universities.eu/sdg-research/classify/ |
|
|
|
|
|
### Direct Use |
|
|
|
<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> |
|
|
|
This is a fine-tuned model and therefore requires no further training. |
|
|
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code here to get started with the model: https://github.com/Aurora-Network-Global/sdgs_many_berts |
|
|
|
|
|
## Training Data |
|
|
|
<!-- This should link to a Data Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> |
|
The training data includes text from 1.4 titles and abstracts of academic research papers, labeled with SDG Goals and Targets, according to an initial validated query. |
|
|
|
See training data here: https://doi.org/10.5281/zenodo.5205672 |
|
|
|
### Evaluation of the Training data |
|
|
|
- Avg_precision = 0.70 |
|
- Avg_recall = 0.15 |
|
|
|
Data evaluated by 244 domain expert senior researchers. |
|
|
|
See evaluation report on the training data here: https://doi.org/10.5281/zenodo.4917107 |
|
|
|
|
|
## Training Hyperparameters |
|
|
|
<!-- |
|
- Num_epoch = 3 |
|
- Learning rate = 5e-5 |
|
- Batch size = 16 |
|
--> |
|
|
|
## Evaluation |
|
|
|
#### Metrics |
|
|
|
<!-- These are the evaluation metrics being used, ideally with a description of why. --> |
|
- Accuracy = 0.9 |
|
- Matthews correlation = 0.89 |
|
|
|
See evaluation report on the model here: https://doi.org/10.5281/zenodo.5603019 |
|
|
|
## Citation |
|
Sadick, A.M. (2023). SDG classification with BERT. https://huggingface.co/sadickam/sdg-classification-bert |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
|
|
<!--## Model Card Contact --> |