---
extra_gated_prompt: "You agree to adhere to all terms and conditions for using the model as specified by the IEA License Agreement."
extra_gated_fields:
  Company: text
  Country: country
  Specific date: date_picker
  I want to use this model for:
    type: select
    options: 
      - Research
      - Education
      - label: Other
        value: other
  I agree to use this model for non-commercial use ONLY: checkbox
  I agree to not redistribute the data or share access credentials: checkbox
  I agree to cite the IEA model source in any publications or presentations: checkbox
  I understand that ICILS is a registered trademark of IEA and is protected by trademark law: checkbox
  I agree that the use of the model for assessments or learning materials requires prior notice to IEA: checkbox
license: mit
base_model: jjzha/esco-xlm-roberta-large
datasets:
- ICILS/multilingual_parental_occupations
pipeline_tag: text-classification
metrics:
- accuracy
- danieldux/isco_hierarchical_accuracy
widget:
- text: Beauticians and Related Workers
  example_title: Example 1
- text: She is a beautition at hair and beauty. She owns a hair and beauty salon
  example_title: Example 2
- text: "Retired. Doesn't work anymore."
  example_title: Example 3
- text: Ingeniero civil. ayuda en construcciones
  example_title: Example 4
tags:
- ISCO
- ESCO
- occupation coding
- ICILS
language:
- da
- de
- en
- es
- fi
- fr
- it
- kk
- ko
- kz
- pt
- ro
- ru
- sv
model-index:
- name: xlm-r-icils-ilo
  results:
  - task:
      name: Text Classification
      type: text-classification
    dataset:
      name: ICILS/multilingual_parental_occupations
      type: ICILS/multilingual_parental_occupations
      config: icils
      split: test
      args: icils
    metrics:
    - name: Accuracy
      type: accuracy
      value: 0.6285
    - name: ISCO Hierarchical Accuracy
      type: danieldux/isco_hierarchical_accuracy
      value: 0.95
library_name: transformers
---

# Model Card for ICILS XLM-R ISCO

This model is a fine-tuned version of [ESCOXLM-R](https://huggingface.co/jjzha/esco-xlm-roberta-large) trained on [The ICILS Multilingual ISCO-08 Parental Occupation Corpus](https://huggingface.co/datasets/ICILS/multilingual_parental_occupations).

A R&D report explaining the research is available at [https://www.iea.nl/publications/rd-outcomes/improving-parental-occupation-coding-procedures-ai](https://www.iea.nl/publications/rd-outcomes/improving-parental-occupation-coding-procedures-ai).

It achieves the following results on the test split:
- Loss: 1.7849
- Accuracy: 0.6285
- Hierarchical Accuracy: 0.95

The research paper, [ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain](https://aclanthology.org/2023.acl-long.662/), 
states "ESCOXLM-R, based on XLM-R-large, uses domain-adaptive pre-training on the 
[European Skills, Competences, Qualifications and Occupations](https://esco.ec.europa.eu/en/classification/occupation-main) (ESCO) taxonomy, covering 27 languages. 
The pre-training objectives for ESCOXLM-R include dynamic masked language modeling and a novel additional objective for inducing multilingual 
taxonomical ESCO relations" (Zhang et al., ACL 2023).

## Model Details

### Model Description

IEA is an international cooperative of national research institutions, governmental research agencies, scholars, and analysts working to research, understand, and improve education worldwide.


- **Developed by:** [The International Computer and Information Literacy Study](https://www.iea.nl/studies/iea/icils)
- **Funded by:** [IEA International Association for the Evaluation of Educational Achievement](https://www.iea.nl/)
- **Shared by [optional]:** [More Information Needed]
- **Model type:** [More Information Needed]
- **Language(s) (NLP):** [More Information Needed]
- **License:** [More Information Needed]
- **Finetuned from model:** [ESCOXLM-R](https://huggingface.co/jjzha/esco-xlm-roberta-large)

### Model Sources

- **Repository:** [More Information Needed]
- **Paper:** [Improving parental occupation coding procedures AI](https://www.iea.nl/publications/rd-outcomes/improving-parental-occupation-coding-procedures-ai)
- **Demo:** [https://huggingface.co/spaces/ICILS/ICILS-XLM-R-ISCO](https://huggingface.co/spaces/ICILS/ICILS-XLM-R-ISCO)

## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

### Out-of-Scope Use

<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->

[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

Use the code below to get started with the model.

[More Information Needed]

## Training Details

### Training Data

<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->

[More Information Needed]

### Training Procedure 

<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->

#### Preprocessing [optional]

[More Information Needed]


### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 12.0

### Training results

| Training Loss | Epoch | Step  | Accuracy | Validation Loss |
|:-------------:|:-----:|:-----:|:--------:|:---------------:|
| 3.2269        | 1.0   | 3518  | 0.4176   | 2.9434          |
| 2.2851        | 2.0   | 7036  | 0.5250   | 2.2479          |
| 1.937         | 3.0   | 10554 | 0.5691   | 1.9822          |
| 1.4695        | 4.0   | 14072 | 0.6018   | 1.8560          |
| 1.2157        | 5.0   | 17590 | 0.6114   | 1.8160          |
| 0.9819        | 6.0   | 21108 | 0.6214   | 1.7946          |
| 0.8608        | 7.0   | 24626 | 0.6285   | 1.7849          |
| 0.8374        | 8.0   | 28144 | 0.6353   | 1.7893          |
| 0.7908        | 9.0   | 31662 | 1.8279   | 0.6239          |
| 0.6962        | 10.0  | 35180 | 1.8472   | 0.6347          |
| 0.6371        | 11.0  | 38698 | 1.8669   | 0.6339          |
| 0.5226        | 12.0  | 42216 | 1.8695   | 0.6336          |

## Evaluation

<!-- This section describes the evaluation protocols and provides the results. -->

### Testing Data, Factors & Metrics

#### Testing Data

The model was trained on the `icils` configuration of the ISCO-08 dataset using the train and validation splits and evaluated on the test split.

#### Factors

<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->

[More Information Needed]

#### Metrics

<!-- These are the evaluation metrics being used, ideally with a description of why. -->

[More Information Needed]

### Results

[More Information Needed]

#### Summary


## Model Examination [optional]

<!-- Relevant interpretability work for the model goes here -->

[More Information Needed]

## Technical Specifications [optional]

### Model Architecture and Objective

[More Information Needed]

### Compute Infrastructure

[More Information Needed]

#### Hardware

[More Information Needed]

#### Software

### Framework versions

- Transformers 4.40.0.dev0
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2

## Citation [optional]

<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

**BibTeX:**

[More Information Needed]

**APA:**

[More Information Needed]

## Glossary [optional]

<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->

[More Information Needed]

## More Information [optional]

[More Information Needed]

## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]