File size: 2,245 Bytes
99bc2fc
81bbd8f
 
 
 
 
99bc2fc
81bbd8f
 
 
 
 
980c213
81bbd8f
 
 
 
 
 
cb47e56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
eb7ac8c
81bbd8f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
language:
  - en
tags:
  - JobBERTa
  - job postings
---

# JobBERTa

This is the JobBERTa model from:

**NNOSE: Nearest Neighbor Occupational Skill Extraction**. Mike Zhang, Rob van der Goot, Min-Yen Kan, and Barbara Plank. To appear at EACL 2024.
  
This model is continuously pre-trained from a `roberta-base` checkpoint on ~3.2M sentences from job postings. More information can be found in the paper.

If you use this model, please cite the following paper:

```
@inproceedings{zhang-etal-2024-nnose,
    title = "{NNOSE}: Nearest Neighbor Occupational Skill Extraction",
    author = "Zhang, Mike  and
      Goot, Rob  and
      Kan, Min-Yen  and
      Plank, Barbara",
    editor = "Graham, Yvette  and
      Purver, Matthew",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = mar,
    year = "2024",
    address = "St. Julian{'}s, Malta",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.eacl-long.35",
    pages = "589--608",
    abstract = "The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks{---}combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction \textit{without} additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30{\%} span-F1 in cross-dataset settings.",
}
```