File size: 3,860 Bytes
adccf00
1240b7e
fe75b58
 
 
adccf00
 
 
 
 
 
f6855f9
adccf00
 
befe1ae
893ab81
befe1ae
893ab81
e7fa8e3
befe1ae
893ab81
2f34dc4
893ab81
2f34dc4
 
 
 
893ab81
befe1ae
893ab81
befe1ae
893ab81
befe1ae
893ab81
befe1ae
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8057e84
befe1ae
 
 
 
 
 
 
 
 
 
 
 
 
893ab81
befe1ae
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
title: Master Thesis DVS Tagging
emoji: 😊
colorFrom: yellow
colorTo: purple
sdk: docker
app_port: 6900
fullWidth: true
tags:
- argilla
duplicated_from: argilla/argilla-template-space
license: unknown
---

## Introduction 

This space has the purpose to validate predictions of labels for job titles from the field of data visualization. This work is part of my Master's Thesis and has the goal to facilitate the analysis of the DVS SOTI-Survey and to create additional analysis possibilities. In the background operates a BERT model trained to distinguish the LEVEL, RESposibility and FUNction mentioned in job titles. The predictions of the model are the thin underlines of the text. 


## Interaction with the tool

Check the predictions by clicking on the word and setting the correct tag if necessary. If everything is ok, click on the *Validate* button.

1. top: filter by status: all already verified examples can be excluded by selecting the status 'Default'.
2. top: filter by annotations: you can filter by tag group of already annotated examples.
3. top: sort by score: you can sort the examples by their scores of predictions, if ↓ you start with the most certain predictions.
4. in each example: With the button 'Find Similar' you can filter for the 50 most similar examples from the dataset.

## Tagging scheme

### Prefixes

These tags are prefixed with **B-[TAG]** for 'Beginning' and **I-[TAG]** for 'Inside' to indicate to the model whether a word is the beginning of one of the tags or already the following word, and yet still belongs to the same tag.

For example, a 
'senior      data     visualization   engineer' is annotated accordingly with 
[B-LEVEL,    B-FUN,   I-FUN,          B-RES]. Thus, it is clear to the model that data and visualization belong together.

In contrast, a 
'data      analyst   and    scientist' is annotated accordingly with 
[B-FUN,    B-RES,           B-RES]. Thus, it is clear to the model that analyst and scientist are two different responsibilities.

### Tags

**LEVEL**: Are terms that **exclusively** refer to the LEVEL. Examples are *Senior* or *Junior*, but *Head*, *Professor* or *Assistant* is an overlap of RESponsibility and therefore part of the RESponsibility.

<u>Ambiguous Example:</u>
It gets more complicated with an *Assistant Professor*, where *Assistant* is the LEVEL and *Professor* is the RESponsibility, but a pure *Assistant* would be a RESponsibility.

**RES**: are all the terms that refer to tasks. Examples are *engineer*, *designer*, *scientist*, *accountant* and *technician*.

<u>Ambiguous Example:</u>
A *technical product manager* is annotated as B-FUN, I-FUN, B-RES. Since the term *technical product* denotes a specific field of the managerial RESponsibility.

**FUN**: describes the business function of a position from various dimensions. First departments such as sales, marketing or operations, secondly the scope such as enterprise, project, customer or national and thirdly the content of an occupation such as data, visualization, research or education.

<u>Ambiguous Example:</u>
*teacher student data analysis* is annotated as B-FUN, I-FUN, B-FUN, I-FUN since *teacher student* is a specification of the *data analysis*. Unfortunately, no RES has been specified here, setting this as *analyst* would be imprecise as we do not know if it is the *head of teacher [..] analysis*, or the *teacher [...] analyst*.

There is a possibility to discard entries that do not correspond to a job title. However, it is better to leave them in the system and not tag any element.

If you are not sure about the annotation, you can skip the record. 

Thanks a lot for your work, and I hope you enjoy surfing through all the job titles in the field of data visualization. πŸ˜ƒ

If you need help or have questions don't hesitate and contact me in the discussions section.