Master_Thesis / README.md
LukasGe's picture
Update README.md
f48d3f7
metadata
title: Master Thesis DVS Tagging
emoji: 😊
colorFrom: yellow
colorTo: purple
sdk: docker
app_port: 6900
fullWidth: true
tags:
  - argilla
duplicated_from: argilla/argilla-template-space
license: unknown

Introduction

This space has the purpose to validate predictions of labels for job titles from the field of data visualization. This work is part of my Master's Thesis and has the goal to facilitate the analysis of the DVS SOTI-Survey and to create additional analysis possibilities. In the background operates a BERT model trained to distinguish the LEVEL, RESposibility and FUNction mentioned in job titles. The predictions of the model are the thin underlines of the text.

Interaction with the tool

Check the predictions by clicking on the word and setting the correct tag if necessary. If everything is ok, click on the Validate button.

  1. top: filter by status: all already verified examples can be excluded by selecting the status 'Default'.
  2. top: filter by annotations: you can filter by tag group of already annotated examples.
  3. top: sort by score: you can sort the examples by their scores of predictions, if ↓ you start with the most certain predictions.
  4. in each example: With the button 'Find Similar' you can filter for the 50 most similar examples from the dataset.

Tagging scheme

Prefixes

These tags are prefixed with B-[TAG] for 'Beginning' and I-[TAG] for 'Inside' to indicate to the model whether a word is the beginning of one of the tags or already the following word, and yet still belongs to the same tag.

For example, a 'senior data visualization engineer' is annotated accordingly with [B-LEVEL, B-FUN, I-FUN, B-RES]. Thus, it is clear to the model that data and visualization belong together.

In contrast, a 'data analyst and scientist' is annotated accordingly with [B-FUN, B-RES, B-RES]. Thus, it is clear to the model that analyst and scientist are two different responsibilities.

Tags

LEVEL: Are terms that exclusively refer to the LEVEL. Examples are Senior or Junior, but Head, Professor or Assistant is an overlap of RESponsibility and therefore part of the RESponsibility.

Ambiguous Example: It gets more complicated with an Assistant Professor, where Assistant is the LEVEL and Professor is the RESponsibility, but a pure Assistant would be a RESponsibility.

RES: are all the terms that refer to tasks. Examples are engineer, designer, scientist, accountant and technician.

Ambiguous Example: A technical product manager is annotated as B-FUN, I-FUN, B-RES. Since the term technical product denotes a specific field of the managerial RESponsibility.

FUN: describes the business function of a position from various dimensions. First departments such as sales, marketing or operations, secondly the scope such as enterprise, project, customer or national and thirdly the content of an occupation such as data, visualization, research or education.

Ambiguous Example: teacher student data analysis is annotated as B-FUN, I-FUN, B-FUN, I-FUN since teacher student is a specification of the data analysis. Unfortunately, no RES has been specified here, setting this as analyst would be imprecise as we do not know if it is the head of teacher [..] analysis, or the teacher [...] analyst.

There is a possibility to discard entries that do not correspond to a job title. However, it is better to leave them in the system and not tag any element.

If you are not sure about the annotation, you can skip the record.

Thanks a lot for your work, and I hope you enjoy surfing through all the job titles in the field of data visualization. πŸ˜ƒ

If you need help or have questions don't hesitate and contact me in the discussions section.