igorsterner's picture
Update README.md
5cfcc1d
metadata
language:
  - multilingual
  - en
  - de
license: mit
widget:
  - text: >-
      ich glaub ich muss echt rewatchen like i so empty was soll ich denn jetzt
      machen
    example_title: Example 1
  - text: Ich hab das selbst gedownloadet I have the receipts
    example_title: Example 2
  - text: >-
      Ich dachte jz mit dem Date wäre der andere raus I know overthinken ist
      dein Problem
    example_title: Example 3

German-English Code-Switching Identification

The Tongueswitcher BERT model finetuned for German-English identification. It was introduced in this paper. This model is case sensitive.

Overview

  • Initialized language model: german-english-code-switching-bert
  • Training data: The Denglish Corpus
  • Infrastructure: 1x Nvidia A100 GPU
  • Published: 16 October 2023

Hyperparameters

batch_size = 16
epochs = 3
n_steps = 789
max_seq_len = 512
learning_rate = 3e-5
weight_decay = 0.01
seed = 2021

Authors

  • Igor Sterner: is473 [at] cam.ac.uk
  • Simone Teufel: sht25 [at] cam.ac.uk

BibTeX entry and citation info

@inproceedings{sterner2023tongueswitcher,
  author    = {Igor Sterner and Simone Teufel},
  title     = {TongueSwitcher: Fine-Grained Identification of German-English Code-Switching},
  booktitle = {Sixth Workshop on Computational Approaches to Linguistic Code-Switching},
  publisher = {Empirical Methods in Natural Language Processing},
  year      = {2023},
}