pad's picture
Update README.md
a74b730
metadata
language:
  - deu
tags:
  - german business registry
  - german roberta
  - language model
  - deutsches handelsregister
widget:
  - text: >-
      Sind mehrere Geschäftsführer bestellt, so wird die Gesellschaft durch zwei
      Geschäftsführer oder durch einen Geschäftsführer gemeinsam mit einem
      Produkristen <mask>.
    example_title: Fill mask
license: cc-by-nc-sa-4.0

RoBERTa Language Model pre-trained on German Business Registry Publications

Released, Jan 2023, this is a German RoBERTa language model trained on 250.000 CD files ("Chronologische Abdrücke") provided by the German Business Registry ("Deutsches Handelsregister").

The model can be considered as a "base" model, similar to the original RoBERTa "base" model (https://huggingface.co/roberta-base).

Parameters: ~110M

Intended uses & limitations

You can use the raw model for masked language modeling, but it's mostly intended to be fine-tuned on a downstream task. Common downstream tasks are named entity recognition (NER) and relation extraction (RE). Note that this model is primarily aimed at being fine-tuned on tasks that use the whole sentence (potentially masked) to make decisions, such as sequence classification, token classification or question answering. For tasks such as text generation you should look at generative models.

A use case for this model would be to fine-tune it on a NER task and use it to structure company data published by the German Business Registry.

Questions

If you have any questions feel free to drop a message to info@fusionbase.com Additionally, if you have interest in structured company data and/or publications let us know as well!