Fine-tune Flair Models on NER Dataset with 🤗 AutoTrain SpaceRunner

Community Article Published October 20, 2023

TLDR: In this blog post, we demonstrate how to fine-tune Flair models on the German MobIE NER dataset with the powerful 🤗 AutoTrain library and SpaceRunner. Additionally, we create visually appealing Model Cards using the 🤗 Hub client library.


The Flair library is a straightforward framework for state-of-the-art NLP, developed by Humboldt University of Berlin and friends, and fully integrated into the Model Hub.

We utilize Flair in this blog post to fine-tune a model for named entity recognition (NER) on a German dataset in the Mobility Domain (MobIE dataset).

This blog post also outlines how to use this dataset in Flair and how to conduct a basic hyper-parameter search.

By leveraging the 🤗 AutoTrain library with the SpaceRunner feature, the entire fine-tuning process can be accomplished cost-effective and efficiently within the Hugging Face ecosystem.

Furthermore, this blog post demonstrates how to automatically upload finely-tuned models and create informative, aesthetically pleasing model cards.

German MobIE Dataset

The German MobIE Dataset was introduced in the MobIE paper by Hennig, Truong and Gabryszak (2021).

This is a German-language dataset that has been human-annotated with 20 coarse- and fine-grained entity types, and it includes entity linking information for geographically linkable entities. The dataset comprises 3,232 social media texts and traffic reports, totaling 91K tokens, with 20.5K annotated entities, of which 13.1K are linked to a knowledge base. In total, 20 different named entities are annotated.

To use this dataset in Flair, we must create our own dataset loader because the dataset has not yet been integrated into the Flair library:

class NER_GERMAN_MOBIE(ColumnCorpus):
    def __init__(
        base_path: Optional[Union[str, Path]] = None,
        in_memory: bool = True,
    ) -> None:
        base_path = flair.cache_root / "datasets" if not base_path else Path(base_path)
        dataset_name = self.__class__.__name__.lower()
        data_folder = base_path / dataset_name
        data_path = flair.cache_root / "datasets" / dataset_name

        columns = {0: "text", 3: "ner"}

        train_data_file = data_path / "train.conll2003"
        if not train_data_file.is_file():
            temp_file = cached_path(
                Path("datasets") / dataset_name,
            from zipfile import ZipFile

            with ZipFile(temp_file, "r") as zip_file:


The following figure shows an annotated sentence (taken from the MobIE paper):


Fine-Tuning with Flair

We use the latest Flair version for fine-tuning. Additionally, the model is trained with the FLERT (Schweter and Akbik (2020) approach, because the MobIE dataset thankfully comes with document boundary information marker. The GBERT Base model from Chan et al. (2020) is used as backbone LM.

We define a very basic hyper-parameter search over the following parameters:

  • Batch Sizes = [16]
  • Learning Rates = [3e-05, 5e-05]
  • Seeds = [1, 2, 3, 4, 5]

This means that 10 models are trained in total. The hyper-parameter search could be implemented like this:

# Hyper-Parameter search definitions
batch_sizes = [16]
learning_rates = [3e-05, 5e-05]
seeds = [1, 2, 3, 4, 5]
epochs = [10]
context_sizes = [64]

# Backbone LM definitions
base_model = "deepset/gbert-base"
base_model_short = "gbert_base"

# Hugging Face Model Hub configuration
hf_token = os.environ.get("HF_TOKEN")
hf_hub_org_name = os.environ.get("HUB_ORG_NAME")

for seed in seeds:
    for batch_size in batch_sizes:
        for epoch in epochs:
            for learning_rate in learning_rates:
                for context_size in context_sizes:
                    experiment_configuration = ExperimentConfiguration(
                    output_path = run_experiment(experiment_configuration=experiment_configuration)

The implementation of the run_experiment() method (that holds the complete fine-tuning logic) can be found here.

Start Fine-Tuning with 🤗 AutoTrain SpaceRunner

The fine-tuning process begins by using the remarkable 🤗 AutoTrain library with SpaceRunner feature.

This initiates a Docker-based SpaceRunner, and the entire model fine-tuning process is carried out on hardware provided by Hugging Face.

We utilize a T4 Small instance for our experiments. To set up SpaceRunner, two essential files are required:

  • This file manages the entire fine-tuning process, including implementing the hyper-parameter search and model uploading. You can find an example of this here.
  • requirements.txt: This file defines all the necessary dependencies that are installed in the AutoTrain Space.

Before commencing AutoTrain fine-tuning, the following environment variables need to be created:

  • HF_TOKEN: This is the User Access Token, which can be obtained here
  • HUB_ORG_NAME: This is the username or organization where the AutoTrain space is created.

To start the fine-tuning process via the command line, use the following command:

$ autotrain spacerunner --project-name "flair-mobie" \
  --script-path $(pwd) \
  --username stefan-it \
  --token $HF_TOKEN \
  --backend spaces-t4s\
  --env "HF_TOKEN=$HF_TOKEN;HUB_ORG_NAME=stefan-it"

This command creates a Docker space where the entire fine-tuning process can be monitored. Additionally, it establishes a new dataset repository where all source files are stored.

The fine-tuning of all ten models in this tutorial required 4 hours and 34 minutes on the T4 small instance and had a total cost of $2.74.

Model Upload

After each model is fine-tuned, the following files/folder are uploaded to the Model Hub (one repository for every model):

  • pytorch-model.bin: Flair internally tracks the best model as over all epochs. To be compatible with the Model Hub the, is renamed automatically to pytorch_model.bin
  • training.log: Flair stores the training log in training.log. This file is later needed to parse the best F1-score on development set
  • ./runs: In this folder the TensorBoard logs are stored. This enables a nice display of metrics on the Model Hub

The repository creation and uploading of files/folders are done via the awesome 🤗 Hub client library:

# Creates repository
repo_url = api.create_repo(

# Upload TensorBoard logs

# Upload Flair's training log

# Upload best model

Model Card Creation

After the automatic upload of all models to the Model Hub, we now want to create model cards for each model with the following features:

  • Nice looking model card with metadata (important!);
  • A working inference widget to try out other NER examples;
  • A results overview.

In order to create model cards automatically, we extensively use the 🤗 Hub client library.

Metadata Section

We use the following template to define the metadata section of each model:

language: de
license: mit
- flair
- token-classification
- sequence-tagger-model
base_model: {{ base_model }}
- text: {{ widget_text }}

Later, we pass base_model and widget_text to this template.

Results Overview

The results overview section is very important and includes the following steps:

  • Iterating over all the fine-tuned models and parsing the training.log file to get the best F1-score on the development set
  • Constructing a results table for all hyper-parameter configurations (in our example, batch size, number of epochs and learning rate) and their different F1-scores for every seed, including averaged F1-Score and standard deviation

All these steps are shown in our example notebook.

After retrieving all results from the training.log files, a Pandas DataFrame shows the results averaged over all seeds and grouped by the hyper-parameter configuration:

Configuration Seed 1 Seed 2 Seed 3 Seed 4 Seed 5 Average Std.
bs16-e10-lr5e-05 0.8446 0.8495 0.8455 0.8419 0.8476 0.8458 0.0029
bs16-e10-lr3e-05 0.8392 0.8445 0.8495 0.8381 0.8449 0.8432 0.0046

However, this is not enough! We want to link the corresponding models and also highlight the result of the current viewed model on the Model Hub. A final results table will then look like:

Configuration Seed 1 Seed 2 Seed 3 Seed 4 Seed 5 Average
bs16-e10-lr5e-05 0.8446 0.8495 0.8455 0.8419 0.8476 0.8458 ± 0.0029
bs16-e10-lr3e-05 0.8392 0.8445 0.8495 0.8381 0.8449 0.8432 ± 0.0046

PR Creation

After we locally constructed nice-looking model cards, we now want to push them for all of our fine-tuned models. Here's the final code snippet - that also allows you to define a good commit message and description:

commit_message = "readme: add initial version of model card"
commit_description = "Hey,\n\nthis PR adds the initial version of model card."
create_pr = True

for model in model_infos:
    current_results_table = get_results_table(final_df, model_infos, model)
    card_data = ModelCardData()
    card = ModelCard.from_template(card_data, template_path="",
                                   batch_sizes=f'[{", ".join([f"`{bs}`" for bs in batch_sizes ])}]',
                                   learning_rates=f'[{", ".join([f"`{lr}`" for lr in learning_rates ])}]',

    commit_url = card.push_to_hub(repo_id=model.model_id,
    print(commit_url + "\n")

An example PR can be seen here.

It is also possible to set the `create_pr` parameter to `False`. This means that the PR is automatically merged without review!

Wait, where are my models?

Initially, the model repositories were created with the private=True option. This means all models are not yet publicly visible - but they can easily be set to public with:

# Now make repositories publicly visible
for model in model_infos:
    print(f"Update visibility to True for repo{model.model_id}")
    update_repo_visibility(repo_id=model.model_id, private=False)

Model Card Showcase

Now it is time to showcase the uploaded model card!

Model Card Header


TensorBoard Metrics


Inference Widget


Results Section



In this blog post we show how to use Flair in combination with the 🤗 AutoTrain library with SpaceRunner to fine-tune models on the German MobIE NER dataset with a basic hyper-parameter search.

Additionally, we used the 🤗 Hub client library to automatically upload nice looking model cards with useful information.

Additional Resources