metadata

license: cc-by-sa-3.0
language:
  - de
library_name: flair

Flair xLSTM Embeddings (German Wikipedia, Forward)

Research & development of Flair xLSTM Embeddings (Forward) trained on German Wikipedia dump.

The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks). Check out the xlstm branch in the Flair repository - many thanks to Patrick Haller for the work on it.

Training

The current model was trained with commit 18ef331 from the xlstm branch. The xlstm library needs to be installed manually - also check that pip3 install Ninja is installed.

The German Wikipedia dump from this repository is used, including sharding the corpus into a Flair-compatible format:

valid.txt -> Validation corpus
test.txt -> Test corpus
train -> Folder with text files as training corpus

The model was trained with the following parameters for 2 epochs:

import flair
import torch

from flair.data import SubTokenDictionary
from flair.models import xLSTMLanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus

from transformers import AutoTokenizer
     
flair.device = torch.device('cuda:0')
      
is_forward_lm = True
                      
dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1")

corpus = TextCorpus("/home/ubuntu/splitted_corpus",
                    dictionary,
                    is_forward_lm,
                    character_level=False,
                    random_case_flip=True,
                    )

xlstm_ablation_1 = """
mlstm_block:
  mlstm:
    conv1d_kernel_size: 2
    qkv_proj_blocksize: 2
    num_heads: 2
slstm_block:
  slstm:
    backend: cuda
    num_heads: 2
    conv1d_kernel_size: 2
    bias_init: powerlaw_blockdependent
  feedforward:
    proj_factor: 1.3
    act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""

language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1,
                                    is_forward_lm=True)
print(language_model)

trainer = LanguageModelTrainer(language_model, corpus)

trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2",
              sequence_length=256,
              mini_batch_size=64,
              learning_rate=5,
              patience=50,
              max_epochs=2,
              checkpoint=False,
              num_workers=4,
              )

Output of last lines of training log:

2024-06-10 22:06:54,411 Split 113        - (22:06:54)
2024-06-10 22:07:23,726 | split 113/113 |   100/  773 batches | ms/batch 293.11 | loss 4.4117 | ppl 82.4075
2024-06-10 22:07:52,762 | split 113/113 |   200/  773 batches | ms/batch 290.36 | loss 4.3306 | ppl 75.9880
2024-06-10 22:08:21,813 | split 113/113 |   300/  773 batches | ms/batch 290.51 | loss 4.3406 | ppl 76.7523
2024-06-10 22:08:50,869 | split 113/113 |   400/  773 batches | ms/batch 290.56 | loss 4.3063 | ppl 74.1655
2024-06-10 22:09:19,923 | split 113/113 |   500/  773 batches | ms/batch 290.54 | loss 4.3354 | ppl 76.3573
2024-06-10 22:09:48,965 | split 113/113 |   600/  773 batches | ms/batch 290.41 | loss 4.3417 | ppl 76.8392
2024-06-10 22:10:18,014 | split 113/113 |   700/  773 batches | ms/batch 290.50 | loss 4.3299 | ppl 75.9367
2024-06-10 22:10:45,001 best loss so far 7.03638310
2024-06-10 22:10:46,537 ['ist ein Wildschlafen, der vom Schmelzwärmezug verbindet. Zimmer und Kondonien.
Der nächste Ausbau der Geländeulanzhaube in dem zweitältesten Zentrum liegt an seinem 2003 gegründete Mooshalle.
Das fertige große Jagdwasserkraftwerk befindet sich damit im benachbarten Astasper Ortsteil Zechbach nahe der Lenzeifel.
Er bildet ab dem 11. Juni 1999 eine Ortschaft ( bis 2009 Stollladen - Laufen ) in der Landschaft, liegt nur noch in Augen und Hetz.
Verkehr. Die Bahn', 'Kleinsecker. Verwandter. Löwenmann ( auch * Hans ), einer Person von Gottfried Meyer, unter.
Die Herkunft der 1810 verlorenen Familie, Ziegelei, Börsenbuch, Personen, Schriften, Jugendeinheit und die Öffentlichkeitsarbeit dienen dem Pfarrer in Knechtenmann dort.
Zur Genetion sind die übrigen Menschen weit verbreitet, in denen sich das Leben der " Admiralism " widmen.
Ein besonderes Verbreitungsgebiet erstreckt sich in grober Form : " Anthogrammam ist eine schlanker, gepflanzt etwa']
2024-06-10 22:10:46,537 -----------------------------------------------------------------------------------------
2024-06-10 22:10:46,538 | end of split 113 /113 | epoch   2 | time: 232.14s | valid loss 7.0906 | valid ppl 1200.6055 | learning rate 0.0781
2024-06-10 22:10:46,538 -----------------------------------------------------------------------------------------
2024-06-10 22:10:46,538 232 seconds for train split 113
2024-06-10 22:10:46,846 Epoch time: 26260.23
2024-06-10 22:10:52,959 TEST: valid loss 7.0908 | valid ppl 1200.8965
2024-06-10 22:10:52,959 -----------------------------------------------------------------------------------------

Caveats

Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters. Also downstream experiments are coming very soon.