license: cc-by-sa-3.0
language:
- de
library_name: flair
Flair xLSTM Embeddings (German Wikipedia, Forward)
Research & development of Flair xLSTM Embeddings (Forward) trained on German Wikipedia dump.
The Flair team is currently working on the integration of xLSTM (both LM training and fine-tuning models for downstream tasks).
Check out the xlstm
branch in the Flair repository - many thanks to Patrick Haller for the work on it.
Training
The current model was trained with commit 18ef331
from the xlstm
branch. The xlstm
library needs to be installed manually - also check that pip3 install Ninja
is installed.
The German Wikipedia dump from this repository is used, including sharding the corpus into a Flair-compatible format:
valid.txt
-> Validation corpustest.txt
-> Test corpustrain
-> Folder with text files as training corpus
The model was trained with the following parameters for 2 epochs:
import flair
import torch
from flair.data import SubTokenDictionary
from flair.models import xLSTMLanguageModel
from flair.trainers.language_model_trainer import LanguageModelTrainer, TextCorpus
from transformers import AutoTokenizer
flair.device = torch.device('cuda:0')
is_forward_lm = True
dictionary = SubTokenDictionary.load("gwlms/bert-base-dewiki-v1")
corpus = TextCorpus("/home/ubuntu/splitted_corpus",
dictionary,
is_forward_lm,
character_level=False,
random_case_flip=True,
)
xlstm_ablation_1 = """
mlstm_block:
mlstm:
conv1d_kernel_size: 2
qkv_proj_blocksize: 2
num_heads: 2
slstm_block:
slstm:
backend: cuda
num_heads: 2
conv1d_kernel_size: 2
bias_init: powerlaw_blockdependent
feedforward:
proj_factor: 1.3
act_fn: gelu
context_length: 256
num_blocks: 7
embedding_dim: 128
slstm_at: [1]
"""
language_model = xLSTMLanguageModel(dictionary, xlstm_cfg=xlstm_ablation_1,
is_forward_lm=True)
print(language_model)
trainer = LanguageModelTrainer(language_model, corpus)
trainer.train("xflair-german-wikipedia-xlstm_ablation_1-bs64-lr5-e2",
sequence_length=256,
mini_batch_size=64,
learning_rate=5,
patience=50,
max_epochs=2,
checkpoint=False,
num_workers=4,
)
Output of last lines of training log:
2024-06-10 22:06:54,411 Split 113 - (22:06:54)
2024-06-10 22:07:23,726 | split 113/113 | 100/ 773 batches | ms/batch 293.11 | loss 4.4117 | ppl 82.4075
2024-06-10 22:07:52,762 | split 113/113 | 200/ 773 batches | ms/batch 290.36 | loss 4.3306 | ppl 75.9880
2024-06-10 22:08:21,813 | split 113/113 | 300/ 773 batches | ms/batch 290.51 | loss 4.3406 | ppl 76.7523
2024-06-10 22:08:50,869 | split 113/113 | 400/ 773 batches | ms/batch 290.56 | loss 4.3063 | ppl 74.1655
2024-06-10 22:09:19,923 | split 113/113 | 500/ 773 batches | ms/batch 290.54 | loss 4.3354 | ppl 76.3573
2024-06-10 22:09:48,965 | split 113/113 | 600/ 773 batches | ms/batch 290.41 | loss 4.3417 | ppl 76.8392
2024-06-10 22:10:18,014 | split 113/113 | 700/ 773 batches | ms/batch 290.50 | loss 4.3299 | ppl 75.9367
2024-06-10 22:10:45,001 best loss so far 7.03638310
2024-06-10 22:10:46,537 ['ist ein Wildschlafen, der vom Schmelzwärmezug verbindet. Zimmer und Kondonien.
Der nächste Ausbau der Geländeulanzhaube in dem zweitältesten Zentrum liegt an seinem 2003 gegründete Mooshalle.
Das fertige große Jagdwasserkraftwerk befindet sich damit im benachbarten Astasper Ortsteil Zechbach nahe der Lenzeifel.
Er bildet ab dem 11. Juni 1999 eine Ortschaft ( bis 2009 Stollladen - Laufen ) in der Landschaft, liegt nur noch in Augen und Hetz.
Verkehr. Die Bahn', 'Kleinsecker. Verwandter. Löwenmann ( auch * Hans ), einer Person von Gottfried Meyer, unter.
Die Herkunft der 1810 verlorenen Familie, Ziegelei, Börsenbuch, Personen, Schriften, Jugendeinheit und die Öffentlichkeitsarbeit dienen dem Pfarrer in Knechtenmann dort.
Zur Genetion sind die übrigen Menschen weit verbreitet, in denen sich das Leben der " Admiralism " widmen.
Ein besonderes Verbreitungsgebiet erstreckt sich in grober Form : " Anthogrammam ist eine schlanker, gepflanzt etwa']
2024-06-10 22:10:46,537 -----------------------------------------------------------------------------------------
2024-06-10 22:10:46,538 | end of split 113 /113 | epoch 2 | time: 232.14s | valid loss 7.0906 | valid ppl 1200.6055 | learning rate 0.0781
2024-06-10 22:10:46,538 -----------------------------------------------------------------------------------------
2024-06-10 22:10:46,538 232 seconds for train split 113
2024-06-10 22:10:46,846 Epoch time: 26260.23
2024-06-10 22:10:52,959 TEST: valid loss 7.0908 | valid ppl 1200.8965
2024-06-10 22:10:52,959 -----------------------------------------------------------------------------------------
Caveats
Notice: this model integration is heavily under development. And in the process of finding good hyper-parameters. Also downstream experiments are coming very soon.