readme: mention potential bug in pretraining (truncated Wikipedia articles are used)
Browse files
README.md
CHANGED
@@ -18,6 +18,9 @@ Initially, we integrated xLSTM model training into Flair - for more information
|
|
18 |
|
19 |
# Changelog
|
20 |
|
|
|
|
|
|
|
21 |
- 29.08.2024: Uploaded re-trained model for 1 epoch over complete German Wikipedia corpus. Training was done with gradient clipping (0.25).
|
22 |
- 28.08.2024: Model training is now done with [Helibrunna](https://github.com/AI-Guru/helibrunna) fork - find it [here](https://github.com/HallerPatrick/helibrunna).
|
23 |
- 10.06.2024: Initial version. xLSTM was trained with Flair library, see this [old](https://huggingface.co/stefan-it/xlstm-german-wikipedia/blob/flair-old/README.md) branch.
|
|
|
18 |
|
19 |
# Changelog
|
20 |
|
21 |
+
- 06.09.2024: We discovered a (potential) bug in pretraining code: when using the complete Wikipedia corpus, unfortunately only the first 512 subtoken of each article are used.
|
22 |
+
- We implement a grouping-based approach that tokenizes the whole corpus and groups the corpus into 512 subtoken chunks.
|
23 |
+
- Pretraining with this new approach is currently running.
|
24 |
- 29.08.2024: Uploaded re-trained model for 1 epoch over complete German Wikipedia corpus. Training was done with gradient clipping (0.25).
|
25 |
- 28.08.2024: Model training is now done with [Helibrunna](https://github.com/AI-Guru/helibrunna) fork - find it [here](https://github.com/HallerPatrick/helibrunna).
|
26 |
- 10.06.2024: Initial version. xLSTM was trained with Flair library, see this [old](https://huggingface.co/stefan-it/xlstm-german-wikipedia/blob/flair-old/README.md) branch.
|