Stefan Schweter's picture

Stefan Schweter PRO

stefan-it

·

AI & ML interests

Flair Library 💕, NER & PoS Tagging, LM Pretraining (mostly encoder-only & encoder-decoder), Historical Language Models

Recent Activity

reacted to jsulz's post with 🔥 about 9 hours ago

Huge week for https://huggingface.co/xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure. Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models. We expect builders on the Hub to see even more improvements, helping power innovation across the community. With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet. Thanks to the https://huggingface.co/meta-llama team for launching on Xet!

upvoted a paper about 19 hours ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

upvoted a paper 4 days ago

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

View all activity

Organizations

stefan-it's activity

upvoted a paper about 19 hours ago

Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models

Paper • 2504.03624 • Published 3 days ago • 2

upvoted 3 papers 4 days ago

Boundless Byte Pair Encoding: Breaking the Pre-tokenization Barrier

Paper • 2504.00178 • Published 7 days ago • 1

Multimodal LLMs for OCR, OCR Post-Correction, and Named Entity Recognition in Historical Documents

Paper • 2504.00414 • Published 7 days ago • 1

Overcoming Vocabulary Constraints with Pixel-level Fallback

Paper • 2504.02122 • Published 5 days ago • 1

upvoted a collection 12 days ago

E3C-Projected

This collection contains the projected datasets of English layer one of e3c into Greek, Italian, Polish, Slovak, and Slovenian • 11 items • Updated Jan 8 • 1

upvoted a paper 14 days ago

State Fourier Diffusion Language Model (SFDLM): A Scalable, Novel Iterative Approach to Language Modeling

Paper • 2503.17382 • Published 23 days ago • 1

upvoted a paper 15 days ago

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Paper • 2211.05100 • Published Nov 9, 2022 • 31

upvoted an article 20 days ago

Article

Xet is on the Hub

21 days ago

• 44

upvoted 3 papers 21 days ago

xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference

Paper • 2503.13427 • Published 21 days ago • 3

UniBERTs: Adversarial Training for Language-Universal Representations

Paper • 2503.12608 • Published 22 days ago • 1

SuperBPE: Space Travel for Language Models

Paper • 2503.13423 • Published 21 days ago • 9

upvoted a paper 22 days ago

Do Construction Distributions Shape Formal Language Learning In German BabyLMs?

Paper • 2503.11593 • Published 24 days ago • 1

upvoted 2 papers 25 days ago

HyperZcdotZcdotW Operator Connects Slow-Fast Networks for Full Context Interaction

Paper • 2401.17948 • Published Jan 31, 2024 • 4

Transformers without Normalization

Paper • 2503.10622 • Published 25 days ago • 153

upvoted a paper 27 days ago

Modern Models, Medieval Texts: A POS Tagging Study of Old Occitan

Paper • 2503.07827 • Published 28 days ago • 1

upvoted a paper about 1 month ago

Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models

Paper • 2502.14901 • Published Feb 18 • 2

upvoted a paper about 2 months ago

NER4all or Context is All You Need: Using LLMs for low-effort, high-performance NER on historical texts. A humanities informed approach

Paper • 2502.04351 • Published Feb 4 • 1

upvoted 3 papers 2 months ago

AI-assisted German Employment Contract Review: A Benchmark Dataset

Paper • 2501.17194 • Published Jan 27 • 1

Familiarity: Better Evaluation of Zero-Shot Named Entity Recognition by Quantifying Label Shifts in Synthetic Training Data

Paper • 2412.10121 • Published Dec 13, 2024 • 2

MorphBPE: A Morpho-Aware Tokenizer Bridging Linguistic Complexity for Efficient LLM Training Across Morphologies

Paper • 2502.00894 • Published Feb 2 • 2