Stefan PRO
stefan-it
AI & ML interests
Flair Library, NER & PoS Tagging, LM Pretraining (mostly encoder-only), Historical Language Models
Articles
Organizations
stefan-it's activity
upvoted
a
paper
7 days ago
upvoted
a
paper
11 days ago
upvoted
a
paper
12 days ago
upvoted
a
paper
18 days ago
upvoted
a
paper
20 days ago
upvoted
a
paper
21 days ago
upvoted
a
paper
28 days ago
Fundus: A Simple-to-Use News Scraper Optimized for High Quality Extractions
Paper
•
2403.15279
•
Published
•
1
CO-Fun: A German Dataset on Company Outsourcing in Fund Prospectuses for Named Entity Recognition and Relation Extraction
Paper
•
2403.15322
•
Published
•
1
MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank
Paper
•
2403.10293
•
Published
•
1
upvoted
a
paper
2 months ago
upvoted
a
collection
2 months ago
SemRel2024: A Collection of Semantic Textual Relatedness Datasets for 14 Languages
Paper
•
2402.08638
•
Published
•
1
Pixel Sentence Representation Learning
Paper
•
2402.08183
•
Published
•
2
Fractal Patterns May Unravel the Intelligence in Next-Token Prediction
Paper
•
2402.01825
•
Published
•
2
Fine-tuning Transformer-based Encoder for Turkish Language Understanding Tasks
Paper
•
2401.17396
•
Published
•
1
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper
•
2401.17072
•
Published
•
22
ToPro: Token-Level Prompt Decomposition for Cross-Lingual Sequence Labeling Tasks
Paper
•
2401.16589
•
Published
•
1
DrBERT: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining
Paper
•
2401.15861
•
Published
•
1
Where's the Point? Self-Supervised Multilingual Punctuation-Agnostic Sentence Segmentation
Paper
•
2305.18893
•
Published
•
2
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and Generation
Paper
•
2401.14373
•
Published
•
10
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Paper
•
2401.13160
•
Published
•
9
LangBridge: Multilingual Reasoning Without Multilingual Supervision
Paper
•
2401.10695
•
Published
•
4
Headless Language Models: Learning without Predicting with Contrastive Weight Tying
Paper
•
2309.08351
•
Published
•
3
Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions
Paper
•
2207.14251
•
Published
•
1
Cross-lingual Editing in Multilingual Language Models
Paper
•
2401.10521
•
Published
•
1
Mission: Impossible Language Models
Paper
•
2401.06416
•
Published
•
3
RoBERTurk: Adjusting RoBERTa for Turkish
Paper
•
2401.03515
•
Published
•
1
PIXAR: Auto-Regressive Language Modeling in Pixel Space
Paper
•
2401.03321
•
Published
•
1
German Text Embedding Clustering Benchmark
Paper
•
2401.02709
•
Published
•
5
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining
Paper
•
2312.17482
•
Published
•
1
Observable Propagation: A Data-Efficient Approach to Uncover Feature Vectors in Transformers
Paper
•
2312.16291
•
Published
•
1
Language Resources for Dutch Large Language Modelling
Paper
•
2312.12852
•
Published
•
9
WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models
Paper
•
2112.06598
•
Published
•
1
PromptBench: A Unified Library for Evaluation of Large Language Models
Paper
•
2312.07910
•
Published
•
14
On Meta-Prompting
Paper
•
2312.06562
•
Published
•
1
Aligner: One Global Token is Worth Millions of Parameters When Aligning Large Language Models
Paper
•
2312.05503
•
Published
•
1
Gated Linear Attention Transformers with Hardware-Efficient Training
Paper
•
2312.06635
•
Published
•
3
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
Paper
•
2312.04032
•
Published
•
1
Advancing State of the Art in Language Modeling
Paper
•
2312.03735
•
Published
•
1
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM
Paper
•
2312.03788
•
Published
•
1
Monarch: Expressive Structured Matrices for Efficient and Accurate Training
Paper
•
2204.00595
•
Published
•
1
NERetrieve: Dataset for Next Generation Named Entity Recognition and Retrieval
Paper
•
2310.14282
•
Published
•
5
Larger-Scale Transformers for Multilingual Masked Language Modeling
Paper
•
2105.00572
•
Published
•
1
SeaLLMs -- Large Language Models for Southeast Asia
Paper
•
2312.00738
•
Published
•
23
Instruction-tuning Aligns LLMs to the Human Brain
Paper
•
2312.00575
•
Published
•
10
CoLLiE: Collaborative Training of Large Language Models in an Efficient Way
Paper
•
2312.00407
•
Published
•
2
Nonparametric Variational Regularisation of Pretrained Transformers
Paper
•
2312.00662
•
Published
•
1
Mark My Words: Analyzing and Evaluating Language Model Watermarks
Paper
•
2312.00273
•
Published
•
3
An Efficient Multilingual Language Model Compression through Vocabulary Trimming
Paper
•
2305.15020
•
Published
•
1
Breaking the Language Barrier: Improving Cross-Lingual Reasoning with Structured Self-Attention
Paper
•
2310.15258
•
Published
•
2
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Paper
•
2311.16863
•
Published
•
6
RETVec: Resilient and Efficient Text Vectorizer
Paper
•
2302.09207
•
Published
•
1