444 69 884

Peter Szemraj PRO

pszemraj

https://pszemraj.carrd.co/

pszemraj

AI & ML interests

metallic intuition

Recent Activity

liked a model about 16 hours ago

avsolatorio/GIST-small-Embedding-v0

liked a model 1 day ago

dunzhang/stella_en_400M_v5

reacted to MoritzLaurer's post with 👍 2 days ago

Quite excited by the ModernBERT release! 0.15/0.4B small, 2T modern pre-training data and tokenizer with code, 8k context window, great efficient model for embeddings & classification! This will probably be the basis for many future SOTA encoders! And I can finally stop using DeBERTav3 from 2021 :D Congrats @answerdotai, @LightOnIO and collaborators like @tomaarsen ! Paper and models here 👇https://huggingface.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb

View all activity

Organizations

pszemraj's activity

upvoted 4 papers 3 days ago

How to Synthesize Text Data without Model Collapse?

Paper • 2412.14689 • Published 3 days ago • 34

LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks

Paper • 2412.15204 • Published 3 days ago • 27

Qwen2.5 Technical Report

Paper • 2412.15115 • Published 3 days ago • 289

Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference

Paper • 2412.13663 • Published 4 days ago • 93

upvoted a collection 3 days ago

ModernBERT

Collection

Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated 3 days ago • 78

upvoted 3 papers 3 days ago

Byte Latent Transformer: Patches Scale Better Than Tokens

Paper • 2412.09871 • Published 10 days ago • 69

OmniPred: Language Models as Universal Regressors

Paper • 2402.14547 • Published Feb 22 • 12

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples

Paper • 2404.07544 • Published Apr 11 • 19

upvoted 5 papers 11 days ago

Open-Sora Plan: Open-Source Large Video Generation Model

Paper • 2412.00131 • Published 24 days ago • 32

Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability

Paper • 2411.19943 • Published 23 days ago • 55

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation

Paper • 2412.02592 • Published 19 days ago • 20

Evaluating Language Models as Synthetic Data Generators

Paper • 2412.03679 • Published 18 days ago • 43

Structured 3D Latents for Scalable and Versatile 3D Generation

Paper • 2412.01506 • Published 20 days ago • 42

upvoted 4 papers about 1 month ago

BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices

Paper • 2411.10640 • Published Nov 16 • 44

upvoted a collection about 2 months ago

SmolLM2

Collection

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 15 items • Updated 20 days ago • 195

upvoted 2 papers about 2 months ago

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction

Paper • 2410.21169 • Published Oct 28 • 30

A Survey of Small Language Models

Paper • 2410.20011 • Published Oct 25 • 40