Steffen Röcker's picture

Steffen Röcker PRO

sroecker

·

https://x.com/sroecker

AI & ML interests

Local models

Recent Activity

updated a model about 17 hours ago

sroecker/Qwen2.5-0.5B-Instruct-FP8-Dynamic

published a model about 18 hours ago

sroecker/Qwen2.5-0.5B-Instruct-FP8-Dynamic

upvoted an article 1 day ago

Introducing EuroBERT: A High-Performance Multilingual Encoder Model

View all activity

Organizations

sroecker's activity

upvoted an article 1 day ago

Article

Introducing EuroBERT: A High-Performance Multilingual Encoder Model

By

and 3 others •

1 day ago

• 88

upvoted a collection 6 days ago

Q-Filters

Pre-computed Q-Filters for efficient KV cache compression. • 15 items • Updated 8 days ago • 6

upvoted a collection 11 days ago

Granite 3.2 Language Models

3 items • Updated 13 days ago • 14

upvoted a collection 12 days ago

DeepSeek-R1-Distill Quantized

18 items • Updated Feb 7 • 12

upvoted a collection 14 days ago

olmOCR

olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 3 items • Updated 12 days ago • 91

upvoted a collection 17 days ago

SigLIP 2

OpenCLIP and timm SigLIP 2 models • 45 items • Updated 17 days ago • 11

upvoted a collection 19 days ago

ModernGLiClass

GLiClass with ModernBERT backbone • 4 items • Updated 4 days ago • 8

upvoted an article 26 days ago

Article

DABStep: Data Agent Benchmark for Multi-step Reasoning

Feb 4

• 61

upvoted a paper about 1 month ago

On Teacher Hacking in Language Model Distillation

Paper • 2502.02671 • Published Feb 4 • 18

upvoted a collection about 1 month ago

EuroLLM

4 items • Updated 18 days ago • 30

upvoted a paper about 1 month ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 108

upvoted an article about 1 month ago

Article

Replicating DeepSeek R1 for Information Extraction

By

•

Jan 31

• 38

upvoted a collection about 1 month ago

R1 Multilingual

5 items • Updated Jan 31 • 10

upvoted a paper about 1 month ago

WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training

Paper • 2501.18511 • Published Jan 30 • 19

upvoted a collection about 1 month ago

Tulu 3 Models

All models released with Tulu 3 -- state of the art open post-training recipes. • 11 items • Updated 27 days ago • 93

upvoted an article about 1 month ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28

• 800

upvoted 2 collections about 1 month ago

Quantized DeepSeek R1 Distill

3 items • Updated Jan 22 • 3

DeepSeek-R1-abliterated

7 items • Updated Jan 31 • 93

upvoted 2 collections about 2 months ago

Language Detection

StaticVectors models to detect language. Exports of FastText that run in NumPy without needing FastText • 2 items • Updated Jan 26 • 3

DeepSeek R1 AWQ

7 items • Updated Jan 22 • 5