4 319 16

Fynn Kröger

fynnkroeger

AI & ML interests

None yet

Recent Activity

upvoted a paper 13 days ago

DDT: Decoupled Diffusion Transformer

upvoted a paper 29 days ago

Defeating Prompt Injections by Design

reacted to cogwheelhead's post with 👍 about 2 months ago

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models) Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: https://huggingface.co/spaces/toloka/u-math-leaderboard) tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability: * performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025) * R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests) * same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving) * R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops) Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

View all activity

Organizations

None yet

fynnkroeger's activity

upvoted a paper 13 days ago

DDT: Decoupled Diffusion Transformer

Paper • 2504.05741 • Published 15 days ago • 73

upvoted a paper 29 days ago

Defeating Prompt Injections by Design

Paper • 2503.18813 • Published 30 days ago • 19

reacted to cogwheelhead's post with 👍 about 2 months ago

Post

2520

Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here

reacted to lysandre's post with ❤️ about 2 months ago

Post

6374

SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.

1 reply

upvoted a paper about 2 months ago

SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features

Paper • 2502.14786 • Published Feb 20 • 143

liked a Space 4 months ago

453

2024 AI Timeline

📈

View and filter AI model releases in 2024

liked a model 4 months ago

jinaai/jina-clip-v2

Feature Extraction • Updated 13 days ago • 53.8k • 215

upvoted 2 papers 5 months ago

Multimodal Autoregressive Pre-training of Large Vision Encoders

Paper • 2411.14402 • Published Nov 21, 2024 • 47

Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 31

upvoted 2 papers 6 months ago

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Paper • 2410.17243 • Published Oct 22, 2024 • 94

Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation

Paper • 2410.13848 • Published Oct 17, 2024 • 35

liked a model 6 months ago

fal/AuraEquiVAE

Updated Oct 13, 2024 • 19

liked a Space 6 months ago

UncheatableEval

🏆

Explore LLM leaderboard with real-time data

liked a Space 7 months ago

357

GOT Online

💬

Extract text from images using various OCR modes

upvoted 3 papers 7 months ago

HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

Paper • 2409.16191 • Published Sep 24, 2024 • 43

MaskBit: Embedding-free Image Generation via Bit Tokens

Paper • 2409.16211 • Published Sep 24, 2024 • 17

Kolmogorov-Arnold Transformer

Paper • 2409.10594 • Published Sep 16, 2024 • 46

reacted to davidberenstein1957's post with 🤗 7 months ago

Post

1344

Distilabel and synthetic data community interviews - the outcomes

We've been doing some interview with community members to understand the needs surrounding synthetic data. Many thanks to the participants. Note that, given they interviewees were sourced from our community, so the results will likely represent that.

Things distilabel does well
- security and reliability by caching generations and having serializable pipelines.
- scaling up generation by parallelising inference and Anyscale Ray
- solid implementations of state of the art research papers

Things to improve
- communication about the fact we support structured generation
- customization of existing prompt implementations are difficult
- creation of new tasks prove difficult
- arguments and parameters for tasks aren't available at first glance
- the learning curve can be steep
- more tutorials that represent real-life usage

Things to note
- create small scale and large scale dataset to Millions of records
- people use synthetic data to move away from frontier model providers
- people mostly use 7B or 70B models for generating

Participate here: https://github.com/argilla-io/distilabel/issues

reacted to TuringsSolutions's post with 😔 8 months ago

Post

1336

I can solve the Traveling Salesman Problem using the same methods the scientists used to solve it with 1 qubit, except I do not need quantum computers to do it. I am kind of tired of screaming this from the rooftops at this point. I can create an imaginary probability space, then I can put a bunch of imaginary agents in the imaginary box, and solve real problems in seconds. Problems that would take minutes, hours, or years to solve via other algorithms. Here is a demo of me solving the Traveling Salesman problem using 50 agents to probabilistically sample at once: https://colab.research.google.com/drive/1XplG72nQDO_-2h4DUllERLp0Dr2pI2J2?usp=sharing

28 replies

upvoted a paper 8 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 80