Dmitry Ryumin's picture

Dmitry Ryumin

DmitryRyumin

·

https://dmitryryumin.github.io

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Recent Activity

liked a dataset about 9 hours ago

faridlab/deepspeak_v2

upvoted a paper 1 day ago

Multi-Token Attention

reacted to AdinaY's post with 🔥 3 days ago

AReal-Boba 🔥 a fully open RL Frameworks released by AntGroup, an affiliate company of Alibaba. https://huggingface.co/collections/inclusionAI/areal-boba-67e9f3fa5aeb74b76dcf5f0a ✨ 7B/32B - Apache2.0 ✨ Outperform on math reasoning ✨ Replicating QwQ-32B with 200 data under $200 ✨ All-in-one: weights, datasets, code & tech report

View all activity

Organizations

DmitryRyumin's activity

upvoted a paper 1 day ago

Multi-Token Attention

Paper • 2504.00927 • Published 2 days ago • 22

upvoted a collection 9 days ago

MambaVision

MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 13 items • Updated 8 days ago • 31

upvoted 2 papers 10 days ago

TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting

Paper • 2503.17032 • Published 13 days ago • 22

MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving

Paper • 2503.16905 • Published 13 days ago • 52

upvoted a paper 20 days ago

Transformers without Normalization

Paper • 2503.10622 • Published 21 days ago • 145

upvoted a paper 23 days ago

Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders

Paper • 2503.03601 • Published 29 days ago • 221

upvoted a collection about 1 month ago

C2SER

3 items • Updated Feb 25 • 2

upvoted a paper about 1 month ago

SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published Feb 20 • 97

upvoted a collection about 2 months ago

DeepSeek R1 (All Versions)

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 3 days ago • 215

upvoted a paper about 2 months ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 87

upvoted a collection 4 months ago

KaLM-embedding

11 items • Updated 23 days ago • 24

upvoted 5 papers 6 months ago

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 26

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 175

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Paper • 2410.01036 • Published Oct 1, 2024 • 15

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

Paper • 2408.06019 • Published Aug 12, 2024 • 15

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26, 2024 • 33

upvoted a collection 6 months ago

Llama 3.2

Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 3 days ago • 59

upvoted 3 articles 6 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 228

Article

Exploring the Daily Papers Page on Hugging Face

Sep 23, 2024

• 54

Article

XetHub is joining Hugging Face!

Aug 8, 2024

• 86