1 50 7

Swasti Sweker

Swekerr

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

Qwen2.5-VL Technical Report

upvoted an article 11 days ago

Introducing smolagents: simple agents that write actions in code.

updated a model 11 days ago

Swekerr/Odia-BPE

View all activity

Organizations

Swekerr's activity

upvoted a paper 6 days ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published 8 days ago • 146

upvoted an article 11 days ago

Article

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

• 779

upvoted a paper 14 days ago

TransMLA: Multi-head Latent Attention Is All You Need

Paper • 2502.07864 • Published 16 days ago • 44

upvoted an article 18 days ago

Article

Zero to Hero with the TRL learning link bomb 💣

•

Nov 25, 2024

• 5

upvoted an article 29 days ago

Article

Janus Pro: DeepSeek's Revolutionary Multimodal AI Model

•

about 1 month ago

• 31

upvoted an article 30 days ago

Article

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

•

Aug 25, 2023

• 28

upvoted 2 articles about 1 month ago

Article

Mastering Long Contexts in LLMs with KVPress

and 1 other •

Jan 23

• 63

Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

Jan 23

• 142

upvoted 2 papers about 1 month ago

Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models

Paper • 2501.11873 • Published Jan 21 • 63

FAST: Efficient Action Tokenization for Vision-Language-Action Models

Paper • 2501.09747 • Published Jan 16 • 23

upvoted 4 articles about 1 month ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

•

May 7, 2024

• 55

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

• 176

Article

Mixture of Experts Explained

Dec 11, 2023

• 410

Article

Training and Finetuning Embedding Models with Sentence Transformers v3

May 28, 2024

• 188

upvoted 2 papers about 1 month ago

PokerBench: Training Large Language Models to become Professional Poker Players

Paper • 2501.08328 • Published Jan 14 • 17

MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 273

upvoted an article about 1 month ago

Article

Mastering Tensor Dimensions in Transformers

•

Jan 12

• 44

upvoted 2 papers about 1 month ago

The Lessons of Developing Process Reward Models in Mathematical Reasoning

Paper • 2501.07301 • Published Jan 13 • 92

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84

upvoted a paper about 2 months ago

LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token

Paper • 2501.03895 • Published Jan 7 • 50