FilipV

PheelaV

AI & ML interests

SW&DS&AI

Recent Activity

upvoted an article 17 days ago

Accelerating Language Model Inference with Mixture of Attentions

upvoted an article 20 days ago

Deriving DPO's Loss

liked a Space 5 months ago

HuggingFaceFW/blogpost-fineweb-v1

View all activity

Organizations

None yet

PheelaV's activity

upvoted an article 17 days ago

Article

Accelerating Language Model Inference with Mixture of Attentions

•

17 days ago

• 24

upvoted an article 20 days ago

Article

Deriving DPO's Loss

•

Dec 24, 2024

• 26

liked a Space 5 months ago

Running

558

🍷

FineWeb: decanting the web for the finest text data at scale

replied to lbourdois's post 9 months ago

Brilliant stuff. Personally I would love to see the discretization and latest A initialization digested. Looking forward to the upcoming posts!

upvoted an article 10 months ago

Article

Introduction to State Space Models (SSM)

•

Jul 19, 2024

• 104

reacted to lbourdois's post with ❤️ 10 months ago

Post

3577

I stopped procrastinating and finally took the time to write the second article of my series of blog posts on SSM: https://huggingface.co/blog/lbourdois/ssm-2022.
In this blog post, I review the history of SSM models released in 2022, with over 14 models discussed in a synthetic format.
They are separated into two parts: "theoretical" (DSS, S4D, GSS, Mega, S5, etc.) and "applications" (Sashimi, ViS4mer, CCNN, etc.).

To understand everything, it's best to have read the introduction to S4 to SSM blog post first: https://huggingface.co/blog/lbourdois/get-on-the-ssm-train.
All the articles in the series are listed in this space: lbourdois/SSM_blog_posts

Wishing you a good reading :)

2 replies

replied to ArthurZ's post 11 months ago

2.8b model is that the Pile or SlimPJ trained one?