RachidAR's picture

RachidAR

RachidAR

·

RachidARx

AI & ML interests

1.58 bit LLM

Recent Activity

liked a model about 7 hours ago

nitrosocke/Ghibli-Diffusion

liked a model 1 day ago

huihui-ai/Qwen2.5-14B-Instruct-1M-abliterated

liked a model 9 days ago

soob3123/amoral-gemma3-12B-v2

View all activity

Organizations

RachidAR's activity

upvoted a paper 12 days ago

ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization

Paper • 2502.02631 • Published Feb 4 • 3

upvoted a collection 24 days ago

Gemma 3 Release

17 items • Updated 2 days ago • 310

upvoted 5 papers about 2 months ago

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Paper • 2502.11089 • Published Feb 16 • 150

Titans: Learning to Memorize at Test Time

Paper • 2501.00663 • Published Dec 31, 2024 • 22

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 98

BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 67

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 148

upvoted an article about 2 months ago

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

• 1.21k

upvoted a paper 2 months ago

Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Paper • 2501.18585 • Published Jan 30 • 59

upvoted a collection 5 months ago

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 111

upvoted 2 papers 6 months ago

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 176

Addition is All You Need for Energy-efficient Language Models

Paper • 2410.00907 • Published Oct 1, 2024 • 149

upvoted 2 collections 7 months ago

Moshi v0.1 Release

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18, 2024 • 226

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 46 items • Updated Feb 26 • 580

upvoted a paper 7 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 79

upvoted 3 papers 10 months ago

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5, 2024 • 65

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 67

You Only Cache Once: Decoder-Decoder Architectures for Language Models

Paper • 2405.05254 • Published May 8, 2024 • 10

upvoted 2 papers 11 months ago

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

Paper • 2405.12981 • Published May 21, 2024 • 32

TerDiT: Ternary Diffusion Models with Transformers

Paper • 2405.14854 • Published May 23, 2024 • 2