I trained a Language Model to schedule events with GRPO!
By
•
•
52Bamba-9B-v2 - Fast and powerful!
By
and 12 others
•
•
26Introducing HalluMix: A Task-Agnostic, Multi-Domain Benchmark for Detecting Hallucinations in Real-World Scenarios
By
and 3 others
•
•
18Creating your custom Ghibli Text-to-Image model
By
and 3 others
•
•
15Uncensor any LLM with abliteration
By
•
•
543🦸🏻#14: What Is MCP, and Why Is Everyone – Suddenly!– Talking About It?
By
•
•
231DeepWiki: Best AI Documentation Generator for Any Github Repo
By
•
•
13Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time
By
and 4 others
•
•
32Introduction to State Space Models (SSM)
By
•
•
127ColPali: Efficient Document Retrieval with Vision Language Models 👀
By
•
•
244Building Multimodal RAG Systems: Supercharging Retrieval with MultiModal Embeddings and LLMs
By
•
•
6DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge
By
•
•
125A Guide to Running Qwen 3 Locally with Ollama and vLLM
By
•
•
6Code a simple RAG from scratch
By
•
•
64Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment
By
•
•
27PipelineRL
By
and 3 others
•
•
17Merge Large Language Models with mergekit
By
•
•
115Efficient LLM Pretraining: Packed Sequences and Masked Attention
By
•
•
38What is test-time compute and how to scale it?
By
and 1 other
•
•
82Open R1: Update #3
By
and 9 others
•
•
290