TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity!
YASH AKHAURI
akhauriyash
AI & ML interests
None yet
Recent Activity
new activity
about 15 hours ago
akhauriyash/Llama-3.2-1B-Butler:Enable the predictor masking by default
new activity
about 15 hours ago
akhauriyash/Llama-2-7b-hf-Butler:Enable the predictor masking by default
new activity
about 15 hours ago
akhauriyash/Llama-3.1-8B-Butler:Enable the predictor masking by default
Organizations
None yet
Collections
1
models
28
akhauriyash/Llama-3.2-1B-Butler
Text Generation
•
Updated
•
23
akhauriyash/Llama-2-7b-hf-Butler
Text Generation
•
Updated
•
3
akhauriyash/Llama-3.1-8B-Butler
Text Generation
•
Updated
•
3
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SplitReasoner
Text Generation
•
Updated
•
48
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SpeculativeReasoner_Mini
Text Generation
•
Updated
•
40
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-GRPO-SpeculativeReasoner
Text Generation
•
Updated
•
480
•
1
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpeculativeReasoner
Text Generation
•
Updated
•
122
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT_GRPO_INDUCETEST
Updated
•
5
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SpecReason_SFT_GRPO_14k
Updated
•
3
akhauriyash/DeepSeek-R1-Distill-Qwen-1.5B-SelfCompress_SFT
Text Generation
•
Updated
•
51