Sanskrit Sahitya Embeddings Embeddings of translated Sanskrit Texts Collection by Mercity 2 days ago 2 Mercity/ramayana-embeddings Viewer • Updated 3 days ago • 18.8k • 34 Mercity/mahabharat-embeddings Viewer • Updated 3 days ago • 73.8k • 38 Mercity/bhagavad_gita-embeddings Viewer • Updated 2 days ago • 657 • 32
Kyro-n1.1 Kyro-n1.1 is an improved model in the Kyro family with better reasoning than n1. This model outperforms Kyro-n1 in all areas such as STEM: Open-Neo Collection by open-neo 1 day ago 2 open-neo/Kyro-n1.1-3B Text Generation • Updated 1 day ago • 33 • 2 open-neo/Kyro-n1.1-7B Text Generation • Updated 1 day ago • 2
TokenButler TokenButler -- Predict token importance for all heads across the transformer in the first layer itself. Enable fine-grained token sparsity! Collection by akhauriyash 3 days ago 2 akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler Text Generation • Updated about 21 hours ago • 32 akhauriyash/Llama-3.1-8B-Butler Text Generation • Updated about 21 hours ago • 27 akhauriyash/Llama-2-7b-hf-Butler Text Generation • Updated about 21 hours ago • 30 akhauriyash/Llama-3.2-3B-Butler Text Generation • Updated about 21 hours ago • 18
akhauriyash/DeepSeek-R1-Distill-Llama-8B-Butler Text Generation • Updated about 21 hours ago • 32
CardProjector-v2 Big update! Collection by AlexBefest 4 days ago 2 AlexBefest/CardProjector-14B-v2 Updated 4 days ago • 20 • 6 AlexBefest/CardProjector-7B-v2 Updated 4 days ago • 13 • 4 AlexBefest/CardProjector-14B-v2-GGUF Updated 4 days ago • 644 • 4 AlexBefest/CardProjector-7B-v2-GGUF Updated 4 days ago • 390
🤓Small-Thoughts Distill thinking dataset more compactly and accurately! Collection by SmallDoge 2 days ago 2 SmallDoge/SmallThoughts Viewer • Updated about 5 hours ago • 51k • 1.51k • 28
DiffCLIP Official models for DiffCLIP: Differential Attention Meets CLIP Collection by hammh0a 5 days ago 2 hammh0a/ViTB16_CC3M Updated 5 days ago hammh0a/ViTB16_CC12M Updated 5 days ago hammh0a/DiffCLIP_ViTB16_CC3M Updated 5 days ago hammh0a/DiffCLIP_ViTB16_CC12M Updated 5 days ago
Llama-3.3-Swallow Collection by tokyotech-llm 5 days ago 2 tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • Updated 5 days ago • 578 • 1 tokyotech-llm/Llama-3.3-Swallow-70B-v0.4 Text Generation • Updated 5 days ago • 59 • 2 tokyotech-llm/edu-classifier Text Classification • Updated Jan 30 • 1.68k • 10
tokyotech-llm/Llama-3.3-Swallow-70B-Instruct-v0.4 Text Generation • Updated 5 days ago • 578 • 1
RomanSetu Romansetu is a collection of models address the challenge of extending Large Language Models (LLMs) to non-English languages using non-Latin scripts Collection by ai4bharat 7 days ago 2 ai4bharat/romansetu-cpt-roman-100m Updated 7 days ago • 22 ai4bharat/romansetu-cpt-roman-200m Updated 7 days ago • 28 ai4bharat/romansetu-cpt-native-300m Updated 7 days ago • 14 ai4bharat/romansetu-cpt-native-400m Updated 7 days ago • 16
Instella ✨ Announcing Instella, a series of 3 billion parameter language models developed by AMD, trained from scratch on 128 Instinct MI300X GPUs. Collection by amd 9 days ago 5 amd/Instella-3B-Stage1 Text Generation • Updated 8 days ago • 166 • 12 amd/Instella-3B Text Generation • Updated 8 days ago • 729 • 31 amd/Instella-3B-SFT Text Generation • Updated 8 days ago • 167 • 8 amd/Instella-3B-Instruct Text Generation • Updated 8 days ago • 1.24k • 34