Collections
Discover the best community collections!
Collections trending this week
-
unsloth/llama-3-8b-Instruct-bnb-4bit
Text Generation • Updated • 218k • 69 -
unsloth/mistral-7b-instruct-v0.2-bnb-4bit
Text Generation • Updated • 142k • 26 -
unsloth/llama-3-70b-Instruct-bnb-4bit
Text Generation • Updated • 16.8k • 28 -
unsloth/gemma-7b-it-bnb-4bit
Text Generation • Updated • 6.82k • 12
-
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Paper • 1701.06538 • Published • 4 -
Sparse Networks from Scratch: Faster Training without Losing Performance
Paper • 1907.04840 • Published • 3 -
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
Paper • 1910.02054 • Published • 3 -
A Mixture of h-1 Heads is Better than h Heads
Paper • 2005.06537 • Published • 2
-
NousResearch/Hermes-2-Theta-Llama-3-8B
Text Generation • Updated • 915 • 61 -
NousResearch/Hermes-2-Pro-Llama-3-8B
Text Generation • Updated • 22.7k • 324 -
NousResearch/Hermes-2-Pro-Mistral-7B
Text Generation • Updated • 37.2k • 458 -
NousResearch/Hermes-2-Pro-Mistral-7B-GGUF
Updated • 36.6k • 208
-
yentinglin/Llama-3-Taiwan-70B-Instruct-rc2
Text Generation • Updated • 6 -
yentinglin/Llama-3-Taiwan-70B-Instruct-rc1
Text Generation • Updated • 26 • 2 -
yentinglin/Llama-3-Taiwan-8B-Instruct-rc1
Text Generation • Updated • 11 • 4 -
Measuring Taiwanese Mandarin Language Understanding
Paper • 2403.20180 • Published • 3
-
MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.1
Text Generation • Updated • 1.49k • 8 -
MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.2
Text Generation • Updated • 987 • 2 -
MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.3
Text Generation • Updated • 1.38k • 2 -
MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4
Text Generation • Updated • 1.41k • 9