nomic-ai/colnomic-embed-multimodal-7b Visual Document Retrieval โข Updated 6 days ago โข 5.84k โข 39
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 โข 11 items โข Updated 22 days ago โข 446
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths โข 3 items โข Updated Feb 26 โข 117
mitkox/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0-Q4_K_M-GGUF Text Generation โข Updated Jan 24 โข 8 โข 1
mitkox/DeepSeek-R1-ReDistill-Qwen-1.5B-v1.0-Q4_K_M-GGUF Text Generation โข Updated Jan 24 โข 8 โข 1
view post Post 2815 llama.cpp is 26.8% faster than ollama. I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison. Total duration: llama.cpp 6.85 sec <- 26.8% fasterollama 8.69 secBreakdown by phase:Model loadingllama.cpp 241 ms <- 2x fasterollama 553 msPrompt processingllama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x fasterollama 42.17 tokens/s with an eval time of 498 msToken generationllama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% fasterollama 122.07 tokens/s with an eval time 7.64 secllama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing. Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it. See translation 7 replies ยท ๐ 15 15 ๐ 7 7 + Reply
view post Post 591 Stargate to the west of meDeepSeek to the eastHere I amStuck in the middle with the EUIt will likely be a matter of sparkle to get export control on frontier research and models on both sides, leaving us in a vacuum. Decentralized training infrastructure and on device inferencing are the future. See translation ๐ 1 1 + Reply