6 21 51

t1u1

AI & ML interests

None yet

Recent Activity

liked a model 2 months ago

BlinkDL/rwkv7-g1

liked a model 3 months ago

katanemo/Arch-Function-3B

liked a model 3 months ago

nomic-ai/nomic-embed-text-v2-moe

View all activity

Organizations

None yet

t1u1's activity

liked a model 2 months ago

BlinkDL/rwkv7-g1

Text Generation • Updated 3 days ago • 86

liked 3 models 3 months ago

katanemo/Arch-Function-3B

Text Generation • Updated Feb 5 • 106 • 118

nomic-ai/nomic-embed-text-v2-moe

bartowski/simplescaling_s1.1-32B-GGUF

Text Generation • Updated Feb 11 • 430 • 5

upvoted 2 papers 3 months ago

Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach

Paper • 2502.05171 • Published Feb 7 • 140

Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling

Paper • 2502.06703 • Published Feb 10 • 152

reacted to schuler's post with 👍 3 months ago

Post

7241

📢 New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

🔑 Key Findings:
• 77% parameter reduction.
• Maintained model capabilities.
• Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm

2 replies

liked a model 3 months ago

agentica-org/DeepScaleR-1.5B-Preview

Text Generation • Updated Apr 9 • 84.6k • 553

upvoted a collection 3 months ago

AceCoder

Collection

17 items • Updated 7 days ago • 6

upvoted a paper 4 months ago

ACECODER: Acing Coder RL via Automated Test-Case Synthesis

Paper • 2502.01718 • Published Feb 3 • 29

liked 2 models 4 months ago

open-thoughts/OpenThinker-7B

Text Generation • Updated Feb 11 • 2.93k • • 132

unsloth/DeepSeek-R1-GGUF

Text Generation • Updated about 7 hours ago • 111k • 1.07k

reacted to mitkox's post with 👍 4 months ago

Post

3025

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies

reacted to lewtun's post with 🚀 4 months ago

Post

10386

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1