codelion (Asankhaya Sharma)

reacted to their post with ➕🚀🔥 4 days ago

Post

2199

🚀 Just published: "OpenEvolve: Open-Source Evolutionary Code Optimization with Real-World GPU Kernel Discovery"

We built the first open-source implementation of Google's AlphaEvolve system and used it to automatically discover GPU kernel optimizations that outperform human engineers!

Key results:

- 21.8% average decode speed improvement on Apple Silicon
- 36.7% improvement on long-context transformer attention
- Discovered novel vectorization patterns and 2-pass softmax algorithm

The system evolved a Metal kernel for Qwen3's Grouped Query Attention from a basic 3-pass implementation into something with sophisticated Apple Silicon optimizations that would take experts months to discover manually. The evolved kernel automatically found the optimal vec<T,8> operations for 128-dim attention heads and fused softmax computation with value accumulation.

Really excited about the potential here - imagine evolutionary algorithms automatically discovering optimizations across all our AI infrastructure. What would you want to optimize with this approach?

Full write-up: https://huggingface.co/blog/codelion/openevolve-gpu-kernel-discovery

GitHub: https://github.com/codelion/openevolve

#AI #MachineLearning #GPU #OpenSource #Evolution #CodeOptimization #TransformerOptimization

1 reply

·

posted an update 4 days ago

Post

2199

🚀 Just published: "OpenEvolve: Open-Source Evolutionary Code Optimization with Real-World GPU Kernel Discovery"

We built the first open-source implementation of Google's AlphaEvolve system and used it to automatically discover GPU kernel optimizations that outperform human engineers!

Key results:

- 21.8% average decode speed improvement on Apple Silicon
- 36.7% improvement on long-context transformer attention
- Discovered novel vectorization patterns and 2-pass softmax algorithm

The system evolved a Metal kernel for Qwen3's Grouped Query Attention from a basic 3-pass implementation into something with sophisticated Apple Silicon optimizations that would take experts months to discover manually. The evolved kernel automatically found the optimal vec<T,8> operations for 128-dim attention heads and fused softmax computation with value accumulation.

Really excited about the potential here - imagine evolutionary algorithms automatically discovering optimizations across all our AI infrastructure. What would you want to optimize with this approach?

Full write-up: https://huggingface.co/blog/codelion/openevolve-gpu-kernel-discovery

GitHub: https://github.com/codelion/openevolve

#AI #MachineLearning #GPU #OpenSource #Evolution #CodeOptimization #TransformerOptimization

1 reply

·

reacted to their post with 🤗🚀🔥 12 days ago

Post

2518

Adaptive Classifier: Dynamic Text Classification with Strategic Learning

New text classification system that learns continuously without catastrophic forgetting. Achieved 22.2% robustness improvement on adversarial datasets while maintaining clean data performance.

🎯 THE PROBLEM
Traditional classifiers require complete retraining when adding new classes. Expensive and time-consuming, especially with adversarial users trying to game the system.

🚀 KEY INNOVATIONS
• Hybrid memory-neural architecture (prototype-based + neural adaptation)
• Strategic classification using game theory to predict and defend against manipulation
• Elastic Weight Consolidation prevents catastrophic forgetting

📊 RESULTS
Tested on AI-Secure/adv_glue dataset:
• Clean data: 80.0% → 82.2% (+2.2%)
• Manipulated data: 60.0% → 82.2% (+22.2%)
• Zero performance drop under adversarial attacks

🔬 APPLICATIONS
• Hallucination detection: 80.7% recall for RAG safety
• LLM routing: 26.6% cost optimization improvement
• Content moderation: Robust against gaming attempts

⚙️ USAGE
pip install adaptive-classifier

from adaptive_classifier import AdaptiveClassifier
classifier = AdaptiveClassifier("bert-base-uncased")
classifier.add_examples(texts, labels)
predictions = classifier.predict("New text")

🔗 RESOURCES
Blog: https://huggingface.co/blog/codelion/adaptive-classifier
Code: https://github.com/codelion/adaptive-classifier
Models:

adaptive-classifier

Available models: llm-hallucination-detector, llm-config-optimizer, llm-router

Works with any HuggingFace transformer. Fully open source and production-ready!

posted an update 12 days ago

Post

2518

Adaptive Classifier: Dynamic Text Classification with Strategic Learning

New text classification system that learns continuously without catastrophic forgetting. Achieved 22.2% robustness improvement on adversarial datasets while maintaining clean data performance.

🎯 THE PROBLEM
Traditional classifiers require complete retraining when adding new classes. Expensive and time-consuming, especially with adversarial users trying to game the system.

🚀 KEY INNOVATIONS
• Hybrid memory-neural architecture (prototype-based + neural adaptation)
• Strategic classification using game theory to predict and defend against manipulation
• Elastic Weight Consolidation prevents catastrophic forgetting

📊 RESULTS
Tested on AI-Secure/adv_glue dataset:
• Clean data: 80.0% → 82.2% (+2.2%)
• Manipulated data: 60.0% → 82.2% (+22.2%)
• Zero performance drop under adversarial attacks

🔬 APPLICATIONS
• Hallucination detection: 80.7% recall for RAG safety
• LLM routing: 26.6% cost optimization improvement
• Content moderation: Robust against gaming attempts

⚙️ USAGE
pip install adaptive-classifier

from adaptive_classifier import AdaptiveClassifier
classifier = AdaptiveClassifier("bert-base-uncased")
classifier.add_examples(texts, labels)
predictions = classifier.predict("New text")

🔗 RESOURCES
Blog: https://huggingface.co/blog/codelion/adaptive-classifier
Code: https://github.com/codelion/adaptive-classifier
Models:

adaptive-classifier

Available models: llm-hallucination-detector, llm-config-optimizer, llm-router

Works with any HuggingFace transformer. Fully open source and production-ready!

reacted to their post with ❤️👀🚀🔥 14 days ago

Post

1551

DeepThink Plugin: Bringing Gemini 2.5's Parallel Reasoning to Open Models

Just released an open-source plugin that implements Google's "Deep Think" reasoning approach for models like DeepSeek R1, Qwen3, and other open models.

Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.

Our implementation works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, models run an internal debate before responding.

Key features:
- Works with any model that supports structured reasoning patterns
- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems
- Increases inference time but significantly improves answer quality

The plugin won the Cerebras & OpenRouter Qwen 3 Hackathon, validating that this approach works well beyond Google's proprietary implementation.

GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink
Demo: https://www.youtube.com/watch?v=b06kD1oWBA4

The goal is democratizing advanced reasoning capabilities that were previously locked behind APIs. Perfect for researchers and practitioners working with local deployments who want enhanced reasoning without dependency on proprietary services.

Performance notes: Currently about 2-3x slower inference but much better results on complex problems. Working on adaptive triggering to only activate when problems benefit from parallel reasoning.

Would love feedback from the HF community and collaborations on optimizing the approach further. Open to PRs and always interested in making open models more capable.

posted an update 14 days ago

Post

1551

DeepThink Plugin: Bringing Gemini 2.5's Parallel Reasoning to Open Models

Just released an open-source plugin that implements Google's "Deep Think" reasoning approach for models like DeepSeek R1, Qwen3, and other open models.

Google's recent Gemini 2.5 report introduced Deep Think - a technique where models generate multiple hypotheses in parallel and critique them before arriving at final answers. It achieves SOTA results on math olympiads and competitive coding benchmarks.

Our implementation works by modifying the inference pipeline to explore multiple solution paths simultaneously, then synthesizing the best approach. Instead of single-pass generation, models run an internal debate before responding.

Key features:
- Works with any model that supports structured reasoning patterns
- Implements parallel thinking during response generation
- Particularly effective for complex reasoning tasks, math, and coding problems
- Increases inference time but significantly improves answer quality

The plugin won the Cerebras & OpenRouter Qwen 3 Hackathon, validating that this approach works well beyond Google's proprietary implementation.

GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/deepthink
Demo: https://www.youtube.com/watch?v=b06kD1oWBA4

The goal is democratizing advanced reasoning capabilities that were previously locked behind APIs. Perfect for researchers and practitioners working with local deployments who want enhanced reasoning without dependency on proprietary services.

Performance notes: Currently about 2-3x slower inference but much better results on complex problems. Working on adaptive triggering to only activate when problems benefit from parallel reasoning.

Would love feedback from the HF community and collaborations on optimizing the approach further. Open to PRs and always interested in making open models more capable.

reacted to their post with ❤️➕🚀🔥 20 days ago

Post

2016

New Research: Theoretical Foundations for In-Context Learning in Transformers

I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning.

The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates?

Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation.

Key theoretical results:

- For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance)
- For linear classification: O(d/ε) examples (where d = input dimension)
- Extensions to finite context scenarios with practical bounds

This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models.

While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment.

Paper: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques (2506.08060)

1 reply

·

posted an update 20 days ago

Post

2016

New Research: Theoretical Foundations for In-Context Learning in Transformers

I'm excited to share our latest theoretical work that formally proves an interesting property of large language models: base transformer models can approximate fine-tuned capabilities using only inference-time techniques like in-context learning.

The core question we investigated: Can specialized behaviors typically acquired through expensive supervised fine-tuning be elicited from base models without any parameter updates?

Our theoretical contribution: We provide a formal proof, grounded in the Turing completeness of transformers, showing that this is indeed possible under certain assumptions. The work establishes mathematical bounds on the minimal dataset sizes needed for approximation.

Key theoretical results:

- For text generation tasks: O(mV/ε²) examples suffice (where m = number of contexts, V = vocabulary size, ε = error tolerance)
- For linear classification: O(d/ε) examples (where d = input dimension)
- Extensions to finite context scenarios with practical bounds

This work helps explain why techniques like few-shot prompting, retrieval-augmented generation, and in-context learning work so effectively in practice. It bridges formal computer science theory with empirical observations about modern language models.

While the assumptions are idealized (unbounded computational resources, full dataset access), the results provide mathematical foundations for understanding inference-time adaptation strategies that are increasingly important in AI deployment.

Paper: Eliciting Fine-Tuned Transformer Capabilities via Inference-Time Techniques (2506.08060)

1 reply

·

reacted to their post with ➕❤️ about 1 month ago

Post

3412

🧠 We just implemented Andrej Karpathy's "third paradigm" for LLM learning!

System Prompt Learning (SPL) enables LLMs to automatically learn problem-solving strategies from experience, rather than relying on static prompts.

🚀 How it works:
Your LLM builds a database of effective strategies, selects the best ones for each problem, and refines them over time based on success rates.

📊 Results across math benchmarks:
Arena Hard: 29% → 37.6% (+8.6%)
AIME24: 23.33% → 30% (+6.67%)
OptILLMBench: 61% → 65% (+4%)

The best part? All strategies are human-readable and the system gets progressively better at problem types you use frequently.

✨ Key benefits:
🔄 Cumulative learning over time
📖 Transparent, inspectable strategies
🔌 Works with any OpenAI-compatible API
⚡ Simple integration: just add "spl-" prefix to your model

Built as an open-source plugin in optillm. After 500 queries, our system developed 129 strategies and refined 97 of them!

This feels like a genuine step toward AI that learns from experience while staying completely interpretable.

🔗 GitHub: https://github.com/codelion/optillm/tree/main/optillm/plugins/spl
📖 Full article: https://huggingface.co/blog/codelion/system-prompt-learning
🐦 Original Karpathy tweet: https://x.com/karpathy/status/1921368644069765486

Have you experimented with advanced system prompting? What strategies would you want your LLM to learn?

Asankhaya Sharma PRO

AI & ML interests

Recent Activity

Organizations

Asankhaya Sharma PRO

AI & ML interests

Recent Activity

Organizations

codelion's activity