๐ข For those who consider a quick and inplace annotation of entities in JSON / CSV tabular data, I got a good news. So far releasing the latest version of the bulk-ner which does these things for you: ๐ https://github.com/nicolay-r/bulk-ner/releases/tag/0.25.2
bulk-ner is a no-string wrapper over NER service using popular frameworks like DeepPavlov, Spacy, Flair.
What's new? The latest 0.25.2 version has the following key features: ๐ง Fixed: ๐ the output ignores other input content in input #31 ๐ฅ Schemas support: you can annotate various coulmns by combining them as you wish and map onto the other output colums (see ๐ธ below) #28
Below is the screenshot on how you can quick start of using it with Spacy models.
I found if we apply the reasoning system prompt (that has been published on the NousResearch/DeepHermes-3-Llama-3-8B-Preview model card) other models are also react to it and start mimicking reasoning. Some better some worse. I've seen internal monologue and self questioning.
GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?
The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).
Try out my updated implementation of forked OpenDeepResearcher(link below) as an OpenAI compatible endpoint, but with full control, can be deployed completely free with Gemini api or completely locally with ollama, or pay-as-you-go in BYOK format, the AI agents will think dynamically based on the difficulties of given research, compatible with any OpenAI compatible configurable clients(Msty, Chatbox, even vscode AI Toolkit playground).
Based on my testing against Perplexity's and Gemini's implementation with some Physics domain questions, mine is comparable and very competent at finding even the most rare articles or methods.
Also a funny benchmark of mine to test all these searching models, is to trouble shot a WSL2 hanging issue I experienced last year, with prompt:
> wsl2 in windows hangs in background with high vmmem cpu usage once in a while, especially after hibernation, no error logs captured in linux, also unable to shutdown in powershell, provide solutions
the final solution that took me a day last year to find is to patch the kernel with some steps documented in carlfriedrich's repo and wait Microsoft to solve it(it is buried deep in wsl issues). Out of the three, only my Deep Research agent has found this solution, Perplexity and Gemini just focus on other force restart or memory management methods. I am very impressed with how it has this kind of obscure and scarce trouble shooting ability.
**Limitations**
Some caveats to be done later: - Multi-turn conversation is not yet supported, so no follow-up questions - System message is only extra writing instructions, don't affect on search - Small local model may have trouble citing source reliably, I am working on a fix to fact check all citation claims
"๐ฎ๐ฌ๐ฎ๐ฑ ๐๐ถ๐น๐น ๐ฏ๐ฒ ๐๐ต๐ฒ ๐๐ฒ๐ฎ๐ฟ ๐ผ๐ณ ๐๐ ๐ฎ๐ด๐ฒ๐ป๐๐": this statement has often been made, here are numbers to support it.
I've plotted the progress of AI agents on GAIA test set, and it seems they're headed to catch up with the human baseline in early 2026.
And that progress is still driven mostly by the improvement of base LLMs: progress would be even faster with fine-tuned agentic models.
Why it matters? The original docs has: ๐ข No the relate support for JS rather only Python/HTTP and NodeJS by using the replicate package. ๐ข Mixture of NodeJS and bash curl snippets: https://replicate.com/docs/topics/predictions/streaming
โ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS
In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.
๐ The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?
1๏ธโฃ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents. 2๏ธโฃ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business. 3๏ธโฃ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.
Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:
1๏ธโฃ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries. 2๏ธโฃ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.
๐ก What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.
Time Stream is a groundbreaking AI tool that transforms your text into a mesmerizing video journey from the past to the future. With this innovative technology, your ideas evolve over time, visualized through a dynamic image strip and a fluid video narrative. Imagine typing a simple prompt and watching as your words transform into vivid scenes that capture every moment of changeโlike a time machine for creativity! ๐ฅโจ
Key Features: โข Text-to-Video Transformation: Enter any text, and Time Stream converts it into a compelling video that travels through time, turning your ideas into a visual story. ๐ฝ๏ธ โข Dynamic Image Strip: Alongside the video, a vibrant image strip is created, showcasing each stage of the transformation so you can see every detail of the evolution. ๐ธ โข Customizable Settings: Adjust parameters such as strength, guidance scale, and more to fine-tune your videoโs appearance and ensure it perfectly matches your creative vision. โ๏ธ โข User-Friendly Interface: With a modern and sleek design, Time Stream is incredibly easy to use. Its intuitive layout lets you focus on your creativity without any technical hurdles. ๐ฅ๏ธ๐
Time Stream is perfect for artists, storytellers, designers, and anyone who loves to see their ideas come to life in new and exciting ways. Whether youโre reflecting on the past, celebrating the present, or dreaming about the future, Time Stream turns your narrative into a vivid, ever-changing masterpiece. Dive in and let your imagination soar as you journey through time, one image at a time! ๐๐ฅ
Tutorial ๐ฅ Training a non-English reasoning model with GRPO and Unsloth
I wanted to share my experiment with training reasoning models in languages other than English/Chinese.
Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.
The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch ๐ช
Whatโs new compared to existing reasoning datasets?
โพ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.
๐ณ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.
๐ 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.
โณ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canโt be verified with a rules-based parser)
๐ We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.
Hugging Face just launched the AI Agents Course โ a free journey from beginner to expert in AI agents!
- Learn AI Agent fundamentals, use cases and frameworks - Use top libraries like LangChain & LlamaIndex - Compete in challenges & earn a certificate - Hands-on projects & real-world applications
What do you need to know about Spacy NER models: โ๏ธ Models represent a python packages; packages could be installed directly into environemnt or via python CLI. โ๏ธ Library has a pipeline for optimized request handling in batches. โ๏ธ Architecture: DNN embedding-based models (not transformers)
A Brief Survey of Associations Between Meta-Learning and General AI
The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:
1. General AI (AGI) and Meta-Learning: - AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks. - Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.
2. Neural Network Design in Meta-Learning: - Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks. - Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.
3. Coevolution: - Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance. - Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.
4. Curiosity in Meta-Learning: - Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima. - Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.
5. Forgetting Mechanisms: - Forgetting is crucial to avoid memory overload in AI systems
Just been starting to port my articles over that mattered most to me from Civitai. Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me. I'm a mere spec compared to their future, and that's amazing. But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI. I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.
RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.
3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342) Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.