Nicolay Rusnachenko's picture

Nicolay Rusnachenko

nicolay-r

AI & ML interests

Information RetrievalใƒปMedical Multimodal NLP (๐Ÿ–ผ+๐Ÿ“) Research Fellow @BU_Researchใƒปsoftware developer http://arekit.ioใƒปPhD in NLP

Recent Activity

View all activity

Organizations

None yet

nicolay-r's activity

posted an update about 22 hours ago
view post
Post
1103
๐Ÿ“ข For those who consider a quick and inplace annotation of entities in JSON / CSV tabular data, I got a good news. So far releasing the latest version of the bulk-ner which does these things for you:
๐ŸŒŸ https://github.com/nicolay-r/bulk-ner/releases/tag/0.25.2

bulk-ner is a no-string wrapper over NER service using popular frameworks like DeepPavlov, Spacy, Flair.

What's new? The latest 0.25.2 version has the following key features:
๐Ÿ”ง Fixed: ๐Ÿ› the output ignores other input content in input #31
๐Ÿ”ฅ Schemas support: you can annotate various coulmns by combining them as you wish and map onto the other output colums (see ๐Ÿ“ธ below) #28

Below is the screenshot on how you can quick start of using it with Spacy models.

๐ŸŒŒ List of other providers @ nlp-thirdgate:
https://github.com/nicolay-r/nlp-thirdgate/tree/master/ner
reacted to csabakecskemeti's post with ๐Ÿ‘ 1 day ago
reacted to fffiloni's post with ๐Ÿ”ฅ 1 day ago
reacted to tianchez's post with ๐Ÿš€ 1 day ago
view post
Post
1135
Introducing VLM-R1!

GRPO has helped DeepSeek R1 to learn reasoning. Can it also help VLMs perform stronger for general computer vision tasks?

The answer is YES and it generalizes better than SFT. We trained Qwen 2.5 VL 3B on RefCOCO (a visual grounding task) and eval on RefCOCO Val and RefGTA (an OOD task).

https://github.com/om-ai-lab/VLM-R1
reacted to benhaotang's post with ๐Ÿš€ 1 day ago
view post
Post
1037
Try out my updated implementation of forked OpenDeepResearcher(link below) as an OpenAI compatible endpoint, but with full control, can be deployed completely free with Gemini api or completely locally with ollama, or pay-as-you-go in BYOK format, the AI agents will think dynamically based on the difficulties of given research, compatible with any OpenAI compatible configurable clients(Msty, Chatbox, even vscode AI Toolkit playground).

If you don't want to pay OpenAI $200 to use or want to take control of your deep research, check out here:
๐Ÿ‘‰ https://github.com/benhaotang/OpenDeepResearcher-via-searxng

**Personal take**

Based on my testing against Perplexity's and Gemini's implementation with some Physics domain questions, mine is comparable and very competent at finding even the most rare articles or methods.

Also a funny benchmark of mine to test all these searching models, is to trouble shot a WSL2 hanging issue I experienced last year, with prompt:

> wsl2 in windows hangs in background with high vmmem cpu usage once in a while, especially after hibernation, no error logs captured in linux, also unable to shutdown in powershell, provide solutions

the final solution that took me a day last year to find is to patch the kernel with some steps documented in carlfriedrich's repo and wait Microsoft to solve it(it is buried deep in wsl issues). Out of the three, only my Deep Research agent has found this solution, Perplexity and Gemini just focus on other force restart or memory management methods. I am very impressed with how it has this kind of obscure and scarce trouble shooting ability.

**Limitations**

Some caveats to be done later:
- Multi-turn conversation is not yet supported, so no follow-up questions
- System message is only extra writing instructions, don't affect on search
- Small local model may have trouble citing source reliably, I am working on a fix to fact check all citation claims
reacted to m-ric's post with ๐Ÿ”ฅ 2 days ago
view post
Post
4475
"๐Ÿฎ๐Ÿฌ๐Ÿฎ๐Ÿฑ ๐˜„๐—ถ๐—น๐—น ๐—ฏ๐—ฒ ๐˜๐—ต๐—ฒ ๐˜†๐—ฒ๐—ฎ๐—ฟ ๐—ผ๐—ณ ๐—”๐—œ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐˜€": this statement has often been made, here are numbers to support it.

I've plotted the progress of AI agents on GAIA test set, and it seems they're headed to catch up with the human baseline in early 2026.

And that progress is still driven mostly by the improvement of base LLMs: progress would be even faster with fine-tuned agentic models.
posted an update 2 days ago
view post
Post
1355
๐Ÿ“ข If you're around Replicate AI models and wish to use them in streaming mode via JS, then this snippet might be a quick way to experiment with streaming API usage:
https://gist.github.com/nicolay-r/86fc212086c0955d541244253ec0564b

Why it matters? The original docs has:
๐ŸŸข No the relate support for JS rather only Python/HTTP and NodeJS by using the replicate package.
๐ŸŸข Mixture of NodeJS and bash curl snippets:
https://replicate.com/docs/topics/predictions/streaming

Special thanks to the reated template for accessing APIs of other vendors like Claude / OpenAI by Simon Willson in the following post:
https://til.simonwillison.net/llms/streaming-llm-apis

Default model: meta-llama/Meta-Llama-3-70B

PS: I am happy and open for your comments related to this solution
reacted to mmhamdy's post with ๐Ÿ‘€ 5 days ago
view post
Post
2898
โ›“ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS

In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.

๐Ÿ“œ The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?

1๏ธโƒฃ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2๏ธโƒฃ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3๏ธโƒฃ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.

Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:

1๏ธโƒฃ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2๏ธโƒฃ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.

๐Ÿ’ก What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.

- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)
reacted to sequelbox's post with ๐Ÿง ๐Ÿ‘€ 5 days ago
reacted to ginipick's post with ๐Ÿš€ 5 days ago
view post
Post
5363
Time Stream โณ๐Ÿš€

Time Stream is a groundbreaking AI tool that transforms your text into a mesmerizing video journey from the past to the future. With this innovative technology, your ideas evolve over time, visualized through a dynamic image strip and a fluid video narrative. Imagine typing a simple prompt and watching as your words transform into vivid scenes that capture every moment of changeโ€”like a time machine for creativity! ๐ŸŽฅโœจ

Key Features: โ€ข Text-to-Video Transformation: Enter any text, and Time Stream converts it into a compelling video that travels through time, turning your ideas into a visual story. ๐Ÿ“ฝ๏ธ
โ€ข Dynamic Image Strip: Alongside the video, a vibrant image strip is created, showcasing each stage of the transformation so you can see every detail of the evolution. ๐Ÿ“ธ
โ€ข Customizable Settings: Adjust parameters such as strength, guidance scale, and more to fine-tune your videoโ€™s appearance and ensure it perfectly matches your creative vision. โš™๏ธ
โ€ข User-Friendly Interface: With a modern and sleek design, Time Stream is incredibly easy to use. Its intuitive layout lets you focus on your creativity without any technical hurdles. ๐Ÿ–ฅ๏ธ๐ŸŒŸ

Time Stream is perfect for artists, storytellers, designers, and anyone who loves to see their ideas come to life in new and exciting ways. Whether youโ€™re reflecting on the past, celebrating the present, or dreaming about the future, Time Stream turns your narrative into a vivid, ever-changing masterpiece. Dive in and let your imagination soar as you journey through time, one image at a time! ๐Ÿš€๐Ÿ”ฅ

ginipick/Time-Stream
reacted to s-emanuilov's post with ๐Ÿ”ฅ 6 days ago
view post
Post
5103
Tutorial ๐Ÿ’ฅ Training a non-English reasoning model with GRPO and Unsloth

I wanted to share my experiment with training reasoning models in languages other than English/Chinese.

Using Llama 3.1 8B as base, GRPO trainer from trl, and Unsloth optimizations, I got a working prototype in Bulgarian after ~5 hours on an L40S GPU. The approach should work for any language where the base model has some pre-training coverage.

Full code and tutorial here: https://unfoldai.com/reasoning-in-a-non-english-language/

The model itself: s-emanuilov/LLMBG-Llama-3.1-8B-BG-Reasoning-v0.1

I hope this helps anyone looking to build reasoning models in their language.
ยท
reacted to schuler's post with ๐Ÿ‘ 6 days ago
view post
Post
7160
๐Ÿ“ข New Research Alert: Making Language Models Smaller & Smarter!

Thrilled to share the latest technical report demonstrating how to reduce language model parameters by 77% while maintaining performance.

The secret? Grouped pointwise convolutions. Yes. We brought a method from computer vision to the transformers arena.

๐Ÿ”‘ Key Findings:
โ€ข 77% parameter reduction.
โ€ข Maintained model capabilities.
โ€ข Improved generalization.

Paper: https://www.researchgate.net/publication/388835829_SAVING_77_OF_THE_PARAMETERS_IN_LARGE_LANGUAGE_MODELS_TECHNICAL_REPORT
Code: https://github.com/joaopauloschuler/less-parameters-llm
  • 2 replies
ยท
reacted to davidberenstein1957's post with ๐Ÿ‘€ 6 days ago
reacted to lewtun's post with โค๏ธ 6 days ago
view post
Post
4237
Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch ๐Ÿ’ช

Whatโ€™s new compared to existing reasoning datasets?

โ™พ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

๐Ÿณ 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

๐Ÿ“€ 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

โณ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that canโ€™t be verified with a rules-based parser)

๐Ÿ“Š We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

๐Ÿ”Ž Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2
reacted to ImranzamanML's post with ๐Ÿ‘ 6 days ago
view post
Post
3085
Hugging Face just launched the AI Agents Course โ€“ a free journey from beginner to expert in AI agents!

- Learn AI Agent fundamentals, use cases and frameworks
- Use top libraries like LangChain & LlamaIndex
- Compete in challenges & earn a certificate
- Hands-on projects & real-world applications

https://huggingface.co/learn/agents-course/unit0/introduction

You can join for a live Q&A on Feb 12 at 5PM CET to learn more about the course here

https://www.youtube.com/live/PopqUt3MGyQ
posted an update 7 days ago
view post
Post
2177
๐Ÿ“ข If you wish to empower LLM with NER for texts in English, then I can recommend to use Spacy. Sharing the wrapper of Spacy NER models the bulk-ner dedicated for hadling CSV / JSONL content:
Script: https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/ner_spacy_383.sh
Code: https://raw.githubusercontent.com/nicolay-r/nlp-thirdgate/refs/heads/master/ner/spacy_383.py

What do you need to know about Spacy NER models:
โ˜‘๏ธ Models represent a python packages; packages could be installed directly into environemnt or via python CLI.
โ˜‘๏ธ Library has a pipeline for optimized request handling in batches.
โ˜‘๏ธ Architecture: DNN embedding-based models (not transformers)

๐Ÿค– List of models (or see screenshot below):
https://huggingface.co/spacy
๐Ÿ“‹ Supported NER types:
https://github.com/explosion/spaCy/discussions/9147

โš ๏ธ NOTE: chunking seems to be non-applicable due to specifics of models and usage of the internal pipeline mechanism

๐Ÿš€ Performance for sentences (en):
Model: spacy/en_core_web_sm ๐Ÿ”ฅ 530 sentences per second ๐Ÿ”ฅ (similar to larger solutions)

๐ŸŒŒ other wrappers for bulk-ner nlp-thirdgate: https://github.com/nicolay-r/nlp-thirdgate#ner
reacted to KnutJaegersberg's post with ๐Ÿ‘€ 7 days ago
view post
Post
2634
A Brief Survey of Associations Between Meta-Learning and General AI

The paper titled "A Brief Survey of Associations Between Meta-Learning and General AI" explores how meta-learning techniques can contribute to the development of Artificial General Intelligence (AGI). Here are the key points summarized:

1. General AI (AGI) and Meta-Learning:
- AGI aims to develop algorithms that can handle a wide variety of tasks, similar to human intelligence. Current AI systems excel at specific tasks but struggle with generalization to unseen tasks.
- Meta-learning or "learning to learn" improves model adaptation and generalization, allowing AI systems to tackle new tasks efficiently using prior experiences.

2. Neural Network Design in Meta-Learning:
- Techniques like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks enable self-improvement and adaptability for deep models, supporting generalization across tasks.
- Highway networks and ResNet-style models use shortcuts for efficient backpropagation, allowing deeper models that can be used in meta-learning frameworks.

3. Coevolution:
- Coevolution involves the mutual evolution of multiple components, such as learners or task-solvers, to improve overall performance.
- Coevolution between learners enhances collaboration and competition within AI systems, while coevolution between tasks and solvers (e.g., POWERPLAY and AI-GA frameworks) pushes solvers to adapt to increasingly complex tasks.

4. Curiosity in Meta-Learning:
- Curiosity-based exploration encourages AI systems to discover new, diverse features of the environment, avoiding local optima.
- Curiosity-based objectives can be combined with performance-based objectives to ensure efficient exploration and adaptation in complex tasks.

5. Forgetting Mechanisms:
- Forgetting is crucial to avoid memory overload in AI systems

https://arxiv.org/abs/2101.04283
reacted to Duskfallcrew's post with ๐Ÿ”ฅ 7 days ago
view post
Post
3065
Just been starting to port my articles over that mattered most to me from Civitai.
Look, i'm not going to sit here and whine, complain and moan entirely - they know why i've left, they're going to thrive without me.
I'm a mere spec compared to their future, and that's amazing.
But the journey continues, i've posted my Design 101 for Ai - the first one up -- i BELEIVE it's the first one, as it delves back to how Arts and Crafts connect to AI.
I'm still looking for a model hub in future for my insane 800+ models i'd published - considering that that's half of what i've got sitting in my repos on HF.
reacted to Kseniase's post with ๐Ÿ”ฅ 7 days ago
view post
Post
7510
8 New Types of RAG

RAG techniques continuously evolve to enhance LLM response accuracy by retrieving relevant external data during generation. To keep up with current AI trends, new RAG types incorporate deep step-by-step reasoning, tree search, citations, multimodality and other effective techniques.

Here's a list of 8 latest RAG advancements:

1. DeepRAG -> DeepRAG: Thinking to Retrieval Step by Step for Large Language Models (2502.01142)
Models retrieval-augmented reasoning as a Markov Decision Process, enabling strategic retrieval. It dynamically decides when to retrieve external knowledge and when rely on parametric reasoning.

2. RealRAG -> RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning (2502.00848)
Enhancesย  novel object generation by retrieving real-world images and using self-reflective contrastive learning to fill knowledge gap, improve realism and reduce distortions.

3. Chain-of-Retrieval Augmented Generation (CoRAG) -> Chain-of-Retrieval Augmented Generation (2501.14342)
Retrieves information step-by-step and adjusts it, also deciding how much compute power to use at test time. If needed it reformulates queries.

4. VideoRAG -> VideoRAG: Retrieval-Augmented Generation over Video Corpus (2501.05874)
Enables unlimited-length video processing, using dual-channel architecture that integrates graph-based textual grounding and multi-modal context encoding.

5. CFT-RAG ->ย  CFT-RAG: An Entity Tree Based Retrieval Augmented Generation Algorithm With Cuckoo Filter (2501.15098)
A tree-RAG acceleration method uses an improved Cuckoo Filter to optimize entity localization, enabling faster retrieval.

6. Contextualized Graph RAG (CG-RAG) -> CG-RAG: Research Question Answering by Citation Graph Retrieval-Augmented LLMs (2501.15067)
Uses Lexical-Semantic Graph Retrieval (LeSeGR) to integrate sparse and dense signals within graph structure and capture citation relationships

7. GFM-RAG -> GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation (2502.01113)
A graph foundation model that uses a graph neural network to refine query-knowledge connections

8. URAG -> URAG: Implementing a Unified Hybrid RAG for Precise Answers in University Admission Chatbots -- A Case Study at HCMUT (2501.16276)
A hybrid system combining rule-based and RAG methods to improve lightweight LLMs for educational chatbots
  • 1 reply
ยท