My (Chiffon) Nguyen PRO

chiffonng

https://chiffonng.github.io/

chiffonng

AI & ML interests

human-centric and data-efficient AI for knowledge acquisition

Recent Activity

upvoted an article 5 days ago

SmolLM - blazingly fast and remarkably powerful

upvoted an article 6 days ago

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

updated a model 9 days ago

chiffonng/gemma2-9b-mnemonics-sft

View all activity

Organizations

None yet

chiffonng's activity

upvoted an article 5 days ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16, 2024

• 336

upvoted an article 6 days ago

Article

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

7 days ago

• 310

updated a model 9 days ago

chiffonng/gemma2-9b-mnemonics-sft

Text Generation • Updated 9 days ago • 2

published a model 9 days ago

chiffonng/gemma2-9b-mnemonics-sft

Text Generation • Updated 9 days ago • 2

upvoted an article 9 days ago

Article

Fixing Gradient Accumulation

Oct 16, 2024

• 51

updated 2 datasets 18 days ago

chiffonng/en-vocab-mnemonics

Viewer • Updated 18 days ago • 1.31k • 105

chiffonng/en-vocab-mnemonics-test

Viewer • Updated 18 days ago • 200 • 71

updated a collection 18 days ago

Vocab Mnemonic Mining

Collection

Investigate the potential of mining linguistic knowledge from LLM to generate mnemonic devices that aid vocabulary learning. • 5 items • Updated 18 days ago

published a dataset 18 days ago

chiffonng/en-vocab-mnemonics-test

Viewer • Updated 18 days ago • 200 • 71

upvoted an article 18 days ago

Article

SmolVLM - small yet mighty Vision Language Model

Nov 26, 2024

• 222

reacted to singhsidhukuldeep's post with 🧠 19 days ago

Post

3476

O1 Embedder: Transforming Retrieval Models with Reasoning Capabilities

Researchers from University of Science and Technology of China and Beijing Academy of Artificial Intelligence have developed a novel retrieval model that mimics the slow-thinking capabilities of reasoning-focused LLMs like OpenAI's O1 and DeepSeek's R1.

Unlike traditional embedding models that directly match queries with documents, O1 Embedder first generates thoughtful reflections about the query before performing retrieval. This two-step process significantly improves performance on complex retrieval tasks, especially those requiring intensive reasoning or zero-shot generalization to new domains.

The technical implementation is fascinating:

- The model integrates two essential functions: Thinking and Embedding
- It uses an "Exploration-Refinement" data synthesis workflow where initial thoughts are generated by an LLM and refined by a retrieval committee
- A multi-task training method fine-tunes a pre-trained LLM to generate retrieval thoughts via behavior cloning while simultaneously learning embedding capabilities through contrastive learning
- Memory-efficient joint training enables both tasks to share encoding results, dramatically increasing batch size

The results are impressive - O1 Embedder outperforms existing methods across 12 datasets in both in-domain and out-of-domain scenarios. For example, it achieves a 3.9% improvement on Natural Questions and a 3.0% boost on HotPotQA compared to models without thinking capabilities.

This approach represents a significant paradigm shift in retrieval technology, bridging the gap between traditional dense retrieval and the reasoning capabilities of large language models.

What do you think about this approach? Could "thinking before retrieval" transform how we build search systems?