1 4

qubite

Xdotnet

https://www.qubite.me/

qu-bite

AI & ML interests

Aspiring Civil Engineer | Passionate About Sustainable Infrastructure | Problem Solver| Tech enthusiasts

Recent Activity

Reacted to m-ric's post with 🚀 about 2 months ago

𝗔𝗱𝗱 𝘀𝗼𝘂𝗿𝗰𝗲 𝗵𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝗶𝗻𝗴 𝘁𝗼 𝘆𝗼𝘂𝗿 𝗥𝗔𝗚 𝘀𝘆𝘀𝘁𝗲𝗺! 📄💡 RAG systems are supposed to make your LLM's answer more trustworthy, by inserting in the prompt some supporting documents from a knowledge base : we say that we're "adding some context". 👎 But if you don't know which part of the answer has been generated based on which input tokens, it's hard to tell wether it was effectively grounded in the context knowledge or not! 🤔 I've been working on the question: is it possible to add notes to the answer linking to which part of the context they're generated from? And I've found a great solution: a great technique called Layer-wise Relevance Propagation (LRP), showcased in a paper at ICML `24 by Reduan Achtibat et al allows, allows to precisely score how important each input token was in generating your output! They've made it into a library called LXT. 📊 For each generated output token, LXT gives you attribution scores for each input token. ⚙️ So I've worked a bit more on aggregating these scores into meaningful spans between successive input and output tokens, and I finally obtained my desired result: RAG with source highlighting! Try the demo here 👉 https://huggingface.co/spaces/m-ric/rag_highlights Caveats: - It slows down generation (for now quite a lot, could hopefully be reduced a lot) - For now it supports only specific models: Llama models and Mixtral If there's enough interest in this solution, I can improve it further and spin it off into a specific library for RAG! 🚀

Reacted to davidberenstein1957's post with 🤗 about 2 months ago

Why is argilla/FinePersonas-v0.1 great for synthetic data generation? It can be used to synthesise realistic and diverse data of the customer personas your company is interested in! Dataset: https://huggingface.co/datasets/argilla/FinePersonas-v0.1 Example usage: https://distilabel.argilla.io/dev/sections/pipeline_samples/examples/fine_personas_social_network/

Reacted to victor's post with 🔥 about 2 months ago

🙋 Calling all Hugging Face users! We want to hear from YOU! What feature or improvement would make the biggest impact on Hugging Face? Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears! Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

View all activity

Organizations

Xdotnet's activity

Reacted to m-ric's post with 🚀 about 2 months ago

Post

1276

𝗔𝗱𝗱 𝘀𝗼𝘂𝗿𝗰𝗲 𝗵𝗶𝗴𝗵𝗹𝗶𝗴𝗵𝘁𝗶𝗻𝗴 𝘁𝗼 𝘆𝗼𝘂𝗿 𝗥𝗔𝗚 𝘀𝘆𝘀𝘁𝗲𝗺! 📄💡

RAG systems are supposed to make your LLM's answer more trustworthy, by inserting in the prompt some supporting documents from a knowledge base : we say that we're "adding some context".

👎 But if you don't know which part of the answer has been generated based on which input tokens, it's hard to tell wether it was effectively grounded in the context knowledge or not!

🤔 I've been working on the question: is it possible to add notes to the answer linking to which part of the context they're generated from?

And I've found a great solution: a great technique called Layer-wise Relevance Propagation (LRP), showcased in a paper at ICML `24 by Reduan Achtibat et al allows, allows to precisely score how important each input token was in generating your output! They've made it into a library called LXT.

📊 For each generated output token, LXT gives you attribution scores for each input token.

⚙️ So I've worked a bit more on aggregating these scores into meaningful spans between successive input and output tokens, and I finally obtained my desired result: RAG with source highlighting!

Try the demo here 👉 m-ric/rag_highlights

Caveats:
- It slows down generation (for now quite a lot, could hopefully be reduced a lot)
- For now it supports only specific models: Llama models and Mixtral

If there's enough interest in this solution, I can improve it further and spin it off into a specific library for RAG! 🚀

Reacted to davidberenstein1957's post with 🤗 about 2 months ago

Post

1134

Why is argilla/FinePersonas-v0.1 great for synthetic data generation? It can be used to synthesise realistic and diverse data of the customer personas your company is interested in!

Dataset: argilla/FinePersonas-v0.1
Example usage: https://distilabel.argilla.io/dev/sections/pipeline_samples/examples/fine_personas_social_network/

1 reply

Reacted to victor's post with 🔥 about 2 months ago

Post

5450

🙋 Calling all Hugging Face users! We want to hear from YOU!

What feature or improvement would make the biggest impact on Hugging Face?

Whether it's the Hub, better documentation, new integrations, or something completely different – we're all ears!

Your feedback shapes the future of Hugging Face. Drop your ideas in the comments below! 👇

154 replies

Reacted to Jaward's post with 👍 4 months ago

Post

1718

Super Exciting New Paper By Meta🤖🧠🚀

Discrete Flow Matching:
Introduces a new framework/algorithm for generating text/code without having to predict auto-regressively or one “word” at a time as traditional GPT models do. It generates all parts of the text/code at once.

The algorithm does this by slowly transforming random noise (source) into meaningful text (data). It learns how to transform samples along a path created between source and target using a "probability velocity" that describes how probabilities change over time. During generation, DFM starts with a random sample and iteratively updates it using this learned velocity, gradually transforming it into a sample from the target distribution. This allows for non-autoregressive generation.

They were able to scale models of up to 1.7B parameters achieving impressive scores on HumanEval and MBPP for coding, significantly closing the gap between autoregressive models and discrete flow models.

Though in its infancy, it sure does hold a promising future as leading research scientists argue non-autoregressive methods yield better reasoning.

Reacted to Symbol-LLM's post with 🚀 4 months ago

Post

2116

🔥Thrilled to release our 8B version of Symbol-LLM-Instruct !

It follows the two-stage training strategy proposed in the original paper and is continually optimized on LLaMA3-Chat-8B model.

Symbol-LLM was accepted by ACL'24 main conference ! See you in Thailand !

Paper link: https://arxiv.org/abs/2311.09278
Paper Title: Symbol-LLM: Towards Foundational Symbol-centric Interface For Large Language Models

1 reply

Reacted to TuringsSolutions's post with 👍 4 months ago

Post

1379

SNN Image Diffusion V2

Billionaires have been made for less than this. This is only one of the things it can it do. It can do API calls, function calls, optimize poker and blackjack odds, anything that is an optimization problem. It costs fractions of a penny and requires fractions of the compute of an LLM model. It can even communicate two ways with an LLM model.