7 30 119

Louis Brulé Naudet

louisbrulenaudet

https://louisbrulenaudet.com

BruleNaudet

louisbrulenaudet

AI & ML interests

Research in business taxation and development (NLP, LLM, Computer vision...), University Dauphine-PSL 📖 | Backed by the Microsoft for Startups Hub program and Google Cloud Platform for startups program.

Organizations

Posts 4

Post

586

I've just open sourced RAGoon, a small utility I use to integrate knowledge from the web into LLM inference based on Groq speed and pure Google search performance ⚡

RAGoon is a Python library available on PyPI that aims to improve the performance of language models by providing contextually relevant information through retrieval-based querying, parallel web scraping, and data augmentation techniques. It offers an integration of various APIs (OpenAI, Groq), enabling users to retrieve information from the web, enrich it with domain-specific knowledge, and feed it to language models for more informed responses.

from groq import Groq
# from openai import OpenAI
from ragoon import RAGoon

# Initialize RAGoon instance
ragoon = RAGoon(
    google_api_key="your_google_api_key",
    google_cx="your_google_cx",
    completion_client=Groq(api_key="your_groq_api_key")
)

# Search and get results
query = "I want to do a left join in python polars"
results = ragoon.search(
    query=query,
    completion_model="Llama3-70b-8192",
)

# Print list of results
print(results)

For the time being, this project remains simple, but can easily be integrated into a RAG pipeline.

Link to GitHub : https://github.com/louisbrulenaudet/ragoon

Post

329

Integrating the French Taxation Embedding Benchmark Task (beta) into the MTEB 🤗

I'm excited to announce an integration of the French Taxation Embedding Benchmark task into the Massive Text Embedding Benchmark (MTEB).

This addition expands the diverse set of tasks available within MTEB, enabling researchers and practitioners to develop and evaluate retrieval models focused on retrieving relevant tax articles or content based on provided queries.

Link to the 🤗 Dataset : louisbrulenaudet/tax-retrieval-benchmark

Link to the GitHub repo : https://github.com/louisbrulenaudet/tax-retrieval-benchmark

Notes:
The Massive Text Embedding Benchmark for French Taxation and the Dataset are currently in beta and may not be suitable for direct use in production. The size of the Dataset may not be sufficient to handle a wide range of queries and scenarios encountered in real-world settings.

As the Dataset grows and matures, I will provide updates and guidance on its suitability for production use cases.

View all posts