dhuynh95 (Daniel Huynh)

posted an update 11 months ago

Post

1658

💪Build an information retrieval Agent that can beat Gemini and OpenAI using open-source Large Action Model framework!

In this video, we ask to different proprietary Conversational AI the question:
“What is the most trendy recent paper on Llava models on Hugging Face papers? Provide the date and a summary of the paper”, and the results are interesting!
❌Gemini: found a paper from Jan 29, 2024
❌OpenAI: found a paper from October 2023
❌You.com: found a paper from Jan 29 2024
✅LaVague: found the latest paper (ConvLlaVA which is dope by the way https://arxiv.org/abs/2405.15738)!

The best? Our solution fits a few ines of code with our open-source framework! I will share how we built that agent during our webinar on AI Web Agents, this Thursday 30th May at 9 am PST (https://lu.ma/m8fzmb3q) so don’t miss it 😉

You can also start playing with our framework: https://github.com/lavague-ai/LaVague

reacted to m-ric's post with 👀 about 1 year ago

Post

1808

𝐓𝐡𝐞 𝐫𝐞𝐭𝐮𝐫𝐧 𝐨𝐟 𝐭𝐡𝐞 𝐑𝐍𝐍𝐬 ⚔ 𝐍𝐞𝐰 𝐌𝐚𝐦𝐛𝐚-𝐛𝐚𝐬𝐞𝐝 𝐚𝐫𝐜𝐡𝐢𝐭𝐞𝐜𝐭𝐮𝐫𝐞 "𝐉𝐚𝐦𝐛𝐚"

Since the release of BERT by Google in 2019, Transformers architecture have taken over machine learning thanks to their 𝗮𝘁𝘁𝗲𝗻𝘁𝗶𝗼𝗻 𝗺𝗲𝗰𝗵𝗮𝗻𝗶𝘀𝗺, that gives them the ability to focus on important points of the input. But 𝙖𝙩𝙩𝙚𝙣𝙩𝙞𝙤𝙣 𝙘𝙤𝙢𝙥𝙪𝙩𝙖𝙩𝙞𝙤𝙣 𝙞𝙨 𝙦𝙪𝙖𝙙𝙧𝙖𝙩𝙞𝙘 𝙞𝙣 𝙩𝙝𝙚 𝙞𝙣𝙥𝙪𝙩 𝙡𝙚𝙣𝙜𝙩𝙝.

💫 The Mamba paper, published in December 2023, announced the return of the RNNs: it has no attention, but integrates a selection mechanism, which should be able to reproduce the “focus” ability of attention, in an architecture for which the compute requirements 𝗴𝗿𝗼𝘄 𝗼𝗻𝗹𝘆 𝗹𝗶𝗻𝗲𝗮𝗿𝗹𝘆 𝗶𝗻 𝗶𝗻𝗽𝘂𝘁 𝗹𝗲𝗻𝗴𝘁𝗵!
🤔 Would this work? We had yet to see a large Mamba model recovering the performance of Attention-based Transformers.

💥 But now it's done! A (Mamba + Transformers) hybrid just beat Transformers!

The AI21 Labs team just released Jamba.
They insert a few Transformer layers to inject some attention in a big pile of Mamba layers, thus getting the best of both worlds.

𝙏𝙇;𝘿𝙍:
🏗️ 𝗡𝗲𝘄 𝗠𝗼𝗘 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲: 4 Jamba blocks, each of these being 7 Mamba layers for 1 Transformer.
🏋️ 𝟱𝟮𝗕 𝗽𝗮𝗿𝗮𝗺𝗲𝘁𝗲𝗿𝘀, 𝟭𝟮𝗕 𝗮𝗰𝘁𝗶𝘃𝗲 𝗮𝘁 𝗶𝗻𝗳𝗲𝗿𝗲𝗻𝗰𝗲: This reduction is enabled by Mixture of Experts, and similar to Mixtral (47B parameters - 13B active).
🏎️ 𝗦𝗽𝗲𝗲𝗱: 𝘅𝟯 𝘁𝗵𝗿𝗼𝘂𝗴𝗵𝗽𝘂𝘁. Jamba is much faster than similar-sized Transformer models on long contexts.
📏 𝗖𝗼𝗻𝘁𝗲𝘅𝘁 𝗹𝗲𝗻𝗴𝘁𝗵: 𝟭𝟰𝟬𝗞 𝘁𝗼𝗸𝗲𝗻𝘀 on a single 80GB A100!
💪 𝗣𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲: 𝘀𝘁𝗮𝘁𝗲-𝗼𝗳-𝘁𝗵𝗲-𝗮𝗿𝘁 𝗳𝗼𝗿 𝘁𝗵𝗶𝘀 𝘀𝗶𝘇𝗲. The small injection of attention seems sufficient since Jamba beats the open-source reference Mixtral-8x7B on many benchmarks!

Try it here 👉 ai21labs/Jamba-v0.1

posted an update about 1 year ago

Post

1810

🌊LaVague can compile Action Plans into actionable code to browse the internet!

In this example, you can see how an action plan with natural language instructions can be “compiled” into executable Selenium code!

🤖This shows the potential of #LAM (Large Action Models) to perform actions for us and automate mechanical tasks.
This example leverages a local embedding model and OpenAI GPT-3.5, but we support many options, including local ones with Gemma!
You can try this in our docs: https://docs.lavague.ai/en/latest/

LaVague is an open-source Large Action Model framework to automate automation. If you are interested in helping us on our mission to democratize automation tooling for devs, don’t hesitate to visit our GitHub (https://github.com/lavague-ai/LaVague) or Discord (https://discord.gg/SDxn9KpqX9)!

posted an update about 1 year ago

Post

Hello World! This post is written by the Large Action Model framework LaVague! Find out more on https://github.com/mithril-security/LaVague

Edit: Here is the video of 🌊LaVague posting this. This is quite meta

2 replies

·

replied to their post about 1 year ago

Thanks! Preparing a tutorial too to explain how we managed to have a working solution in <150 lines of code :D

posted an update about 1 year ago

Post

🌊 Released #LaVague, fullly open-source AI pipeline to turn natural language into browser actions!

In less than 150 lines of code (RAG with local embedding + Zephyr-7b-Gemma locally or Mixtral on HF Inference API), it generates #Selenium code from user query. In this GIF you can see it follow user instructions to command a browser to browse HF website!

Try it on Colab: colab.research.google.com/github/dhuynh95/LaVague/blob/main/LaVague.ipynb
GitHub: github.com/dhuynh95/LaVague

Pretty exciting how it becomes possible to create an AI assistant that could perform actions for us, such as logging on gov accounts, fill forms, or pull personal information!

It was quite fun to hack in the weekend using open-source tools, from @huggingface local embedding with transformers for local inference or HF Inference API, to RAG with @llama_index, through @MistralAI Mixtral model!

Some challenges: to make it run on Colab for the #GPU Poors, I first resorted to @huggingface Inference API with Mixtral as it was the only model good enough (gemma-7b did not make it and refused to produce code). But after some experimentations, I managed to make it work a local Zephyr-7b-Gemma so that people could run this assistant fully locally!

Because I used an off-the-shelf model, I had to improve performance with few-shot learning and Chain Of Thought, which managed to generate appropriate code!

I hope this project will herald a new dawn where transparent, private and local AI assistants help automate menial but critical tasks, such as helping fill taxes, book accomodation, or research information for us.

5 replies

·

posted an update about 1 year ago

Post

✨ In-context learning is all you need!

This super interesting paper shows that fine-tuning with #SFT or #RLHF only helps on the form but does not impact knowledge or reasoning abilities, and in some cases, actually decreases performance!

They tested it with Mistral-base vs Mistral FT-ed, as well as Llama 2 70b base and FT-ed and results are consistent.

Providing the right prompt to the base model actually makes the model better and has 0 training cost!

Paper: https://arxiv.org/abs/2312.01552

reacted to abidlabs's post with ❤️ about 1 year ago

Post

Just out: new custom Gradio component specifically designed for code completion models 🔥

1 reply

·

replied to lbourdois's post about 1 year ago

Pretty cool stuff! Maybe you should do a leaderboard of major datasets and their leakage score

reacted to lbourdois's post with 🤯 about 1 year ago

Post

Let me introduce you LLE: Leaks, leaks everywhere!

A quick experiment I've carried out on around 600 datasets from the HF Hub, the results are stored in lbourdois/LLE, and the methodology is described in
https://huggingface.co/blog/lbourdois/lle

7 replies

·

reacted to Locutusque's post with ❤️ about 1 year ago

Post

Introducing the "UltraTextbooks" dataset 🚀📚
Check it out here: Locutusque/UltraTextbooks
📘 A comprehensive collection of high-quality synthetic and human-written textbooks
👨‍🎓 Spanning various subjects and programming languages
🔧 Designed for advanced NLP tasks like language modeling, educational QA, text summarization, and content generation for edu purposes
🚀 Future expansions planned with additional data sources to enhance the corpus
👇 Data composition highlights 👇
- Blend of synthetic and human-written material
- Includes topics from general edu to specialized areas
- Structured with field "text"
🧩 Data collection from various Hugging Face datasets, guided by a diverse and comprehensive curation rationale
🚧 Limitations may exist, so report any issues you encounter

2 replies

·

reacted to gsarti's post with 👍 about 1 year ago

Post

🔍 Today's pick in Interpretability & Analysis of LMs: Black-Box Access is Insufficient for Rigorous AI Audits by @stecas @carsonezell et al.

Audits conducted on AI systems can identify potential risks and ensure their compliance to safety requirements. Authors categorise audits based on the access to model-related resources (black, grey, white and out-of-the box) and highlight how levels of transparency on audited AI system enable broader and more effective auditing procedures. Technical, physical, and legal safeguards for performing audits are also introduced to ensure minimal security risks for audited companies. Authors conclude that transparency on the type of auditors’ access and methods is a pre-requisite to correctly interpret audit results, and white- and outside-the-box access allow for substantially more scrutiny than black-box access alone.

📄 Paper: Black-Box Access is Insufficient for Rigorous AI Audits (2401.14446)

🔍 Further readings:

📄Taxonomy of AI system access: https://bit.ly/struct-access
💻An API for transparent science on Black-box AI (NNsight): https://nnsight.net/about

posted an update about 1 year ago

Post

Fascinating paper by Rand shows that there is no statistically significant difference between using LLMs or regular internet to craft operational plans for bioweapons!

This is the first paper that actually studies the impact of AI on bioweapons from an operational perspective and looks at the big question: is AI any better than just using public data on the Internet?

As most of the data is most likely out there, an LLM would just be a more efficient tool to come up with the relevant information, but it seems that its impact is limited.

https://www.rand.org/pubs/research_reports/RRA2977-2.html

1 reply

·

replied to santiviquez's post about 1 year ago

Love this paper too! It's simple yet powerful and applicable to black box models.
I actually have a space to demonstrate it: https://huggingface.co/spaces/mithril-security/hallucination_detector

I also dig into it on an HF Blog post: https://huggingface.co/blog/dhuynh95/automatic-hallucination-detection

posted an update over 1 year ago

Post

✅New paper to ensure valid LLM output with SOTA LLMs like GPT4 by mixing it with OSS LLMs

Paper: arxiv.org/abs/2401.09967

Great paper showing how strong proprietary AI like #GPT4 can be paired with #OSS LLM to ensure LLM output validity, e.g. valid JSON.

Many devs complain that #LLMs cannot be reliably used in production if the output is not valid, for instance, if one wants to use LLMs to generate SQL queries or JSON, it is crucial that the output is valid.

Frameworks have arisen to constrain the outputs of the LLM to follow some constraints, like outlines (https://github.com/outlines-dev/outlines), but they assume access to logits.

This makes them incompatible with proprietary LLMs like GPT4 that don’t share logits, so one can only use open-source LLMs that are much less performant.

This paper shows how can use powerful proprietary LLMs like GPT4 to create a first unconstrained sketch and refine it using an OSS model like Llama 2 where logits are accessible, to rewrite the sketch following some specific constraints.

They show that GPT4 Precision can be increased by 14% (43% before, 57% after), by boosting it with constrained output on information extraction on Wiki-NRE!

reacted to philschmid's post with ❤️ over 1 year ago

Post

What's the best way to fine-tune open LLMs in 2024? Look no further! 👀 I am excited to share “How to Fine-Tune LLMs in 2024 with Hugging Face” using the latest research techniques, including Flash Attention, Q-LoRA, OpenAI dataset formats (messages), ChatML, Packing, all built with Hugging Face TRL. 🚀

It is created for consumer-size GPUs (24GB) covering the full end-to-end lifecycle with:
💡Define and understand use cases for fine-tuning
🧑🏻‍💻 Setup of the development environment
🧮 Create and prepare dataset (OpenAI format)
🏋️‍♀️ Fine-tune LLM using TRL and the SFTTrainer
🥇 Test and evaluate the LLM
🚀 Deploy for production with TGI

👉 https://www.philschmid.de/fine-tune-llms-in-2024-with-trl

Coming soon: Advanced Guides for multi-GPU/multi-Node full fine-tuning and alignment using DPO & KTO. 🔜

4 replies

·

reacted to gsarti's post with ❤️ over 1 year ago

Post

💥 Today's pick in Interpretability & Analysis of LMs: Fine-grained Hallucination Detection and Editing For Language Models by @abhika-m @akariasai @vidhisha et al.

Authors introduce a new taxonomy for fine-grained annotation of hallucinations in LM generations and propose Factuality Verification with Augmented Knowledge (FAVA), a retrieval-augmented LM fine-tuned to detect and edit hallucinations in LM outputs, outperforming ChatGPT and LLama2 Chat on both detection and editing tasks.

🌐 Website: https://fine-grained-hallucination.github.io
📄 Paper: Fine-grained Hallucination Detection and Editing for Language Models (2401.06855)
🚀 Demo: fava-uw/fava
🤖 Model: fava-uw/fava-model
🔡 Dataset: fava-uw/fava-data

posted an update over 1 year ago

Post

🪟32k-context BERT for embedding and RAG on long corpus

Monarch Mixer is a new architecture to enable long context BERT for large corpus and can be fine-tuned for large context retrieval.

Quite interesting and important as BERT is still the most used LLM in production for "old school" tasks like classification, NER, embeddings, but is also a key component for RAG.

Paper: https://arxiv.org/abs/2310.12109
Blog: https://hazyresearch.stanford.edu/blog/2024-01-11-m2-bert-retrieval
GitHub: https://github.com/HazyResearch/m2

1 reply

·

Daniel Huynh PRO

AI & ML interests

Recent Activity

Organizations

dhuynh95's activity