Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
dhuynh95 
posted an update Mar 4
Post
🌊 Released #LaVague, fullly open-source AI pipeline to turn natural language into browser actions!

In less than 150 lines of code (RAG with local embedding + Zephyr-7b-Gemma locally or Mixtral on HF Inference API), it generates #Selenium code from user query. In this GIF you can see it follow user instructions to command a browser to browse HF website!

Try it on Colab: colab.research.google.com/github/dhuynh95/LaVague/blob/main/LaVague.ipynb
GitHub: github.com/dhuynh95/LaVague

Pretty exciting how it becomes possible to create an AI assistant that could perform actions for us, such as logging on gov accounts, fill forms, or pull personal information!

It was quite fun to hack in the weekend using open-source tools, from @huggingface local embedding with transformers for local inference or HF Inference API, to RAG with @llama_index, through @MistralAI Mixtral model!

Some challenges: to make it run on Colab for the #GPU Poors, I first resorted to @huggingface Inference API with Mixtral as it was the only model good enough (gemma-7b did not make it and refused to produce code). But after some experimentations, I managed to make it work a local Zephyr-7b-Gemma so that people could run this assistant fully locally!

Because I used an off-the-shelf model, I had to improve performance with few-shot learning and Chain Of Thought, which managed to generate appropriate code!

I hope this project will herald a new dawn where transparent, private and local AI assistants help automate menial but critical tasks, such as helping fill taxes, book accomodation, or research information for us.

this is incredible 🤯

·

Thanks! Preparing a tutorial too to explain how we managed to have a working solution in <150 lines of code :D

@dhuynh95 -- Very cool, any plans to add vision models such as Fireworks Llava version or Moondream 2 to the mix? Thoughts on Playwright or Puppeteer?

incredible, there are some similar usages before like turn human instructions to robots code in 'Coscientist', maybe this can work better.