5 years ago, we launched Gradio as a simple Python library to let researchers at Stanford easily demo computer vision models with a web interface.
Today, Gradio is used by >1 million developers each month to build and share AI web apps. This includes some of the most popular open-source projects of all time, like Automatic1111, Fooocus, Oobaboogaโs Text WebUI, Dall-E Mini, and LLaMA-Factory.
How did we get here? How did Gradio keep growing in the very crowded field of open-source Python libraries? I get this question a lot from folks who are building their own open-source libraries. This post distills some of the lessons that I have learned over the past few years:
1. Invest in good primitives, not high-level abstractions 2. Embed virality directly into your library 3. Focus on a (growing) niche 4. Your only roadmap should be rapid iteration 5. Maximize ways users can consume your library's outputs
1. Invest in good primitives, not high-level abstractions
When we first launched Gradio, we offered only one high-level class (gr.Interface), which created a complete web app from a single Python function. We quickly realized that developers wanted to create other kinds of apps (e.g. multi-step workflows, chatbots, streaming applications), but as we started listing out the apps users wanted to build, we realized what we needed to do:
Huge week for xet-team as Llama 4 is the first major model on Hugging Face uploaded with Xet providing the backing! Every byte downloaded comes through our infrastructure.
Using Xet on Hugging Face is the fastest way to download and iterate on open source models and we've proved it with Llama 4 giving a boost of ~25% across all models.
We expect builders on the Hub to see even more improvements, helping power innovation across the community.
With the models on our infrastructure, we can peer in and see how well our dedupe performs across the Llama 4 family. On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. The attached image shows a few selected models and how they perform on Xet.
Thanks to the meta-llama team for launching on Xet!
It's called ๐๐ข๐๐๐๐ง๐จ๐ฌ๐ข๐ฌ and is a lightweight framework that helps you ๐ฑ๐ถ๐ฎ๐ด๐ป๐ผ๐๐ฒ ๐๐ต๐ฒ ๐ฝ๐ฒ๐ฟ๐ณ๐ผ๐ฟ๐บ๐ฎ๐ป๐ฐ๐ฒ ๐ผ๐ณ ๐๐๐ ๐ ๐ฎ๐ป๐ฑ ๐ฟ๐ฒ๐๐ฟ๐ถ๐ฒ๐๐ฎ๐น ๐บ๐ผ๐ฑ๐ฒ๐น๐ ๐ถ๐ป ๐ฅ๐๐ ๐ฎ๐ฝ๐ฝ๐น๐ถ๐ฐ๐ฎ๐๐ถ๐ผ๐ป๐.
You can launch it as an application locally (it's Docker-ready!๐) or, if you want more flexibility, you can integrate it in your code as a python package๐ฆ
The workflow is simple: ๐ง You choose your favorite LLM provider and model (supported, for now, are Mistral AI, Groq, Anthropic, OpenAI and Cohere) ๐ง You pick the embedding models provider and the embedding model you prefer (supported, for now, are Mistral AI, Hugging Face, Cohere and OpenAI) ๐ You prepare and provide your documents โ๏ธ Documents are ingested into a Qdrant vector database and transformed into a synthetic question dataset with the help of LlamaIndex ๐ The LLM is evaluated for the faithfulness and relevancy of its retrieval-augmented answer to the questions ๐ The embedding model is evaluated for hit rate and mean reciprocal ranking (MRR) of the retrieved documents
And the cool thing is that all of this is ๐ถ๐ป๐๐๐ถ๐๐ถ๐๐ฒ ๐ฎ๐ป๐ฑ ๐ฐ๐ผ๐บ๐ฝ๐น๐ฒ๐๐ฒ๐น๐ ๐ฎ๐๐๐ผ๐บ๐ฎ๐๐ฒ๐ฑ: you plug it in, and it works!๐โก
Even cooler? This is all built on top of LlamaIndex and its integrations: no need for tons of dependencies or fancy workarounds๐ฆ And if you're a UI lover, Gradio and FastAPI are there to provide you a seamless backend-to-frontend experience๐ถ๏ธ