In-browser LLM app in pure Python: Gemini Nano + Gradio-Lite

Community Article Published July 12, 2024

Google's local LLM Gemini Nano, which runs competely in the browser, is available in recent versions of Chrome Canary. Gradio is a Python package that allows you to create Web UIs upon your models in a few lines of Python code, and Gradio-Lite is its in-browser version that also runs completely in the browser. With a combination of these two, you can create a local LLM-based web app only with Python.

👉 Demo app is here.

Try out Gradio-Lite

Access the Gradio Playground, where you can try out and edit various sample apps using Gradio. If you are not familiar with Gradio, you can also start with its quickstart guide.

The Playground actually uses Gradio-Lite, the in-browser version of Gradio, so the apps there run completely in the browser.

In this article, we are going to write code and preview the app in the Playground, but you can also deploy Gradio-Lite apps in different ways. To learn more about it, reading documents like this would be helpful.

Chat app mockup with Gradio-Lite

Let's see the "Chatbot" example in the Playground. The code is as follows:

import random
import gradio as gr

def random_response(message, history):
    return random.choice(["Yes", "No"])

demo = gr.ChatInterface(random_response)

if __name__ == "__main__":
    demo.launch()

Building a chat interface is very easy with Gradio. You can define a function that takes a message and returns a response, and pass it to gr.ChatInterface.

This example just generates random responses by random_response(). By replacing it with Gemini Nano, you can chat with the LLM!

Use Gemini Nano from Python in the browser

First of all, you need to install Chrome version 127 or higher and enable Gemini Nano by following the "Installation" section in this article. Then open the Gradio Playground again in Chrome Canary where Gemini Nano via the Prompt API is available.

Gemini Nano and the Prompt API are available as JavaScript methods such as window.ai.createTextSession(), so how can we use them in Python with Gradio-Lite? Gradio-Lite uses Pyodide, which is a Python runtime compiled for the browser environment, and your Python code passed to Gradio-Lite is actually executed on the Pyodide runtime. In the Pyodide environment, some special modules are available in Python code and you can use the js module to access JavaScript APIs, as it represents the JavaScript global scope. So, for example, you can access the window object by importing window from js.

from js import window  # This doesn't work in Gradio-Lite. See below.

However, there is another thing to note: the Pyodide runtime of Gradio-Lite runs in a WebWorker isolated from the main browser environment, where the window object is not available. Instead, the Prompt API in the WebWorker environment is available via the self object. So, you can use the Prompt API in the Gradio-Lite environment like this:

from js import self

can_create = await self.ai.canCreateTextSession()  # "readily" if available

Also here is another magic: top-level await can be used in the Gradio-Lite environment unlike in the normal Python environment. This is sometimes necessary to use JavaScript APIs that return promises.

Chat with Gemini Nano

You can find several prompt API samples on the web, and the following is a one with a streaming output. We will use this streaming API in this article for a better chat experience.

// This is a JavaScript sample to use the Prompt API in the browser environment.
const canCreate = await window.ai.canCreateTextSession();

if (canCreate !== "no") {
  const session = await window.ai.createTextSession();

  const stream = session.promptStreaming("Write me an extra-long poem");
  for await (const chunk of stream) {
    console.log(chunk);
  }
}

However, it's written in JavaScript. We have to translate it into Python code that can run in the Gradio-Lite environment, being aware of the following points:

Use js module to access JavaScript APIs.
Use self instead of window.
Top-level await can be used in the Gradio-Lite environment.

This is the final code to chat with Gemini Nano with Gradio's chat interface executed on Gradio-Lite. You can try this out by copying and pasting it into the Playground, or just accessing this link.

import gradio as gr
from js import self

session = None
try:
    can_ai_create = await self.ai.canCreateTextSession()
    if can_ai_create != "no":
        session = await self.ai.createTextSession()
except:
    pass


self.ai_text_session = session


async def prompt(message, history):
    session = self.ai_text_session
    if not session:
        raise Exception("Gemini Nano is not available in your browser.")

    stream = session.promptStreaming(message)
    async for chunk in stream:
        yield chunk


demo = gr.ChatInterface(fn=prompt)

demo.launch()

At the beginning, the session is initialized. This part is just like the JavaScript sample above, but a translated version to Python.

The random_response() function in the first example is replaced with prompt() that uses the Gemini Nano session. This part is also similar to the JavaScript sample, but with Python syntax, where the for await in JS is translated to async for in Python to iterate over the streaming output.

Conclusion