Understanding the Interface class
In this section, we will take a closer look at the Interface
class, and understand the
main parameters used to create one.
How to create an Interface
You’ll notice that the Interface
class has 3 required parameters:
Interface(fn, inputs, outputs, ...)
These parameters are:
fn
: the prediction function that is wrapped by the Gradio interface. This function can take one or more parameters and return one or more valuesinputs
: the input component type(s). Gradio provides many pre-built components such as"image"
or"mic"
.outputs
: the output component type(s). Again, Gradio provides many pre-built components e.g."image"
or"label"
.
For a complete list of components, see the Gradio docs . Each pre-built component can be customized by instantiating the class corresponding to the component.
For example, as we saw in the previous section,
instead of passing in "textbox"
to the inputs
parameter, you can pass in a Textbox(lines=7, label="Prompt")
component to create a textbox with 7 lines and a label.
Let’s take a look at another example, this time with an Audio
component.
A simple example with audio
As mentioned earlier, Gradio provides many different inputs and outputs.
So let’s build an Interface
that works with audio.
In this example, we’ll build an audio-to-audio function that takes an audio file and simply reverses it.
We will use for the input the Audio
component. When using the Audio
component,
you can specify whether you want the source
of the audio to be a file that the user
uploads or a microphone that the user records their voice with. In this case, let’s
set it to a "microphone"
. Just for fun, we’ll add a label to our Audio
that says
“Speak here…“.
In addition, we’d like to receive the audio as a numpy array so that we can easily
“reverse” it. So we’ll set the "type"
to be "numpy"
, which passes the input
data as a tuple of (sample_rate
, data
) into our function.
We will also use the Audio
output component which can automatically
render a tuple with a sample rate and numpy array of data as a playable audio file.
In this case, we do not need to do any customization, so we will use the string
shortcut "audio"
.
import numpy as np
import gradio as gr
def reverse_audio(audio):
sr, data = audio
reversed_audio = (sr, np.flipud(data))
return reversed_audio
mic = gr.Audio(source="microphone", type="numpy", label="Speak here...")
gr.Interface(reverse_audio, mic, "audio").launch()
The code above will produce an interface like the one below (if your browser doesn’t ask you for microphone permissions, open the demo in a separate tab.)
You should now be able to record your voice and hear yourself speaking in reverse - spooky 👻!
Handling multiple inputs and outputs
Let’s say we had a more complicated function, with multiple inputs and outputs. In the example below, we have a function that takes a dropdown index, a slider value, and number, and returns an audio sample of a musical tone.
Take a look how we pass a list of input and output components, and see if you can follow along what’s happening.
The key here is that when you pass:
- a list of input components, each component corresponds to a parameter in order.
- a list of output coponents, each component corresponds to a returned value.
The code snippet below shows how three input components line up with the three arguments of the generate_tone()
function:
import numpy as np
import gradio as gr
notes = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]
def generate_tone(note, octave, duration):
sr = 48000
a4_freq, tones_from_a4 = 440, 12 * (octave - 4) + (note - 9)
frequency = a4_freq * 2 ** (tones_from_a4 / 12)
duration = int(duration)
audio = np.linspace(0, duration, duration * sr)
audio = (20000 * np.sin(audio * (2 * np.pi * frequency))).astype(np.int16)
return (sr, audio)
gr.Interface(
generate_tone,
[
gr.Dropdown(notes, type="index"),
gr.Slider(minimum=4, maximum=6, step=1),
gr.Textbox(type="number", value=1, label="Duration in seconds"),
],
"audio",
).launch()
The launch() method
So far, we have used the launch()
method to launch the interface, but we
haven’t really discussed what it does.
By default, the launch()
method will launch the demo in a web server that
is running locally. If you are running your code in a Jupyter or Colab notebook, then
Gradio will embed the demo GUI in the notebook so you can easily use it.
You can customize the behavior of launch()
through different parameters:
inline
- whether to display the interface inline on Python notebooks.inbrowser
- whether to automatically launch the interface in a new tab on the default browser.share
- whether to create a publicly shareable link from your computer for the interface. Kind of like a Google Drive link!
We’ll cover the share
parameter in a lot more detail in the next section!
✏️ Let’s apply it!
Let’s build an interface that allows you to demo a speech-recognition model. To make it interesting, we will accept either a mic input or an uploaded file.
As usual, we’ll load our speech recognition model using the pipeline()
function from 🤗 Transformers.
If you need a quick refresher, you can go back to that section in Chapter 1. Next, we’ll implement a transcribe_audio()
function that processes the audio and returns the transcription. Finally, we’ll wrap this function in an Interface
with the Audio
components for the inputs and just text for the output. Altogether, the code for this application is the following:
from transformers import pipeline
import gradio as gr
model = pipeline("automatic-speech-recognition")
def transcribe_audio(mic=None, file=None):
if mic is not None:
audio = mic
elif file is not None:
audio = file
else:
return "You must either provide a mic recording or a file"
transcription = model(audio)["text"]
return transcription
gr.Interface(
fn=transcribe_audio,
inputs=[
gr.Audio(source="microphone", type="filepath", optional=True),
gr.Audio(source="upload", type="filepath", optional=True),
],
outputs="text",
).launch()
If your browser doesn’t ask you for microphone permissions, open the demo in a separate tab.
That’s it! You can now use this interface to transcribe audio. Notice here that
by passing in the optional
parameter as True
, we allow the user to either
provide a microphone or an audio file (or neither, but that will return an error message).
Keep going to see how to share your interface with others!