# Project IO Achievement - UI Design (Oral Exam)

## Problem Definition

In this issue, we'll create a gradio app (through Google Colab) which implements most of the "oral exam" component of this project. Namely:

- The user should be able to start/stop the recording. In other words, they press some button to start recording, record their answer, and then press stop recording.

- This should be transcribed by Whisper

- This should be sent to GPT4 for some sort of analysis.

We will integrate this with the rest of the functionality in the repo to fill out part (3) in a later issue.

## Libraries

This section will install and import some important libraries such as Langchain, openai, Gradio, and so on

In [1]:
# install libraries here
# -q flag for "quiet" install
%%capture
!pip install -q langchain
!pip install -q openai
!pip install -q gradio
!pip install -q transformers
!pip install -q datasets
!pip install -q huggingsound
!pip install -q torchaudio
!pip install -q git+https://github.com/openai/whisper.git

In [2]:
# import libraries here
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.document_loaders import TextLoader
from langchain.indexes import VectorstoreIndexCreator
from langchain import ConversationChain, LLMChain, PromptTemplate
from langchain.chat_models import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.prompts import ChatPromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
import openai
import os
from getpass import getpass
from IPython.display import display, Javascript, HTML
from google.colab.output import eval_js
from base64 import b64decode
import ipywidgets as widgets
from IPython.display import clear_output
import time
import requests
from transformers import WhisperProcessor, WhisperForConditionalGeneration
from datasets import load_dataset
# from torchaudio.transforms import Resample
import whisper
from huggingsound import SpeechRecognitionModel
import numpy as np
import torch
import librosa
from datasets import load_dataset
from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor
from jiwer import wer
import pandas as pd
from IPython.display import display, HTML
import gradio as gr
from transformers import pipeline

## API Keys

Use these cells to load the API keys required for this notebook. The below code cell uses the `getpass` library.

In [3]:
openai_api_key = getpass()
os.environ["OPENAI_API_KEY"] = openai_api_key
openai.api_key = openai_api_key

··········


## Prompt Design

To be added

In [4]:
chat = ChatOpenAI(temperature=0.0, model_name='gpt-4')
chat

ChatOpenAI(cache=None, verbose=False, callbacks=None, callback_manager=None, tags=None, metadata=None, client=, model_name='gpt-4', temperature=0.0, model_kwargs={}, openai_api_key='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', openai_api_base='', openai_organization='', openai_proxy='', request_timeout=None, max_retries=6, streaming=False, n=1, max_tokens=None, tiktoken_model_name=None)

In [5]:
template_string = """
You are an expert in {expertise}.

Please translate the following text into {language} language in {style} style. \

The text that need to be transcribed is shown below: {transcribed_text}. \
"""

In [6]:
prompt_template = ChatPromptTemplate.from_template(template_string)

In [7]:
# prompt_template.messages[0].prompt
prompt_template.messages[0].prompt.input_variables

['expertise', 'language', 'style', 'transcribed_text']

## UI Design

https://colab.research.google.com/github/petewarden/openai-whisper-webapp/blob/main/OpenAI_Whisper_ASR_Demo.ipynb

### Whisper Small

In [10]:
p = pipeline("automatic-speech-recognition", model="openai/whisper-small")

# def transcribe(audio):
# text = openai.Audio.transcribe("whisper-1", audio)['text']
# return text

def transcribe(audio):
 text = p(audio)["text"]
 test_input1 = prompt_template.format_messages(
 expertise='Language Translation',
 language='Japanese',
 style='romantic',
 transcribed_text=text)

 response = chat.predict_messages(test_input1)
 return text, response.content

Downloading (…)lve/main/config.json: 0%| | 0.00/1.97k [00:00, ?B/s]

Downloading pytorch_model.bin: 0%| | 0.00/967M [00:00, ?B/s]

Downloading (…)neration_config.json: 0%| | 0.00/3.51k [00:00, ?B/s]

Downloading (…)okenizer_config.json: 0%| | 0.00/842 [00:00, ?B/s]

Downloading (…)olve/main/vocab.json: 0%| | 0.00/1.04M [00:00, ?B/s]

Downloading (…)/main/tokenizer.json: 0%| | 0.00/2.20M [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0%| | 0.00/494k [00:00, ?B/s]

Downloading (…)main/normalizer.json: 0%| | 0.00/52.7k [00:00, ?B/s]

Downloading (…)in/added_tokens.json: 0%| | 0.00/2.08k [00:00, ?B/s]

Downloading (…)cial_tokens_map.json: 0%| | 0.00/2.08k [00:00, ?B/s]

Downloading (…)rocessor_config.json: 0%| | 0.00/185k [00:00, ?B/s]

In [9]:
gr.Interface(
 fn=transcribe,
 inputs=gr.Audio(source="microphone", type="filepath"),
 outputs=[
 gr.Textbox(label="Transcription"),
 gr.Textbox(label="Translation")
 ],
 title="Speech to Text using OpenAI Whisper",
 description="Speak into the microphone and have the spoken words transcribed into text and translated"
).launch()


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.






### Whisper API

In [21]:
def transcribe(audio_file_path):
 with open(audio_file_path, "rb") as audio_file:
 # Call OpenAI's Whisper model for transcription
 transcript = openai.Audio.transcribe("whisper-1", audio_file)
 transcribed_text = transcript["text"]
 return transcribed_text

def translate(text):
 # Create a prompt template (This will be changed later to fit the actual task)
 # Here translation will be a filler task of GPT
 test_input1 = prompt_template.format_messages(
 expertise='Language Translation',
 language='Japanese',
 style='romantic',
 transcribed_text=text)

 response = chat.predict_messages(test_input1)
 return response.content

In [22]:
with gr.Blocks() as demo:
 gr.Markdown("# Oral Exam App")
 with gr.Box():
 gr.HTML("""Embed your OpenAI API key below; if you haven't created one already, visit
 platform.openai.com/account/api-keys
 to sign up for an account and get your personal API key""",
 elem_classes="textbox_label")
 input = gr.Textbox(show_label=False, type="password", container=False,
 placeholder="●●●●●●●●●●●●●●●●●")

 with gr.Blocks():
 gr.Markdown("## Upload your audio file or start recording")

 with gr.Row():


 with gr.Column():
 file_input = gr.Files(label="Load a mp3 file",
 file_types=['.mp3'], type="file",
 elem_classes="short-height")
 record_inputs = gr.Audio(source="microphone", type="filepath")

 with gr.Column():
 outputs_transcribe=gr.Textbox(label="Transcription")

 with gr.Row():
 btn1 = gr.Button(value="Transcribe recorded audio")
 btn1.click(transcribe, inputs=record_inputs, outputs=outputs_transcribe)
 btn2 = gr.Button(value="Transcribe uploaded audio")
 btn2.click(transcribe, inputs=file_input, outputs=outputs_transcribe)

 outputs_translate=gr.Textbox(label="Translation")
 btn3 = gr.Button(value="Translate")
 btn3.click(translate, inputs=outputs_transcribe, outputs=outputs_translate)

 demo.launch()


Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Note: opening Chrome Inspector may crash demo inside Colab notebooks.

To create a public link, set `share=True` in `launch()`.


