Revolutionizing Video Transcription: Unveiling Gemma-2b-it and Langchain in the Era of Transformers

Community Article Published March 12, 2024

image/png

Introduction

In the vast realm of technological innovation, where the fusion of language and artificial intelligence dances seamlessly, the emergence of transformers models like Gemma-2b-it and the integration of langchain have paved the way for a revolutionary approach to video transcription, unveiling the future of transcription. Let us embark on a journey through the realms of these cutting-edge technologies, exploring definitions, unraveling the benefits, and delving into the intricacies of their code implementation.

In a world fueled by the constant evolution of technology, video transcription has become an indispensable tool for communication, education, and accessibility. Traditional transcription methods often grapple with challenges, such as accuracy, speed, and scalability. Enter the era of transformers, where Gemma-2b-it and langchain shine as beacons of innovation, promising to redefine the landscape of video transcription.

image/png

Definitions: Decoding the Transformers and Langchain Mystique

Gemma-2b-it, a state-of-the-art transformers model, harnesses the power of deep learning and natural language processing to comprehend and generate human-like text. Its prowess lies in its ability to understand context, nuances, and even adapt to various linguistic intricacies. On the other hand, langchain acts as the linchpin, orchestrating a symphony of linguistic patterns, ensuring a seamless integration of Gemma-2b-it into the transcription process.

Why Integrate Gemma-2b-it and Langchain? Unleashing the Benefits

  1. Unprecedented Accuracy: Gemma-2b-it's advanced neural architecture enables it to discern context and produce remarkably accurate transcriptions, reducing the margin of error to near perfection.

  2. Multilingual Proficiency: Langchain, with its linguistic intelligence, provides the framework to seamlessly integrate Gemma-2b-it into multiple languages, breaking down barriers and fostering global accessibility.

  3. Efficiency in Real-time Transcription: The amalgamation of Gemma-2b-it and langchain propels the transcription process into real-time, catering to the need for instant, on-the-fly conversion of spoken words into written text.

  4. Adaptability and Continuous Improvement: Gemma-2b-it, with its adaptive learning capabilities, ensures that the transcription system evolves over time, learning from its interactions and continuously improving accuracy.

Code Implementation: Weaving the Technological Tapestry

Now, let's delve into the enchanting realm of code implementation. With Gemma-2b-it and langchain at your fingertips, the process becomes a symphony of elegant algorithms and intricate patterns.

image/png

Step I. Install Libraries

!pip install pytube # For audio downloading
!pip install git+https://github.com/openai/whisper.git -q # Whisper from OpenAI transcription model
!pip install langchain
!pip install faiss-cpu
!pip install auto-gptq
!pip install sentence-transformers

Step II: Import Libraries and Audio Extraction

import whisper 
import pytube
from datetime import datetime

url = "https://www.youtube.com/watch?v=Qa_4c9zrxf0"
video = pytube.YouTube(url)

## Extract Audio

audio = video.streams.get_audio_only()
audio.download(filename='tmp.mp3')

Step III: Load Whisper model and forming Transcription

model = whisper.load_model("small")

transcription = model.transcribe('/content/tmp.mp3')

res = transcription['segments']

def store_segments(segments):
    texts = []
    start_times = []

    for segment in segments:
        text = segment['text']
        start = segment['start']

        # Convert the starting time to a datetime object
        start_datetime = datetime.fromtimestamp(start)

        # Format the starting time as a string in the format "00:00:00"
        formatted_start_time = start_datetime.strftime('%H:%M:%S')

        texts.append("".join(text))
        start_times.append(formatted_start_time)

    return texts, start_times

texts, start_times = store_segments(res)

Step IV: Langchain for Splitting and Vectorizing

from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores.faiss import FAISS
from langchain.chains import VectorDBQAWithSourcesChain
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain import LLMChain
from transformers import AutoTokenizer, pipeline, logging, AutoModelForCausalLM
import faiss

#Model
model_name_or_path = "google/gemma-2b-it"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_name_or_path,use_safetensors=True)

#Pipeline
logging.set_verbosity(logging.CRITICAL)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=512,
    temperature=0.1,
    top_p=0.95,
    repetition_penalty=1.15
)

llm = HuggingFacePipeline(pipeline=pipe)

Step V: Embedding formation, Splitting and Storing

embeddings=HuggingFaceEmbeddings(model_name='intfloat/multilingual-e5-small',model_kwargs={'device':'cuda0'})

## Splitting Text
text_splitter = CharacterTextSplitter(chunk_size=1500, separator="\n")
docs = []
metadatas = []
for i, d in enumerate(texts):
    splits = text_splitter.split_text(d)
    docs.extend(splits)
    metadatas.extend([{"source": start_times[i]}] * len(splits))

## Storage
store = FAISS.from_texts(docs, embeddings, metadatas=metadatas)
faiss.write_index(store.index, "docs.index")

Step VI: Querying Stored Data

chain = VectorDBQAWithSourcesChain.from_llm(llm=llm, vectorstore=store)
result = chain({"question": "Elon musk qoute for growing bussinses"})

Conclusion

As we conclude our exploration, the fusion of Gemma-2b-it and langchain paints a masterpiece in the world of video transcription. The harmony of advanced technology, linguistic intelligence, and efficient code implementation opens new doors to unparalleled accuracy, multilingual proficiency, real-time transcription, and continuous improvement.

In your journey towards embracing this transformative duo, may your endeavors be met with success, and may the symphony of innovation echo through the halls of progress, resonating with the hearts of those who seek the power of seamless and inspiring video transcription.

“Stay connected and support my work through various platforms:

Medium: You can read my latest articles and insights on Medium at https://medium.com/@andysingal

Paypal: Enjoyed my article? Buy me a coffee! https://paypal.me/alphasingal?country.x=US&locale.x=en_US"

Requests and questions: If you have a project in mind that you’d like me to work on or if you have any questions about the concepts I’ve explained, don’t hesitate to let me know. I’m always looking for new ideas for future Notebooks and I love helping to resolve any doubts you might have.

Resources: