diegopacheco's picture
v1 of the app
edeaf50
|
raw
history blame
641 Bytes

Result

  • Multi-models in action
  • Story Telling
    • Given a image
    • Generate the caption for the image
    • Generate an background story for the text
  • Use LLM models:
    • Salesforce/blip-image-captioning-base for image captioning
    • gpt2 for text generation
    • gTTS for text to speech, gTTS is a Python library and CLI tool to interface with Google Translate's text-to-speech API.
    • openai/whisper-large-v2 for speach recognition
    • pipeline/sentiment-analysis task for sentiment analysis of the text story

Result UI:

Audio Result: