awacke1/GPT-4o-omni-text-audio-image-video · Updated app.py and also capabilities

New capabilities:

Images now auto save with new name from original plus 10 entity names found in image. This changes the filename to be input content with self describing filenames which double as context prompts.
Video now auto saves as well. After text generation from image and video markdown file will show on sidebar, area below will spot duplicates, and you can resize images and videos and use the new filenames as jump points to Arxiv and other multiagent system ai pipeline flows.

README updated as well:

GPT-4o Documentation: https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

This experimental multi agent mixture of expert system uses a variety of techniques and models to create different combinatorial AI solutions.

Eight Models Are Used In This Space:

Mistral-7B-Instruct
Llama2-7B
Mixtral-8x7B-Instruct
Google Gemma-7B
OpenAI Whisper Small En
OpenAI GPT-4o,
Whisper-1
ArXiV Embeddings

The techniques below which are not ML models but AI include:

Speech Synthesis using browser technology Memory for semantic facts, and episodic emotional and event time series memories Web integration using the q= standard for search linking allowing comparison of tech giant AI implementations: Bing then Bing copilot with click 2 Google which does an AI search now Twitter, the new home for technology discoveries, AI Output and Grok Wikipedia for fact checking YouTube File and metadata integration combining text, audio, image, and video This app also merges common theories in cognitive AI, AI with python libraries (e.g. NLTK, SKLearn).

The intent is to demonstrate SOTA AI/ML and combinations of Function-Input-Output for interoperability and knowledge management.

This space also serves as an experimental test bed for new technologies mixing it in with old for comparison and integration.

--Aaron