GPT-4o-omni-text-audio-image-video

Running

File size: 1,572 Bytes

f9040a4
2878c48
 
f9040a4
 
 
 
 
0b0a63a
f9040a4
 
 
2de4f09
 
 
f9040a4
cea7f26

---
title: 🧠GPT 4o Omni Text Audio Image Video
emoji: 🐠🔬🧠
colorFrom: gray
colorTo: blue
sdk: streamlit
sdk_version: 1.34.0
app_file: app.py
pinned: true
license: mit
---


GPT-4o Documentation:  https://cookbook.openai.com/examples/gpt4o/introduction_to_gpt4o

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

This experimental multi agent mixture of expert system uses a variety of techniques and models to create different combinatorial AI solutions.

Models Used:

Mistral-7B-Instruct
Llama2-7B
Mixtral-8x7B-Instruct
Google Gemma-7B
OpenAI Whisper Small En
OpenAI GPT-4o, Whisper-1
ArXiV Embeddings
The techniques below which are not ML models but AI include:

Speech Synthesis using browser technology
Memory for semantic facts, and episodic emotional and event time series memories
Web integration using the q= standard for search linking allowing comparison of tech giant AI implementations:
Bing then Bing copilot with click 2
Google which does an AI search now
Twitter, the new home for technology discoveries, AI Output and Grok
Wikipedia for fact checking
YouTube
File and metadata integration combining text, audio, image, and video
This app also merges common theories in cognitive AI, AI with python libraries (e.g. NLTK, SKLearn).

The intent is to demonstrate SOTA AI/ML and combinations of Function-Input-Output for interoperability and knowledge management.

This space also serves as an experimental test bed for new technologies mixing it in with old for comparison and integration.

--Aaron