NexaAIDev
/

omnivision-968M

Image-Text-to-Text

Inference Endpoints

Model card Files Files and versions Community

alanzhuly commited on about 24 hours ago

Commit

48fbc9a

•

1 Parent(s): 39535c4

Update README.md

Files changed (1) hide show

README.md +7 -2

README.md CHANGED Viewed

@@ -87,8 +87,13 @@ We enhance the model's contextual understanding using image-based question-answe
 **Direct Preference Optimization (DPO):**
 The final stage implements DPO by first generating responses to images using the base model. A teacher model then produces minimally edited corrections while maintaining high semantic similarity with the original responses, focusing specifically on accuracy-critical elements. These original and corrected outputs form chosen-rejected pairs. The fine-tuning targeted at essential model output improvements without altering the model's core response characteristics
-## What's next?
-We are continually improving Omnivision for better on-device performance. Stay tuned.
 ### Follow us
 [Blogs](https://nexa.ai) | [Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/alanzhuly)

 **Direct Preference Optimization (DPO):**
 The final stage implements DPO by first generating responses to images using the base model. A teacher model then produces minimally edited corrections while maintaining high semantic similarity with the original responses, focusing specifically on accuracy-critical elements. These original and corrected outputs form chosen-rejected pairs. The fine-tuning targeted at essential model output improvements without altering the model's core response characteristics
+## What's next for Omnivision?
+Omnivision is in early development and we are working to address current limitations:
+- Expand DPO Training: Increase the scope of DPO (Direct Preference Optimization) training in an iterative process to continually improve model performance and response quality.
+- Develop an Action + Conversation Model: Leverage Omnivision’s vision and conversational capacities to build an action model capable of understanding and interacting with visual and text inputs.
+- Improve document and text understanding
+In the long term, we aim to develop Omnivision as a fully optimized, production-ready solution for edge AI multimodal applications.
 ### Follow us
 [Blogs](https://nexa.ai) | [Discord](https://discord.gg/nexa-ai) | [X(Twitter)](https://x.com/alanzhuly)