Post
1834
Gemma-3-4B : Image and Video Inference 🖼️🎥
🧤Space: prithivMLmods/Gemma-3-Multimodal
@gemma3-4b : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}
+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
Additionally, I have also tested Aya-Vision 8B vs Custom Qwen2-VL-OCR for OCR with test case samples on messy handwriting for experimental purposes to optimize edge device VLMs for Optical Character Recognition.
📜Read the blog here: https://huggingface.co/blog/prithivMLmods/aya-vision-vs-qwen2vl-ocr-2b
🧤Space: prithivMLmods/Gemma-3-Multimodal
@gemma3-4b : {Tag + Space_+ 'prompt'}
@video-infer : {Tag + Space_+ 'prompt'}
+ Gemma3-4B : google/gemma-3-4b-it
+ By default, it runs : prithivMLmods/Qwen2-VL-OCR-2B-Instruct
Gemma 3 Technical Report : https://storage.googleapis.com/deepmind-media/gemma/Gemma3Report.pdf
Additionally, I have also tested Aya-Vision 8B vs Custom Qwen2-VL-OCR for OCR with test case samples on messy handwriting for experimental purposes to optimize edge device VLMs for Optical Character Recognition.
📜Read the blog here: https://huggingface.co/blog/prithivMLmods/aya-vision-vs-qwen2vl-ocr-2b