It's in the same repo, uploaded with the tag "2024-07-23" you can pass in as revision
when instantiating the model.
vik PRO
vikhyatk
AI & ML interests
None yet
Organizations
vikhyatk's activity
replied to
their
post
10 days ago
replied to
their
post
about 1 month ago
Yup, currently working on getting a clean GGML implementation working here so I can then go and figure out what's going on in llama.cpp. https://github.com/vikhyat/moondream/tree/vik-ggml
posted
an
update
about 1 month ago
Post
2433
๐ Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! ๐
Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (โ103%)
* TextVQA: 60.2 (โ5.2%)
* GQA: 64.9 (โ2.9%)
What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.
Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!
Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.
Curious to try it out? Try out the live demo here! https://moondream.ai/playground
Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (โ103%)
* TextVQA: 60.2 (โ5.2%)
* GQA: 64.9 (โ2.9%)
What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.
Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!
Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.
Curious to try it out? Try out the live demo here! https://moondream.ai/playground
posted
an
update
3 months ago
Post
3621
Disappointed that Golden Gate Claude couldn't process images? Want to learn how to use activation vectors to steer VLMs?
Try out the vikhyatk/contemplative-moondream space, and check out the notebook I released showing how to obtain control vectors! โฌ๏ธ
https://github.com/vikhyat/moondream/blob/main/notebooks/RepEng.ipynb
Try out the vikhyatk/contemplative-moondream space, and check out the notebook I released showing how to obtain control vectors! โฌ๏ธ
https://github.com/vikhyat/moondream/blob/main/notebooks/RepEng.ipynb
posted
an
update
3 months ago
Post
3035
Just released a new version of
vikhyatk/moondream2 - now supporting higher resolution images (up to 756x756)!
TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.
TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.
posted
an
update
3 months ago
Post
1727
Cool new dataset from
@isidentical
-
isidentical/moondream2-coyo-5M-captions
The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.
The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.
posted
an
update
4 months ago
Post
3024
Updated the
vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!
posted
an
update
5 months ago
Post
3308
Released a new version of
vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.
posted
an
update
5 months ago
Post
3663
Just released a notebook showing how to finetune moondream: https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb
replied to
their
post
5 months ago
Definitely, I'm planning to set up a blog some time soon.
posted
an
update
5 months ago
Post
New moondream update out with significantly improved OCR performance (among other benchmarks)!
vikhyatk/moondream2
vikhyatk/moondream2
posted
an
update
6 months ago
Post
Released updated weights for moondream2 today, with significantly improved benchmark scores!
vikhyatk/moondream2
vikhyatk/moondream2
vikhyatk/moondream2
vikhyatk/moondream2
posted
an
update
6 months ago
Post
Just released moondream2 - a small 1.8B parameter vision language model. Now fully open source (Apache 2.0) so you can use it without restrictions on commercial use!
vikhyatk/moondream2
vikhyatk/moondream2