vikhyatk (Vik Korrapati)

posted an update 3 months ago

Post

3347

🚨 New VQA + captioning dataset! moondream/megalith-mdqa

Images from Megalith, captioned using Moondream, then transformed to short-form QA.

9M+ images, 6-10 QA pairs per image.

reacted to Xenova's post with ❤️🔥 5 months ago

Post

5850

Have you tried out 🤗 Transformers.js v3? Here are the new features:
⚡ WebGPU support (up to 100x faster than WASM)
🔢 New quantization formats (dtypes)
🏛 120 supported architectures in total
📂 25 new example projects and templates
🤖 Over 1200 pre-converted models
🌐 Node.js (ESM + CJS), Deno, and Bun compatibility
🏡 A new home on GitHub and NPM

Get started with npm i @huggingface/transformers.

Learn more in our blog post: https://huggingface.co/blog/transformersjs-v3

3 replies

·

posted an update 6 months ago

Post

4870

Just released a dataset with 7000+ hours of synthetically generated lo-fi music. vikhyatk/lofi

posted an update 8 months ago

Post

6668

Pushed a new update to vikhyatk/moondream2 today. TextVQA up from 60.2 to 65.2, DocVQA up from 61.9 to 70.5.

Space has been updated to the new model if you want to try it out! vikhyatk/moondream2

reacted to Csplk's post with 🔥 8 months ago

Post

2653

# Offensive Security Reconnaissance Continued with Public Facing Industrial Control System HMIs using Moondream

Building on my previous experiments with Moondream for physical security reconnaissance planning automation (https://huggingface.co/posts/Csplk/926337297827024), I've now turned my attention to exploring the potential of this powerful image-text-text model for offensive security reconnaissance in the realm of Industrial Control Systems (ICS).
ICS HMIs (Human-Machine Interfaces) are increasingly exposed to the public internet, often without adequate security measures in place. This presents a tantalizing opportunity for malicious actors to exploit vulnerabilities and gain unauthorized access to critical infrastructure.

Using Moondream with batch processing ( Csplk/moondream2-batch-processing), I've been experimenting with analyzing public facing ICS ( Csplk/ICS_UIs) HMI ( Csplk/HMI) screenshots from shodan to identify types of exposed ICS system HMIs, how they are operated and how malicious actors with access to these systems could cause damage to physical infrastructure. Feeding images of HMIs and pre-defined text prompts to Moondream batch processing successfully (unconfirmed accuracy levels) extracted information about the underlying systems, including

1. **System type**
2. **Possible Operation Details**
3. **Malicious Actor Outcomes**

Next steps:
* I have a longer and more in depth blog write up in the works that will cover the previous and this post's approaches for experiments for sharing via HF community blog posts soon.
* I plan to continue refining my Moondream-based tool to improve its accuracy and effectiveness in processing public facing ICS HMIs.
* As mentioned before, offensive security with moondream focused HF Space once its fleshed out.

Thanks again to @vikhyatk for the incredible Moondream model. vikhyatk/moondream2

replied to their post 9 months ago

It's in the same repo, uploaded with the tag "2024-07-23" you can pass in as revision when instantiating the model.

replied to their post 9 months ago

Yup, currently working on getting a clean GGML implementation working here so I can then go and figure out what's going on in llama.cpp. https://github.com/vikhyat/moondream/tree/vik-ggml

posted an update 9 months ago

Post

3317

🚀 Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! 🌙

Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (↑103%)
* TextVQA: 60.2 (↑5.2%)
* GQA: 64.9 (↑2.9%)

What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.

Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!

Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.

Curious to try it out? Try out the live demo here! https://moondream.ai/playground

4 replies

·

reacted to Csplk's post with 🤯 9 months ago

Post

1436

# Offensive Physical Security Reconnaissance Planning Automation with public facing RTSP streams and Moondream

After some late night casual hacking about on VLMs for criminal attack vector reconnaissance automaton experiments using Moondream (as usual) based image-text-text with pre defined text prompts that are tuned for extracting weakness or customer identity and monitory based theft physical red team engagement reconnaissance and vector of malicious or criminal activity Working on a space. Thanks again for such a wonderful blessing of super power image-text-to-text model with minimal computational power needed @vikhyatk

I have started actually implementing a custom little tool with both static html space sand python gradio spaces on the go which I shall share as hf spaces when done them.

---

vikhyatk/moondream2

vikhyatk/moondream2

1 reply

·

posted an update 11 months ago

Post

3750

Disappointed that Golden Gate Claude couldn't process images? Want to learn how to use activation vectors to steer VLMs?

Try out the vikhyatk/contemplative-moondream space, and check out the notebook I released showing how to obtain control vectors! ⬇️

https://github.com/vikhyat/moondream/blob/main/notebooks/RepEng.ipynb

posted an update 11 months ago

Post

3104

Just released a new version of vikhyatk/moondream2 - now supporting higher resolution images (up to 756x756)!

TextVQA score (which measures the model's ability to read and reason about text in images) is up from 53.1 to 57.2 (+7.7%). Other visual question answering and counting benchmark results are up ~0.5%.

posted an update 12 months ago

Post

1788

Cool new dataset from @isidentical - isidentical/moondream2-coyo-5M-captions

The VeCLIP paper showed a +3% gain while only using 14% of the data by synthetically captioning like this. You get diversity from the alt text (middle column) without having to deal with all of the noise.

1 reply

·

posted an update about 1 year ago

Post

3094

Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!

reacted to radames's post with ❤️🔥 about 1 year ago

Post

2782

Following up on @vikhyatk 's Moondream2 update and @santiagomed 's implementation on Candle, I quickly put togheter the WASM module so that you could try running the ~1.5GB quantized model in the browser. Perhaps the next step is to rewrite it using https://github.com/huggingface/ratchet and run it even faster with WebGPU, @FL33TW00D-HF .

radames/Candle-Moondream-2

ps: I have a collection of all Candle WASM demos here radames/candle-wasm-examples-650898dee13ff96230ce3e1f

posted an update about 1 year ago

Post

3353

Released a new version of vikhyatk/moondream2 today! Primarily focused on improving OCR and captioning (e.g. "Describe this image", "Describe this image in one sentence"), but also seeing general improvement across all benchmarks.

1 reply

·

posted an update about 1 year ago

Post

3793

Just released a notebook showing how to finetune moondream: https://github.com/vikhyat/moondream/blob/main/notebooks/Finetuning.ipynb

posted an update about 1 year ago

Post

2267

Just released a dataset with 1.5M image question/answers! vikhyatk/lnqa

replied to their post about 1 year ago

Definitely, I'm planning to set up a blog some time soon.

Vik Korrapati PRO

AI & ML interests

Recent Activity

Organizations

vikhyatk's activity