vikhyatk
/

moondream2

Image-Text-to-Text

text-generation

Inference Endpoints

Model card Files Files and versions Community

moondream2 / README.md

vikhyatk's picture

Update README.md

dffbe5f verified 9 months ago

|

1.03 kB

	---
	license: apache-2.0
	---

	moondream2 is a small vision language model designed to run efficiently on edge devices. Check out the [GitHub repository](https://github.com/vikhyat/moondream) for details.

	Benchmarks

	\| Release \| VQAv2 \| GQA \| TextVQA \| POPE \| TallyQA \|
	\| --- \| --- \| --- \| --- \| --- \| --- \|
	\| 2024-03-04 (latest) \| 74.2 \| 58.5 \| 36.4 \| (coming soon) \| (coming soon) \|

	Usage

	```bash
	pip install transformers timm einops
	```

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from PIL import Image

	model_id = "vikhyatk/moondream2"
	model = AutoModelForCausalLM.from_pretrained(
	model_id, trust_remote_code=True, revision="2024-03-04"
	)
	tokenizer = AutoTokenizer.from_pretrained(model_id, revision="2024-03-04")

	image = Image.open('<IMAGE_PATH>')
	enc_image = model.encode_image(image)
	print(model.answer_question(enc_image, "Describe this image.", tokenizer))
	```

	The model is updated regularly, so we recommend pinning the model version to a
	specific release as shown above.