erikkaum
/

moonline

Image-to-Text

Model card Files Files and versions Community

erikkaum HF staff commited on May 7, 2024

Commit

c457f56

verified ·

1 Parent(s): c378480

Update README.md

Browse files

Files changed (1) hide show

README.md +79 -3

README.md CHANGED Viewed

@@ -1,3 +1,79 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+# Moonline
+Moonline is a fork of [moondream2](https://huggingface.co/vikhyatk/moondream2). It combines the image to text generation with a modification of
+[outlines](https://github.com/outlines-dev/outlines) to be able to generate text according to a specific pydantic model.
+## Model Details
+The weights and the model strcture are directly from moondream2. The difference is that the Phi text model is swapped with a Phi model, that
+generates text according to a given structure. Since the outlines API doesn't work directly on embeddings, only the relevant parts are
+copy+pased and modified.
+### How to use
+The best way to start is by cloning the repo and running `example.py`.
+Make sure to set up a virtual enviroment and install the dependencies from the requirements.txt
+The example.py runs through a simple example of generating a description and a mood for the farm image.
+```python
+from PIL import Image
+from transformers import AutoTokenizer
+from pydantic import BaseModel
+from enum import Enum
+from moonline import Moonline
+def main():
+    class Mood(Enum):
+        sad = "sad"
+        happy = "happy"
+        angry = "angry"
+        neutral = "neutral"
+    class ExampleModel(BaseModel):
+        description: str
+        mood: Mood
+    prompt = f"""
+    Your job is to describe the image.
+    Please answer in json with the following format: {ExampleModel.__annotations__}
+    """
+    image_path = "example.png"
+    prompt = prompt
+    model_id = "vikhyatk/moondream2"
+    revision = "2024-04-02"
+    tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
+    moonline = Moonline.from_pretrained(
+        model_id,
+        revision=revision,
+    ).to()
+    moonline.eval()
+    image = Image.open(image_path)
+    image_embeds = moonline.encode_image(image)
+    fsm = moonline.generate_fsm(ExampleModel, tokenizer)
+    answer = moonline.answer_question(image_embeds, prompt, tokenizer, fsm)
+    print(f"answer: {answer}")
+if __name__ == "__main__":
+    main()
+```
+### Limitations
+The model hallucinetes especially in cases where a field is given, that doesn't exist in the image.
+This can be alleviated by giving `None` options or guidance in the prompt. But in my experience this doesn't solve the issue fully.
+Moondream is also not specifically trained on json output. I expect results would be improved by fine-tuning on json descriptions of
+images. Especially cases where missing fields are present.