erikkaum HF staff commited on
Commit
c457f56
·
verified ·
1 Parent(s): c378480

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +79 -3
README.md CHANGED
@@ -1,3 +1,79 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+ # Moonline
6
+
7
+ Moonline is a fork of [moondream2](https://huggingface.co/vikhyatk/moondream2). It combines the image to text generation with a modification of
8
+ [outlines](https://github.com/outlines-dev/outlines) to be able to generate text according to a specific pydantic model.
9
+
10
+ ## Model Details
11
+
12
+ The weights and the model strcture are directly from moondream2. The difference is that the Phi text model is swapped with a Phi model, that
13
+ generates text according to a given structure. Since the outlines API doesn't work directly on embeddings, only the relevant parts are
14
+ copy+pased and modified.
15
+
16
+ ### How to use
17
+
18
+ The best way to start is by cloning the repo and running `example.py`.
19
+ Make sure to set up a virtual enviroment and install the dependencies from the requirements.txt
20
+
21
+ The example.py runs through a simple example of generating a description and a mood for the farm image.
22
+
23
+ ```python
24
+ from PIL import Image
25
+ from transformers import AutoTokenizer
26
+ from pydantic import BaseModel
27
+ from enum import Enum
28
+
29
+ from moonline import Moonline
30
+
31
+ def main():
32
+ class Mood(Enum):
33
+ sad = "sad"
34
+ happy = "happy"
35
+ angry = "angry"
36
+ neutral = "neutral"
37
+
38
+ class ExampleModel(BaseModel):
39
+ description: str
40
+ mood: Mood
41
+
42
+ prompt = f"""
43
+ Your job is to describe the image.
44
+ Please answer in json with the following format: {ExampleModel.__annotations__}
45
+ """
46
+
47
+ image_path = "example.png"
48
+ prompt = prompt
49
+
50
+ model_id = "vikhyatk/moondream2"
51
+ revision = "2024-04-02"
52
+ tokenizer = AutoTokenizer.from_pretrained(model_id, revision=revision)
53
+ moonline = Moonline.from_pretrained(
54
+ model_id,
55
+ revision=revision,
56
+ ).to()
57
+ moonline.eval()
58
+
59
+ image = Image.open(image_path)
60
+ image_embeds = moonline.encode_image(image)
61
+ fsm = moonline.generate_fsm(ExampleModel, tokenizer)
62
+
63
+ answer = moonline.answer_question(image_embeds, prompt, tokenizer, fsm)
64
+ print(f"answer: {answer}")
65
+
66
+
67
+ if __name__ == "__main__":
68
+ main()
69
+ ```
70
+
71
+
72
+ ### Limitations
73
+
74
+ The model hallucinetes especially in cases where a field is given, that doesn't exist in the image.
75
+ This can be alleviated by giving `None` options or guidance in the prompt. But in my experience this doesn't solve the issue fully.
76
+
77
+ Moondream is also not specifically trained on json output. I expect results would be improved by fine-tuning on json descriptions of
78
+ images. Especially cases where missing fields are present.
79
+