prithivMLmods commited on
Commit
a7a3bcd
·
verified ·
1 Parent(s): 40d6e76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md CHANGED
@@ -13,3 +13,108 @@ tags:
13
  ---
14
 
15
  ![10.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/tCeb4MVz79GCiOmMxkLcN.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14
 
15
  ![10.png](https://cdn-uploads.huggingface.co/production/uploads/65bb837dbfb878f46c77de4c/tCeb4MVz79GCiOmMxkLcN.png)
16
+ Below is the updated documentation and usage example for **LabelPRO** – an image caption and annotation generator that provides detailed annotations in structured JSON format based on the given image.
17
+
18
+ ---
19
+
20
+ # **LabelPRO**
21
+
22
+ **LabelPRO** is an advanced image caption and annotation generator optimized for generating detailed, structured JSON outputs. Built upon a powerful vision-language architecture with enhanced OCR and multilingual support, LabelPRO extracts high-quality captions and annotations from images for seamless integration into your applications.
23
+
24
+ #### Key Enhancements:
25
+
26
+ * **Advanced Image Understanding**: Fine-tuned on millions of annotated images, LabelPRO delivers precise comprehension and interpretation of visual content.
27
+ * **Optimized for JSON Output**: Produces structured JSON data containing captions and detailed annotations—perfect for integration with databases, APIs, and automation pipelines.
28
+ * **Enhanced OCR Capabilities**: Accurately extracts textual content from images in multiple languages, including English, Chinese, Japanese, Korean, Arabic, and more.
29
+ * **Multimodal Processing**: Seamlessly handles both image and text inputs, generating comprehensive annotations based on the provided image.
30
+ * **Multilingual Support**: Recognizes and processes text within images across various languages.
31
+ * **Secure and Optimized Model Weights**: Employs safetensors for efficient and secure model loading.
32
+
33
+ ---
34
+
35
+ ### How to Use
36
+
37
+ ```python
38
+ from transformers import Qwen2VLForConditionalGeneration, AutoTokenizer, AutoProcessor
39
+ from qwen_vl_utils import process_vision_info
40
+
41
+ # Load the LabelPRO model with optimized parameters
42
+ model = Qwen2VLForConditionalGeneration.from_pretrained(
43
+ "prithivMLmods/LabelPRO", torch_dtype="auto", device_map="auto"
44
+ )
45
+
46
+ # Recommended acceleration for performance optimization:
47
+ # model = Qwen2VLForConditionalGeneration.from_pretrained(
48
+ # "prithivMLmods/LabelPRO",
49
+ # torch_dtype=torch.bfloat16,
50
+ # attn_implementation="flash_attention_2",
51
+ # device_map="auto",
52
+ # )
53
+
54
+ # Load the default processor for LabelPRO
55
+ processor = AutoProcessor.from_pretrained("prithivMLmods/LabelPRO")
56
+
57
+ # Define the input messages with both an image and a text prompt
58
+ messages = [
59
+ {
60
+ "role": "user",
61
+ "content": [
62
+ {
63
+ "type": "image",
64
+ "image": "https://flux-generated.com/sample_image.jpeg",
65
+ },
66
+ {"type": "text", "text": "Provide detailed captions and annotations for this image in JSON format."},
67
+ ],
68
+ }
69
+ ]
70
+
71
+ # Prepare the input for inference
72
+ text = processor.apply_chat_template(
73
+ messages, tokenize=False, add_generation_prompt=True
74
+ )
75
+ image_inputs, video_inputs = process_vision_info(messages)
76
+ inputs = processor(
77
+ text=[text],
78
+ images=image_inputs,
79
+ videos=video_inputs,
80
+ padding=True,
81
+ return_tensors="pt",
82
+ )
83
+ inputs = inputs.to("cuda")
84
+
85
+ # Generate the output
86
+ generated_ids = model.generate(**inputs, max_new_tokens=256)
87
+ generated_ids_trimmed = [
88
+ out_ids[len(in_ids) :] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
89
+ ]
90
+ output_text = processor.batch_decode(
91
+ generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
92
+ )
93
+ print(output_text)
94
+ ```
95
+
96
+ ---
97
+
98
+ ### **Key Features**
99
+
100
+ 1. **Annotation-Ready Training Data**
101
+ - Trained using a diverse dataset of annotated images to ensure high-quality structured output.
102
+
103
+ 2. **Optical Character Recognition (OCR)**
104
+ - Robustly extracts and processes text from images in various languages and scripts.
105
+
106
+ 3. **Structured JSON Output**
107
+ - Generates detailed captions and annotations in standardized JSON format for easy downstream integration.
108
+
109
+ 4. **Image & Text Processing**
110
+ - Capable of handling both visual and textual inputs, delivering comprehensive and context-aware annotations.
111
+
112
+ 5. **Conversational Annotation Generation**
113
+ - Supports multi-turn interactions, enabling detailed and iterative refinement of annotations.
114
+
115
+ 6. **Secure and Efficient Model Weights**
116
+ - Uses safetensors for enhanced security and optimized model performance.
117
+
118
+ ---
119
+
120
+ **LabelPRO** streamlines the process of generating image captions and annotations, making it an ideal solution for applications that require detailed visual content analysis and structured data integration.