JackChew commited on
Commit
2eb02c8
·
verified ·
1 Parent(s): 582bcd5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -1
README.md CHANGED
@@ -28,7 +28,6 @@ Qwen2-VL-2B-OCR is a fine-tuned variant of unsloth/Qwen2-VL-2B-Instruct, optimiz
28
  This model uses cutting-edge techniques for text-to-text generation from images and works seamlessly for various OCR tasks, including text from complex documents with structured layouts.
29
 
30
  ## Intended Use
31
- Intended Use
32
  The primary purpose of the model is to extract data from images or documents, especially from payslips and tables, without missing any critical details. It can be applied in various domains such as payroll systems, finance, legal document analysis, and any field where document extraction is required.
33
  Prompt Example:
34
  - **text**: The model will BEST WORK to this `"Extract all text from image/payslip without miss anything"`.
@@ -146,8 +145,19 @@ output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, cl
146
  print(output_text)
147
  ```
148
 
 
 
 
 
 
 
 
 
 
149
  ## Model Fine-Tuning Details
150
  The model was fine-tuned using the Unsloth framework, which accelerated training by 2x using Huggingface's TRL (Training Reinforcement Learning) library. LoRA (Low-Rank Adaptation) was applied to fine-tune only a small subset of the parameters, which significantly reduces training time and computational resources. Fine-tuning focused on both vision and language layers, ensuring that the model could handle complex OCR tasks efficiently.
151
 
152
  Total Trainable Parameters: 57,901,056
153
 
 
 
 
28
  This model uses cutting-edge techniques for text-to-text generation from images and works seamlessly for various OCR tasks, including text from complex documents with structured layouts.
29
 
30
  ## Intended Use
 
31
  The primary purpose of the model is to extract data from images or documents, especially from payslips and tables, without missing any critical details. It can be applied in various domains such as payroll systems, finance, legal document analysis, and any field where document extraction is required.
32
  Prompt Example:
33
  - **text**: The model will BEST WORK to this `"Extract all text from image/payslip without miss anything"`.
 
145
  print(output_text)
146
  ```
147
 
148
+ ### Handling CUDA Memory Issues During Inference
149
+
150
+ If you encounter CUDA memory issues during model inference, a common solution is to resize the input image to reduce its size. This helps in reducing the memory footprint and allows the model to process the image more efficiently.
151
+
152
+ ```python
153
+ # Resize the image to reduce its size (e.g., scale to half its original size)
154
+ image = image.resize((image.width // 2, image.height // 2))
155
+ ```
156
+
157
  ## Model Fine-Tuning Details
158
  The model was fine-tuned using the Unsloth framework, which accelerated training by 2x using Huggingface's TRL (Training Reinforcement Learning) library. LoRA (Low-Rank Adaptation) was applied to fine-tune only a small subset of the parameters, which significantly reduces training time and computational resources. Fine-tuning focused on both vision and language layers, ensuring that the model could handle complex OCR tasks efficiently.
159
 
160
  Total Trainable Parameters: 57,901,056
161
 
162
+ ## Hardware Requirements
163
+ To run this model, it is recommended to have access to a GPU with at least 16 GB of VRAM. Training requires significant memory, so smaller batch sizes or gradient accumulation may be necessary for GPUs with less memory.