neuraax commited on
Commit
ad527e3
·
verified ·
1 Parent(s): 2fdf389

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +160 -5
README.md CHANGED
@@ -10,12 +10,167 @@ language:
10
  - en
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Uploaded finetuned model
14
 
15
  - **Developed by:** neuralabs
16
  - **License:** apache-2.0
17
- - **Finetuned from model :** deepseek-ai/DeepSeek-OCR
18
-
19
- This deepseek_vl_v2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
-
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  - en
11
  ---
12
 
13
+ # DeepSeek OCR - Fine-tuned for German/Deutsch
14
+
15
+ This model is a fine-tuned version of DeepSeek OCR on German text for Optical Character Recognition (OCR) tasks.
16
+
17
+ ## Model Description
18
+
19
+ - **Base Model:** DeepSeek OCR
20
+ - **Language:** German (de)
21
+ - **Task:** Image-to-Text (OCR)
22
+ - **Training Data:** 200K synthetic German text images
23
+ - **License:** Apache 2.0
24
+
25
+ This model has been fine-tuned specifically for recognizing German text in images, including handling of German-specific characters (ä, ö, ü, ß) and common German compound words.
26
+
27
+ ## Intended Uses
28
+
29
+ This model is designed for:
30
+ - Extracting German text from scanned documents
31
+ - Digitizing printed German materials
32
+ - Reading German text from photographs
33
+ - Processing German forms and receipts
34
+ - Any German text recognition tasks
35
+
36
+ ## How to Use
37
+
38
+ ### Basic Usage
39
+
40
+ ```python
41
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
42
+ from PIL import Image
43
+ import requests
44
+
45
+ # Load model and processor
46
+ processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
47
+ model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
48
+
49
+ # Load image
50
+ url = "path_to_your_german_text_image.jpg"
51
+ image = Image.open(url).convert("RGB")
52
+
53
+ # Process
54
+ pixel_values = processor(image, return_tensors="pt").pixel_values
55
+ generated_ids = model.generate(pixel_values)
56
+ generated_text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
57
+
58
+ print(generated_text)
59
+ ```
60
+
61
+ ### Batch Processing
62
+
63
+ ```python
64
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
65
+ from PIL import Image
66
+
67
+ processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
68
+ model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
69
+
70
+ # Multiple images
71
+ images = [Image.open(f"image_{i}.jpg").convert("RGB") for i in range(5)]
72
+
73
+ # Batch process
74
+ pixel_values = processor(images, return_tensors="pt", padding=True).pixel_values
75
+ generated_ids = model.generate(pixel_values)
76
+ generated_texts = processor.batch_decode(generated_ids, skip_special_tokens=True)
77
+
78
+ for text in generated_texts:
79
+ print(text)
80
+ ```
81
+
82
+ ### With GPU Acceleration
83
+
84
+ ```python
85
+ import torch
86
+ from transformers import TrOCRProcessor, VisionEncoderDecoderModel
87
+ from PIL import Image
88
+
89
+ device = "cuda" if torch.cuda.is_available() else "cpu"
90
+
91
+ processor = TrOCRProcessor.from_pretrained("YOUR_USERNAME/deepseek-ocr-german")
92
+ model = VisionEncoderDecoderModel.from_pretrained("YOUR_USERNAME/deepseek-ocr-german").to(device)
93
+
94
+ image = Image.open("german_text.jpg").convert("RGB")
95
+ pixel_values = processor(image, return_tensors="pt").pixel_values.to(device)
96
+
97
+ generated_ids = model.generate(pixel_values)
98
+ text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
99
+ print(text)
100
+ ```
101
+
102
+ ## Training Details
103
+
104
+ ### Training Data
105
+
106
+ The model was fine-tuned on a synthetic German OCR dataset containing 200,000 images with:
107
+ - Diverse German sentences covering multiple domains (everyday conversation, news, literature, technical, business)
108
+ - Various fonts and font sizes (16-48pt)
109
+ - Multiple augmentations: noise, blur, brightness/contrast variations
110
+ - Different text and background colors
111
+
112
+ **Data Split:**
113
+ - Train: 180,000 samples (90%)
114
+ - Validation: 10,000 samples (5%)
115
+ - Test: 10,000 samples (5%)
116
+
117
+ ### Training Framework
118
+
119
+ ```python
120
+ # Example training configuration
121
+ from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
122
+
123
+ training_args = Seq2SeqTrainingArguments(
124
+ output_dir="./deepseek-ocr-german",
125
+ per_device_train_batch_size=8,
126
+ per_device_eval_batch_size=8,
127
+ learning_rate=5e-5,
128
+ num_train_epochs=10,
129
+ logging_steps=100,
130
+ save_steps=1000,
131
+ eval_steps=1000,
132
+ evaluation_strategy="steps",
133
+ save_total_limit=2,
134
+ fp16=True,
135
+ predict_with_generate=True,
136
+ )
137
+ ```
138
+
139
+ # # Limitations
140
+
141
+ - **Font coverage:** Performance may vary with handwritten text
142
+ - **Image quality:** Works best with clear, high-contrast images
143
+ - **Domain specificity:** Best performance on printed German text similar to training distribution
144
+
145
+
146
+ ## Citation
147
+
148
+ If you use this model, please cite:
149
+
150
+ ```bibtex
151
+ @misc{deepseek-ocr-german,
152
+ author = {Santosh Pandit},
153
+ title = {DeepSeek OCR - German Fine-tuned},
154
+ year = {2025},
155
+ publisher = {HuggingFace},
156
+ howpublished = {\url{https://huggingface.co/YOUR_USERNAME/deepseek-ocr-german}},
157
+ }
158
+ ```
159
+
160
+ ## Model Card Contact
161
+
162
+ For questions or feedback, please open an issue on the model repository or contact [hello@neuralabs.one].
163
+
164
+ ---
165
+
166
+ ### Acknowledgments
167
+
168
+ - Base model: DeepSeek AI
169
+ - Training data generation: LM Studio with local LLM
170
+ - Framework: Hugging Face Transformers
171
+
172
  # Uploaded finetuned model
173
 
174
  - **Developed by:** neuralabs
175
  - **License:** apache-2.0
176
+ - **Finetuned from model :** deepseek-ai/DeepSeek-OCR