Update README.md
Browse files
README.md
CHANGED
@@ -13,14 +13,14 @@ datasets:
|
|
13 |
|
14 |
## Model Description
|
15 |
|
16 |
-
This is a
|
17 |
|
18 |
## Model Details:
|
19 |
-
*
|
20 |
-
*
|
21 |
-
*
|
22 |
-
*
|
23 |
-
*
|
24 |
|
25 |
## Intended Use
|
26 |
|
@@ -60,45 +60,46 @@ If given an image with the Hebrew word "אברם" (Abram), the model can detect
|
|
60 |
|
61 |
## Limitations:
|
62 |
|
63 |
-
*
|
64 |
-
*
|
65 |
-
*
|
66 |
|
67 |
## Training Data:
|
68 |
|
69 |
The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:
|
70 |
|
71 |
-
*
|
72 |
-
*
|
73 |
|
74 |
## Training Procedure:
|
75 |
-
*
|
76 |
-
*
|
77 |
-
*
|
78 |
-
*
|
79 |
|
80 |
Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.
|
81 |
|
82 |
## Model Performance
|
83 |
|
84 |
# Metrics:
|
85 |
-
*
|
86 |
-
*
|
87 |
-
*
|
88 |
*
|
89 |
Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.
|
90 |
|
91 |
## Known Issues:
|
92 |
-
*
|
93 |
-
*
|
94 |
|
95 |
## Ethical Considerations
|
96 |
|
97 |
-
*
|
98 |
Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.
|
99 |
-
Future Work:
|
100 |
|
101 |
-
|
|
|
|
|
102 |
Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
|
103 |
Citation:
|
104 |
|
|
|
13 |
|
14 |
## Model Description
|
15 |
|
16 |
+
This is a **Convolutional Neural Network (CNN)** model trained to recognize **Hebrew letters** and a **stop symbols** in images. The model can identify individual letters from a provided image, outputting their respective class along with probabilities.
|
17 |
|
18 |
## Model Details:
|
19 |
+
* **Model Type**: Convolutional Neural Network (CNN)
|
20 |
+
* **Framework**: TensorFlow 2.x / Keras
|
21 |
+
* **Input Size**: 64x64 grayscale images of isolated letters.
|
22 |
+
* **Output Classes**: 28 Hebrew letters + 1 stop symbol (.)
|
23 |
+
* **Use Case**: Recognizing handwritten or printed Hebrew letters and punctuation in scanned images or photos of documents.
|
24 |
|
25 |
## Intended Use
|
26 |
|
|
|
60 |
|
61 |
## Limitations:
|
62 |
|
63 |
+
* **Font Variations**: The model performs best on specific fonts (e.g., square Hebrew letters). Performance may degrade with highly stylized or cursive fonts.
|
64 |
+
* **Noise Sensitivity**: Images with a lot of noise, artifacts, or low resolution may lead to incorrect predictions.
|
65 |
+
* **Stop Symbol**: The stop symbol is particularly recognized by detecting three vertical dots. However, false positives can occur if letters with similar shapes are present.
|
66 |
|
67 |
## Training Data:
|
68 |
|
69 |
The model was trained on a dataset containing *Hebrew letters and stop symbols*. The training dataset includes:
|
70 |
|
71 |
+
* **28 Hebrew letters**.
|
72 |
+
* **1 stop symbol** representing three vertical dots (.).
|
73 |
|
74 |
## Training Procedure:
|
75 |
+
* **Optimizer**: Adam
|
76 |
+
* **Loss function**: Categorical Crossentropy
|
77 |
+
* **Batch size**: 32
|
78 |
+
* **Epochs**: 10
|
79 |
|
80 |
Data augmentation was applied to reduce overfitting and increase the model's generalizability to unseen data. This includes random rotations, zooms, and horizontal flips.
|
81 |
|
82 |
## Model Performance
|
83 |
|
84 |
# Metrics:
|
85 |
+
* **Accuracy**: 95% on the validation dataset.
|
86 |
+
* **Precision**: 94%
|
87 |
+
* **Recall**: 93%
|
88 |
*
|
89 |
Performance may vary depending on the quality of the input images, noise levels, and whether the letters are handwritten or printed.
|
90 |
|
91 |
## Known Issues:
|
92 |
+
* **False Positives for Stop Symbols**: The model sometimes incorrectly identifies letters that resemble three vertical dots as stop symbols.
|
93 |
+
* **Overfitting to Specific Fonts**: Performance can degrade on handwritten texts or cursive fonts not represented well in the training set.
|
94 |
|
95 |
## Ethical Considerations
|
96 |
|
97 |
+
* **Bias**: The model was trained on a specific set of Hebrew fonts and may not perform equally well across all types of Hebrew texts, particularly historical or handwritten documents.
|
98 |
Fairness: The model may produce varying results depending on font style, quality of input images, and preprocessing applied.
|
|
|
99 |
|
100 |
+
## Future Work:
|
101 |
+
|
102 |
+
* **Improving Generalization**: Future work will focus on improving the model's robustness to different fonts, handwriting styles, and noisy inputs.
|
103 |
Multilingual Expansion: Adding support for other Semitic scripts or expanding the model for multilingual OCR tasks.
|
104 |
Citation:
|
105 |
|