pszemraj
/

MiniLMv2-L6-H384_R-OCR-quality

Text Classification

data processing

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on May 11

Commit

5c30310

•

1 Parent(s): 642c619

Update README.md

Files changed (1) hide show

README.md +11 -18

README.md CHANGED Viewed

@@ -2,36 +2,29 @@
 license: apache-2.0
 base_model: pszemraj/MiniLMv2-L6-H384_R-fineweb-100k
 tags:
-- generated_from_trainer
 metrics:
 - accuracy
-model-index:
-- name: MiniLMv2-L6-H384_R-fineweb-100k-OCR-quality-classification-cls
-  results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# MiniLMv2-L6-H384_R-fineweb-100k-OCR-quality-classification-cls
-This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-fineweb-100k](https://huggingface.co/pszemraj/MiniLMv2-L6-H384_R-fineweb-100k) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.0162
 - Accuracy: 0.996
 - Num Input Tokens Seen: 61536256
-## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
 ## Training procedure
@@ -67,4 +60,4 @@ The following hyperparameters were used during training:
 - Transformers 4.40.2
 - Pytorch 2.2.0+cu121
 - Datasets 2.19.1
-- Tokenizers 0.19.1

 license: apache-2.0
 base_model: pszemraj/MiniLMv2-L6-H384_R-fineweb-100k
 tags:
+- data processing
+- data filter
+- text quality
 metrics:
 - accuracy
+datasets:
+- pszemraj/OCR-quality-classification
+language:
+- en
 ---
+# MiniLMv2-L6-H384_R-OCR-quality
+This model is a fine-tuned version of [pszemraj/MiniLMv2-L6-H384_R-fineweb-100k](https://hf.co/pszemraj/MiniLMv2-L6-H384_R-fineweb-100k) on `pszemraj/OCR-quality-classification`
 It achieves the following results on the evaluation set:
 - Loss: 0.0162
 - Accuracy: 0.996
 - Num Input Tokens Seen: 61536256
 ## Intended uses & limitations
+predict whether a document is clean or noisy
 ## Training procedure
 - Transformers 4.40.2
 - Pytorch 2.2.0+cu121
 - Datasets 2.19.1
+- Tokenizers 0.19.1