Riksarkivet
/

HTR_pipeline_models

Image-to-Text

Swedish

HTR

Model card Files Files and versions Community

Sneriko commited on Aug 31, 2023

Commit

f4b4034

•

1 Parent(s): 017d807

Update README.md

Browse files

Evaluation tables inserted, text checked and updated

Files changed (1) hide show

README.md +27 -16

README.md CHANGED Viewed

@@ -13,39 +13,50 @@ tags:
 You can try out a demo of the Swedish National Archives HTR pipeline at [Riksarkivet HTR Demo](https://huggingface.co/spaces/Riksarkivet/htr_demo).
 ## Model Description
-The Swedish National Archives presents an end-to-end Handwritten Text Recognition (HTR) pipeline for running-text documents ranging from the 16th to the 19th century. The pipeline consists of the following components:
-1. **RTMDet Instance Segmentation Models**: The pipeline utilizes two RTMDet instance segmentation models, trained using MMDetection. The first model is designed to segment text regions within the documents, while the second model focuses on segmenting text lines within these regions. These models enable the identification and localization of text areas, which is a crucial step in the HTR pipeline.
-2. **SATRN HTR Model**: The pipeline incorporates a SATRN (Spatial Attention Transformer Networks) model, trained using MMOCR (OpenMMLab's OCR toolbox). SATRN is a state-of-the-art model for HTR tasks and provides accurate recognition of handwritten text. The SATRN model is trained specifically to handle the characteristics and challenges of handwritten text present in the Swedish National Archives' documents.
-The models are designed to provide a generic pipeline for handwritten text recognition, offering robust performance for documents from the 16th to the 19th century.
 ## Evaluation
-The Swedish National Archives HTR pipeline has been evaluated using standard evaluation metrics for Handwritten Text Recognition. The Word Error Rate (WER) and Character Error Rate (CER) are commonly used to assess the accuracy of the pipeline.
-The reported performance metrics are obtained on a test dataset that represents a diverse range of historical running-text documents from the 16th to the 19th century. It is important to note that the actual performance may vary depending on the specific documents and handwriting styles encountered in practical usage.
-| Metric | Performance |
-|--------|-------------|
-| WER    | XX%         |
-| CER    | XX%         |
-The WER measures the percentage of incorrectly recognized words compared to the ground truth, while the CER measures the percentage of incorrectly recognized characters.
 Regular evaluations are conducted to monitor and improve the performance of the pipeline. As new evaluation results become available, this table will be updated to reflect the most recent performance metrics.
 ## Intended Use
 The Swedish National Archives HTR pipeline is intended to be used for the following purposes:
-- Handwritten Text Recognition: The pipeline enables the automatic recognition of handwritten text in running-text documents from the 16th to the 19th century. It can be utilized by researchers, historians, and archivists to efficiently transcribe and analyze historical texts.
 - Document Digitization: The pipeline aids in the process of digitizing archival documents by automating the extraction and transcription of handwritten text. This facilitates broader accessibility and preservation of historical materials.
 It's important to note that the pipeline is optimized for running-text documents from the specified time period and may not perform optimally for other types of documents or handwriting styles.
-Additionally, it is currently more suitable for documents from books rather than complex layouts from either tabels or newspapers.
 ## Performance and Limitations
 The performance of the Swedish National Archives HTR pipeline is influenced by several factors:
@@ -54,14 +65,14 @@ The performance of the Swedish National Archives HTR pipeline is influenced by s
 - **Speed**: The pipeline aims to provide real-time or near real-time performance for efficient processing of handwritten text documents. The speed may vary depending on the hardware used for inference.
-- **Document Specificity**: The pipeline is specifically trained for running-text documents from the 16th to the 19th century. It may not perform optimally for documents outside this time range or for documents with unique characteristics or handwriting styles not covered by the training data.
-- **Language Limitations**: The pipeline is tailored for Swedish text recognition. While it may handle other languages to some extent, its performance may not be as accurate as for Swedish.
 - **Handwriting Style**: The pipeline is optimized for the cursive handwriting style prevalent in the historical documents of the Swedish National Archives. It may not perform as well for other handwriting styles, such as block letters or highly stylized scripts.
 ## Training Data
-The Swedish National Archives HTR pipeline was trained using a diverse dataset of running-text documents from the 16th to the 19th century. The training data includes various types of historical texts, such as letters, manuscripts, and official records.
 The dataset comprises both high-quality and challenging examples to ensure the models' robustness. It covers a wide range of handwriting styles, legibility levels, and document conditions.

 You can try out a demo of the Swedish National Archives HTR pipeline at [Riksarkivet HTR Demo](https://huggingface.co/spaces/Riksarkivet/htr_demo).
 ## Model Description
+The Swedish National Archives presents an end-to-end Handwritten Text Recognition (HTR) pipeline for running-text documents ranging from the mid 17th century to the late 19th century. The pipeline consists of the following components:
+1. **RTMDet Instance Segmentation Models**: The pipeline utilizes two RTMDet instance segmentation models, trained using MMDetection. The first model is designed to segment text regions within the documents, while the second model focuses on segmenting text lines within these regions. These models enable the identification and localization of text-line regions, which is a crucial step in the HTR pipeline since text-recognition models work at the text-line level.
+2. **SATRN HTR Model**: The pipeline incorporates a SATRN (Spatial Attention Transformer Networks) model, trained using MMOCR (OpenMMLab's OCR toolbox). SATRN is a state-of-the-art model for irregular scene-text recognition, which makes it an excellent choice for HTR, given that handwriting is highly irregular. The SATRN model consists of a shallow CNN, a 2D-transformer encoder, and a transformer decoder that works on the character level. It is trained on about a million text-line images of running-text handwritten documents ranging from the mid 17th century to the late 19th century.
+The models are designed to provide a generic pipeline for handwritten text recognition, offering robust performance for running-text documents from the mid 16th to the late 19th century.
 ## Evaluation
+The Swedish National Archives HTR pipeline has been evaluated using standard evaluation metrics for Handwritten Text Recognition. The Character Error Rate (CER) is commonly used to assess the accuracy of the text-recognition model. The best way to evaluate the entire pipeline is to run all three models on unsegmented document images and calculate CER for the entire pipeline.
+The reported performance metrics are obtained on several test-sets from archives that weren't included in the training-set, ranging the entire time-period the model was trained on. So these error rates are what you should expect if you run the pipeline out-of-the-box on your own documents given that the documents contain running-text and are from the model's time-period-domain. It is important to note that the actual performance may vary depending on the specific layout and handwriting styles encountered in the document.
+| Model          | train-eval  | 1661-testset | 1664-testset | 1688-testset-unusual-layout | 1735-testset | 1740-1793-testset | 1777-testset | 1840-1890-testset | 1861-testset |
+|----------------|-------------|--------------|--------------|-----------------------------|--------------|-------------------|--------------|-------------------|--------------|
+|SATRN_1650_1900 | 0.033       | 0.096        | 0.078        | 0.215                       | 0.079        | 0.066             | 0.074        | 0.037             | 0.043        |
+|SATRN_1650_1800 | 0.039       | 0.109        | 0.085        | 0.243                       | 0.079        | 0.079             | 0.087        | 0.239             | 0.157        |
+|SATRN_1800_1900 | 0.031       | 0.455        | 0.382        | 0.381                       | 0.309        | 0.252             | 0.182        | 0.046             | 0.051        |
+The lower two rows are for comparison only. You can see that the model trained exclusively on the 19th century actually performed worse on 19th century testsets than the model trained on the entire time-period. This was the reason we only published the aggregated model rather than models specialized on a specific century.
 Regular evaluations are conducted to monitor and improve the performance of the pipeline. As new evaluation results become available, this table will be updated to reflect the most recent performance metrics.
+We also did some fine-tuning experiments to give an idea of the performance benefits of finetuning the model on domain-specific material, as well as a rough estimate of how many pages one needs to transcribe to do the fine-tuning.
+| Model               | 16th-century-testsets-combined | 17th-century-testsets-combined | 18th-century-testsets-combined |
+|---------------------|--------------------------------|--------------------------------|--------------------------------|
+| SATRN_1650_1900     | 0.124                          | 0.095                          | 0.038                          |
+| SATRN_1650_1900_ft  | 0.064                          | 0.084                          | 0.026                          |
+| Number of pages     | 57                             | 28                             | 29                             |
+As seen 50-60 transcribed pages is enough to halve the CER on 17th century documents. 30 pages of transcribed text gives significant improvements on 18th and 19th century text, but the improvement are not as steep. Our recommendation, if you have a large domain you want to run the pipeline on, is to transcribe 50-100 pages, and finetune the text-recognition model on this data. Guides on how to do this will be forthcoming.
 ## Intended Use
 The Swedish National Archives HTR pipeline is intended to be used for the following purposes:
+- Handwritten Text Recognition: The pipeline enables the automatic recognition of handwritten text in running-text documents from the 17th to the 19th century. It can be utilized by researchers, historians, and archivists to efficiently transcribe and analyze historical texts.
 - Document Digitization: The pipeline aids in the process of digitizing archival documents by automating the extraction and transcription of handwritten text. This facilitates broader accessibility and preservation of historical materials.
 It's important to note that the pipeline is optimized for running-text documents from the specified time period and may not perform optimally for other types of documents or handwriting styles.
+Additionally, it is currently more suitable for documents from books rather than complex layouts from either tables or newspapers.
 ## Performance and Limitations
 The performance of the Swedish National Archives HTR pipeline is influenced by several factors:
 - **Speed**: The pipeline aims to provide real-time or near real-time performance for efficient processing of handwritten text documents. The speed may vary depending on the hardware used for inference.
+- **Document Specificity**: The pipeline is specifically trained for running-text documents from the 17th to the 19th century. It may not perform optimally for documents outside this time period or for documents with non-typical layouts.
+- **Language Limitations**: The pipeline is mainly for Swedish text recognition. While it may handle other languages to some extent, finish for example, its performance may not be as accurate as for Swedish.
 - **Handwriting Style**: The pipeline is optimized for the cursive handwriting style prevalent in the historical documents of the Swedish National Archives. It may not perform as well for other handwriting styles, such as block letters or highly stylized scripts.
 ## Training Data
+The Swedish National Archives HTR pipeline was trained using a diverse dataset of binarized, running-text documents from the 17th to the 19th century. The training data includes various types of historical texts, such as letters, manuscripts, and official records.
 The dataset comprises both high-quality and challenging examples to ensure the models' robustness. It covers a wide range of handwriting styles, legibility levels, and document conditions.