Spaces:

bluuebunny
/

RedactNLP

Sleeping

bluuebunny commited on Sep 14, 2024

Commit

45b16c9

1 Parent(s): 8e157b2

added info on redaction process

Files changed (1) hide show

app.py CHANGED Viewed

@@ -211,10 +211,11 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
     gr.Markdown("# RedactNLP: Redact your PDF!")
     gr.Markdown("## How redaction happens:")
     gr.Markdown("""
-                1. The PDF pages are converted to images.
-                2. EasyOCR is run on the converted images to extract text.
-                3. "dslim/distilbert-NER" model does the token classification.
-                4. Non-recoverable mask is applied to identified elements.
                 """)
     # Input Section

     gr.Markdown("# RedactNLP: Redact your PDF!")
     gr.Markdown("## How redaction happens:")
     gr.Markdown("""
+                1. The PDF pages are converted to images using **[PyMuPDF](https://github.com/pymupdf/PyMuPDF)**.
+                2. **[EasyOCR](https://github.com/JaidedAI/EasyOCR)** is run on the converted images to extract text.
+                3. **[dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER)** model does the token classification.
+                4. Non-recoverable mask is applied to identified elements using **[OpenCV](https://github.com/opencv/opencv)**.
+                5. The masked images are converted back to a PDF again using PyMuPDF.
                 """)
     # Input Section