bluuebunny commited on
Commit
45b16c9
·
1 Parent(s): 8e157b2

added info on redaction process

Browse files
Files changed (1) hide show
  1. app.py +5 -4
app.py CHANGED
@@ -211,10 +211,11 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
211
  gr.Markdown("# RedactNLP: Redact your PDF!")
212
  gr.Markdown("## How redaction happens:")
213
  gr.Markdown("""
214
- 1. The PDF pages are converted to images.
215
- 2. EasyOCR is run on the converted images to extract text.
216
- 3. "dslim/distilbert-NER" model does the token classification.
217
- 4. Non-recoverable mask is applied to identified elements.
 
218
  """)
219
 
220
  # Input Section
 
211
  gr.Markdown("# RedactNLP: Redact your PDF!")
212
  gr.Markdown("## How redaction happens:")
213
  gr.Markdown("""
214
+ 1. The PDF pages are converted to images using **[PyMuPDF](https://github.com/pymupdf/PyMuPDF)**.
215
+ 2. **[EasyOCR](https://github.com/JaidedAI/EasyOCR)** is run on the converted images to extract text.
216
+ 3. **[dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER)** model does the token classification.
217
+ 4. Non-recoverable mask is applied to identified elements using **[OpenCV](https://github.com/opencv/opencv)**.
218
+ 5. The masked images are converted back to a PDF again using PyMuPDF.
219
  """)
220
 
221
  # Input Section