Spaces:
Sleeping
Sleeping
Commit
·
45b16c9
1
Parent(s):
8e157b2
added info on redaction process
Browse files
app.py
CHANGED
@@ -211,10 +211,11 @@ with gr.Blocks(theme=gr.themes.Soft()) as demo:
|
|
211 |
gr.Markdown("# RedactNLP: Redact your PDF!")
|
212 |
gr.Markdown("## How redaction happens:")
|
213 |
gr.Markdown("""
|
214 |
-
1. The PDF pages are converted to images.
|
215 |
-
2. EasyOCR is run on the converted images to extract text.
|
216 |
-
3.
|
217 |
-
4. Non-recoverable mask is applied to identified elements.
|
|
|
218 |
""")
|
219 |
|
220 |
# Input Section
|
|
|
211 |
gr.Markdown("# RedactNLP: Redact your PDF!")
|
212 |
gr.Markdown("## How redaction happens:")
|
213 |
gr.Markdown("""
|
214 |
+
1. The PDF pages are converted to images using **[PyMuPDF](https://github.com/pymupdf/PyMuPDF)**.
|
215 |
+
2. **[EasyOCR](https://github.com/JaidedAI/EasyOCR)** is run on the converted images to extract text.
|
216 |
+
3. **[dslim/distilbert-NER](https://huggingface.co/dslim/distilbert-NER)** model does the token classification.
|
217 |
+
4. Non-recoverable mask is applied to identified elements using **[OpenCV](https://github.com/opencv/opencv)**.
|
218 |
+
5. The masked images are converted back to a PDF again using PyMuPDF.
|
219 |
""")
|
220 |
|
221 |
# Input Section
|