Alex Strick van Linschoten commited on
Commit
084c73c
1 Parent(s): 2e134aa

update article and space

Browse files
Files changed (2) hide show
  1. app.py +11 -3
  2. article.md +21 -0
app.py CHANGED
@@ -14,8 +14,12 @@ def predict(img):
14
 
15
 
16
  title = "Redacted Document Classifier"
 
17
  description = "A classifier trained on publicly released redacted (and unredacted) FOIA documents, using fastai."
18
- article = "<p style='text-align: center'><a href='https://mlops.systems/fastai/redactionmodel/computervision/datalabelling/2021/09/06/redaction-classification-chapter-2.html' target='_blank'>Blog post</a></p>"
 
 
 
19
  examples = [
20
  "test1.jpg",
21
  "test2.jpg",
@@ -28,7 +32,7 @@ enable_queue = True
28
  theme = "default"
29
  allow_flagging = "never"
30
 
31
- gr.Interface(
32
  fn=predict,
33
  inputs=gr.inputs.Image(shape=(1024, 1024)),
34
  outputs=gr.outputs.Label(num_top_classes=3),
@@ -39,5 +43,9 @@ gr.Interface(
39
  allow_flagging=allow_flagging,
40
  examples=examples,
41
  interpretation=interpretation,
 
 
 
 
42
  enable_queue=enable_queue,
43
- ).launch(cache_examples=True)
 
14
 
15
 
16
  title = "Redacted Document Classifier"
17
+
18
  description = "A classifier trained on publicly released redacted (and unredacted) FOIA documents, using fastai."
19
+
20
+ with open("article.md") as f:
21
+ article = f.read()
22
+
23
  examples = [
24
  "test1.jpg",
25
  "test2.jpg",
 
32
  theme = "default"
33
  allow_flagging = "never"
34
 
35
+ demo = gr.Interface(
36
  fn=predict,
37
  inputs=gr.inputs.Image(shape=(1024, 1024)),
38
  outputs=gr.outputs.Label(num_top_classes=3),
 
43
  allow_flagging=allow_flagging,
44
  examples=examples,
45
  interpretation=interpretation,
46
+ )
47
+
48
+ demo.launch(
49
+ cache_examples=True,
50
  enable_queue=enable_queue,
51
+ )
article.md ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ I've been working through the first two lessons of [the fastai course](https://course.fast.ai/). For lesson one I trained a model to recognise my cat, Mr Blupus. For lesson two the emphasis is on getting those models out in the world as some kind of demo or application. [Gradio](https://gradio.app) and [Huggingface Spaces](https://huggingface.co/spaces) makes it super easy to get a prototype of your model on the internet.
2
+
3
+ ## The Dataset
4
+
5
+ I downloaded a few thousand publicly-available FOIA documents from a government website. I split the PDFs up into individual `.jpg` files and then used [Prodigy](https://prodi.gy/) to annotate the data. (This process was described in [a blogpost written last year](https://mlops.systems/fastai/redactionmodel/computervision/datalabelling/2021/09/06/redaction-classification-chapter-2.html).)
6
+
7
+ ## Training the model
8
+
9
+ I trained the model with fastai's flexible `vision_learner`, fine-tuning `resnet18` which was both smaller than `resnet34` (no surprises there) and less liable to early overfitting. I trained the model for 10 epochs.
10
+
11
+ ## Further Reading
12
+
13
+ This initial dataset spurred an ongoing interest in the domain and I've since been working on the problem of object detection, i.e. identifying exactly which parts of the image contain redactions.
14
+
15
+ Some of the key blogs I've written about this project:
16
+
17
+ - How to annotate data for an object detection problem with Prodigy ([link](https://mlops.systems/redactionmodel/computervision/datalabelling/2021/11/29/prodigy-object-detection-training.html))
18
+ - How to create synthetic images to supplement a small dataset ([link](https://mlops.systems/redactionmodel/computervision/python/tools/2022/02/10/synthetic-image-data.html))
19
+ - How to use error analysis and visual tools like FiftyOne to improve model performance ([link](https://mlops.systems/redactionmodel/computervision/tools/debugging/jupyter/2022/03/12/fiftyone-computervision.html))
20
+ - Creating more synthetic data focused on the tasks my model finds hard ([link](https://mlops.systems/tools/redactionmodel/computervision/2022/04/06/synthetic-data-results.html))
21
+ - Data validation for object detection / computer vision (a three part series — [part 1](https://mlops.systems/tools/redactionmodel/computervision/datavalidation/2022/04/19/data-validation-great-expectations-part-1.html), [part 2](https://mlops.systems/tools/redactionmodel/computervision/datavalidation/2022/04/26/data-validation-great-expectations-part-2.html), [part 3](https://mlops.systems/tools/redactionmodel/computervision/datavalidation/2022/04/28/data-validation-great-expectations-part-3.html))