Alexander Seifert commited on
Commit
8ab50ed
1 Parent(s): 554bac5

improve docs

Browse files
Files changed (1) hide show
  1. html/index.md +12 -2
html/index.md CHANGED
@@ -3,7 +3,7 @@ title: "🏷️ ExplaiNER"
3
  subtitle: "Error Analysis for NER models & datasets"
4
  ---
5
 
6
- Error Analysis is an important but often overlooked part of the data science project lifecycle, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an extensive toolkit to probe any NER model/dataset combination, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further improvements.
7
 
8
  [Documentation](../doc/index.html) | [Slides](../presentation.pdf) | [Github](https://github.com/aseifert/ExplaiNER)
9
 
@@ -51,6 +51,8 @@ A group of neurons tend to fire in response to commas and other punctuation. Oth
51
 
52
  For every token in the dataset, we take its hidden state and project it onto a two-dimensional plane. Data points are colored by label/prediction, with mislabeled examples marked by a small black border.
53
 
 
 
54
 
55
  ### Probing
56
 
@@ -61,21 +63,29 @@ A very direct and interactive way to test your model is by providing it with a l
61
 
62
  The metrics page contains precision, recall and f-score metrics as well as a confusion matrix over all the classes. By default, the confusion matrix is normalized. There's an option to zero out the diagonal, leaving only prediction errors (here it makes sense to turn off normalization, so you get raw error counts).
63
 
 
 
64
 
65
  ### Misclassified
66
 
67
- This page contains all misclassified examples and allows filtering by specific error types.
68
 
69
 
70
  ### Loss by Token/Label
71
 
72
  Show count, mean and median loss per token and label.
73
 
 
 
74
 
75
  ### Samples by Loss
76
 
77
  Show every example sorted by loss (descending) for close inspection.
78
 
 
 
 
 
79
 
80
  ### Random Samples
81
 
 
3
  subtitle: "Error Analysis for NER models & datasets"
4
  ---
5
 
6
+ _Error Analysis is an important but often overlooked part of the data science project lifecycle, for which there is still very little tooling available. Practitioners tend to write throwaway code or, worse, skip this crucial step of understanding their models' errors altogether. This project tries to provide an extensive toolkit to probe any NER model/dataset combination, find labeling errors and understand the models' and datasets' limitations, leading the user on her way to further improvements._
7
 
8
  [Documentation](../doc/index.html) | [Slides](../presentation.pdf) | [Github](https://github.com/aseifert/ExplaiNER)
9
 
 
51
 
52
  For every token in the dataset, we take its hidden state and project it onto a two-dimensional plane. Data points are colored by label/prediction, with mislabeled examples marked by a small black border.
53
 
54
+ Using these projections you can visually identify data points that end up in the wrong neighborhood, indicating prediction/labeling errors.
55
+
56
 
57
  ### Probing
58
 
 
63
 
64
  The metrics page contains precision, recall and f-score metrics as well as a confusion matrix over all the classes. By default, the confusion matrix is normalized. There's an option to zero out the diagonal, leaving only prediction errors (here it makes sense to turn off normalization, so you get raw error counts).
65
 
66
+ With the confusion matrix, you don't want any of the classes to end up in the bottom right quarter: those are frequent but error-prone.
67
+
68
 
69
  ### Misclassified
70
 
71
+ This page contains all misclassified examples and allows filtering by specific error types. Helps you get an understanding of the types of errors your model makes.
72
 
73
 
74
  ### Loss by Token/Label
75
 
76
  Show count, mean and median loss per token and label.
77
 
78
+ Look out for tokens that have a big gap between mean and median, indicating systematic labeling issues.
79
+
80
 
81
  ### Samples by Loss
82
 
83
  Show every example sorted by loss (descending) for close inspection.
84
 
85
+ Apart from a (token-based) dataframe view, there's also an HTML representation of the samples, which is very information-dense but really helpful, once you got used to reading it:
86
+
87
+ Every predicted entity (every token, really) gets a black border. The text color signifies the predicted label, with the first token of a sequence of token also showing the label's icon. If (and only if) the prediction is wrong, a small little box after the entity (token) contains the correct target class, with a background color corresponding to that class.
88
+
89
 
90
  ### Random Samples
91