davanstrien HF staff commited on
Commit
de3e826
β€’
1 Parent(s): 241c7f8

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +5 -0
app.py CHANGED
@@ -31,6 +31,7 @@ description = """
31
  V2 of a British Library Books genre detection model. The interpretation interface shows what the model is using to make its predictions. Words highlighted in red contributed to the model being more confident about a prediction. The intensity of colour corresponds to the importance of that part of the input. The words that decrease the label confidence are highlighted in blue."""
32
 
33
  article = """
 
34
  # British Library Books genre detection demo
35
  This demo allows you to play with a 'genre' detection model which has been trained to predict, from the title of a book, whether it is 'fiction' or 'non-fiction'.
36
  The model was trained with the [fastai](https://docs.fast.ai/) library on training data drawn from [digitised books](https://www.bl.uk/collection-guides/digitised-printed-books) at the British Library. These Books are mainly from the 19th Century.
@@ -45,17 +46,21 @@ Vanity Fair. A novel without a hero ... With all the original illustrations by t
45
  You can see that the model gets a bit of help with the genre here πŸ˜‰. Since the model was trained for a very particular dataset and task it might not work well on titles that don't match this original corpus.
46
 
47
  ## Background
 
48
  This model was developed as part of work by the [Living with Machines](https://livingwithmachines.ac.uk/). The process of training the model and working with the data is documented in a [tutorial](github.com/living-with-machines/genre-classification).
49
 
50
  ## Model description
 
51
  This model is intended to predict, from the title of a book, whether it is 'fiction' or 'non-fiction'. This model was trained on data created from the [Digitised printed books (18th-19th Century)](https://www.bl.uk/collection-guides/digitised-printed-books) book collection.
52
  This dataset is dominated by English language books though it includes books in several other languages in much smaller numbers. This model was originally developed for use as part of the Living with Machines project to be able to 'segment' this large dataset of books into different categories based on a 'crude' classification of genre i.e. whether the title was `fiction` or `non-fiction`.
53
  You can find more information about the model [here]((https://doi.org/10.5281/zenodo.5245175))
54
 
55
  ## Training data
 
56
  The model is trained on a particular collection of books digitised by the British Library. As a result, the model may do less well on titles that look different to this data. In particular, the training data, was mostly English, and mostly from the 19th Century. The model is likely to do less well with non-English languages and book titles which fall outside of the 19th Century. Since the data was derived from books catalogued by the British Library it is also possible the model will perform less well for books held by other institutions if, for example, they catalogue book titles in different ways, or have different biases in the types of books they hold. Some of the data was generated using weak supervision. You can learn more about how this was done [here](https://living-with-machines.github.io/genre-classification/04_snorkel.html)
57
 
58
  ### Credits
 
59
  >This work was partly supported by [Living with Machines](https://livingwithmachines.ac.uk/). This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.
60
  """
61
 
 
31
  V2 of a British Library Books genre detection model. The interpretation interface shows what the model is using to make its predictions. Words highlighted in red contributed to the model being more confident about a prediction. The intensity of colour corresponds to the importance of that part of the input. The words that decrease the label confidence are highlighted in blue."""
32
 
33
  article = """
34
+
35
  # British Library Books genre detection demo
36
  This demo allows you to play with a 'genre' detection model which has been trained to predict, from the title of a book, whether it is 'fiction' or 'non-fiction'.
37
  The model was trained with the [fastai](https://docs.fast.ai/) library on training data drawn from [digitised books](https://www.bl.uk/collection-guides/digitised-printed-books) at the British Library. These Books are mainly from the 19th Century.
 
46
  You can see that the model gets a bit of help with the genre here πŸ˜‰. Since the model was trained for a very particular dataset and task it might not work well on titles that don't match this original corpus.
47
 
48
  ## Background
49
+
50
  This model was developed as part of work by the [Living with Machines](https://livingwithmachines.ac.uk/). The process of training the model and working with the data is documented in a [tutorial](github.com/living-with-machines/genre-classification).
51
 
52
  ## Model description
53
+
54
  This model is intended to predict, from the title of a book, whether it is 'fiction' or 'non-fiction'. This model was trained on data created from the [Digitised printed books (18th-19th Century)](https://www.bl.uk/collection-guides/digitised-printed-books) book collection.
55
  This dataset is dominated by English language books though it includes books in several other languages in much smaller numbers. This model was originally developed for use as part of the Living with Machines project to be able to 'segment' this large dataset of books into different categories based on a 'crude' classification of genre i.e. whether the title was `fiction` or `non-fiction`.
56
  You can find more information about the model [here]((https://doi.org/10.5281/zenodo.5245175))
57
 
58
  ## Training data
59
+
60
  The model is trained on a particular collection of books digitised by the British Library. As a result, the model may do less well on titles that look different to this data. In particular, the training data, was mostly English, and mostly from the 19th Century. The model is likely to do less well with non-English languages and book titles which fall outside of the 19th Century. Since the data was derived from books catalogued by the British Library it is also possible the model will perform less well for books held by other institutions if, for example, they catalogue book titles in different ways, or have different biases in the types of books they hold. Some of the data was generated using weak supervision. You can learn more about how this was done [here](https://living-with-machines.github.io/genre-classification/04_snorkel.html)
61
 
62
  ### Credits
63
+
64
  >This work was partly supported by [Living with Machines](https://livingwithmachines.ac.uk/). This project, funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with The Alan Turing Institute, the British Library and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.
65
  """
66