davanstrien HF staff commited on
Commit
6128d56
1 Parent(s): 382890c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -1
README.md CHANGED
@@ -8,5 +8,56 @@ library_name: generic
8
 
9
  ---
10
 
11
- # todo
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
12
 
 
8
 
9
  ---
10
 
11
+ ## Model description
12
+
13
+ This model is intended to predict, from the title of a book, whether it is 'fiction' or 'non-fiction'.
14
+
15
+ This model was trained on data created from the Digitised printed books (18th-19th Century) book collection. The datasets in this collection are comprised and derived from 49,455 digitised books (65,227 volumes), mainly from the 19th Century. This dataset is dominated by English language books and includes books in several other languages in much smaller numbers.
16
+
17
+ This model was originally developed for use as part of the Living with Machines project to be able to 'segment' this large dataset of books into different categories based on a 'crude' classification of genre i.e. whether the title was `fiction` or `non-fiction`.
18
+
19
+ The model's training data (discussed more below) primarily consists of 19th Century book titles from the British Library Digitised printed books (18th-19th century) collection. These books have been catalogued according to British Library cataloguing practices. The model is likely to perform worse on any book titles from earlier or later periods. While the model is multilingual, it has training data in non-English book titles; these appear much less frequently.
20
+
21
+ ## How to use
22
+
23
+ To use this within fastai, first install version 2 of the fastai library. You can load directly from the Hugging Face hub using the [`huggingface_hub`](https://github.com/huggingface/huggingface_hub) library.
24
+
25
+ ```python
26
+ from fastai import load_learner
27
+ from huggingface_hub import hf_hub_download
28
+ learn = load_learner(
29
+ hf_hub_download('davanstrien/bl-books-genre-fastai', filename="model.pkl")
30
+ )
31
+ learn.predict("Oliver Twist")
32
+ ```
33
+
34
+ ## Limitations and bias
35
+
36
+ The model was developed based on data from the British Library's Digitised printed books (18th-19th Century) collection. This dataset is not representative of books from the period covered with biases towards certain types (travel) and a likely absence of books that were difficult to digitise.
37
+
38
+ The formatting of the British Library books corpus titles may differ from other collections, resulting in worse performance on other collections. It is recommended to evaluate the performance of the model before applying it to your own data. Likely, this model won't perform well for contemporary book titles without further fine-tuning.
39
+
40
+ ## Training data
41
+
42
+ The training data was created using the Zooniverse platform. British Library cataloguers carried out the majority of the annotations used as training data. More information on the process of creating the training data will be available soon.
43
+
44
+ ### Training procedure
45
+
46
+ Model training was carried out using the fastai library version 2.5.2.
47
+
48
+ The notebook using for training the model is available at: https://github.com/Living-with-machines/genre-classification
49
+
50
+ ## Eval result
51
+
52
+ The model was evaluated on a held out test set:
53
+ ```
54
+ precision recall f1-score support
55
+
56
+ Fiction 0.91 0.88 0.90 296
57
+ Non-fiction 0.94 0.95 0.95 554
58
+
59
+ accuracy 0.93 850
60
+ macro avg 0.93 0.92 0.92 850
61
+ weighted avg 0.93 0.93 0.93 850
62
+ ```
63