juhoinkinen commited on
Commit
8ac019e
1 Parent(s): 0ed6eec

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -1,3 +1,17 @@
1
  ---
2
  license: cc-by-4.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
1
  ---
2
  license: cc-by-4.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-classification
6
+ tags:
7
+ - glam
8
+ - lam
9
+ - subject indexing
10
+ - annif
11
+ - hogwarts
12
  ---
13
+ # Hogwarts Sorting Hat using Annif and its fastText backend
14
+
15
+ The model is the output of [this Annif tutorial exercise](https://github.com/NatLibFi/Annif-tutorial/blob/master/exercises/OPT_hogwarts.md).
16
+
17
+ > The original Sorting Hat reads the thoughts of the student, but Annif generally does not have access to that kind of information, so we will simply use the name of the student as input. We will train a fastText model on the names of characters from the Harry Potter novels whose house is known. To make it possible to generalize the model to new, unseen names, we will use character n-grams to split all names into chunks of 1 to 4 characters - for example harry becomes [h, ha, har, harr, a, ar, arr, arry ...]. fastText can do this when given the minn and maxn parameters, which set the minimum and maximum length of character n-grams to generate from input text.