jnishi commited on
Commit
735c2d1
1 Parent(s): cb0e9ab

expand abbreviations of TLD .

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -40,7 +40,7 @@ The training data used is
40
  #### Preprocessing
41
  The following filtering is done
42
  - Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
43
- - Whitelist-style filtering using TLD of URL to remove affiliate sites.
44
 
45
  #### Training Hyperparameters
46
 
 
40
  #### Preprocessing
41
  The following filtering is done
42
  - Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
43
+ - Whitelist-style filtering using the top level domain of URL to remove affiliate sites.
44
 
45
  #### Training Hyperparameters
46