Cicciokr commited on
Commit
f614dbe
·
verified ·
1 Parent(s): 0652214

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -3
README.md CHANGED
@@ -1,3 +1,13 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ This model is fine tuned with:
5
+ - The Latin Library - 15M Token
6
+ - Perseus Project - 15M Token
7
+
8
+ The dataset was cleaned:
9
+
10
+ Removal of all "pseudo-Latin" text ("Lorem ipsum ...").
11
+ Use of CLTK for sentence splitting and normalisation.
12
+ deduplication of the corpus
13
+ lowercase all text