Sigurdur commited on
Commit
fa73879
1 Parent(s): c545dbb

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ISL-SBERT small
2
+
3
+ Sentence transformer, trained using unsupervised technique, TSDAE. The models take in an Icelandic text and creates sentence embeddings from it.
4
+
5
+ Based off of this [article](https://www.pinecone.io/learn/unsupervised-training-sentence-transformers/)
6
+
7
+
8
+ ## Data
9
+
10
+ the model was trained on 100_000 sentences selected at random from clarin-is: [unanotated news2 from IGC(RMH)](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/238)
11
+
12
+ to install the data, run the following command:
13
+
14
+ ```bash
15
+ curl --remote-name-all https://repository.clarin.is/repository/xmlui/bitstream/handle/20.500.12537/238{/IGC-News2-22.10.TEI.zip}
16
+ ```
17
+
18
+ ## Author
19
+
20
+ Sigurður Haukur Birgisson