Sigurdur commited on
Commit
39f2d34
1 Parent(s): a3006de

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -6
README.md CHANGED
@@ -5,14 +5,24 @@ tags:
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
-
 
9
  ---
10
 
11
- # {MODEL_NAME}
12
 
13
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
14
 
15
- <!--- Describe your model here -->
 
 
 
 
 
 
 
 
 
16
 
17
  ## Usage (Sentence-Transformers)
18
 
@@ -82,9 +92,9 @@ The model was trained with the parameters:
82
 
83
  **DataLoader**:
84
 
85
- `torch.utils.data.dataloader.DataLoader` of length 210819 with parameters:
86
  ```
87
- {'batch_size': 3, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
88
  ```
89
 
90
  **Loss**:
@@ -120,4 +130,5 @@ SentenceTransformer(
120
 
121
  ## Citing & Authors
122
 
123
- <!--- Describe where people can find more information -->
 
 
5
  - feature-extraction
6
  - sentence-similarity
7
  - transformers
8
+ language:
9
+ - is
10
  ---
11
 
12
+ # Icelandic SBERT for Sentence Embedding
13
 
14
  This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
15
 
16
+ ## Data
17
+
18
+ The model was trained on 600 000 sentences, selected at random from clarin-is: [unanotated news2 from IGC(RMH)](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/238)
19
+
20
+
21
+ to install the data, run the following command:
22
+
23
+ ```bash
24
+ curl --remote-name-all https://repository.clarin.is/repository/xmlui/bitstream/handle/20.500.12537/238{/IGC-News2-22.10.TEI.zip}
25
+ ```
26
 
27
  ## Usage (Sentence-Transformers)
28
 
 
92
 
93
  **DataLoader**:
94
 
95
+ `torch.utils.data.dataloader.DataLoader` of length 150000 with parameters:
96
  ```
97
+ {'batch_size': 2, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
98
  ```
99
 
100
  **Loss**:
 
130
 
131
  ## Citing & Authors
132
 
133
+ <!--- Describe where people can find more information -->
134
+ Sigurdur Haukur Birgisson