File size: 17,070 Bytes
0b46400
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
---
base_model: sentence-transformers/LaBSE
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:23999
- loss:MultipleNegativesRankingLoss
widget:
- source_sentence: Who led thee through that great and terrible wilderness , wherein
    were fiery serpents , and scorpions , and drought , where there was no water  ;
    who brought thee forth water out of the rock of flint  ;
  sentences:
  - bad u ai ïa ki ha u Aaron bad ki khun shynrang jong u .
  - U la ïalam ïa phi lyngba ka ri shyiap kaba ïar bad kaba ishyrkhei eh , ha kaba
    la don ki bseiñ kiba don bih bad ki  ñianglartham . Ha kata ka ri kaba tyrkhong
    bad ka bym don um , u la pynmih um na u mawsiang na ka bynta jong phi .
  - Ki paidbah na ki jait ba na shatei ki phah khot ïa u , bad nangta ma ki baroh
    ki ïaleit lang sha u Rehoboam bad ki ong ha u ,
- source_sentence: And , behold , Boaz came from Beth–lehem , and said unto the reapers
    , The  Lord  be with you . And they answered him , The  Lord  bless thee .
  sentences:
  - Ko ki briew bymïaineh , to wan noh ; phi long ki jong nga . Ngan shim iwei na
    phi na kawei kawei ka shnong bad ar ngut na kawei kawei ka kur , bad ngan wallam
    pat ïa phi sha u lum Seïon .
  - Hadien katto katne por u Boas da lade hi u wan poi na Bethlehem bad u ai khublei
    ïa ki nongtrei .  To U  Trai  un long ryngkat bad phi !  u ong .  U  Trai  u kyrkhu
    ïa phi !  ki jubab .
  - U Trai u la ong ha u ,  Khreh bad leit sha  Ka Lynti Ba-beit ,’ bad ha ka ïing
    jong u Judas kylli ïa u briew na Tarsos uba kyrteng u Saul .
- source_sentence: Jehovah used the prehuman Jesus as his “master worker” in creating
    all other things in heaven and on earth .
  sentences:
  - Shuwa ba un wan long briew U Jehobah u la pyndonkam ïa u Jisu kum u “rangbah nongtrei”
    ha kaba thaw ïa kiei kiei baroh kiba don ha bneng bad ha khyndew .
  - Shisien la don u briew uba la leit ban bet symbai . Katba u dang bet ïa u symbai
    , katto katne na u , ki la hap ha shi lynter ka lynti ïaid kjat , ha kaba ki la
    shah ïuh , bad ki sim ki la bam lut .
  - Ngan ïathuh ïa ka shatei ban shah ïa ki ban leit bad ïa ka shathie ban ym bat
    noh ïa ki . Ai ba ki briew jong nga ki wan phai na ki ri bajngai , na man la ki
    bynta baroh jong ka pyrthei .
- source_sentence: 'The like figure whereunto even baptism doth also now save us (
    not the putting away of the filth of the flesh , but the answer of a good conscience
    toward God , ) by the resurrection of Jesus Christ :'
  sentences:
  - kaba long ka dak kaba kdew sha ka jingpynbaptis , kaba pyllait im ïa phi mynta
    . Kam dei ka jingsait noh ïa ka jakhlia na ka met , hynrei ka jingkular ba la
    pynlong sha U Blei na ka jingïatiplem babha . Ka pynim ïa phi da ka jingmihpat
    jong U Jisu Khrist ,
  - Ki briew kiba sniew kin ïoh ïa kaei kaba ki dei ban ïoh . Ki briew kiba bha kin
    ïoh bainong na ka bynta ki kam jong ki .
  - Nangta nga la ïohi ïa ka bneng bathymmai bad ïa ka pyrthei bathymmai . Ka bneng
    banyngkong bad ka pyrthei banyngkong ki la jah noh , bad ka duriaw kam don shuh
    .
- source_sentence: On that day they read in the book of Moses in the audience of the
    people  ; and therein was found written , that the Ammonite and the Moabite should
    not come into the congregation of God for ever  ;
  sentences:
  - U Elisha u la ïap bad la tep ïa u . Man la ka snem ki kynhun jong ki Moab ki ju
    wan tur thma ïa ka ri Israel .
  - Katba dang pule jam ïa ka Hukum u Moses ha u paidbah , ki poi ha ka bynta kaba
    ong ba ym dei ban shah ïa u nong Amon ne u nong Moab ban ïasnohlang bad ki briew
    jong U Blei .
  - U angel u la jubab ,  U Mynsiem Bakhuid un sa wan ha pha , bad ka bor jong U Blei
    kan shong halor jong pha . Na kane ka daw , ïa i khunlung bakhuid yn khot U Khun
    U Blei .
---

# SentenceTransformer based on sentence-transformers/LaBSE

This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

## Model Details

### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [sentence-transformers/LaBSE](https://huggingface.co/sentence-transformers/LaBSE) <!-- at revision e34fab64a3011d2176c99545a93d5cbddc9a91b7 -->
- **Maximum Sequence Length:** 256 tokens
- **Output Dimensionality:** 768 tokens
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->

### Model Sources

- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

### Full Model Architecture

```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
  (3): Normalize()
)
```

## Usage

### Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

```bash
pip install -U sentence-transformers
```

Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("ABHIiiii1/LaBSE-Fine-Tuned-EN-KHA")
# Run inference
sentences = [
    'On that day they read in the book of Moses in the audience of the people  ; and therein was found written , that the Ammonite and the Moabite should not come into the congregation of God for ever  ;',
    'Katba dang pule jam ïa ka Hukum u Moses ha u paidbah , ki poi ha ka bynta kaba ong ba ym dei ban shah ïa u nong Amon ne u nong Moab ban ïasnohlang bad ki briew jong U Blei .',
    'U Elisha u la ïap bad la tep ïa u . Man la ka snem ki kynhun jong ki Moab ki ju wan tur thma ïa ka ri Israel .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```

<!--
### Direct Usage (Transformers)

<details><summary>Click to see the direct usage in Transformers</summary>

</details>
-->

<!--
### Downstream Usage (Sentence Transformers)

You can finetune this model on your own dataset.

<details><summary>Click to expand</summary>

</details>
-->

<!--
### Out-of-Scope Use

*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->

<!--
## Bias, Risks and Limitations

*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->

<!--
### Recommendations

*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->

## Training Details

### Training Dataset

#### Unnamed Dataset


* Size: 23,999 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 1000 samples:
  |         | sentence_0                                                                        | sentence_1                                                                         |
  |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
  | type    | string                                                                            | string                                                                             |
  | details | <ul><li>min: 6 tokens</li><li>mean: 34.89 tokens</li><li>max: 87 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 51.51 tokens</li><li>max: 127 tokens</li></ul> |
* Samples:
  | sentence_0                                                                                                                                                                                                        | sentence_1                                                                                                                                                                                                                                                                                                                          |
  |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
  | <code>And Moses went out from Pharaoh , and entreated the  Lord .</code>                                                                                                                                          | <code>U Moses u mihnoh na u Pharaoh , bad u kyrpad ïa U  Trai  ,</code>                                                                                                                                                                                                                                                             |
  | <code>In the ninth year of Hoshea the king of Assyria took Samaria , and carried Israel away into Assyria , and placed them in Halah and in Habor by the river of Gozan , and in the cities of the Medes .</code> | <code>kaba long ka snem kaba khyndai jong ka jingsynshar u Hoshea , u patsha ka Assyria u kurup ïa ka Samaria , u rah ïa ki Israel sha Assyria kum ki koidi , bad pynsah katto katne ngut na ki ha ka nongbah Halah , katto katne pat hajan ka wah Habor ha ka distrik Gosan , bad katto katne ha ki nongbah jong ka Media .</code> |
  | <code>And the king said unto Cushi , Is the young man Absalom safe ? And Cushi answered , The enemies of my lord the king , and all that rise against thee to do thee hurt , be as that young man is .</code>     | <code>Hato u samla Absalom u dang im ?  u syiem u kylli . U mraw u jubab ,  Ko Kynrad , nga sngew ba kaei kaba la jia ha u kan jin da la jia ha baroh ki nongshun jong ngi , bad ha baroh kiba ïaleh pyrshah ïa phi .</code>                                                                                                        |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
  ```json
  {
      "scale": 20.0,
      "similarity_fct": "cos_sim"
  }
  ```

### Training Hyperparameters
#### Non-Default Hyperparameters

- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `multi_dataset_batch_sampler`: round_robin

#### All Hyperparameters
<details><summary>Click to expand</summary>

- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: no
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 16
- `per_device_eval_batch_size`: 16
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 3
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: False
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`: 
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `dispatch_batches`: None
- `split_batches`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin

</details>

### Training Logs
| Epoch  | Step | Training Loss |
|:------:|:----:|:-------------:|
| 0.3333 | 500  | 0.542         |
| 0.6667 | 1000 | 0.135         |
| 1.0    | 1500 | 0.0926        |
| 1.3333 | 2000 | 0.0535        |
| 1.6667 | 2500 | 0.0226        |
| 2.0    | 3000 | 0.018         |
| 2.3333 | 3500 | 0.0124        |
| 2.6667 | 4000 | 0.0057        |
| 3.0    | 4500 | 0.0053        |


### Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.0.1
- Transformers: 4.42.3
- PyTorch: 2.1.2
- Accelerate: 0.32.1
- Datasets: 2.20.0
- Tokenizers: 0.19.1

## Citation

### BibTeX

#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```

#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
```

<!--
## Glossary

*Clearly define terms in order to be accessible across audiences.*
-->

<!--
## Model Card Authors

*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->

<!--
## Model Card Contact

*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->