romain125 commited on
Commit
0e9dcb8
·
verified ·
1 Parent(s): 1d814f1

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,451 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:5302
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: intfloat/multilingual-e5-base
10
+ datasets:
11
+ - Lettria/GRAG-GO-IDF-Only-Pos
12
+ pipeline_tag: sentence-similarity
13
+ library_name: sentence-transformers
14
+ metrics:
15
+ - pearson_cosine
16
+ - spearman_cosine
17
+ - cosine_accuracy
18
+ - cosine_accuracy_threshold
19
+ - cosine_f1
20
+ - cosine_f1_threshold
21
+ - cosine_precision
22
+ - cosine_recall
23
+ - cosine_ap
24
+ - cosine_mcc
25
+ model-index:
26
+ - name: SentenceTransformer based on intfloat/multilingual-e5-base
27
+ results:
28
+ - task:
29
+ type: semantic-similarity
30
+ name: Semantic Similarity
31
+ dataset:
32
+ name: EmbeddingSimEval
33
+ type: EmbeddingSimEval
34
+ metrics:
35
+ - type: pearson_cosine
36
+ value: .nan
37
+ name: Pearson Cosine
38
+ - type: spearman_cosine
39
+ value: .nan
40
+ name: Spearman Cosine
41
+ - task:
42
+ type: binary-classification
43
+ name: Binary Classification
44
+ dataset:
45
+ name: BinaryClassifEval
46
+ type: BinaryClassifEval
47
+ metrics:
48
+ - type: cosine_accuracy
49
+ value: 0.8
50
+ name: Cosine Accuracy
51
+ - type: cosine_accuracy_threshold
52
+ value: 0.8309140205383301
53
+ name: Cosine Accuracy Threshold
54
+ - type: cosine_f1
55
+ value: 0.888888888888889
56
+ name: Cosine F1
57
+ - type: cosine_f1_threshold
58
+ value: 0.8309140205383301
59
+ name: Cosine F1 Threshold
60
+ - type: cosine_precision
61
+ value: 1.0
62
+ name: Cosine Precision
63
+ - type: cosine_recall
64
+ value: 0.8
65
+ name: Cosine Recall
66
+ - type: cosine_ap
67
+ value: 1.0
68
+ name: Cosine Ap
69
+ - type: cosine_mcc
70
+ value: 0.0
71
+ name: Cosine Mcc
72
+ ---
73
+
74
+ # SentenceTransformer based on intfloat/multilingual-e5-base
75
+
76
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) on the [grag-go-idf-only-pos](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
77
+
78
+ ## Model Details
79
+
80
+ ### Model Description
81
+ - **Model Type:** Sentence Transformer
82
+ - **Base model:** [intfloat/multilingual-e5-base](https://huggingface.co/intfloat/multilingual-e5-base) <!-- at revision 835193815a3936a24a0ee7dc9e3d48c1fbb19c55 -->
83
+ - **Maximum Sequence Length:** 512 tokens
84
+ - **Output Dimensionality:** 768 dimensions
85
+ - **Similarity Function:** Cosine Similarity
86
+ - **Training Dataset:**
87
+ - [grag-go-idf-only-pos](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos)
88
+ <!-- - **Language:** Unknown -->
89
+ <!-- - **License:** Unknown -->
90
+
91
+ ### Model Sources
92
+
93
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
94
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
95
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
96
+
97
+ ### Full Model Architecture
98
+
99
+ ```
100
+ SentenceTransformer(
101
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel
102
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
103
+ (2): Normalize()
104
+ )
105
+ ```
106
+
107
+ ## Usage
108
+
109
+ ### Direct Usage (Sentence Transformers)
110
+
111
+ First install the Sentence Transformers library:
112
+
113
+ ```bash
114
+ pip install -U sentence-transformers
115
+ ```
116
+
117
+ Then you can load this model and run inference.
118
+ ```python
119
+ from sentence_transformers import SentenceTransformer
120
+
121
+ # Download from the 🤗 Hub
122
+ model = SentenceTransformer("Lettria/test_finetuned_model")
123
+ # Run inference
124
+ sentences = [
125
+ 'The weather is lovely today.',
126
+ "It's so sunny outside!",
127
+ 'He drove to the stadium.',
128
+ ]
129
+ embeddings = model.encode(sentences)
130
+ print(embeddings.shape)
131
+ # [3, 768]
132
+
133
+ # Get the similarity scores for the embeddings
134
+ similarities = model.similarity(embeddings, embeddings)
135
+ print(similarities.shape)
136
+ # [3, 3]
137
+ ```
138
+
139
+ <!--
140
+ ### Direct Usage (Transformers)
141
+
142
+ <details><summary>Click to see the direct usage in Transformers</summary>
143
+
144
+ </details>
145
+ -->
146
+
147
+ <!--
148
+ ### Downstream Usage (Sentence Transformers)
149
+
150
+ You can finetune this model on your own dataset.
151
+
152
+ <details><summary>Click to expand</summary>
153
+
154
+ </details>
155
+ -->
156
+
157
+ <!--
158
+ ### Out-of-Scope Use
159
+
160
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
161
+ -->
162
+
163
+ ## Evaluation
164
+
165
+ ### Metrics
166
+
167
+ #### Semantic Similarity
168
+
169
+ * Dataset: `EmbeddingSimEval`
170
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
171
+
172
+ | Metric | Value |
173
+ |:--------------------|:--------|
174
+ | pearson_cosine | nan |
175
+ | **spearman_cosine** | **nan** |
176
+
177
+ #### Binary Classification
178
+
179
+ * Dataset: `BinaryClassifEval`
180
+ * Evaluated with [<code>BinaryClassificationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.BinaryClassificationEvaluator)
181
+
182
+ | Metric | Value |
183
+ |:--------------------------|:--------|
184
+ | cosine_accuracy | 0.8 |
185
+ | cosine_accuracy_threshold | 0.8309 |
186
+ | cosine_f1 | 0.8889 |
187
+ | cosine_f1_threshold | 0.8309 |
188
+ | cosine_precision | 1.0 |
189
+ | cosine_recall | 0.8 |
190
+ | **cosine_ap** | **1.0** |
191
+ | cosine_mcc | 0.0 |
192
+
193
+ <!--
194
+ ## Bias, Risks and Limitations
195
+
196
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
197
+ -->
198
+
199
+ <!--
200
+ ### Recommendations
201
+
202
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
203
+ -->
204
+
205
+ ## Training Details
206
+
207
+ ### Training Dataset
208
+
209
+ #### grag-go-idf-only-pos
210
+
211
+ * Dataset: [grag-go-idf-only-pos](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos) at [9743952](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos/tree/9743952a5d02847c83f30a59009b3231c56871a3)
212
+ * Size: 5,302 training samples
213
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
214
+ * Approximate statistics based on the first 1000 samples:
215
+ | | sentence1 | sentence2 | label |
216
+ |:--------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
217
+ | type | string | string | int |
218
+ | details | <ul><li>min: 142 tokens</li><li>mean: 260.2 tokens</li><li>max: 340 tokens</li></ul> | <ul><li>min: 32 tokens</li><li>mean: 37.2 tokens</li><li>max: 44 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
219
+ * Samples:
220
+ | sentence1 | sentence2 | label |
221
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------|:---------------|
222
+ | <code>Procédures et démarches: Les dossiers peuvent être déposés toute l'année sur mesdemarches.iledefrance.fr préalablement au commencement du projet. Un démarrage anticipé peut être autorisé, mais il ne préjuge pas de la décision de la Commission permanente de l’octroi de la subvention.Le candidat qui présente plus d’un projet, doit réaliser autant de dossiers de candidature que de projets.Après instruction des dossiers par les services régionaux, l'attribution définitive des aides est votée en commission permanente.<br>Bénéficiaires: Association - Régie par la loi de 1901, Professionnel - ETI < 5000, Professionnel - GE > 5000, Professionnel - PME < 250, Professionnel - TPE < 10, Collectivité ou institution - Autre (GIP, copropriété, EPA...), Collectivité ou institution - Bailleurs sociaux, Collectivité ou institution - Communes de 10 000 à 20 000 hab, Collectivité ou institution - Communes de 2000 à 10 000 hab, Collectivité ou institution - Communes de < 2000 hab, Collectivité ou institution...</code> | <code>[Association](entité) --- UTILISE ---> [mesdemarches.iledefrance.fr](plateforme)</code> | <code>1</code> |
223
+ | <code>Procédures et démarches: Merci de contacter le service concerné au sein de la direction de la culture, afin de vous accompagner dans la constitution de votre dossier. Le dépôt du dossier à la Région doit intervenir obligatoirement avant le début des travaux (ou avant l'engagement des dépenses d'acquisition).La demande d'aide doit faire l’objet d’un dossier de candidature complet. Le projet objet de la demande d’aide doit être financé à hauteur de 20% minimum par la structure porteuse.<br>Bénéficiaires: Association - Fondation, Association - ONG, Association - Régie par la loi de 1901, Collectivité ou institution - Autre (GIP, copropriété, EPA...), Collectivité ou institution - Communes de 10 000 à 20 000 hab, Collectivité ou institution - Communes de 2000 à 10 000 hab, Collectivité ou institution - Communes de < 2000 hab, Collectivité ou institution - Communes de > 20 000 hab, Collectivité ou institution - Département, Collectivité ou institution - EPT / Métropole du Grand Paris, Collec...</code> | <code>[Collectivité ou institution - Communes de 10 000 à 20 000 hab](organisation) --- BÉNÉFICIAIRE ---> [Région](organisation)</code> | <code>1</code> |
224
+ | <code>Type de project: L’excès de précipitations tout au long de l’année a conduit à une chute spectaculaire des rendements des céréales d’été et des protéagineux (blé, orge, pois, féverole, etc.) que produisent 90% des agriculteurs d’Île-de-France, historique grenier à blé du pays. Tributaires naturels du fleurissement des cultures, les apiculteurs professionnels de la région ont également souffert de ces dérèglements climatiques.La Région accompagne les exploitations concernées en leur apportant une aide exceptionnelle.</code> | <code>[excès de précipitations](phénomène) --- DIMINUE ---> [rendements des protéagineux](concept)</code> | <code>1</code> |
225
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
226
+ ```json
227
+ {
228
+ "scale": 20.0,
229
+ "similarity_fct": "cos_sim"
230
+ }
231
+ ```
232
+
233
+ ### Evaluation Dataset
234
+
235
+ #### grag-go-idf-only-pos
236
+
237
+ * Dataset: [grag-go-idf-only-pos](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos) at [9743952](https://huggingface.co/datasets/Lettria/GRAG-GO-IDF-Only-Pos/tree/9743952a5d02847c83f30a59009b3231c56871a3)
238
+ * Size: 1,325 evaluation samples
239
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>label</code>
240
+ * Approximate statistics based on the first 1000 samples:
241
+ | | sentence1 | sentence2 | label |
242
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------|
243
+ | type | string | string | int |
244
+ | details | <ul><li>min: 31 tokens</li><li>mean: 86.2 tokens</li><li>max: 160 tokens</li></ul> | <ul><li>min: 25 tokens</li><li>mean: 28.6 tokens</li><li>max: 33 tokens</li></ul> | <ul><li>1: 100.00%</li></ul> |
245
+ * Samples:
246
+ | sentence1 | sentence2 | label |
247
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------|:---------------|
248
+ | <code>Date de début: non précisée<br>Date de fin (clôture): non précisée<br>Date de début de la future campagne: non précisée</code> | <code>[Date de fin](concept) --- EST ---> [non précisée](__inferred__)</code> | <code>1</code> |
249
+ | <code>Type de project: L’action porte sur 3 dimensions constituant un dispositif global d’accompagnement des jeunes filles vers la réussite de leurs études et le développement de leurs ambitions : Mentorat par salariés d’entreprises et mentors d’établissement scolaires ou bénévole de l’association. Le mentor d’entreprise joue le rôle de passeur social pour la jeune fille.Accompagnement collectif qui au-delà d’être un soutien au bon fonctionnement de la relation mentor-filleule crée et organise un programme d’animations (plus de 200 activités en présentiel et digital l’an dernier en Île-de-France) varié couvrant les leviers sur lesquels agit l’association.Accompagnement par soutien matériel.</code> | <code>[action](__inferred__) --- INCLUT ---> [mentorat](concept)</code> | <code>1</code> |
250
+ | <code>Date de début: non précisée<br>Date de fin (clôture): non précisée<br>Date de début de la future campagne: non précisée</code> | <code>[Date de début de la future campagne](concept) --- EST ---> [non précisée](__inferred__)</code> | <code>1</code> |
251
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
252
+ ```json
253
+ {
254
+ "scale": 20.0,
255
+ "similarity_fct": "cos_sim"
256
+ }
257
+ ```
258
+
259
+ ### Training Hyperparameters
260
+ #### Non-Default Hyperparameters
261
+
262
+ - `eval_strategy`: epoch
263
+ - `per_device_train_batch_size`: 2
264
+ - `per_device_eval_batch_size`: 2
265
+ - `num_train_epochs`: 1
266
+ - `use_cpu`: True
267
+ - `dataloader_pin_memory`: False
268
+
269
+ #### All Hyperparameters
270
+ <details><summary>Click to expand</summary>
271
+
272
+ - `overwrite_output_dir`: False
273
+ - `do_predict`: False
274
+ - `eval_strategy`: epoch
275
+ - `prediction_loss_only`: True
276
+ - `per_device_train_batch_size`: 2
277
+ - `per_device_eval_batch_size`: 2
278
+ - `per_gpu_train_batch_size`: None
279
+ - `per_gpu_eval_batch_size`: None
280
+ - `gradient_accumulation_steps`: 1
281
+ - `eval_accumulation_steps`: None
282
+ - `torch_empty_cache_steps`: None
283
+ - `learning_rate`: 5e-05
284
+ - `weight_decay`: 0.0
285
+ - `adam_beta1`: 0.9
286
+ - `adam_beta2`: 0.999
287
+ - `adam_epsilon`: 1e-08
288
+ - `max_grad_norm`: 1.0
289
+ - `num_train_epochs`: 1
290
+ - `max_steps`: -1
291
+ - `lr_scheduler_type`: linear
292
+ - `lr_scheduler_kwargs`: {}
293
+ - `warmup_ratio`: 0.0
294
+ - `warmup_steps`: 0
295
+ - `log_level`: passive
296
+ - `log_level_replica`: warning
297
+ - `log_on_each_node`: True
298
+ - `logging_nan_inf_filter`: True
299
+ - `save_safetensors`: True
300
+ - `save_on_each_node`: False
301
+ - `save_only_model`: False
302
+ - `restore_callback_states_from_checkpoint`: False
303
+ - `no_cuda`: False
304
+ - `use_cpu`: True
305
+ - `use_mps_device`: False
306
+ - `seed`: 42
307
+ - `data_seed`: None
308
+ - `jit_mode_eval`: False
309
+ - `use_ipex`: False
310
+ - `bf16`: False
311
+ - `fp16`: False
312
+ - `fp16_opt_level`: O1
313
+ - `half_precision_backend`: auto
314
+ - `bf16_full_eval`: False
315
+ - `fp16_full_eval`: False
316
+ - `tf32`: None
317
+ - `local_rank`: 0
318
+ - `ddp_backend`: None
319
+ - `tpu_num_cores`: None
320
+ - `tpu_metrics_debug`: False
321
+ - `debug`: []
322
+ - `dataloader_drop_last`: False
323
+ - `dataloader_num_workers`: 0
324
+ - `dataloader_prefetch_factor`: None
325
+ - `past_index`: -1
326
+ - `disable_tqdm`: False
327
+ - `remove_unused_columns`: True
328
+ - `label_names`: None
329
+ - `load_best_model_at_end`: False
330
+ - `ignore_data_skip`: False
331
+ - `fsdp`: []
332
+ - `fsdp_min_num_params`: 0
333
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
334
+ - `fsdp_transformer_layer_cls_to_wrap`: None
335
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
336
+ - `deepspeed`: None
337
+ - `label_smoothing_factor`: 0.0
338
+ - `optim`: adamw_torch
339
+ - `optim_args`: None
340
+ - `adafactor`: False
341
+ - `group_by_length`: False
342
+ - `length_column_name`: length
343
+ - `ddp_find_unused_parameters`: None
344
+ - `ddp_bucket_cap_mb`: None
345
+ - `ddp_broadcast_buffers`: False
346
+ - `dataloader_pin_memory`: False
347
+ - `dataloader_persistent_workers`: False
348
+ - `skip_memory_metrics`: True
349
+ - `use_legacy_prediction_loop`: False
350
+ - `push_to_hub`: False
351
+ - `resume_from_checkpoint`: None
352
+ - `hub_model_id`: None
353
+ - `hub_strategy`: every_save
354
+ - `hub_private_repo`: None
355
+ - `hub_always_push`: False
356
+ - `gradient_checkpointing`: False
357
+ - `gradient_checkpointing_kwargs`: None
358
+ - `include_inputs_for_metrics`: False
359
+ - `include_for_metrics`: []
360
+ - `eval_do_concat_batches`: True
361
+ - `fp16_backend`: auto
362
+ - `push_to_hub_model_id`: None
363
+ - `push_to_hub_organization`: None
364
+ - `mp_parameters`:
365
+ - `auto_find_batch_size`: False
366
+ - `full_determinism`: False
367
+ - `torchdynamo`: None
368
+ - `ray_scope`: last
369
+ - `ddp_timeout`: 1800
370
+ - `torch_compile`: False
371
+ - `torch_compile_backend`: None
372
+ - `torch_compile_mode`: None
373
+ - `dispatch_batches`: None
374
+ - `split_batches`: None
375
+ - `include_tokens_per_second`: False
376
+ - `include_num_input_tokens_seen`: False
377
+ - `neftune_noise_alpha`: None
378
+ - `optim_target_modules`: None
379
+ - `batch_eval_metrics`: False
380
+ - `eval_on_start`: False
381
+ - `use_liger_kernel`: False
382
+ - `eval_use_gather_object`: False
383
+ - `average_tokens_across_devices`: False
384
+ - `prompts`: None
385
+ - `batch_sampler`: batch_sampler
386
+ - `multi_dataset_batch_sampler`: proportional
387
+
388
+ </details>
389
+
390
+ ### Training Logs
391
+ | Epoch | Step | Training Loss | Validation Loss | EmbeddingSimEval_spearman_cosine | BinaryClassifEval_cosine_ap |
392
+ |:------:|:----:|:-------------:|:---------------:|:--------------------------------:|:---------------------------:|
393
+ | 0.6667 | 2 | 0.6283 | - | - | - |
394
+ | 1.0 | 3 | - | 0.1791 | nan | 1.0 |
395
+
396
+
397
+ ### Framework Versions
398
+ - Python: 3.11.11
399
+ - Sentence Transformers: 3.4.1
400
+ - Transformers: 4.48.3
401
+ - PyTorch: 2.6.0+cpu
402
+ - Accelerate: 1.4.0
403
+ - Datasets: 3.3.1
404
+ - Tokenizers: 0.21.0
405
+
406
+ ## Citation
407
+
408
+ ### BibTeX
409
+
410
+ #### Sentence Transformers
411
+ ```bibtex
412
+ @inproceedings{reimers-2019-sentence-bert,
413
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
414
+ author = "Reimers, Nils and Gurevych, Iryna",
415
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
416
+ month = "11",
417
+ year = "2019",
418
+ publisher = "Association for Computational Linguistics",
419
+ url = "https://arxiv.org/abs/1908.10084",
420
+ }
421
+ ```
422
+
423
+ #### MultipleNegativesRankingLoss
424
+ ```bibtex
425
+ @misc{henderson2017efficient,
426
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
427
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
428
+ year={2017},
429
+ eprint={1705.00652},
430
+ archivePrefix={arXiv},
431
+ primaryClass={cs.CL}
432
+ }
433
+ ```
434
+
435
+ <!--
436
+ ## Glossary
437
+
438
+ *Clearly define terms in order to be accessible across audiences.*
439
+ -->
440
+
441
+ <!--
442
+ ## Model Card Authors
443
+
444
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
445
+ -->
446
+
447
+ <!--
448
+ ## Model Card Contact
449
+
450
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
451
+ -->
config.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "models/debug",
3
+ "architectures": [
4
+ "XLMRobertaModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "bos_token_id": 0,
8
+ "classifier_dropout": null,
9
+ "eos_token_id": 2,
10
+ "hidden_act": "gelu",
11
+ "hidden_dropout_prob": 0.1,
12
+ "hidden_size": 768,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 3072,
15
+ "layer_norm_eps": 1e-05,
16
+ "max_position_embeddings": 514,
17
+ "model_type": "xlm-roberta",
18
+ "num_attention_heads": 12,
19
+ "num_hidden_layers": 12,
20
+ "output_past": true,
21
+ "pad_token_id": 1,
22
+ "position_embedding_type": "absolute",
23
+ "torch_dtype": "float32",
24
+ "transformers_version": "4.48.3",
25
+ "type_vocab_size": 1,
26
+ "use_cache": true,
27
+ "vocab_size": 250002
28
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.1",
4
+ "transformers": "4.48.3",
5
+ "pytorch": "2.6.0+cpu"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ced70aa855c7fae97e99fa4108adc17d6e3bec0a0063ad6b1aedd9a56a3f263e
3
+ size 1112197096
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "max_length": 512,
51
+ "model_max_length": 512,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "<pad>",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "</s>",
57
+ "stride": 0,
58
+ "tokenizer_class": "XLMRobertaTokenizer",
59
+ "truncation_side": "right",
60
+ "truncation_strategy": "longest_first",
61
+ "unk_token": "<unk>"
62
+ }