dbourget commited on
Commit
2db6871
1 Parent(s): 74ac154

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,660 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: dbourget/pb-ds1-48K
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ metrics:
7
+ - pearson_cosine
8
+ - spearman_cosine
9
+ - pearson_manhattan
10
+ - spearman_manhattan
11
+ - pearson_euclidean
12
+ - spearman_euclidean
13
+ - pearson_dot
14
+ - spearman_dot
15
+ - pearson_max
16
+ - spearman_max
17
+ pipeline_tag: sentence-similarity
18
+ tags:
19
+ - sentence-transformers
20
+ - sentence-similarity
21
+ - feature-extraction
22
+ - generated_from_trainer
23
+ - dataset_size:106810
24
+ - loss:CosineSimilarityLoss
25
+ widget:
26
+ - source_sentence: In The Law of Civilization and Decay, Brooks provides a detailed
27
+ look at the rise and fall of civilizations, offering a critical perspective on
28
+ the impact of capitalism. As societies become prosperous, their pursuit of wealth
29
+ ultimately leads to their own downfall as greed takes over.
30
+ sentences:
31
+ - Patrick Todd's The Open Future argues that all future contingent statements, such
32
+ as 'It will rain tomorrow', are inherently false.
33
+ - If propositions are made true in virtue of corresponding to facts, then what are
34
+ the truth-makers of true negative propositions such as ‘The apple is not red’?
35
+ Russell argued that there must be negative facts to account for what makes true
36
+ negative propositions true and false positive propositions false. Others, more
37
+ parsimonious in their ontological commitments, have attempted to avoid them. Wittgenstein
38
+ rejected them since he was loath to think that the sign for negation referred
39
+ to a negative element in a fact. A contemporary of Russell’s, Raphael Demos, attempted
40
+ to eliminate them by appealing to ‘incompatibility’ facts. More recently, Armstrong
41
+ has appealed to the totality of positive facts as the ground of the truth of true
42
+ negative propositions. Oaklander and Miracchi have suggested that the absence
43
+ or non-existence of the positive fact (which is not itself a further fact) is
44
+ the basis of a positive proposition being false and therefore of the truth of
45
+ its negation.
46
+ - The Law of Civilization and Decay is an overview of history, articulating Brooks'
47
+ critical view of capitalism. A civilization grows wealthy, and then its wealth
48
+ causes it to crumble upon itself due to greed.
49
+ - source_sentence: It is generally accepted that the development of the modern sciences
50
+ is rooted in experiment. Yet for a long time, experimentation did not occupy a
51
+ prominent role, neither in philosophy nor in history of science. With the ‘practical
52
+ turn’ in studying the sciences and their history, this has begun to change. This
53
+ paper is concerned with systems and cultures of experimentation and the consistencies
54
+ that are generated within such systems and cultures. The first part of the paper
55
+ exposes the forms of historical and structural coherence that characterize the
56
+ experimental exploration of epistemic objects. In the second part, a particular
57
+ experimental culture in the life sciences is briefly described as an example.
58
+ A survey will be given of what it means and what it takes to analyze biological
59
+ functions in the test tube
60
+ sentences:
61
+ - Experimentation has long been overlooked in the study of science, but with a new
62
+ focus on practical aspects, this is starting to change. This paper explores the
63
+ systems and cultures of experimentation and the patterns that emerge within them.
64
+ The first part discusses the historical and structural coherence of experimental
65
+ exploration. The second part provides a brief overview of an experimental culture
66
+ in the life sciences. The paper concludes with a discussion on analyzing biological
67
+ functions in the test tube.
68
+ - Hintikka and Mutanen have introduced Trail-And-Error machines as a new way to
69
+ think about computation, expanding on the traditional Turing machine model. This
70
+ innovation opens up new possibilities in the field of computation theory.
71
+ - As Allaire and Firsirotu (1984) pointed out over a decade ago, the concept of
72
+ culture seemed to be sliding inexorably into a superficial explanatory pool that
73
+ promised everything and nothing. However, since then, some sophisticated and interesting
74
+ theoretical developments have prevented drowning in the pool of superficiality
75
+ and hence theoretical redundancy. The purpose of this article is to build upon
76
+ such theoretical developments and to introduce an approach that maintains that
77
+ culture can be theorized in the same way as structure, possessing irreducible
78
+ powers and properties that predispose organizational actors towards specific courses
79
+ of action. The morphogenetic approach is the methodological complement of transcendental
80
+ realism, providing explanatory leverage on the conditions that maintain for cultural
81
+ change or stability.
82
+ - source_sentence: 'This chapter examines three approaches to applied political and
83
+ legal philosophy: Standard activism is primarily addressed to other philosophers,
84
+ adopts an indirect and coincidental role in creating change, and counts articulating
85
+ sound arguments as success. Extreme activism, in contrast, is a form of applied
86
+ philosophy directly addressed to policy-makers, with the goal of bringing about
87
+ a particular outcome, and measures success in terms of whether it makes a direct
88
+ causal contribution to that goal. Finally, conceptual activism (like standard
89
+ activism), primarily targets an audience of fellow philosophers, bears a distant,
90
+ non-direct, relation to a desired outcome, and counts success in terms of whether
91
+ it encourages a particular understanding and adoption of the concepts under examination.'
92
+ sentences:
93
+ - John Rawls’ resistance to any kind of global egalitarian principle has seemed
94
+ strange and unconvincing to many commentators, including those generally supportive
95
+ of Rawls’ project. His rejection of a global egalitarian principle seems to rely
96
+ on an assumption that states are economically bounded and separate from one another,
97
+ which is not an accurate portrayal of economic relations among states in our globalised
98
+ world. In this article, I examine the implications of the domestic theory of justice
99
+ as fairness to argue that Rawls has good reason to insist on economically bounded
100
+ states. I argue that certain central features of the contemporary global economy,
101
+ particularly the free movement of capital across borders, undermine the distributional
102
+ autonomy required for states to realise Rawls’ principles of justice, and the
103
+ domestic theory thus requires a certain degree of economic separation among states
104
+ prior to the convening of the international original position. Given this, I defend
105
+ Rawls’ reluctance to endorse a global egalitarian principle and defend a policy
106
+ regime of international capital controls, to restore distributional autonomy and
107
+ make the realisation of the principles of justice as fairness possible.
108
+ - 'Bibliography of the writings by Hilary Putnam: 16 books, 198 articles, 10 translations
109
+ into German (up to 1994).'
110
+ - The jurisprudence under international human rights treaties has had a considerable
111
+ impact across countries. Known for addressing complex agendas, the work of expert
112
+ bodies under the treaties has been credited and relied upon for filling the gaps
113
+ in the realization of several objectives, including the peace and security agenda. In
114
+ 1982, the Human Rights Committee (ICCPR), in a General Comment observed that “states
115
+ have the supreme duty to prevent wars, acts of genocide and other acts of mass
116
+ violence ... Every effort … to avert the danger of war, especially thermonuclear
117
+ war, and to strengthen international peace and security would constitute the most
118
+ important condition and guarantee for the safeguarding of the right to life.”
119
+ Over the years, all treaty bodies have contributed in this direction, endorsing
120
+ peace and security so as “to protect people against direct and structural violence
121
+ … as systemic problems and not merely as isolated incidents …”. A closer look
122
+ at the jurisprudence on peace and security, emanating from treaty monitoring mechanisms
123
+ including state periodic reports, interpretive statements, the individual communications
124
+ procedure, and others, reveals its distinctive nature
125
+ - source_sentence: Autonomist accounts of cognitive science suggest that cognitive
126
+ model building and theory construction (can or should) proceed independently of
127
+ findings in neuroscience. Common functionalist justifications of autonomy rely
128
+ on there being relatively few constraints between neural structure and cognitive
129
+ function (e.g., Weiskopf, 2011). In contrast, an integrative mechanistic perspective
130
+ stresses the mutual constraining of structure and function (e.g., Piccinini &
131
+ Craver, 2011; Povich, 2015). In this paper, I show how model-based cognitive neuroscience
132
+ (MBCN) epitomizes the integrative mechanistic perspective and concentrates the
133
+ most revolutionary elements of the cognitive neuroscience revolution (Boone &
134
+ Piccinini, 2016). I also show how the prominent subset account of functional realization
135
+ supports the integrative mechanistic perspective I take on MBCN and use it to
136
+ clarify the intralevel and interlevel components of integration.
137
+ sentences:
138
+ - Fictional truth, or truth in fiction/pretense, has been the object of extended
139
+ scrutiny among philosophers and logicians in recent decades. Comparatively little
140
+ attention, however, has been paid to its inferential relationships with time and
141
+ with certain deliberate and contingent human activities, namely, the creation
142
+ of fictional works. The aim of the paper is to contribute to filling the gap.
143
+ Toward this goal, a formal framework is outlined that is consistent with a variety
144
+ of conceptions of fictional truth and based upon a specific formal treatment of
145
+ time and agency, that of so-called stit logics. Moreover, a complete axiomatic
146
+ theory of fiction-making TFM is defined, where fiction-making is understood as
147
+ the exercise of agency and choice in time over what is fictionally true. The language
148
+ \ of TFM is an extension of the language of propositional logic, with the addition
149
+ of temporal and modal operators. A distinctive feature of \ with respect to other
150
+ modal languages is a variety of operators having to do with fictional truth, including
151
+ a ‘fictionality’ operator \ . Some applications of TFM are outlined, and some
152
+ interesting linguistic and inferential phenomena, which are not so easily dealt
153
+ with in other frameworks, are accounted for
154
+ - 'We have structured our response according to five questions arising from the
155
+ commentaries: (i) What is sentience? (ii) Is sentience a necessary or sufficient
156
+ condition for moral standing? (iii) What methods should guide comparative cognitive
157
+ research in general, and specifically in studying invertebrates? (iv) How should
158
+ we balance scientific uncertainty and moral risk? (v) What practical strategies
159
+ can help reduce biases and morally dismissive attitudes toward invertebrates?'
160
+ - 'In 2007, ten world-renowned neuroscientists proposed “A Decade of the Mind Initiative.”
161
+ The contention was that, despite the successes of the Decade of the Brain, “a
162
+ fundamental understanding of how the brain gives rise to the mind [was] still
163
+ lacking” (2007, 1321). The primary aims of the decade of the mind were “to build
164
+ on the progress of the recent Decade of the Brain (1990-99)” by focusing on “four
165
+ broad but intertwined areas” of research, including: healing and protecting, understanding,
166
+ enriching, and modeling the mind. These four aims were to be the result of “transdisciplinary
167
+ and multiagency” research spanning “across disparate fields, such as cognitive
168
+ science, medicine, neuroscience, psychology, mathematics, engineering, and computer
169
+ science.” The proposal for a decade of the mind prompted many questions (See Spitzer
170
+ 2008). In this chapter, I address three of them: (1) How do proponents of this
171
+ new decade conceive of the mind? (2) Why should a decade be devoted to understanding
172
+ it? (3) What should this decade look like?'
173
+ - source_sentence: This essay explores the historical and modern perspectives on the
174
+ Gettier problem, highlighting the connections between this issue, skepticism,
175
+ and relevance. Through methods such as historical analysis, induction, and deduction,
176
+ it is found that while contextual theories and varying definitions of knowledge
177
+ do not fully address skeptical challenges, they can help clarify our understanding
178
+ of knowledge. Ultimately, embracing subjectivity and intuition can provide insight
179
+ into what it truly means to claim knowledge.
180
+ sentences:
181
+ - In this article I present and analyze three popular moral justifications for hunting.
182
+ My purpose is to expose the moral terrain of this issue and facilitate more fruitful,
183
+ philosophically relevant discussions about the ethics of hunting.
184
+ - Teaching competency in bioethics has been a concern since the field's inception.
185
+ The first report on the teaching of contemporary bioethics was published in 1976
186
+ by The Hastings Center, which concluded that graduate programs were not necessary
187
+ at the time. However, the report speculated that future developments may require
188
+ new academic structures for graduate education in bioethics. The creation of a
189
+ terminal degree in bioethics has its critics, with scholars debating whether bioethics
190
+ is a discipline with its own methods and theoretical grounding, a multidisciplinary
191
+ field, or something else entirely. Despite these debates, new bioethics training
192
+ programs have emerged at all postsecondary levels in the U.S. This essay examines
193
+ the number and types of programs and degrees in this growing field.
194
+ - 'Objective: In this essay, I will try to track some historical and modern stages
195
+ of the discussion on the Gettier problem, and point out the interrelations of
196
+ the questions that this problem raises for epistemologists, with sceptical arguments,
197
+ and a so-called problem of relevance. Methods: historical analysis, induction,
198
+ generalization, deduction, discourse, intuition results: Albeit the contextual
199
+ theories of knowledge, the use of different definitions of knowledge, and the
200
+ different ways of the uses of knowledge do not resolve all the issues that the
201
+ sceptic can put forward, but they can be productive in giving clarity to a concept
202
+ of knowledge for us. On the other hand, our knowledge will always have an element
203
+ of intuition and subjectivity, however not equating to epistemic luck and probability. Significance
204
+ novelty: the approach to the context in general, not giving up being a Subject
205
+ may give us a clarity about the sense of what it means to say – “I know”.'
206
+ model-index:
207
+ - name: SentenceTransformer based on dbourget/pb-ds1-48K
208
+ results:
209
+ - task:
210
+ type: semantic-similarity
211
+ name: Semantic Similarity
212
+ dataset:
213
+ name: sts dev
214
+ type: sts-dev
215
+ metrics:
216
+ - type: pearson_cosine
217
+ value: 0.9378177365442741
218
+ name: Pearson Cosine
219
+ - type: spearman_cosine
220
+ value: 0.8943299298202461
221
+ name: Spearman Cosine
222
+ - type: pearson_manhattan
223
+ value: 0.9709949018414847
224
+ name: Pearson Manhattan
225
+ - type: spearman_manhattan
226
+ value: 0.8969442622028955
227
+ name: Spearman Manhattan
228
+ - type: pearson_euclidean
229
+ value: 0.9711044669329696
230
+ name: Pearson Euclidean
231
+ - type: spearman_euclidean
232
+ value: 0.8966133108746955
233
+ name: Spearman Euclidean
234
+ - type: pearson_dot
235
+ value: 0.9419649751470724
236
+ name: Pearson Dot
237
+ - type: spearman_dot
238
+ value: 0.8551487313582053
239
+ name: Spearman Dot
240
+ - type: pearson_max
241
+ value: 0.9711044669329696
242
+ name: Pearson Max
243
+ - type: spearman_max
244
+ value: 0.8969442622028955
245
+ name: Spearman Max
246
+ ---
247
+
248
+ # SentenceTransformer based on dbourget/pb-ds1-48K
249
+
250
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [dbourget/pb-ds1-48K](https://huggingface.co/dbourget/pb-ds1-48K). It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
251
+
252
+ ## Model Details
253
+
254
+ ### Model Description
255
+ - **Model Type:** Sentence Transformer
256
+ - **Base model:** [dbourget/pb-ds1-48K](https://huggingface.co/dbourget/pb-ds1-48K) <!-- at revision fcd4aeedcdc3ad836820d47fd28ffd2529914647 -->
257
+ - **Maximum Sequence Length:** 512 tokens
258
+ - **Output Dimensionality:** 768 tokens
259
+ - **Similarity Function:** Cosine Similarity
260
+ <!-- - **Training Dataset:** Unknown -->
261
+ <!-- - **Language:** Unknown -->
262
+ <!-- - **License:** Unknown -->
263
+
264
+ ### Model Sources
265
+
266
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
267
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
268
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
269
+
270
+ ### Full Model Architecture
271
+
272
+ ```
273
+ SentenceTransformer(
274
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
275
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
276
+ )
277
+ ```
278
+
279
+ ## Usage
280
+
281
+ ### Direct Usage (Sentence Transformers)
282
+
283
+ First install the Sentence Transformers library:
284
+
285
+ ```bash
286
+ pip install -U sentence-transformers
287
+ ```
288
+
289
+ Then you can load this model and run inference.
290
+ ```python
291
+ from sentence_transformers import SentenceTransformer
292
+
293
+ # Download from the 🤗 Hub
294
+ model = SentenceTransformer("dbourget/pb-ds1-48K-philsim")
295
+ # Run inference
296
+ sentences = [
297
+ 'This essay explores the historical and modern perspectives on the Gettier problem, highlighting the connections between this issue, skepticism, and relevance. Through methods such as historical analysis, induction, and deduction, it is found that while contextual theories and varying definitions of knowledge do not fully address skeptical challenges, they can help clarify our understanding of knowledge. Ultimately, embracing subjectivity and intuition can provide insight into what it truly means to claim knowledge.',
298
+ 'Objective: In this essay, I will try to track some historical and modern stages of the discussion on the Gettier problem, and point out the interrelations of the questions that this problem raises for epistemologists, with sceptical arguments, and a so-called problem of relevance. Methods: historical analysis, induction, generalization, deduction, discourse, intuition results: Albeit the contextual theories of knowledge, the use of different definitions of knowledge, and the different ways of the uses of knowledge do not resolve all the issues that the sceptic can put forward, but they can be productive in giving clarity to a concept of knowledge for us. On the other hand, our knowledge will always have an element of intuition and subjectivity, however not equating to epistemic luck and probability. Significance novelty: the approach to the context in general, not giving up being a Subject may give us a clarity about the sense of what it means to say – “I know”.',
299
+ "Teaching competency in bioethics has been a concern since the field's inception. The first report on the teaching of contemporary bioethics was published in 1976 by The Hastings Center, which concluded that graduate programs were not necessary at the time. However, the report speculated that future developments may require new academic structures for graduate education in bioethics. The creation of a terminal degree in bioethics has its critics, with scholars debating whether bioethics is a discipline with its own methods and theoretical grounding, a multidisciplinary field, or something else entirely. Despite these debates, new bioethics training programs have emerged at all postsecondary levels in the U.S. This essay examines the number and types of programs and degrees in this growing field.",
300
+ ]
301
+ embeddings = model.encode(sentences)
302
+ print(embeddings.shape)
303
+ # [3, 768]
304
+
305
+ # Get the similarity scores for the embeddings
306
+ similarities = model.similarity(embeddings, embeddings)
307
+ print(similarities.shape)
308
+ # [3, 3]
309
+ ```
310
+
311
+ <!--
312
+ ### Direct Usage (Transformers)
313
+
314
+ <details><summary>Click to see the direct usage in Transformers</summary>
315
+
316
+ </details>
317
+ -->
318
+
319
+ <!--
320
+ ### Downstream Usage (Sentence Transformers)
321
+
322
+ You can finetune this model on your own dataset.
323
+
324
+ <details><summary>Click to expand</summary>
325
+
326
+ </details>
327
+ -->
328
+
329
+ <!--
330
+ ### Out-of-Scope Use
331
+
332
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
333
+ -->
334
+
335
+ ## Evaluation
336
+
337
+ ### Metrics
338
+
339
+ #### Semantic Similarity
340
+ * Dataset: `sts-dev`
341
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
342
+
343
+ | Metric | Value |
344
+ |:--------------------|:-----------|
345
+ | pearson_cosine | 0.9378 |
346
+ | **spearman_cosine** | **0.8943** |
347
+ | pearson_manhattan | 0.971 |
348
+ | spearman_manhattan | 0.8969 |
349
+ | pearson_euclidean | 0.9711 |
350
+ | spearman_euclidean | 0.8966 |
351
+ | pearson_dot | 0.942 |
352
+ | spearman_dot | 0.8551 |
353
+ | pearson_max | 0.9711 |
354
+ | spearman_max | 0.8969 |
355
+
356
+ <!--
357
+ ## Bias, Risks and Limitations
358
+
359
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
360
+ -->
361
+
362
+ <!--
363
+ ### Recommendations
364
+
365
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
366
+ -->
367
+
368
+ ## Training Details
369
+
370
+ ### Training Hyperparameters
371
+ #### Non-Default Hyperparameters
372
+
373
+ - `eval_strategy`: steps
374
+ - `per_device_train_batch_size`: 190
375
+ - `per_device_eval_batch_size`: 190
376
+ - `learning_rate`: 5e-06
377
+ - `num_train_epochs`: 2
378
+ - `warmup_ratio`: 0.1
379
+ - `bf16`: True
380
+ - `batch_sampler`: no_duplicates
381
+
382
+ #### All Hyperparameters
383
+ <details><summary>Click to expand</summary>
384
+
385
+ - `overwrite_output_dir`: False
386
+ - `do_predict`: False
387
+ - `eval_strategy`: steps
388
+ - `prediction_loss_only`: True
389
+ - `per_device_train_batch_size`: 190
390
+ - `per_device_eval_batch_size`: 190
391
+ - `per_gpu_train_batch_size`: None
392
+ - `per_gpu_eval_batch_size`: None
393
+ - `gradient_accumulation_steps`: 1
394
+ - `eval_accumulation_steps`: None
395
+ - `learning_rate`: 5e-06
396
+ - `weight_decay`: 0.0
397
+ - `adam_beta1`: 0.9
398
+ - `adam_beta2`: 0.999
399
+ - `adam_epsilon`: 1e-08
400
+ - `max_grad_norm`: 1.0
401
+ - `num_train_epochs`: 2
402
+ - `max_steps`: -1
403
+ - `lr_scheduler_type`: linear
404
+ - `lr_scheduler_kwargs`: {}
405
+ - `warmup_ratio`: 0.1
406
+ - `warmup_steps`: 0
407
+ - `log_level`: passive
408
+ - `log_level_replica`: warning
409
+ - `log_on_each_node`: True
410
+ - `logging_nan_inf_filter`: True
411
+ - `save_safetensors`: True
412
+ - `save_on_each_node`: False
413
+ - `save_only_model`: False
414
+ - `restore_callback_states_from_checkpoint`: False
415
+ - `no_cuda`: False
416
+ - `use_cpu`: False
417
+ - `use_mps_device`: False
418
+ - `seed`: 42
419
+ - `data_seed`: None
420
+ - `jit_mode_eval`: False
421
+ - `use_ipex`: False
422
+ - `bf16`: True
423
+ - `fp16`: False
424
+ - `fp16_opt_level`: O1
425
+ - `half_precision_backend`: auto
426
+ - `bf16_full_eval`: False
427
+ - `fp16_full_eval`: False
428
+ - `tf32`: None
429
+ - `local_rank`: 0
430
+ - `ddp_backend`: None
431
+ - `tpu_num_cores`: None
432
+ - `tpu_metrics_debug`: False
433
+ - `debug`: []
434
+ - `dataloader_drop_last`: False
435
+ - `dataloader_num_workers`: 0
436
+ - `dataloader_prefetch_factor`: None
437
+ - `past_index`: -1
438
+ - `disable_tqdm`: False
439
+ - `remove_unused_columns`: True
440
+ - `label_names`: None
441
+ - `load_best_model_at_end`: False
442
+ - `ignore_data_skip`: False
443
+ - `fsdp`: []
444
+ - `fsdp_min_num_params`: 0
445
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
446
+ - `fsdp_transformer_layer_cls_to_wrap`: None
447
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
448
+ - `deepspeed`: None
449
+ - `label_smoothing_factor`: 0.0
450
+ - `optim`: adamw_torch
451
+ - `optim_args`: None
452
+ - `adafactor`: False
453
+ - `group_by_length`: False
454
+ - `length_column_name`: length
455
+ - `ddp_find_unused_parameters`: None
456
+ - `ddp_bucket_cap_mb`: None
457
+ - `ddp_broadcast_buffers`: False
458
+ - `dataloader_pin_memory`: True
459
+ - `dataloader_persistent_workers`: False
460
+ - `skip_memory_metrics`: True
461
+ - `use_legacy_prediction_loop`: False
462
+ - `push_to_hub`: False
463
+ - `resume_from_checkpoint`: None
464
+ - `hub_model_id`: None
465
+ - `hub_strategy`: every_save
466
+ - `hub_private_repo`: False
467
+ - `hub_always_push`: False
468
+ - `gradient_checkpointing`: False
469
+ - `gradient_checkpointing_kwargs`: None
470
+ - `include_inputs_for_metrics`: False
471
+ - `eval_do_concat_batches`: True
472
+ - `fp16_backend`: auto
473
+ - `push_to_hub_model_id`: None
474
+ - `push_to_hub_organization`: None
475
+ - `mp_parameters`:
476
+ - `auto_find_batch_size`: False
477
+ - `full_determinism`: False
478
+ - `torchdynamo`: None
479
+ - `ray_scope`: last
480
+ - `ddp_timeout`: 1800
481
+ - `torch_compile`: False
482
+ - `torch_compile_backend`: None
483
+ - `torch_compile_mode`: None
484
+ - `dispatch_batches`: None
485
+ - `split_batches`: None
486
+ - `include_tokens_per_second`: False
487
+ - `include_num_input_tokens_seen`: False
488
+ - `neftune_noise_alpha`: None
489
+ - `optim_target_modules`: None
490
+ - `batch_eval_metrics`: False
491
+ - `eval_on_start`: False
492
+ - `batch_sampler`: no_duplicates
493
+ - `multi_dataset_batch_sampler`: proportional
494
+
495
+ </details>
496
+
497
+ ### Training Logs
498
+ <details><summary>Click to expand</summary>
499
+
500
+ | Epoch | Step | Training Loss | loss | sts-dev_spearman_cosine |
501
+ |:------:|:----:|:-------------:|:------:|:-----------------------:|
502
+ | 0 | 0 | - | - | 0.8229 |
503
+ | 0.0178 | 10 | 0.0545 | - | - |
504
+ | 0.0355 | 20 | 0.0556 | - | - |
505
+ | 0.0533 | 30 | 0.0502 | - | - |
506
+ | 0.0710 | 40 | 0.0497 | - | - |
507
+ | 0.0888 | 50 | 0.0413 | - | - |
508
+ | 0.1066 | 60 | 0.0334 | - | - |
509
+ | 0.1243 | 70 | 0.0238 | - | - |
510
+ | 0.1421 | 80 | 0.0206 | - | - |
511
+ | 0.1599 | 90 | 0.0167 | - | - |
512
+ | 0.1776 | 100 | 0.0146 | 0.0725 | 0.8788 |
513
+ | 0.1954 | 110 | 0.0127 | - | - |
514
+ | 0.2131 | 120 | 0.0125 | - | - |
515
+ | 0.2309 | 130 | 0.0115 | - | - |
516
+ | 0.2487 | 140 | 0.0116 | - | - |
517
+ | 0.2664 | 150 | 0.0111 | - | - |
518
+ | 0.2842 | 160 | 0.0107 | - | - |
519
+ | 0.3020 | 170 | 0.0113 | - | - |
520
+ | 0.3197 | 180 | 0.0106 | - | - |
521
+ | 0.3375 | 190 | 0.0099 | - | - |
522
+ | 0.3552 | 200 | 0.0092 | 0.0207 | 0.8856 |
523
+ | 0.3730 | 210 | 0.0097 | - | - |
524
+ | 0.3908 | 220 | 0.0099 | - | - |
525
+ | 0.4085 | 230 | 0.0087 | - | - |
526
+ | 0.4263 | 240 | 0.0087 | - | - |
527
+ | 0.4440 | 250 | 0.0082 | - | - |
528
+ | 0.4618 | 260 | 0.0083 | - | - |
529
+ | 0.4796 | 270 | 0.0089 | - | - |
530
+ | 0.4973 | 280 | 0.0082 | - | - |
531
+ | 0.5151 | 290 | 0.0078 | - | - |
532
+ | 0.5329 | 300 | 0.0081 | 0.0078 | 0.8891 |
533
+ | 0.5506 | 310 | 0.0081 | - | - |
534
+ | 0.5684 | 320 | 0.0072 | - | - |
535
+ | 0.5861 | 330 | 0.0084 | - | - |
536
+ | 0.6039 | 340 | 0.0083 | - | - |
537
+ | 0.6217 | 350 | 0.0078 | - | - |
538
+ | 0.6394 | 360 | 0.0077 | - | - |
539
+ | 0.6572 | 370 | 0.008 | - | - |
540
+ | 0.6750 | 380 | 0.0073 | - | - |
541
+ | 0.6927 | 390 | 0.008 | - | - |
542
+ | 0.7105 | 400 | 0.0073 | 0.0058 | 0.8890 |
543
+ | 0.7282 | 410 | 0.0075 | - | - |
544
+ | 0.7460 | 420 | 0.0077 | - | - |
545
+ | 0.7638 | 430 | 0.0074 | - | - |
546
+ | 0.7815 | 440 | 0.0073 | - | - |
547
+ | 0.7993 | 450 | 0.007 | - | - |
548
+ | 0.8171 | 460 | 0.0043 | - | - |
549
+ | 0.8348 | 470 | 0.0052 | - | - |
550
+ | 0.8526 | 480 | 0.0046 | - | - |
551
+ | 0.8703 | 490 | 0.0073 | - | - |
552
+ | 0.8881 | 500 | 0.0056 | 0.0069 | 0.8922 |
553
+ | 0.9059 | 510 | 0.0059 | - | - |
554
+ | 0.9236 | 520 | 0.0045 | - | - |
555
+ | 0.9414 | 530 | 0.0033 | - | - |
556
+ | 0.9591 | 540 | 0.0058 | - | - |
557
+ | 0.9769 | 550 | 0.0056 | - | - |
558
+ | 0.9947 | 560 | 0.0046 | - | - |
559
+ | 1.0124 | 570 | 0.003 | - | - |
560
+ | 1.0302 | 580 | 0.0039 | - | - |
561
+ | 1.0480 | 590 | 0.0032 | - | - |
562
+ | 1.0657 | 600 | 0.0031 | 0.0029 | 0.8931 |
563
+ | 1.0835 | 610 | 0.0046 | - | - |
564
+ | 1.1012 | 620 | 0.003 | - | - |
565
+ | 1.1190 | 630 | 0.0021 | - | - |
566
+ | 1.1368 | 640 | 0.0031 | - | - |
567
+ | 1.1545 | 650 | 0.0035 | - | - |
568
+ | 1.1723 | 660 | 0.0033 | - | - |
569
+ | 1.1901 | 670 | 0.0024 | - | - |
570
+ | 1.2078 | 680 | 0.0012 | - | - |
571
+ | 1.2256 | 690 | 0.0075 | - | - |
572
+ | 1.2433 | 700 | 0.0028 | 0.0036 | 0.8945 |
573
+ | 1.2611 | 710 | 0.0033 | - | - |
574
+ | 1.2789 | 720 | 0.0023 | - | - |
575
+ | 1.2966 | 730 | 0.0034 | - | - |
576
+ | 1.3144 | 740 | 0.0018 | - | - |
577
+ | 1.3321 | 750 | 0.0016 | - | - |
578
+ | 1.3499 | 760 | 0.0025 | - | - |
579
+ | 1.3677 | 770 | 0.002 | - | - |
580
+ | 1.3854 | 780 | 0.0016 | - | - |
581
+ | 1.4032 | 790 | 0.0018 | - | - |
582
+ | 1.4210 | 800 | 0.003 | 0.0027 | 0.8944 |
583
+ | 1.4387 | 810 | 0.0018 | - | - |
584
+ | 1.4565 | 820 | 0.0008 | - | - |
585
+ | 1.4742 | 830 | 0.0014 | - | - |
586
+ | 1.4920 | 840 | 0.0025 | - | - |
587
+ | 1.5098 | 850 | 0.0026 | - | - |
588
+ | 1.5275 | 860 | 0.0012 | - | - |
589
+ | 1.5453 | 870 | 0.001 | - | - |
590
+ | 1.5631 | 880 | 0.001 | - | - |
591
+ | 1.5808 | 890 | 0.0012 | - | - |
592
+ | 1.5986 | 900 | 0.0021 | 0.0021 | 0.8952 |
593
+ | 1.6163 | 910 | 0.0016 | - | - |
594
+ | 1.6341 | 920 | 0.0008 | - | - |
595
+ | 1.6519 | 930 | 0.0008 | - | - |
596
+ | 1.6696 | 940 | 0.0009 | - | - |
597
+ | 1.6874 | 950 | 0.0004 | - | - |
598
+ | 1.7052 | 960 | 0.0003 | - | - |
599
+ | 1.7229 | 970 | 0.0007 | - | - |
600
+ | 1.7407 | 980 | 0.0007 | - | - |
601
+ | 1.7584 | 990 | 0.0011 | - | - |
602
+ | 1.7762 | 1000 | 0.0007 | 0.0029 | 0.8952 |
603
+ | 1.7940 | 1010 | 0.0008 | - | - |
604
+ | 1.8117 | 1020 | 0.001 | - | - |
605
+ | 1.8295 | 1030 | 0.0006 | - | - |
606
+ | 1.8472 | 1040 | 0.0006 | - | - |
607
+ | 1.8650 | 1050 | 0.0015 | - | - |
608
+ | 1.8828 | 1060 | 0.0009 | - | - |
609
+ | 1.9005 | 1070 | 0.0005 | - | - |
610
+ | 1.9183 | 1080 | 0.0006 | - | - |
611
+ | 1.9361 | 1090 | 0.0021 | - | - |
612
+ | 1.9538 | 1100 | 0.0009 | 0.0023 | 0.8943 |
613
+ | 1.9716 | 1110 | 0.0007 | - | - |
614
+ | 1.9893 | 1120 | 0.0003 | - | - |
615
+
616
+ </details>
617
+
618
+ ### Framework Versions
619
+ - Python: 3.10.12
620
+ - Sentence Transformers: 3.0.1
621
+ - Transformers: 4.42.3
622
+ - PyTorch: 2.2.0+cu121
623
+ - Accelerate: 0.31.0
624
+ - Datasets: 2.20.0
625
+ - Tokenizers: 0.19.1
626
+
627
+ ## Citation
628
+
629
+ ### BibTeX
630
+
631
+ #### Sentence Transformers
632
+ ```bibtex
633
+ @inproceedings{reimers-2019-sentence-bert,
634
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
635
+ author = "Reimers, Nils and Gurevych, Iryna",
636
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
637
+ month = "11",
638
+ year = "2019",
639
+ publisher = "Association for Computational Linguistics",
640
+ url = "https://arxiv.org/abs/1908.10084",
641
+ }
642
+ ```
643
+
644
+ <!--
645
+ ## Glossary
646
+
647
+ *Clearly define terms in order to be accessible across audiences.*
648
+ -->
649
+
650
+ <!--
651
+ ## Model Card Authors
652
+
653
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
654
+ -->
655
+
656
+ <!--
657
+ ## Model Card Contact
658
+
659
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
660
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "dbourget/pb-ds1-48K",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 3072,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.42.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 30522
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.3",
5
+ "pytorch": "2.2.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67b293c7b2ed5f03a36d59234cdbfdad0b581626ccabb57c994acf16b4bf57dd
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "[PAD]",
4
+ "[UNK]",
5
+ "[CLS]",
6
+ "[SEP]",
7
+ "[MASK]"
8
+ ],
9
+ "cls_token": {
10
+ "content": "[CLS]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "mask_token": {
17
+ "content": "[MASK]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "pad_token": {
24
+ "content": "[PAD]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "sep_token": {
31
+ "content": "[SEP]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "unk_token": {
38
+ "content": "[UNK]",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ }
44
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "additional_special_tokens": [
45
+ "[PAD]",
46
+ "[UNK]",
47
+ "[CLS]",
48
+ "[SEP]",
49
+ "[MASK]"
50
+ ],
51
+ "clean_up_tokenization_spaces": true,
52
+ "cls_token": "[CLS]",
53
+ "mask_token": "[MASK]",
54
+ "model_max_length": 512,
55
+ "pad_token": "[PAD]",
56
+ "sep_token": "[SEP]",
57
+ "tokenizer_class": "PreTrainedTokenizerFast",
58
+ "unk_token": "[UNK]"
59
+ }