SentenceTransformer based on answerdotai/ModernBERT-large

This is a sentence-transformers model finetuned from answerdotai/ModernBERT-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: answerdotai/ModernBERT-large
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("m7n/discipline-bert-modern-large_01")
# Run inference
sentences = [
    'normal subjects, residents of the Ural region, were examined by a dichromatic bone densitometer of "GE/Lunar" firm (USA). After that they were divided according to their somatotype: normosthenics, hypersthenics and asthenics. Age-related groups in girls were formed from the age of years, in youths from years, up to years every other year, after years every years up to the age of years. The somatotype has been revealed to influence the mineral density (MD) of skeleton, the mass of muscular, connective and fatty tissues: MD in girls has been formed at the age of years, in youths at that of years. In normosthenics and asthenics MD at the same age was % and %, respectively. At the age of years MD in women with hypersthenia was % less than peak bone mass, in those with normosthenia it was % less and in women with asthenia % less. In men these measurements were and %, respectively.',
    'x-ray images of patients with posttraumatic defects of forearm bones have been analyzed using DiaMorph computer-assisted complex. Mean optical density of regenerated bone shadows has been evaluated for the purpose of studying the dynamics of osteogenesis and mineralization of newly formed bone tissue during osteosynthesis. By planimetry of distraction regenerated bones it was established that osteogenesis developed by normoplastic type. Typical distraction regenerated bones were formed while filling defect-diastases; the regenerated bones lost their zonal structure at the end of fixation period. During formation of wedge-shaped regenerated bones clear zonal structure of newly formed tissue was not traced, the area of interlayer occupied significantly less part than it was in case of filling the defects of forearm bones by fragment lengthening and formation of typical distraction regenerated bone.',
    'Early adult changes in the facial profile were studied longitudinally from to years of age in a Swedish Caucasian sample of female and male dental students. Lateral cephalometric radiographs were analysed by the conventional point-based method and by the structure-based method of superimposing serial films, adapted for computerized numerical analysis. Skeletal and soft tissue changes were described by linear and angular variables. The magnitude of linear dimensional changes was similar in the two sexes. The largest changes were found in the vertical dimensions. Total anterior facial height increased by about mm in the -year period, suggesting that the major part of the increase in vertical facial dimensions during the third decade of life takes place in the first half of this decade. Sagittal jaw relationship increased by about in both sexes. Soft tissue changes reflected those of the vertical skeletal dimensions.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Datasets: modernBERT and modernBERT_disciplines
Evaluated with TripletEvaluator

Metric	modernBERT	modernBERT_disciplines
cosine_accuracy	0.6726	0.6756

Training Details

Training Dataset

Unnamed Dataset

Size: 7,828 training samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 1000 samples:

	anchor	positive	negative
type	string	string	string
details	min: 82 tokens mean: 236.35 tokens max: 620 tokens	min: 82 tokens mean: 237.05 tokens max: 663 tokens	min: 82 tokens mean: 247.65 tokens max: 653 tokens

Samples:

anchor	positive	negative
Implementing management systems in organisations of all types and sizes often raises the following question: "What benefits will this bring?" Initial resistance and criticism are common as potential challenges are identified during the implementation process. To address this, it is essential to highlight the advantages of these systems and engage stakeholders in supporting management efforts. While the planning, implementation, use, maintenance, auditing, and improvement of management systems are generally voluntary, certification is frequently driven by external factors, particularly customer demands. Employees also stand to gain significantly, with knowledge and information serving as valuable resources, especially for leveraging artificial intelligence. This article explores the management's readiness to adopt and fully utilise two management systems based on international standards: the ISO Knowledge management system (KMS) and the ISO/IEC Artificial intelligence management system ...	Machine learning is an established and frequently used technique in industry and academia, but a standard process model to improve success and efficiency of machine learning applications is still missing. Project organizations and machine learning practitioners face manifold challenges and risks when developing machine learning applications and have a need for guidance to meet business expectations. This paper therefore proposes a process model for the development of machine learning applications, covering six phases from defining the scope to maintaining the deployed machine learning application. Business and data understanding are executed simultaneously in the first phase, as both have considerable impact on the feasibility of the project. The next phases are comprised of data preparation, modeling, evaluation, and deployment. Special focus is applied to the last phase, as a model running in changing real-time environments requires close monitoring and maintenance to reduce the risk...	This study aims to obtain user satisfaction factors for a knowledge management system so that a questionnaire can be made for evaluation or measurement. The SECI method is used with the CISE sequence which consists of four knowledge creation steps, namely C-combination, I-nternalization, S-socialization, and ending E-externalization. The stage begins with literature studies and then modifications are made with the selection, addition, and incorporation of existing models. From understanding and analyzing several models, discussions or brainstorming with colleagues were then carried out so that a final model was obtained to compile a list of keywords and statements as a questionnaire based on indicators related to knowledge management and the satisfaction of knowledge management system users. The results obtained there are eight user satisfaction factors divided into technical aspects (knowledge quality, knowledge sharing, system quality, service quality) and social aspects (management ...
This study examines the effect of alloying elements of Ni and W on the repassivation properties of stainless steel (SS) as evaluated by a rapid scratching electrode technique and stress corrosion cracking (SCC) test. The SS specimens were grouped into two different grades according to Ni content (00Cr-0Ni duplex, Type 000LMN [UNS S00000] austenitic SS). Major considerations regarding alloy design were Ni content and the substitution of W for Mo. However, a similar pitting resistance equivalent number (PREN) of to was maintained for all specimens. The main factors for evaluation of repassivation properties are the peak current for the scratched surface and repassivation rate. In M magnesium chloride (MgCl0) and N sulfuric acid containing chloride ions (H0SO0 + % Cl) solution, repassivation test results showed that repassivation properties decreased as Ni content increased. However, W substitution was effective on the repassivation process and increased the resistance of SCC property for...	Abstract High-nitrogen (N) stainless steels (SS) are receiving increased attention because of their strength advantages over carbon (C)-alloyed materials, but they have been found susceptible to dichromium nitride (Cr0N) precipitation during thermal exposure between 000C and ,000C. Sensitization susceptibility of a high-N, low-C austenitic SS by Cr0N precipitation at 000C and 000C was determined using the single-loop electrochemical potentiokinetic reactivation (EPR) test. High-N SS was found susceptible to sensitization caused by grain boundary (GB) precipitation of Cr0N, with the degree of sensitization increasing systematically with aging time at 000C. Sensitization of high-N materials did not require the concomitant precipitation of chromium (Cr)-rich metal carbide (M00C0). Materials aged at 000C were not sensitized, although the rate of precipitation was greater than at 000C. This indicated the minimum Cr level in the Cr-depleted zone of the matrix associated with nitride precipit...	The anodic dissolution characteristics of nickel, molybdenum, and stainless steel have been examined in pure and eutectic melt. Molybdenum and nickel show Tafeltype dissolution kinetics in pure eutectic which permit estimates of longterm corrosion rates as a function of voltage. Nickel exhibits a sharp threshold potential for dissolution in melt, forming a nonpassivating layer. Comparative voltammetry and opencircuit potential measurements with iron in this melt suggest that care may be required in using nickel as an iron sulfide current collector. The anodic dissolution of stainless steel in melt appears to be rate limited by diffusion through a reaction layer, showing a dependence that may be applicable to longterm corrosion predictions. Dissolution is strongly inhibited by dissolved , apparently by formation of a protective anodic oxide layer. Molybdenum appears to owe its excellent anodic corrosion resistance in melt both to a chemically formed prepassive film and to a welldefined ...
FY-0E WindRAD (Fengyun-0E Wind Radar) is a dual-frequency rotating fan-beam scatterometer. Its data characteristics, NOC (NWP Ocean Calibration), and wind retrieval performance are investigated in this paper. The diversity of the radar view geometry varies across the swaths, with maximum diversity in the sweet swaths and limited diversity in the outer and nadir swaths. When NOC backscatter calibration coefficients are computed as a function of incidence angle only (NOCint), a smooth correction is found. However, when relative antenna azimuth angle is included (NOCant), it appears that the corrections as a function of relative azimuth angle vary harmonically and substantially for a specific incidence angle. NOCant corrections yield a better fit of the measurements to the GMF (Geophysical Model Function). Hence, NOCant is applied for the analysis of wind retrieval from the Ku-band and C-band. An extra engineering correction of dB and dB is applied on Ku-band and C-band backscatter values...	Spaceborne synthetic aperture radar (SAR) represents a powerful source of data for enhancing maritime domain awareness (MDA). Wakes generated by traveling vessels hold a crucial role in MDA since they can be exploited both for ship route and velocity estimation and as a marker of ship presence. Even if deep learning (DL) has led to an impressive performance boost on a variety of computer vision tasks, its usage for automatic target recognition (ATR) in SAR images to support MDA is still limited to the detection of ships rather than ship wakes. A dataset is presented in this paper and several state-of-the-art object detectors based on convolutional neural networks (CNNs) are tested with different backbones. The dataset, including more than wake chips, is realized by visually inspecting Sentinel- images over highly trafficked maritime sites. Extensive experiments are shown to characterize CNNs for the wake detection task. For the first time, a deep-learning approach is implemented to spe...	With the publication of Part Wind Actions of the South African Loading Code SANS : , several issues concerning adjustments from the reference standard Eurocode EN - - : could not be resolved due to lack of sufficient updated background information on South African conditions. The need for updating the map for the free field wind speed is related also to the improved representation of the mixed and complex strong wind climate of the country. Furthermore, strong wind probability models are used for the reliability assessment and calibration of wind design procedures. Updating of the reliability provisions for the revised wind loading process was a further need identified at the time. This paper provides a review of the historical development of the representation of the free field wind, used as input to design wind loading procedures for South Africa. The review considers: (i) the historical representations of the geographic distribution of free field wind, (ii) the climatic influences c...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 0.05
}

Evaluation Dataset

Unnamed Dataset

Size: 391 evaluation samples
Columns: anchor, positive, and negative

Approximate statistics based on the first 391 samples:

	anchor	positive	negative
type	string	string	string
details	min: 85 tokens mean: 239.86 tokens max: 592 tokens	min: 87 tokens mean: 229.31 tokens max: 542 tokens	min: 93 tokens mean: 239.99 tokens max: 592 tokens

Samples:

anchor	positive	negative
Industrial Relations: A Journal of Economy and SocietyVolume , Issue p. - Internet Resources Selected by the Institute for Research on Labor and Employment Library University of California, Berkeley TERENCE K. HUWE, TERENCE K. HUWE Director of Library & Information ResourcesSearch for more papers by this authorJANICE KIMBALL, JANICE KIMBALL Library AssistantSearch for more papers by this author TERENCE K. HUWE, TERENCE K. HUWE Director of Library & Information ResourcesSearch for more papers by this authorJANICE KIMBALL, JANICE KIMBALL Library AssistantSearch for more papers by this author First published: April the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a ...	Industrial Relations: A Journal of Economy and SocietyVolume , Issue p. - Recent Publications Selected by the Institute for Research on Labor and Employment Library University of California, Berkeley Terence K. Huwe, Terence K. Huwe Director of Library & Information ResourcesSearch for more papers by this authorJanice Kimball, Janice Kimball Library AssistantSearch for more papers by this author Terence K. Huwe, Terence K. Huwe Director of Library & Information ResourcesSearch for more papers by this authorJanice Kimball, Janice Kimball Library AssistantSearch for more papers by this author First published: September the full textAboutPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to sha...	This paper suggests that all the models of industrial relations, not just the more statist ones, have been characterized throughout their history by complex and sometimes troublesome relationships with the state. These models have always been conditioned, and in certain sense shaped, by the latter's more or less direct intervention at the moment of their formation and as they have expanded or declined. An intervention which is also influenced by the nature of economic problems that national political economies have to cope with. Such difficulties of relationship are to a large extent due to the fact that political regulation and regulation through industrial relations only partially overlap in their goals and contents. More frequently they compete with each other and have methods and logics that tend to diverge. Whereas decisions are taken by majority principle in the political sphere, in industrial relations they can only be taken unamimously - and especially so in collective bargaini...
Poor response rates to follow-up questionnaires can adversely affect the progress of a randomised controlled trial and the validity of its results. This embedded 'study within a trial' aimed to investigate the impact of including a pen with the postal -month questionnaire completed by the trial participants on the response rates to this questionnaire.This study was a two-armed randomised controlled trial nested in the Gentle Years Yoga (GYY) trial. Participants in the intervention group of the GYY trial were allocated : using simple randomisation to either receive a pen (intervention) or no pen with their -month questionnaire (control). The primary outcome was the proportion of participants sent a -month questionnaire who returned it. Secondary outcomes were time taken to return the questionnaire, proportion of participants sent a reminder to return the questionnaire, and completeness of the questionnaire. Binary outcomes were analysed using logistic regression, time to return by Cox P...	Background Poor response rates to follow-up questionnaires can adversely affect the progress of a randomised controlled trial and the validity of its results. This embedded 'study within a trial' aimed to investigate the impact of including a pen with the postal -month questionnaire completed by the trial participants on the response rates to this questionnaire. Methods This study was a two-armed randomised controlled trial nested in the Gentle Years Yoga (GYY) trial. Participants in the intervention group of the GYY trial were allocated : using simple randomisation to either receive a pen (intervention) or no pen with their -month questionnaire (control). The primary outcome was the proportion of participants sent a -month questionnaire who returned it. Secondary outcomes were time taken to return the questionnaire, proportion of participants sent a reminder to return the questionnaire, and completeness of the questionnaire. Binary outcomes were analysed using logistic regression, tim...	Patients' failure to adhere on tuberculosis (TB) treatment leads to drug resistance, relapse and death. Non-adherence to TB treatment is higher during continuation treatment phase. The study aimed to evaluate effectiveness of combined pill refilling and medication reminders on adherence to TB treatment.A two-arm randomised controlled trial on adult patients with TB was used during continuation treatment phase. In the first arm, in addition to usual care, participants will receive cellphone-based daily medication and weekly pill refilling reminders. In the control arm, participants will receive only usual care. The study will use a covariate adaptive randomisation technique to balance covariates during allocation. The primary outcome is patients' adherence to TB treatment and secondary outcomes are attendance to clinic and treatment outcomes. We apply intention to treat with generalised linear mixed model.Ethical approval was obtained from Institutional Review Board of University of Gon...
EthologyVolume , Issue p. i-i Front CoverFree Access A male Swainson's Spurfowl, Pternistis swainsonii, calling out a raucous 'krrrraaak-krrrraaak-krrrraaak' in the bushveld of Kruger National Park, South Africa. Photograph reproduced by permission of Emmanuel Do Linh San - First published: June ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat No abstract is available for this article. Volume000, Issue0July 0000Pages i-i RelatedInformation	EthologyVolume , Issue p. i-i Front CoverFree Access Breeding male Southern Masked-Weaver, Ploceus velatus, building a nest in Addo Elephant National Park, South Africa. Photograph reproduced by permission of Emmanuel Do Linh San First published: July ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinkedInRedditWechat No abstract is available for this article. Volume000, Issue0August 0000Pages i-i RelatedInformation	IbisVolume , Issue p. - Do male Chaffinches Fringilla coelebs copy song sequencing and bout length from their tutors? Katharina Riebel, Corresponding Author Katharina Riebel School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKBehavioural Biology, Institute of Evolutionary and Ecology Sciences, PO Box , RA Leiden, The Nederlands. Email: for more papers by this authorPeter J. B. Slater, Peter J. B. Slater School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKSearch for more papers by this author Katharina Riebel, Corresponding Author Katharina Riebel School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS, UKBehavioural Biology, Institute of Evolutionary and Ecology Sciences, PO Box , RA Leiden, The Nederlands. Email: for more papers by this authorPeter J. B. Slater, Peter J. B. Slater School of Environmental and Evolutionary Biology, University of St Andrews, File KY00 0TS...

Loss: TripletLoss with these parameters:

{
    "distance_metric": "TripletDistanceMetric.COSINE",
    "triplet_margin": 0.05
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
learning_rate: 1e-05
weight_decay: 0.01
warmup_ratio: 0.1
batch_sampler: no_duplicates

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 4
per_device_eval_batch_size: 4
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: no_duplicates
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	modernBERT_cosine_accuracy	modernBERT_disciplines_cosine_accuracy
0	0	-	-	0.4783	-
0.0511	100	0.0534	0.0495	0.5090	-
0.1022	200	0.0502	0.0474	0.5243	-
0.1533	300	0.0486	0.0465	0.5499	-
0.2044	400	0.0465	0.0457	0.5831	-
0.2555	500	0.0468	0.0467	0.5754	-
0.3066	600	0.0465	0.0444	0.6113	-
0.3577	700	0.0426	0.0467	0.5831	-
0.4088	800	0.0445	0.0454	0.5857	-
0.4599	900	0.0441	0.0441	0.6215	-
0.5110	1000	0.0432	0.0423	0.6189	-
0.5621	1100	0.0433	0.0417	0.6189	-
0.6132	1200	0.0395	0.0416	0.6240	-
0.6643	1300	0.0408	0.0403	0.6419	-
0.7154	1400	0.0414	0.0414	0.6445	-
0.7665	1500	0.044	0.0423	0.6343	-
0.8176	1600	0.0436	0.0418	0.6292	-
0.8687	1700	0.0392	0.0402	0.6624	-
0.9198	1800	0.039	0.0434	0.6419	-
0.9709	1900	0.0413	0.0439	0.5959	-
1.0220	2000	0.0396	0.0437	0.6087	-
1.0731	2100	0.0402	0.0414	0.6266	-
1.1242	2200	0.0402	0.0411	0.6496	-
1.1753	2300	0.0362	0.0415	0.6419	-
1.2264	2400	0.0371	0.0393	0.6496	-
1.2775	2500	0.0353	0.0396	0.6445	-
1.3286	2600	0.0322	0.0418	0.6496	-
1.3797	2700	0.0329	0.0412	0.6394	-
1.4308	2800	0.0311	0.0400	0.6445	-
1.4819	2900	0.0318	0.0385	0.6573	-
1.5330	3000	0.0306	0.0387	0.6726	-
1.5841	3100	0.0273	0.0387	0.6803	-
1.6352	3200	0.0285	0.0384	0.6803	-
1.6863	3300	0.0299	0.0375	0.6675	-
1.7374	3400	0.0304	0.0378	0.6522	-
1.7885	3500	0.03	0.0388	0.6496	-
1.8396	3600	0.028	0.0383	0.6803	-
1.8906	3700	0.0264	0.0380	0.6957	-
1.9417	3800	0.0275	0.0388	0.6573	-
1.9928	3900	0.0314	0.0378	0.6803	-
2.0439	4000	0.03	0.0388	0.6777	-
2.0950	4100	0.0308	0.0380	0.6752	-
2.1461	4200	0.0263	0.0382	0.6598	-
2.1972	4300	0.0215	0.0391	0.6573	-
2.2483	4400	0.017	0.0413	0.6471	-
2.2994	4500	0.0173	0.0398	0.6726	-
2.3505	4600	0.0183	0.0393	0.6752	-
2.4016	4700	0.0189	0.0399	0.6957	-
2.4527	4800	0.0123	0.0407	0.6803	-
2.5038	4900	0.0155	0.0405	0.6803	-
2.5549	5000	0.0108	0.0413	0.6726	-
2.6060	5100	0.0112	0.0416	0.6650	-
2.6571	5200	0.0134	0.0414	0.6777	-
2.7082	5300	0.0133	0.0406	0.6624	-
2.7593	5400	0.0109	0.0408	0.6701	-
2.8104	5500	0.0121	0.0408	0.6726	-
2.8615	5600	0.0124	0.0408	0.6752	-
2.9126	5700	0.012	0.0407	0.6752	-
2.9637	5800	0.0127	0.0406	0.6726	-
3.0	5871	-	-	-	0.6756

Framework Versions

Python: 3.10.12
Sentence Transformers: 3.3.1
Transformers: 4.48.0.dev0
PyTorch: 2.5.1+cu121
Accelerate: 1.2.1
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

m7n
/

discipline-bert-modern-large_01

SentenceTransformer based on answerdotai/ModernBERT-large

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Triplet

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

TripletLoss

Model tree for m7n/discipline-bert-modern-large_01

Evaluation results