--- tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:246166 - loss:MultipleNegativesRankingLoss base_model: pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ widget: - source_sentence: Does innate defense regulator peptide 1018 protect against perinatal brain injury? sentences: - pFR-Z is a specific gene therapy vehicle for EBV-positive carcinomas. - There is emerging evidence that the VEGF system can play either a beneficial or a detrimental role depending on the specific pathologic situations. Therefore, modulating the renal VEGF axis by using an SRL-based regimen may influence the evolution of kidney injury associated with renal transplantation. - IDR-1018 suppresses proinflammatory mediators and cell injurious mechanisms in the developing brain, and postinsult treatment is efficacious in reducing LPS-induced hypoxic-ischemic brain damage. IDR-1018 is effective in the brain when given systemically, confers neuroprotection of both gray and white matter, and lacks significant effects on the brain under normal conditions. Thus, this peptide provides the features of a promising neuroprotective agent in newborns with brain injury. - source_sentence: Does decitabine induce G2/M cell cycle arrest by suppressing p38/NF-κB signaling in human renal clear cell carcinoma? sentences: - We investigated gene expression profiling and pathways modulated by decitabine in RCC cells. Decitabine was shown to suppress the growth of RCC cells via G2/M cell cycle arrest and the p38-NF-κB signaling pathway may play a role in the anti-neoplastic effect of decitabine in RCC cells. - The CST and the MIST are safe and can be used as an alternative test to assess functional capacity in patients hospitalized for acute lung diseases. The worse the performance on the step tests, the lower the pulmonary function and the distance walked on the 6MWT, the greater the dyspnea, and the longer the hospitalization. - These findings implicate ENT1 in liver protection from ischemia and reperfusion injury and suggest ENT inhibitors may be of benefit in the prevention or treatment of ischemic liver injury. - source_sentence: Does demineralized Bone Matrix Injection in Consolidation Phase enhance Bone Regeneration in Distraction Osteogenesis via Endochondral Bone Formation? sentences: - In 10 consecutive patients with CECS, a 6-week forefoot strike running intervention led to decreased postrunning lower leg intracompartmental pressures. Pain and disability typically associated with CECS were greatly reduced for up to 1 year after intervention. Surgical intervention was avoided for all patients. - DBM administration into the distraction gap at the end of the distraction period resulted in a significantly greater regenerate bone area, trabecular number, and cortical thickness in the rabbit tibial DO model. These data suggest that percutaneous DBM administration at the end of the distraction period or in the early consolidation period may stimulate regenerate bone formation and consolidation in a clinical situation with delayed bone healing during DO. - The chemical reaction of DPGP and dentin indicated that DPGP combined with CO2 laser is a potential regimen for the treatment of vertical root fracture. - source_sentence: Does hDAC inhibitor MS-275 attenuate the inflammatory reaction in rat experimental autoimmune prostatitis? sentences: - In summary, our data demonstrated that MS-275 could effectively suppress inflammatory reaction in EAP, through suppressing immune cells and pro-inflammatory molecules, and inducing anti-inflammatory immune cells and molecules, which may suggest MS-275 as a potential candidate for treatment of inflammatory prostatitis. - The quality and completeness of reporting should be enhanced as a priority, because without this policymakers and practitioners will continue lack the evidence base they need to inform decision-making about health inequity. Furthermore, there is a need to develop methods to systematically consider impacts on equity in health status that is currently lacking in systematic reviews. - No HIV stage dominated the epidemical though the latent stage provided the largest contribution. The role of each stage depends on the phase of the epidemic and on the prevailing levels of sexual risk behavior in the populations in which HIV is spreading. These findings may influence the design and implementation of different HIV interventions. - source_sentence: Are circulating microparticles elevated in carriers of factor V Leiden? sentences: - Isovolemic hemodilution (approximately 5% hematocrit) with albumin, pentastarch, or hetastarch solutions does not result in significant hepatic ischemia or injury assessed by histology. - Serial electrocardiogram recordings and troponin I assessments may be proposed for initial screening in high-risk trauma patients to detect anatomical cardiac injuries through the time course of circulating protein. Troponin I release does not have a prognosis value in trauma patients. - This is the first study on circulating MP levels in subjects who are heterozygote for factor V Leiden. We report that circulating platelet and leukocyte MP are elevated in carriers of this mutation and may be important contributors to risk of thrombosis. pipeline_tag: sentence-similarity library_name: sentence-transformers --- # SentenceTransformer based on pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ](https://huggingface.co/pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ](https://huggingface.co/pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ) - **Maximum Sequence Length:** 1024 tokens - **Output Dimensionality:** 384 dimensions - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: BertModel (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("pankajrajdeo/UMLS-Pubmed-ST-TCE-Epoch-1-QA_10K-BioASQ-PQA") # Run inference sentences = [ 'Are circulating microparticles elevated in carriers of factor V Leiden?', 'This is the first study on circulating MP levels in subjects who are heterozygote for factor V Leiden. We report that circulating platelet and leukocyte MP are elevated in carriers of this mutation and may be important contributors to risk of thrombosis.', 'Isovolemic hemodilution (approximately 5% hematocrit) with albumin, pentastarch, or hetastarch solutions does not result in significant hepatic ischemia or injury assessed by histology.', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 384] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Training Dataset #### Unnamed Dataset * Size: 246,166 training samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:---------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Survival of women with gestational trophoblastic neoplasia and liver metastases: is it improving? | The prognosis of patients with liver metastases from GTN has improved. Outcome may be best in those patients presenting within 2.8 years of the causative pregnancy and without very large volumes of disease. | | Do serum nitrites predict the response to prostaglandin-induced delivery at term? | A reduced level of NOx is associated with a prompt clinical response to PGE-induced labor. Provided we do not know the origin of NOx in the general circulation, these data indicate NOx levels as predictors of the response to PGE-induced delivery at term and support the hypothesis that labor onset is modulated by the endogenous NO activity. | | Is sleep deprivation an additional stress for parents staying in hospital? | Parental sleep deprivation needs to be acknowledged and accommodated when nurses and parents negotiate the care of children in hospital. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Evaluation Dataset #### Unnamed Dataset * Size: 27,352 evaluation samples * Columns: anchor and positive * Approximate statistics based on the first 1000 samples: | | anchor | positive | |:--------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------| | type | string | string | | details | | | * Samples: | anchor | positive | |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | Is dEAD-box protein p68 regulated by β-catenin/transcription factor 4 to maintain a positive feedback loop in control of breast cancer progression? | Our findings indicate that Wnt/β-catenin signaling plays an important role in breast cancer progression through p68 upregulation. | | Are obstetric medical emergency teams a step forward in maternal safety? | In the literature, there is a lack of reporting and probably of implementation of Obstetrics METs. Therefore, there is a need for more standardized experiences and reports on the implementation of various types of Obstetrics METs. We propose here a design for Obstetrics METs to be implemented in developing countries, aiming to reduce maternal mortality and morbidity resulting from obstetric hemorrhage. | | Is monocyte-Induced Prostate Cancer Cell Invasion Mediated by Chemokine ligand 2 and Nuclear Factor-κB Activity? | Co-cultures with monocyte-lineage cell lines stimulated increased prostate cancer cell invasion through increased CCL2 expression and increased prostate cancer cell NF-κB activity. CCL2 and NF-κB may be useful therapeutic targets to interfere with inflammation-induced prostate cancer invasion. | * Loss: [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters: ```json { "scale": 20.0, "similarity_fct": "cos_sim" } ``` ### Training Hyperparameters #### Non-Default Hyperparameters - `eval_strategy`: epoch - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `gradient_accumulation_steps`: 4 - `learning_rate`: 2e-05 - `weight_decay`: 0.01 - `num_train_epochs`: 1 - `warmup_ratio`: 0.1 - `fp16`: True - `load_best_model_at_end`: True - `push_to_hub`: True - `resume_from_checkpoint`: True #### All Hyperparameters
Click to expand - `overwrite_output_dir`: False - `do_predict`: False - `eval_strategy`: epoch - `prediction_loss_only`: True - `per_device_train_batch_size`: 16 - `per_device_eval_batch_size`: 16 - `per_gpu_train_batch_size`: None - `per_gpu_eval_batch_size`: None - `gradient_accumulation_steps`: 4 - `eval_accumulation_steps`: None - `torch_empty_cache_steps`: None - `learning_rate`: 2e-05 - `weight_decay`: 0.01 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `max_grad_norm`: 1.0 - `num_train_epochs`: 1 - `max_steps`: -1 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: {} - `warmup_ratio`: 0.1 - `warmup_steps`: 0 - `log_level`: passive - `log_level_replica`: warning - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `save_safetensors`: True - `save_on_each_node`: False - `save_only_model`: False - `restore_callback_states_from_checkpoint`: False - `no_cuda`: False - `use_cpu`: False - `use_mps_device`: False - `seed`: 42 - `data_seed`: None - `jit_mode_eval`: False - `use_ipex`: False - `bf16`: False - `fp16`: True - `fp16_opt_level`: O1 - `half_precision_backend`: auto - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `local_rank`: 0 - `ddp_backend`: None - `tpu_num_cores`: None - `tpu_metrics_debug`: False - `debug`: [] - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_prefetch_factor`: None - `past_index`: -1 - `disable_tqdm`: False - `remove_unused_columns`: True - `label_names`: None - `load_best_model_at_end`: True - `ignore_data_skip`: False - `fsdp`: [] - `fsdp_min_num_params`: 0 - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `fsdp_transformer_layer_cls_to_wrap`: None - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `deepspeed`: None - `label_smoothing_factor`: 0.0 - `optim`: adamw_torch - `optim_args`: None - `adafactor`: False - `group_by_length`: False - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `skip_memory_metrics`: True - `use_legacy_prediction_loop`: False - `push_to_hub`: True - `resume_from_checkpoint`: True - `hub_model_id`: None - `hub_strategy`: every_save - `hub_private_repo`: False - `hub_always_push`: False - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `include_inputs_for_metrics`: False - `include_for_metrics`: [] - `eval_do_concat_batches`: True - `fp16_backend`: auto - `push_to_hub_model_id`: None - `push_to_hub_organization`: None - `mp_parameters`: - `auto_find_batch_size`: False - `full_determinism`: False - `torchdynamo`: None - `ray_scope`: last - `ddp_timeout`: 1800 - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `dispatch_batches`: None - `split_batches`: None - `include_tokens_per_second`: False - `include_num_input_tokens_seen`: False - `neftune_noise_alpha`: None - `optim_target_modules`: None - `batch_eval_metrics`: False - `eval_on_start`: False - `use_liger_kernel`: False - `eval_use_gather_object`: False - `average_tokens_across_devices`: False - `prompts`: None - `batch_sampler`: batch_sampler - `multi_dataset_batch_sampler`: proportional
### Training Logs | Epoch | Step | Training Loss | Validation Loss | |:------:|:----:|:-------------:|:---------------:| | 0.0260 | 100 | 0.0201 | - | | 0.0520 | 200 | 0.0162 | - | | 0.0780 | 300 | 0.0118 | - | | 0.1040 | 400 | 0.0112 | - | | 0.1300 | 500 | 0.0102 | - | | 0.1560 | 600 | 0.0101 | - | | 0.1820 | 700 | 0.0121 | - | | 0.2080 | 800 | 0.0127 | - | | 0.2340 | 900 | 0.008 | - | | 0.2600 | 1000 | 0.0086 | - | | 0.2860 | 1100 | 0.0073 | - | | 0.3120 | 1200 | 0.0113 | - | | 0.3380 | 1300 | 0.0084 | - | | 0.3640 | 1400 | 0.0079 | - | | 0.3900 | 1500 | 0.0073 | - | | 0.4160 | 1600 | 0.0048 | - | | 0.4420 | 1700 | 0.0088 | - | | 0.4680 | 1800 | 0.0077 | - | | 0.4940 | 1900 | 0.0076 | - | | 0.5200 | 2000 | 0.0064 | - | | 0.5460 | 2100 | 0.0074 | - | | 0.5719 | 2200 | 0.0079 | - | | 0.5979 | 2300 | 0.0079 | - | | 0.6239 | 2400 | 0.0078 | - | | 0.6499 | 2500 | 0.0073 | - | | 0.6759 | 2600 | 0.0076 | - | | 0.7019 | 2700 | 0.0085 | - | | 0.7279 | 2800 | 0.0081 | - | | 0.7539 | 2900 | 0.0083 | - | | 0.7799 | 3000 | 0.0055 | - | | 0.8059 | 3100 | 0.0068 | - | | 0.8319 | 3200 | 0.007 | - | | 0.8579 | 3300 | 0.0087 | - | | 0.8839 | 3400 | 0.007 | - | | 0.9099 | 3500 | 0.0068 | - | | 0.9359 | 3600 | 0.0069 | - | | 0.9619 | 3700 | 0.0109 | - | | 0.9879 | 3800 | 0.0053 | - | | 0.9999 | 3846 | - | 0.0062 | ### Framework Versions - Python: 3.10.12 - Sentence Transformers: 3.3.1 - Transformers: 4.46.2 - PyTorch: 2.5.1+cu121 - Accelerate: 1.1.1 - Datasets: 3.1.0 - Tokenizers: 0.20.3 ## Citation ### BibTeX #### Sentence Transformers ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ``` #### MultipleNegativesRankingLoss ```bibtex @misc{henderson2017efficient, title={Efficient Natural Language Response Suggestion for Smart Reply}, author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, year={2017}, eprint={1705.00652}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```