SentenceTransformer based on Qwen/Qwen3-Embedding-0.6B

This is a sentence-transformers model finetuned from Qwen/Qwen3-Embedding-0.6B. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen3-Embedding-0.6B
  • Maximum Sequence Length: 256 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modality: Text

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}}, 'module_output_name': 'token_embeddings', 'architecture': 'Qwen3Model'})
  (1): Pooling({'embedding_dimension': 1024, 'pooling_mode': 'lasttoken', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("MinhPhuc0804/qwen3-embedding-docling-checkthat-task1-v1")
# Run inference
queries = [
    'query: Myocarditis-triggered Unexpected Demise following BNT162b2 mRNA COVID-19 Immunization in Korea: Case Study Concentrating on Tissue-level Findings Sangjoon Choi et al.',
]
documents = [
    'passage: title: Myocarditis-induced Sudden Death after BNT162b2 mRNA COVID-19 Vaccination in Korea: Case Report Focusing on Histopathological Findings\nabstract: We present autopsy findings of a 22-year-old man who developed chest pain 5 days after the first dose of the BNT162b2 mRNA vaccine and died 7 hours later. Histological examination of the heart revealed isolated atrial myocarditis, with neutrophil and histiocyte predominance. Immunohistochemical C4d staining revealed scattered single-cell necrosis of myocytes which was not accompanied by inflammatory infiltrates. Extensive contraction band necrosis was observed in the atria and ventricles. There was no evidence of microthrombosis or infection in the heart and other organs. The primary cause of death was determined to be myocarditis, causally-associated with the BNT162b2 vaccine.',
    'passage: atase (DUSP) expression in neurons in the disease process.\n\ntitle: Mitogen Activated Protein Kinase (MAPK) Activation, p53, and Autophagy Inhibition Characterize the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) Spike Protein Induced Neurotoxicity\nThe pathways induced by the spike protein via toll-like receptor activation induce both the upregulation of PrPC (the normal isoform of the prion protein, PrP) and the expression of β amyloid. Through the spike-protein-dependent elevation of p53 levels via β amyloid metabolism, increased PrPC expression can lead to PrP misfolding and impaired autophagy, generating prion disease. We conclude that, according to the age of the spike protein-exposed patient and the state of their cellular autophagy activity, excess sustained activity of p53 in neurons may be a catalytic factor in neurodegeneration. An autoimmune reaction via molecular mimicry likely also contributes to neurological symptoms. Overall results suggest that neurodegeneration is in part due to the intensity and duration of spike protein exposure, patient advanced age, cellular autophagy activity, and activation, function and regulation of p53.',
    'passage: more contacts (22.8%; 95% CI, 13.6%-33.5%).\n\ntitle: Household Transmission of SARS-CoV-2\n<h3>Conclusions and Relevance</h3> The findings of this study suggest that given that individuals with suspected or confirmed infections are being referred to isolate at home, households will continue to be a significant venue for transmission of SARS-CoV-2.',
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 1024] [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.8458, 0.1361, 0.0091]])

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.5595
cosine_accuracy@3 0.7668
cosine_accuracy@5 0.8234
cosine_accuracy@10 0.8712
cosine_precision@1 0.5595
cosine_precision@3 0.2556
cosine_precision@5 0.1647
cosine_precision@10 0.0871
cosine_recall@1 0.5595
cosine_recall@3 0.7668
cosine_recall@5 0.8234
cosine_recall@10 0.8712
cosine_ndcg@10 0.721
cosine_mrr@10 0.6723
cosine_map@100 0.676

Training Details

Training Dataset

Unnamed Dataset

  • Size: 17,319 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 19 tokens
    • mean: 54.81 tokens
    • max: 123 tokens
    • min: 21 tokens
    • mean: 180.82 tokens
    • max: 256 tokens
  • Samples:
    sentence_0 sentence_1
    query: 10) Sure enough, a slew of reports in numerous nations showing a rise in infections in children in the fresh surge in 2021, probably because of #B117 variant. passage: otherapy at Israel's Bar-Ilan University and member of the country's national covid-19 vaccine clinical trial advisory committee,

    title: Covid-19: More young children are being infected in Israel and Italy, emerging data suggest
    told The BMJ that his figures indicated that, since the emergence of the UK variant B1.1.7 in Israel in mid-December, the proportion of new daily cases accounted for by children aged under 10 had risen by nearly a quarter (23%).Cohen urged caution over schools reopening."Though I am convinced that education should be the first sector to open up because of its importance, it is my personal opinion that we should still reopen gradually .
    . .until we understand better the infection pattern of this new variant," he said.He said that no evidence yet showed the new variant to be more dangerous to children, but he noted that in January Israel had opened its first special covid-19 intensive care unit for children, admitting four or five children.
    query: CDC: Monkeypox isn’t airborne. And CDC: Wear a snug mask. Me: snug mask might only imply aerosols. Which are in the air. If not, you could wear a face shield for just droplets. Research on fomites- reveals that #MonkeypoxIsAirborne passage: " or ulcerated, to characteristically well-circumscribed and centrally umbilicated. Both patients had mild illness.

    title: High-Contact Object and Surface Contamination in a Household of Persons with Monkeypox Virus Infection — Utah, June 2022
    The time from symptom onset to resolution was approximately 30 days for patient A and approximately 22 days for patient B.
    query: The extra advantage of vaccination combined with Omicron infection for neutralizing antibodies versus infection alone: far reduced expected protection throughout all variants, Omicron itself included. passage: title: Neutralization profile of Omicron variant convalescent individuals
    abstract: Abstract Recently, the Omicron variant of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been described as immune escape variant. Here, we analyzed samples from BA.1 (Omicron) convalescent patients with different constellations of prior SARS-CoV-2 immunity regarding vaccination and previous infection with a non-Omicron variant and determined titers of neutralizing antibodies against different SARS-CoV-2 variants (D614G, Alpha, Beta, Delta, Gamma, Omicron). We found high neutralizing antibody titers against all variants for vaccinated individuals after BA.1 breakthrough infection or for individuals after infection with a pre-omicron variant followed by BA.1 infection. In contrast, samples from naive unvaccinated individuals after BA.1 infection mainly contained neutralizing antibodies against BA.1 but only occasionally against the other variants.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false,
        "directions": [
            "query_to_doc"
        ],
        "partition_mode": "joint",
        "hardness_mode": null,
        "hardness_strength": 0.0
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • num_train_epochs: 10
  • fp16: True
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss 10-percent-dev-split_cosine_ndcg@10
0.9225 500 0.3566 -
1.0 542 - 0.7010
1.8450 1000 0.194 -
2.0 1084 - 0.7106
2.7675 1500 0.0851 -
3.0 1626 - 0.7200
3.6900 2000 0.0447 -
4.0 2168 - 0.7184
4.6125 2500 0.0342 -
5.0 2710 - 0.7151
5.5351 3000 0.0254 -
6.0 3252 - 0.7113
6.4576 3500 0.0216 -
7.0 3794 - 0.7171
7.3801 4000 0.0189 -
8.0 4336 - 0.7221
8.3026 4500 0.0169 -
9.0 4878 - 0.7179
9.2251 5000 0.0167 -
10.0 5420 - 0.7210

Training Time

  • Training: 1.3 hours

Framework Versions

  • Python: 3.12.6
  • Sentence Transformers: 5.4.1
  • Transformers: 4.56.0
  • PyTorch: 2.8.0+cu129
  • Accelerate: 1.10.1
  • Datasets: 4.8.4
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{oord2019representationlearningcontrastivepredictive,
      title={Representation Learning with Contrastive Predictive Coding},
      author={Aaron van den Oord and Yazhe Li and Oriol Vinyals},
      year={2019},
      eprint={1807.03748},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/1807.03748},
}
Downloads last month
8
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MinhPhuc0804/qwen3-embedding-docling-checkthat-task1-v1

Finetuned
(187)
this model

Papers for MinhPhuc0804/qwen3-embedding-docling-checkthat-task1-v1

Evaluation results