SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct

This is a sentence-transformers model finetuned from Qwen/Qwen2.5-0.5B-Instruct. It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Qwen/Qwen2.5-0.5B-Instruct
  • Maximum Sequence Length: 1024 tokens
  • Output Dimensionality: 896 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model 
  (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("AlexWortega/qwen3k")
# Run inference
sentences = [
    'When was ABC formed?',
    "American Broadcasting Company\nABC launched as a radio network on October 12, 1943, serving as the successor to the NBC Blue Network, which had been purchased by Edward J. Noble. It extended its operations to television in 1948, following in the footsteps of established broadcast networks CBS and NBC. In the mid-1950s, ABC merged with United Paramount Theatres, a chain of movie theaters that formerly operated as a subsidiary of Paramount Pictures. Leonard Goldenson, who had been the head of UPT, made the new television network profitable by helping develop and greenlight many successful series. In the 1980s, after purchasing an 80% interest in cable sports channel ESPN, the network's corporate parent, American Broadcasting Companies, Inc., merged with Capital Cities Communications, owner of several print publications, and television and radio stations. In 1996, most of Capital Cities/ABC's assets were purchased by The Walt Disney Company.",
    'Americans Battling Communism\nAmericans Battling Communism, Inc. (ABC) was an anti-communist organization created following an October 1947 speech by Pennsylvania Judge Blair Gunther that called for an "ABC movement" to educate America about communism. Chartered in November 1947 by Harry Alan Sherman, a local lawyer active in various anti-communist organizations, the group took part in such activities as blacklisting by disclosing the names of people suspected of being communists. Its members included local judges and lawyers active in the McCarthy-era prosecution of communists.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 896]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

Metric sts-dev-896 sts-dev-768
pearson_cosine 0.7513 0.7504
spearman_cosine 0.7603 0.759

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,077,240 training samples
  • Columns: query, response, and negative
  • Approximate statistics based on the first 1000 samples:
    query response negative
    type string string string
    details
    • min: 4 tokens
    • mean: 8.76 tokens
    • max: 26 tokens
    • min: 23 tokens
    • mean: 141.88 tokens
    • max: 532 tokens
    • min: 4 tokens
    • mean: 134.02 tokens
    • max: 472 tokens
  • Samples:
    query response negative
    Was there a year 0? Year zero
    Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.
    504
    Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.
    When is the dialectical method used? Dialectic
    Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.
    Derek Bentley case
    Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.
    What do Grasshoppers eat? Grasshopper
    Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.
    Groundhog
    Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • gradient_accumulation_steps: 4
  • num_train_epochs: 1
  • warmup_ratio: 0.3
  • bf16: True
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 12
  • per_device_eval_batch_size: 12
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 1
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.3
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss sts-dev-896_spearman_cosine sts-dev-768_spearman_cosine
0.0004 10 2.2049 - -
0.0009 20 2.3168 - -
0.0013 30 2.3544 - -
0.0018 40 2.2519 - -
0.0022 50 2.1809 - -
0.0027 60 2.1572 - -
0.0031 70 2.1855 - -
0.0036 80 2.5887 - -
0.0040 90 2.883 - -
0.0045 100 2.8557 - -
0.0049 110 2.9356 - -
0.0053 120 2.8833 - -
0.0058 130 2.8394 - -
0.0062 140 2.923 - -
0.0067 150 2.8191 - -
0.0071 160 2.8658 - -
0.0076 170 2.8252 - -
0.0080 180 2.8312 - -
0.0085 190 2.7761 - -
0.0089 200 2.7193 - -
0.0094 210 2.724 - -
0.0098 220 2.7484 - -
0.0102 230 2.7262 - -
0.0107 240 2.6964 - -
0.0111 250 2.6676 - -
0.0116 260 2.6715 - -
0.0120 270 2.6145 - -
0.0125 280 2.6191 - -
0.0129 290 1.9812 - -
0.0134 300 1.6413 - -
0.0138 310 1.6126 - -
0.0143 320 1.3599 - -
0.0147 330 1.2996 - -
0.0151 340 1.2654 - -
0.0156 350 1.9409 - -
0.0160 360 2.1287 - -
0.0165 370 1.8442 - -
0.0169 380 1.6837 - -
0.0174 390 1.5489 - -
0.0178 400 1.4382 - -
0.0183 410 1.4848 - -
0.0187 420 1.3481 - -
0.0192 430 1.3467 - -
0.0196 440 1.3977 - -
0.0201 450 1.26 - -
0.0205 460 1.2412 - -
0.0209 470 1.316 - -
0.0214 480 1.3501 - -
0.0218 490 1.2246 - -
0.0223 500 1.2271 - -
0.0227 510 1.1871 - -
0.0232 520 1.1685 - -
0.0236 530 1.1624 - -
0.0241 540 1.1911 - -
0.0245 550 1.1978 - -
0.0250 560 1.1228 - -
0.0254 570 1.1091 - -
0.0258 580 1.1433 - -
0.0263 590 1.0638 - -
0.0267 600 1.0515 - -
0.0272 610 1.175 - -
0.0276 620 1.0943 - -
0.0281 630 1.1226 - -
0.0285 640 0.9871 - -
0.0290 650 1.0171 - -
0.0294 660 1.0169 - -
0.0299 670 0.9643 - -
0.0303 680 0.9563 - -
0.0307 690 0.9841 - -
0.0312 700 1.0349 - -
0.0316 710 0.8958 - -
0.0321 720 0.9225 - -
0.0325 730 0.842 - -
0.0330 740 0.9104 - -
0.0334 750 0.8927 - -
0.0339 760 0.8508 - -
0.0343 770 0.8835 - -
0.0348 780 0.9531 - -
0.0352 790 0.926 - -
0.0356 800 0.8718 - -
0.0361 810 0.8261 - -
0.0365 820 0.8169 - -
0.0370 830 0.8525 - -
0.0374 840 0.8504 - -
0.0379 850 0.7625 - -
0.0383 860 0.8259 - -
0.0388 870 0.7558 - -
0.0392 880 0.7898 - -
0.0397 890 0.7694 - -
0.0401 900 0.7429 - -
0.0405 910 0.6666 - -
0.0410 920 0.7407 - -
0.0414 930 0.6665 - -
0.0419 940 0.7597 - -
0.0423 950 0.7035 - -
0.0428 960 0.7166 - -
0.0432 970 0.6889 - -
0.0437 980 0.7541 - -
0.0441 990 0.7175 - -
0.0446 1000 0.7389 0.6420 0.6403
0.0450 1010 0.7142 - -
0.0454 1020 0.7301 - -
0.0459 1030 0.7299 - -
0.0463 1040 0.6759 - -
0.0468 1050 0.7036 - -
0.0472 1060 0.6286 - -
0.0477 1070 0.595 - -
0.0481 1080 0.6099 - -
0.0486 1090 0.6377 - -
0.0490 1100 0.6309 - -
0.0495 1110 0.6306 - -
0.0499 1120 0.557 - -
0.0504 1130 0.5898 - -
0.0508 1140 0.5896 - -
0.0512 1150 0.6399 - -
0.0517 1160 0.5923 - -
0.0521 1170 0.5787 - -
0.0526 1180 0.591 - -
0.0530 1190 0.5714 - -
0.0535 1200 0.6047 - -
0.0539 1210 0.5904 - -
0.0544 1220 0.543 - -
0.0548 1230 0.6033 - -
0.0553 1240 0.5445 - -
0.0557 1250 0.5217 - -
0.0561 1260 0.5835 - -
0.0566 1270 0.5353 - -
0.0570 1280 0.5887 - -
0.0575 1290 0.5967 - -
0.0579 1300 0.5036 - -
0.0584 1310 0.5915 - -
0.0588 1320 0.5719 - -
0.0593 1330 0.5238 - -
0.0597 1340 0.5647 - -
0.0602 1350 0.538 - -
0.0606 1360 0.5457 - -
0.0610 1370 0.5169 - -
0.0615 1380 0.4967 - -
0.0619 1390 0.4864 - -
0.0624 1400 0.5133 - -
0.0628 1410 0.5587 - -
0.0633 1420 0.4691 - -
0.0637 1430 0.5186 - -
0.0642 1440 0.4907 - -
0.0646 1450 0.5281 - -
0.0651 1460 0.4741 - -
0.0655 1470 0.4452 - -
0.0659 1480 0.4771 - -
0.0664 1490 0.4289 - -
0.0668 1500 0.4551 - -
0.0673 1510 0.4558 - -
0.0677 1520 0.5159 - -
0.0682 1530 0.4296 - -
0.0686 1540 0.4548 - -
0.0691 1550 0.4439 - -
0.0695 1560 0.4295 - -
0.0700 1570 0.4466 - -
0.0704 1580 0.4717 - -
0.0708 1590 0.492 - -
0.0713 1600 0.4566 - -
0.0717 1610 0.4451 - -
0.0722 1620 0.4715 - -
0.0726 1630 0.4573 - -
0.0731 1640 0.3972 - -
0.0735 1650 0.5212 - -
0.0740 1660 0.4381 - -
0.0744 1670 0.4552 - -
0.0749 1680 0.4767 - -
0.0753 1690 0.4398 - -
0.0757 1700 0.4801 - -
0.0762 1710 0.3751 - -
0.0766 1720 0.4407 - -
0.0771 1730 0.4305 - -
0.0775 1740 0.3938 - -
0.0780 1750 0.4748 - -
0.0784 1760 0.428 - -
0.0789 1770 0.404 - -
0.0793 1780 0.4261 - -
0.0798 1790 0.359 - -
0.0802 1800 0.4422 - -
0.0807 1810 0.4748 - -
0.0811 1820 0.4352 - -
0.0815 1830 0.4032 - -
0.0820 1840 0.4124 - -
0.0824 1850 0.4486 - -
0.0829 1860 0.429 - -
0.0833 1870 0.4189 - -
0.0838 1880 0.3658 - -
0.0842 1890 0.4297 - -
0.0847 1900 0.4215 - -
0.0851 1910 0.3726 - -
0.0856 1920 0.3736 - -
0.0860 1930 0.4287 - -
0.0864 1940 0.4402 - -
0.0869 1950 0.4353 - -
0.0873 1960 0.3622 - -
0.0878 1970 0.3557 - -
0.0882 1980 0.4107 - -
0.0887 1990 0.3982 - -
0.0891 2000 0.453 0.7292 0.7261
0.0896 2010 0.3971 - -
0.0900 2020 0.4374 - -
0.0905 2030 0.4322 - -
0.0909 2040 0.3945 - -
0.0913 2050 0.356 - -
0.0918 2060 0.4182 - -
0.0922 2070 0.3694 - -
0.0927 2080 0.3989 - -
0.0931 2090 0.4237 - -
0.0936 2100 0.3961 - -
0.0940 2110 0.4264 - -
0.0945 2120 0.3609 - -
0.0949 2130 0.4154 - -
0.0954 2140 0.3661 - -
0.0958 2150 0.3328 - -
0.0962 2160 0.3456 - -
0.0967 2170 0.3478 - -
0.0971 2180 0.3339 - -
0.0976 2190 0.3833 - -
0.0980 2200 0.3238 - -
0.0985 2210 0.3871 - -
0.0989 2220 0.4009 - -
0.0994 2230 0.4115 - -
0.0998 2240 0.4024 - -
0.1003 2250 0.35 - -
0.1007 2260 0.3649 - -
0.1011 2270 0.3615 - -
0.1016 2280 0.3898 - -
0.1020 2290 0.3866 - -
0.1025 2300 0.3904 - -
0.1029 2310 0.3321 - -
0.1034 2320 0.3803 - -
0.1038 2330 0.3831 - -
0.1043 2340 0.403 - -
0.1047 2350 0.3803 - -
0.1052 2360 0.3463 - -
0.1056 2370 0.3987 - -
0.1060 2380 0.3731 - -
0.1065 2390 0.353 - -
0.1069 2400 0.3166 - -
0.1074 2410 0.3895 - -
0.1078 2420 0.4025 - -
0.1083 2430 0.3798 - -
0.1087 2440 0.2991 - -
0.1092 2450 0.3094 - -
0.1096 2460 0.3669 - -
0.1101 2470 0.3412 - -
0.1105 2480 0.3697 - -
0.1110 2490 0.369 - -
0.1114 2500 0.3393 - -
0.1118 2510 0.4232 - -
0.1123 2520 0.3445 - -
0.1127 2530 0.4165 - -
0.1132 2540 0.3721 - -
0.1136 2550 0.3476 - -
0.1141 2560 0.2847 - -
0.1145 2570 0.3609 - -
0.1150 2580 0.3017 - -
0.1154 2590 0.374 - -
0.1159 2600 0.3365 - -
0.1163 2610 0.393 - -
0.1167 2620 0.3623 - -
0.1172 2630 0.3538 - -
0.1176 2640 0.3206 - -
0.1181 2650 0.3962 - -
0.1185 2660 0.3087 - -
0.1190 2670 0.3482 - -
0.1194 2680 0.3616 - -
0.1199 2690 0.3955 - -
0.1203 2700 0.3915 - -
0.1208 2710 0.3782 - -
0.1212 2720 0.3576 - -
0.1216 2730 0.3544 - -
0.1221 2740 0.3572 - -
0.1225 2750 0.3107 - -
0.1230 2760 0.3579 - -
0.1234 2770 0.3571 - -
0.1239 2780 0.3694 - -
0.1243 2790 0.3674 - -
0.1248 2800 0.3373 - -
0.1252 2810 0.3362 - -
0.1257 2820 0.3225 - -
0.1261 2830 0.3609 - -
0.1265 2840 0.3681 - -
0.1270 2850 0.4059 - -
0.1274 2860 0.3047 - -
0.1279 2870 0.3446 - -
0.1283 2880 0.3507 - -
0.1288 2890 0.3124 - -
0.1292 2900 0.3712 - -
0.1297 2910 0.3394 - -
0.1301 2920 0.3869 - -
0.1306 2930 0.3449 - -
0.1310 2940 0.3752 - -
0.1314 2950 0.3341 - -
0.1319 2960 0.3329 - -
0.1323 2970 0.36 - -
0.1328 2980 0.3788 - -
0.1332 2990 0.3834 - -
0.1337 3000 0.3426 0.7603 0.7590

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.0
  • Transformers: 4.46.2
  • PyTorch: 2.1.0+cu118
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
42
Safetensors
Model size
494M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for AlexWortega/qwen3k

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(93)
this model

Evaluation results