SentenceTransformer based on intfloat/multilingual-e5-large-instruct

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large-instruct on the measuring-embeddings-v3 dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("Lauther/measuring-embeddings-v3-multilingual-e5-large-instruct-20e")
# Run inference
sentences = [
    'What is the table structure for secondary equipment?',
    'How are flow computers and measurement systems related?\nFlow computers can have multiple systems assigned to them. However, a measurement system can only be assigned to one flow computer.\n\nDatabase terminology:\nIn the database, this relationship is referred to as:\n- Meter streams\n- Meter runs\n- Sections\n\nStorage of the relationship:\nThe relationship between a flow computer and its assigned measurement system is stored in a special table.\n\nUser context:\nWhen a user refers to a "meter stream," they are indicating that they are searching for a measurement system assigned to a specific flow computer.',
    'What kind of data store an equipment?\nEquipments can capture meteorological data, such as pressure, temperature, and volume (magnitudes). This data is essential for users to perform various calculations.\n\nData storage:\n- The measured values are stored in a special table in the database for magnitudes. This table contains the values of the variables captured by the equipments.\n- These values are **direct measurements** from the fluid (e.g., raw pressure, temperature, or volume readings). **They are not calculated values**, such as uncertainty.\n- The values stored in the variable values table are **different** from variable uncertainty values, which are calculated separately and represent the margin of error.\n\nAccessing the data:\n- Users typically access the data by referring to the readings from the measurement system, not directly from the individual equipments.\n- The readings are stored in a "variable values" table within the database.\n\nLinking variable names:\nIf the user needs to know the name of a variable, they must link the data to another table that stores information about the types of variables.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

measuring-embeddings-v3

  • Dataset: measuring-embeddings-v3 at 1b3cbbe
  • Size: 7,552 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 9 tokens
    • mean: 15.96 tokens
    • max: 40 tokens
    • min: 120 tokens
    • mean: 255.56 tokens
    • max: 512 tokens
    • min: 0.0
    • mean: 0.22
    • max: 0.95
  • Samples:
    sentence1 sentence2 score
    How can I combine the sub-query with the main query to fetch the last uncertainty report? What do measurement equipment measure?
    Each equipment measures a physical magnitude, also known as a variable. Based on the type of variable they measure, devices are classified into different categories.

    Equipment classification:
    - Primary meter: Assigned by default to equipments like orifice plates.
    - Secondary meter: Assigned by default to equipments like transmitters.
    - Tertiary meter: Used for other types of equipments.

    Equipment types in the database:
    The database includes a table listing all equipment types. Examples of equipment types are:
    - Differential pressure transmitters
    - RTDs (Resistance Temperature Detectors)
    - Orifice plates
    - Multivariable transmitters
    - Ultrasonic meters

    Meteorological checks for equipments:
    Each equipment type is assigned a meteorological check, which can be either:
    - Calibration: To ensure measurement accuracy.
    - Inspection: To verify proper functioning.

    Data storage in tables:
    The database also includes a separate table for equipment classific...
    0.1
    What is the column name for the calibration date in the calibration table? How are flow computers and measurement systems related?
    Flow computers can have multiple systems assigned to them. However, a measurement system can only be assigned to one flow computer.

    Database terminology:
    In the database, this relationship is referred to as:
    - Meter streams
    - Meter runs
    - Sections

    Storage of the relationship:
    The relationship between a flow computer and its assigned measurement system is stored in a special table.

    User context:
    When a user refers to a "meter stream," they are indicating that they are searching for a measurement system assigned to a specific flow computer.
    0.1
    What is the name of the table that contains the flow computer tags? What is equipment calibration?
    Calibration is a metrological verification process used to ensure the accuracy of measurement equipment. It is performed periodically, based on intervals set by the company or a regulatory body.

    Purpose of calibration:
    The calibration process corrects any deviations in how the equipment measures physical magnitudes (variables). This ensures the equipment provides accurate and reliable data.

    Calibration cycles:
    There are two main calibration cycles:
    1. As-found: Represents the equipment's measurement accuracy before any adjustments are made. This cycle is almost always implemented.
    2. As-left: Represents the equipment's measurement accuracy after adjustments are made. This cycle is used depending on regulatory requirements.

    Calibration uncertainty:
    - Uncertainty is included in the results of a calibration.
    - Calibration uncertainty refers to the margin of error in the device's measurements, which also affects the uncertainty of the measured variable or ...
    0.05
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Evaluation Dataset

measuring-embeddings-v3

  • Dataset: measuring-embeddings-v3 at 1b3cbbe
  • Size: 1,618 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 9 tokens
    • mean: 15.83 tokens
    • max: 40 tokens
    • min: 120 tokens
    • mean: 250.41 tokens
    • max: 512 tokens
    • min: 0.0
    • mean: 0.23
    • max: 0.95
  • Samples:
    sentence1 sentence2 score
    Identify any additional tables or columns that might be needed for the query. How are flow computers and measurement systems related?
    Flow computers can have multiple systems assigned to them. However, a measurement system can only be assigned to one flow computer.

    Database terminology:
    In the database, this relationship is referred to as:
    - Meter streams
    - Meter runs
    - Sections

    Storage of the relationship:
    The relationship between a flow computer and its assigned measurement system is stored in a special table.

    User context:
    When a user refers to a "meter stream," they are indicating that they are searching for a measurement system assigned to a specific flow computer.
    0.2
    What columns in these tables contain the measurement system tag and the flow computer tag? How does a flow computer generate and store reports?
    A flow computer generates daily or hourly reports to provide users with operational data. These reports are stored in the flow computer's memory in an organized format.

    Report structure:
    - Each report includes:
    - Date and time of the data recording.
    - Data recorded from flow computers.

    Data storage in tables:
    The reports are saved in two tables:
    1. Main table (Index):
    - Stores the date, time, and flow computer identifier.
    2. Detail table:
    - Stores the measured values associated with the report.

    Connection to the Modbus table:
    The flow computer's reports are linked to a Modbus table. This table contains the names corresponding to each value in the reports, making it easier to interpret the data.
    0.1
    Identify the column that stores the calibration number. What kind of data store an equipment?
    Equipments can capture meteorological data, such as pressure, temperature, and volume (magnitudes). This data is essential for users to perform various calculations.

    Data storage:
    - The measured values are stored in a special table in the database for magnitudes. This table contains the values of the variables captured by the equipments.
    - These values are direct measurements from the fluid (e.g., raw pressure, temperature, or volume readings). They are not calculated values, such as uncertainty.
    - The values stored in the variable values table are different from variable uncertainty values, which are calculated separately and represent the margin of error.

    Accessing the data:
    - Users typically access the data by referring to the readings from the measurement system, not directly from the individual equipments.
    - The readings are stored in a "variable values" table within the database.

    Linking variable names:
    If the user needs to kno...
    0.1
  • Loss: CoSENTLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "pairwise_cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 7
  • per_device_eval_batch_size: 7
  • gradient_accumulation_steps: 4
  • learning_rate: 3e-05
  • num_train_epochs: 20
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 7
  • per_device_eval_batch_size: 7
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 4
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
9.5153 2560 6.782 -
9.5524 2570 7.3027 -
9.5894 2580 7.3348 -
9.6265 2590 7.7864 -
9.6636 2600 6.3552 -
9.7006 2610 7.151 -
9.7377 2620 6.1664 -
9.7748 2630 6.0398 -
9.8119 2640 7.0452 -
9.8489 2650 7.2457 -
9.8860 2660 6.7531 -
9.9231 2670 6.7149 -
9.9601 2680 6.4635 -
9.9972 2690 6.2237 -
10.0371 2700 6.1798 2.9939
10.0741 2710 7.2224 -
10.1112 2720 6.5327 -
10.1483 2730 7.4686 -
10.1854 2740 6.1404 -
10.2224 2750 7.0005 -
10.2595 2760 5.7726 -
10.2966 2770 6.5327 -
10.3336 2780 7.5015 -
10.3707 2790 6.5526 -
10.4078 2800 6.2078 -
10.4449 2810 6.1 -
10.4819 2820 7.1027 -
10.5190 2830 8.639 -
10.5561 2840 6.9937 -
10.5931 2850 7.2734 2.8532
10.6302 2860 7.6321 -
10.6673 2870 7.5788 -
10.7044 2880 6.7864 -
10.7414 2890 7.4237 -
10.7785 2900 6.9813 -
10.8156 2910 6.6884 -
10.8526 2920 6.7464 -
10.8897 2930 7.7989 -
10.9268 2940 7.3568 -
10.9639 2950 8.6706 -
11.0 2960 6.5687 -
11.0371 2970 5.8992 -
11.0741 2980 6.4543 -
11.1112 2990 6.1386 -
11.1483 3000 6.9047 2.9147
11.1854 3010 7.405 -
11.2224 3020 7.5441 -
11.2595 3030 6.7524 -
11.2966 3040 7.698 -
11.3336 3050 7.6167 -
11.3707 3060 7.1516 -
11.4078 3070 6.7458 -
11.4449 3080 6.7608 -
11.4819 3090 7.1508 -
11.5190 3100 6.9155 -
11.5561 3110 6.6664 -
11.5931 3120 8.3841 -
11.6302 3130 7.1934 -
11.6673 3140 6.9681 -
11.7044 3150 7.2187 2.7509
11.7414 3160 7.3155 -
11.7785 3170 7.3103 -
11.8156 3180 7.1959 -
11.8526 3190 6.8164 -
11.8897 3200 7.5836 -
11.9268 3210 5.2671 -
11.9639 3220 6.4929 -
12.0 3230 7.0892 -
12.0371 3240 7.0877 -
12.0741 3250 5.8302 -
12.1112 3260 5.6145 -
12.1483 3270 6.5808 -
12.1854 3280 6.6826 -
12.2224 3290 5.9819 -
12.2595 3300 6.68 3.0175
12.2966 3310 6.1685 -
12.3336 3320 6.4473 -
12.3707 3330 6.3965 -
12.4078 3340 6.6278 -
12.4449 3350 5.4575 -
12.4819 3360 7.3019 -
12.5190 3370 7.4843 -
12.5561 3380 6.709 -
12.5931 3390 6.7168 -
12.6302 3400 7.0223 -
12.6673 3410 6.5089 -
12.7044 3420 6.5094 -
12.7414 3430 7.2317 -
12.7785 3440 6.6885 -
12.8156 3450 6.9693 2.8462
12.8526 3460 6.8242 -
12.8897 3470 6.6899 -
12.9268 3480 6.9113 -
12.9639 3490 7.1903 -
13.0 3500 7.3286 -
13.0371 3510 6.5465 -
13.0741 3520 5.6804 -
13.1112 3530 5.6412 -
13.1483 3540 6.6161 -
13.1854 3550 5.761 -
13.2224 3560 5.5669 -
13.2595 3570 5.6184 -
13.2966 3580 6.2996 -
13.3336 3590 4.99 -
13.3707 3600 5.9974 3.2358
13.4078 3610 5.6962 -
13.4449 3620 6.3662 -
13.4819 3630 7.0398 -
13.5190 3640 7.7358 -
13.5561 3650 7.9063 -
13.5931 3660 5.7823 -
13.6302 3670 6.9861 -
13.6673 3680 7.2855 -
13.7044 3690 5.6785 -
13.7414 3700 6.4071 -
13.7785 3710 6.4294 -
13.8156 3720 6.0842 -
13.8526 3730 5.9422 -
13.8897 3740 7.0778 -
13.9268 3750 8.1597 3.0093
13.9639 3760 6.3154 -
14.0 3770 6.2416 -
14.0371 3780 5.9958 -
14.0741 3790 5.7032 -
14.1112 3800 4.9524 -
14.1483 3810 5.386 -
14.1854 3820 5.6353 -
14.2224 3830 5.0873 -
14.2595 3840 4.9255 -
14.2966 3850 5.1423 -
14.3336 3860 6.0775 -
14.3707 3870 4.5073 -
14.4078 3880 6.8347 -
14.4449 3890 6.5397 -
14.4819 3900 7.2143 3.3080
14.5190 3910 6.1123 -
14.5561 3920 6.6048 -
14.5931 3930 6.3464 -
14.6302 3940 6.3618 -
14.6673 3950 6.5718 -
14.7044 3960 5.9785 -
14.7414 3970 6.5758 -
14.7785 3980 6.4308 -
14.8156 3990 6.0208 -
14.8526 4000 6.0303 -
14.8897 4010 6.6396 -
14.9268 4020 6.0184 -
14.9639 4030 6.6248 -
15.0 4040 6.4538 -
15.0371 4050 6.4742 3.1761
15.0741 4060 5.5295 -
15.1112 4070 6.8753 -
15.1483 4080 5.639 -
15.1854 4090 5.6232 -
15.2224 4100 6.3026 -
15.2595 4110 6.1182 -
15.2966 4120 5.4736 -
15.3336 4130 6.2961 -
15.3707 4140 5.4742 -
15.4078 4150 5.4707 -
15.4449 4160 4.7272 -
15.4819 4170 6.1026 -
15.5190 4180 5.0468 -
15.5561 4190 5.5796 -
15.5931 4200 6.9046 3.1433
15.6302 4210 5.6123 -
15.6673 4220 6.7246 -
15.7044 4230 5.7076 -
15.7414 4240 6.6772 -
15.7785 4250 5.6038 -
15.8156 4260 4.9544 -
15.8526 4270 5.0661 -
15.8897 4280 5.291 -
15.9268 4290 6.6652 -
15.9639 4300 5.6797 -
16.0 4310 5.1129 -
16.0371 4320 5.4445 -
16.0741 4330 4.8946 -
16.1112 4340 6.3929 -
16.1483 4350 6.0633 3.1426
16.1854 4360 5.522 -
16.2224 4370 4.7067 -
16.2595 4380 5.4688 -
16.2966 4390 5.6009 -
16.3336 4400 5.1376 -
16.3707 4410 4.5196 -
16.4078 4420 5.5109 -
16.4449 4430 5.1888 -
16.4819 4440 6.0305 -
16.5190 4450 5.2791 -
16.5561 4460 5.4005 -
16.5931 4470 5.255 -
16.6302 4480 6.2026 -
16.6673 4490 6.6388 -
16.7044 4500 5.6138 3.2812
16.7414 4510 4.7913 -
16.7785 4520 5.6675 -
16.8156 4530 5.8975 -
16.8526 4540 5.4597 -
16.8897 4550 5.137 -
16.9268 4560 4.5395 -
16.9639 4570 4.6304 -
17.0 4580 5.8098 -
17.0371 4590 4.0267 -
17.0741 4600 4.9194 -
17.1112 4610 4.1852 -
17.1483 4620 5.129 -
17.1854 4630 4.469 -
17.2224 4640 5.4298 -
17.2595 4650 4.5234 3.3447
17.2966 4660 4.6856 -
17.3336 4670 6.3431 -
17.3707 4680 5.347 -
17.4078 4690 4.9223 -
17.4449 4700 5.4404 -
17.4819 4710 4.916 -
17.5190 4720 6.1744 -
17.5561 4730 4.8039 -
17.5931 4740 5.2276 -
17.6302 4750 4.4189 -
17.6673 4760 4.1434 -
17.7044 4770 4.9443 -
17.7414 4780 5.6975 -
17.7785 4790 4.6667 -
17.8156 4800 4.9876 3.2924
17.8526 4810 4.4342 -
17.8897 4820 5.2595 -
17.9268 4830 5.6566 -
17.9639 4840 5.5452 -
18.0 4850 4.4986 -
18.0371 4860 4.8155 -
18.0741 4870 4.2278 -
18.1112 4880 5.4733 -
18.1483 4890 4.2394 -
18.1854 4900 5.1253 -
18.2224 4910 4.7498 -
18.2595 4920 4.9775 -
18.2966 4930 4.797 -
18.3336 4940 4.5694 -
18.3707 4950 4.6192 3.6615
18.4078 4960 5.8114 -
18.4449 4970 4.8035 -
18.4819 4980 4.6944 -
18.5190 4990 4.8664 -
18.5561 5000 4.6916 -
18.5931 5010 4.3352 -
18.6302 5020 5.9779 -
18.6673 5030 4.7813 -
18.7044 5040 4.632 -
18.7414 5050 4.7411 -
18.7785 5060 3.6489 -
18.8156 5070 4.5373 -
18.8526 5080 5.6129 -
18.8897 5090 4.8933 -
18.9268 5100 4.27 3.6957
18.9639 5110 4.5338 -
19.0 5120 5.5175 -
19.0371 5130 5.0835 -
19.0741 5140 4.6826 -
19.1112 5150 4.5391 -
19.1483 5160 5.3723 -
19.1854 5170 4.8095 -
19.2224 5180 4.7402 -
19.2595 5190 4.0488 -
19.2966 5200 3.6424 -
19.3336 5210 4.2256 -
19.3707 5220 4.4607 -
19.4078 5230 3.5702 -
19.4449 5240 4.3062 -
19.4819 5250 4.2919 3.6594
19.5190 5260 4.6985 -
19.5561 5270 4.6907 -
19.5931 5280 4.3865 -
19.6302 5290 3.9818 -
19.6673 5300 4.3166 -
19.7044 5310 4.9131 -
19.7414 5320 4.7641 -
19.7785 5330 5.419 -
19.8156 5340 4.068 -
19.8526 5350 4.1094 -
19.8897 5360 5.2279 -
19.9268 5370 4.4818 -
19.9639 5380 4.3103 -

Framework Versions

  • Python: 3.11.0
  • Sentence Transformers: 3.4.0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
1,016
Safetensors
Model size
560M params
Tensor type
F32
·
Inference Providers NEW

Model tree for Lauther/measuring-embeddings-v3

Finetuned
(81)
this model

Dataset used to train Lauther/measuring-embeddings-v3