BGE base PatentMatch Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the bhlim/patentmatch_for_finetuning dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Cosine Similarity
- Training Dataset:
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("bhlim/bge-base-patentmatch")
# Run inference
sentences = [
'Referring to FIG.32 a a sink device 3200 is designed to display thumbnail images in the metadata of contents received from source devices connected via an integrated wire interface.As mentioned in the foregoing description if a remote controller 3250 capable of outputting a pointing signal is situated within a region of a specific thumbnail image 3260 side information e.g.Amanda 1st album singer.Song etc.is displayed together.',
'The method of any one of claims 8 to 12 wherein the requesting for the broadcast channel information comprises transmitting to the server image data obtained by capturing the content being reproduced by the display apparatus or audio data obtained by recording the content for a certain time.',
'The electrode assembly of any one of the preceding claims wherein the first electrode comprises a substrate 113 wherein the first active material layer comprises active material layers 112 on both surfaces of the substrate and the ceramic layer comprises ceramic material layers 50 on both surfaces of the substrate.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
dim_768
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0426 |
cosine_accuracy@3 | 0.1014 |
cosine_accuracy@5 | 0.1448 |
cosine_accuracy@10 | 0.232 |
cosine_precision@1 | 0.0426 |
cosine_precision@3 | 0.0338 |
cosine_precision@5 | 0.029 |
cosine_precision@10 | 0.0232 |
cosine_recall@1 | 0.0426 |
cosine_recall@3 | 0.1014 |
cosine_recall@5 | 0.1448 |
cosine_recall@10 | 0.232 |
cosine_ndcg@10 | 0.1217 |
cosine_mrr@10 | 0.0884 |
cosine_map@100 | 0.1014 |
Information Retrieval
- Dataset:
dim_512
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0422 |
cosine_accuracy@3 | 0.0935 |
cosine_accuracy@5 | 0.1429 |
cosine_accuracy@10 | 0.2245 |
cosine_precision@1 | 0.0422 |
cosine_precision@3 | 0.0312 |
cosine_precision@5 | 0.0286 |
cosine_precision@10 | 0.0225 |
cosine_recall@1 | 0.0422 |
cosine_recall@3 | 0.0935 |
cosine_recall@5 | 0.1429 |
cosine_recall@10 | 0.2245 |
cosine_ndcg@10 | 0.1182 |
cosine_mrr@10 | 0.0861 |
cosine_map@100 | 0.0996 |
Information Retrieval
- Dataset:
dim_256
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0403 |
cosine_accuracy@3 | 0.0916 |
cosine_accuracy@5 | 0.1397 |
cosine_accuracy@10 | 0.2198 |
cosine_precision@1 | 0.0403 |
cosine_precision@3 | 0.0305 |
cosine_precision@5 | 0.0279 |
cosine_precision@10 | 0.022 |
cosine_recall@1 | 0.0403 |
cosine_recall@3 | 0.0916 |
cosine_recall@5 | 0.1397 |
cosine_recall@10 | 0.2198 |
cosine_ndcg@10 | 0.1151 |
cosine_mrr@10 | 0.0835 |
cosine_map@100 | 0.0963 |
Information Retrieval
- Dataset:
dim_128
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0379 |
cosine_accuracy@3 | 0.086 |
cosine_accuracy@5 | 0.1318 |
cosine_accuracy@10 | 0.208 |
cosine_precision@1 | 0.0379 |
cosine_precision@3 | 0.0287 |
cosine_precision@5 | 0.0264 |
cosine_precision@10 | 0.0208 |
cosine_recall@1 | 0.0379 |
cosine_recall@3 | 0.086 |
cosine_recall@5 | 0.1318 |
cosine_recall@10 | 0.208 |
cosine_ndcg@10 | 0.1089 |
cosine_mrr@10 | 0.0791 |
cosine_map@100 | 0.0909 |
Information Retrieval
- Dataset:
dim_64
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0328 |
cosine_accuracy@3 | 0.0742 |
cosine_accuracy@5 | 0.1144 |
cosine_accuracy@10 | 0.1847 |
cosine_precision@1 | 0.0328 |
cosine_precision@3 | 0.0247 |
cosine_precision@5 | 0.0229 |
cosine_precision@10 | 0.0185 |
cosine_recall@1 | 0.0328 |
cosine_recall@3 | 0.0742 |
cosine_recall@5 | 0.1144 |
cosine_recall@10 | 0.1847 |
cosine_ndcg@10 | 0.096 |
cosine_mrr@10 | 0.0692 |
cosine_map@100 | 0.0802 |
Training Details
Training Dataset
bhlim/patentmatch_for_finetuning
- Dataset: bhlim/patentmatch_for_finetuning at 8d60f21
- Size: 10,136 training samples
- Columns:
positive
andanchor
- Approximate statistics based on the first 1000 samples:
positive anchor type string string details - min: 5 tokens
- mean: 136.61 tokens
- max: 512 tokens
- min: 12 tokens
- mean: 76.35 tokens
- max: 512 tokens
- Samples:
positive anchor Furthermore according to this liquid consuming apparatus if the decompression level acting on the liquid sensing chamber 21 of the liquid container 1 i.e.the pressure loss arising in the connecting passage between the liquid storage portion 7 and the liquid sensing chamber 21 due to the flow rate outflowing from the liquid storage portion 7 because of distension of the diaphragm pump through application of the external force when external force is applied in the direction of expansion of volume of the diaphragm pump 42 asdepicted in FIG.6 has been set to a low level if sufficient liquid is present in the liquid container 1 the liquid sensing chamber 21 will experience substantially no change in volume.
The liquid cartridge according to any of claims 4 to 5 further comprising a ground terminal 175c 176c 177c positioned in the second line.
It is highly desirable for tires to have good wet skid resistance low rolling resistance and good wear characteristics.It has traditionally been very difficult to improve a tires wear characteristics without sacrificing its wet skid resistance and traction characteristics.These properties depend to a great extent on the dynamic viscoelastic properties of the rubbers utilized in making the tire.
The pneumatic tire of at least one of the previous claims wherein the rubber composition comprises from 5 to 20 phr of the oil and from 45 to 70 phr of the terpene phenol resin.
Before setting the environment of the mobile communication terminal a user stores a multimedia message composed of different kinds of contents i.e.images sounds and texts.For example reference block 201 indicates a multimedia message composed of several images sounds and texts.The user can select an image A a sound A and a text A for environment setting elements of the mobile communication terminal from the contents of the multimedia message and construct a theme like in block 203 using the selected image A sound A and text A.The MPU 101 maps the contents of the theme to environment setting elements of the mobile communication terminal i.e.a background screen a ringtone and a user name like in block 205.The MPU 101 then sets the environment of the mobile communication terminal using the mapped elements like in block 207 thereby automatically and collectively changing the environment of the mobile communication terminal.Mapping information about mapping between the selected contents of the multimediamessage and the environment setting elements of the mobile communication terminal is stored in the flash RAM 107.
A terminal for processing data comprising an output unit configured to output a chatting service window a receiving unit configured to receive a request for executing a chatting service and a first download request for downloading first data through the chatting service from a user and a controller configured to control to output the first data downloaded in response to the received first download request to a background screen of the chatting service window.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 32per_device_eval_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 4lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Trueload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 4max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Truelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_128_cosine_map@100 | dim_256_cosine_map@100 | dim_512_cosine_map@100 | dim_64_cosine_map@100 | dim_768_cosine_map@100 |
---|---|---|---|---|---|---|---|
0.5047 | 10 | 10.0459 | - | - | - | - | - |
0.9590 | 19 | - | 0.0849 | 0.0915 | 0.0939 | 0.0778 | 0.0966 |
1.0095 | 20 | 7.1373 | - | - | - | - | - |
1.5142 | 30 | 5.9969 | - | - | - | - | - |
1.9685 | 39 | - | 0.0890 | 0.0965 | 0.1007 | 0.0795 | 0.1012 |
2.0189 | 40 | 5.2984 | - | - | - | - | - |
2.5237 | 50 | 4.884 | - | - | - | - | - |
2.9779 | 59 | - | 0.091 | 0.0967 | 0.099 | 0.0801 | 0.1013 |
3.0284 | 60 | 4.6633 | - | - | - | - | - |
3.5331 | 70 | 4.5226 | - | - | - | - | - |
3.8360 | 76 | - | 0.0909 | 0.0963 | 0.0996 | 0.0802 | 0.1014 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.12
- Sentence Transformers: 3.0.1
- Transformers: 4.41.2
- PyTorch: 2.1.2+cu121
- Accelerate: 0.32.1
- Datasets: 2.19.1
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 40
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
Model tree for bhlim/bge-base-patentmatch
Base model
BAAI/bge-base-en-v1.5Dataset used to train bhlim/bge-base-patentmatch
Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.043
- Cosine Accuracy@3 on dim 768self-reported0.101
- Cosine Accuracy@5 on dim 768self-reported0.145
- Cosine Accuracy@10 on dim 768self-reported0.232
- Cosine Precision@1 on dim 768self-reported0.043
- Cosine Precision@3 on dim 768self-reported0.034
- Cosine Precision@5 on dim 768self-reported0.029
- Cosine Precision@10 on dim 768self-reported0.023
- Cosine Recall@1 on dim 768self-reported0.043
- Cosine Recall@3 on dim 768self-reported0.101