Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use WorkStation0/clip-finetuned-satellite with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("WorkStation0/clip-finetuned-satellite")
sentences = [
"Some buildings and many green trees are located in an average residential area.some buildings and many green trees are located in an average residential area.some buildings and many green trees are located in an average residential area.some buildings and many green trees are in an average residential area.some buildings and many green trees are in a medium residential area .",
"Seawater in the wind triggered layers of white spray.a pedestrian are on the shore of the sea .a lot of people on the beach .water is dark blue light blue .sea water in the wind set off layers of white spray .",
"the brown roof stage is located in the middle of the road.Many large trees were planted around the stadium.many tall trees were planted around the stadium .the brown roof stadium is located in the middle of the road.the brown roof stadium is located in the middle of the road .",
"Seawater in the wind triggered layers of white spray.a pedestrian are on the shore of the sea .a lot of people on the beach .water is dark blue light blue .sea water in the wind set off layers of white spray ."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from sentence-transformers/clip-ViT-B-32. It maps sentences & paragraphs to a None-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): CLIPModel()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("WorkStation0/clip-finetuned-satellite")
# Run inference
sentences = [
"It's a mountainous area.it is white, brown and green .It's a piece of green mountains.it is a piece of green mountains .this is mountainous region .",
'Many aircraft are parked next to the terminals near the runways of an airport.Many aircraft are parked next to terminals near airstrips at an airport.Many planes are parked next to the terminals near the runways in an airport.many planes are parked next to terminals near runways at an airport.many planes are parked next to terminals near runways in an airport .',
'yellow ribbon beach is between green trees and dark green ocean with white waves.yellow ribbon beach lies between green trees and dark green ocean with white waves.Yellow ribbon beach is between green trees and dark green ocean with white waves.Yellow ribbon beach is between green trees and dark green ocean with white waves.yellow ribbon beach is between green trees and dark green ocean with white waves .',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
clip-valid-triplet, clip-train-triplet and clip-test-tripletTripletEvaluator| Metric | clip-valid-triplet | clip-train-triplet | clip-test-triplet |
|---|---|---|---|
| cosine_accuracy | 1.0 | 0.9995 | 0.9992 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | PIL.PngImagePlugin.PngImageFile | string | string |
| details |
|
|
| anchor | positive | negative |
|---|---|---|
|
Many buildings and some green trees are located in an industrial area.there are many parking cars on the area next to the buildings in the industrial region.there are many cars parking on the region beside the buildings in the industrial region .many buildings and some green trees are located in an industrial area.many buildings and some green trees are in an industrial area . |
It's a piece of white snow mountain.It's a piece of snow white mountain.It's a piece of white snow mountain.It's a piece of white snow mountain.it is a piece of white snow mountain . |
|
Many buildings are located in a commercial area.Many buildings are located in a commercial area.Many buildings are located in a commercial area.Many buildings are in a commercial area.many buildings are in a commercial area . |
The mountain of yellow and green has a vein texture.the mountain of yellow and green has a vein texture.the mountain of yellow and green has a texture of vein .mountains with long and narrow ridges traverse in this range .it is a piece of irregular green mountains . |
|
There's a strong bridge over the river.There are many houses on both sides of the river.there are many houses on both sides of the river .There's a strong bridge over the river.there is a strong bridge over the river . |
many small green spots are scattered in a piece of kaki nueland.Many small green spots are scattered in a piece of naked khaki.Many small green spots are scattered in a piece of khaki stripe.many small green spots are scattered in a piece of khaki bareland.many small green spots are scattered in a piece of khaki bareland . |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | PIL.PngImagePlugin.PngImageFile | string | string |
| details |
|
|
| anchor | positive | negative |
|---|---|---|
|
Many trees are planted around the playground.There's a green football pitch on the red track.there's a green football field on the red track .a lot of trees are planted around the playground.a lot of trees are planted around the playground . |
It's a piece of uneven kaki nueland.It's a piece of uneven naked khaki.It's a piece of irregular Kaki stripping.It's a piece of unequally brazen khaki.it is a piece of uneven khaki bareland . |
|
It's a big bridge and a few buildings with some grass.the roads are grey and the ground is brown .two parallel bridges on a black river are close to many green plants and several buildings.two parallel bridges on a black river are near many green plants and several buildings .this is a big bridge and some buildings with a little grass . |
Cylinder storage tanks are built on two square concrete fields near some trees and a parking lot.Cylinder storage tanks are built on two square concrete grounds near some trees and a parking lot.cylinder storage tanks are built on two square concrete ground near some trees and a parking lot .these storage tanks are painted in different colors located next to the wood .many storage tanks are near some green trees . |
|
the lake with borders is surrounded by roads a parking lot and rows of houses.the lake with edges is surrounded by roads a parking lot and rows of houses.the lake with bylands is surrounded by roads a parking lot and rows of houses .green ponds sit in this resort surrounded by rows of red houses .some buildings and green trees are in a resort with several green ponds . |
many storage tanks of different sizes are in a factory.Many storage tanks in different sizes are in a factory.Many storage tanks in different sizes are in a factory.many storage tanks in different sizes are in a factory.many storage tanks in different sizes are in a factory . |
MultipleNegativesRankingLoss with these parameters:{
"scale": 20.0,
"similarity_fct": "cos_sim"
}
eval_strategy: epochper_device_train_batch_size: 2per_device_eval_batch_size: 2gradient_accumulation_steps: 4num_train_epochs: 1fp16: Truedataloader_num_workers: 2load_best_model_at_end: Trueoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 2per_device_eval_batch_size: 2per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 5e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 1max_steps: -1lr_scheduler_type: linearlr_scheduler_kwargs: {}warmup_ratio: 0.0warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Falseseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Truefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 2dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}tp_size: 0fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: batch_samplermulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | clip-valid-triplet_cosine_accuracy | clip-train-triplet_cosine_accuracy | clip-test-triplet_cosine_accuracy |
|---|---|---|---|---|---|---|
| -1 | -1 | - | - | 0.9863 | - | - |
| 0.0994 | 76 | 1.3967 | - | - | - | - |
| 0.1989 | 152 | 0.7383 | - | - | - | - |
| 0.2983 | 228 | 0.5708 | - | - | - | - |
| 0.3978 | 304 | 0.4172 | - | - | - | - |
| 0.4972 | 380 | 0.4475 | - | - | - | - |
| 0.5967 | 456 | 0.4363 | - | - | - | - |
| 0.6961 | 532 | 0.4044 | - | - | - | - |
| 0.7956 | 608 | 0.3529 | - | - | - | - |
| 0.8950 | 684 | 0.3021 | - | - | - | - |
| 0.9944 | 760 | 0.2855 | - | - | - | - |
| 0.9997 | 764 | - | 0.0805 | 1.0 | - | - |
| -1 | -1 | - | - | 1.0 | 0.9995 | 0.9992 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
Base model
sentence-transformers/clip-ViT-B-32