SentenceTransformer based on thenlper/gte-small
This is a sentence-transformers model finetuned from thenlper/gte-small. It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: thenlper/gte-small
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 384 tokens
- Similarity Function: Cosine Similarity
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'**Question 1**',
'89Fisika SMA/MA XII\n2. Dua muatan listrik q1 = +8 x 10-9 C dan q2 = + 16 x 10-9 C\nterpisah pada jarak 12 cm. Tentukan di mana muatan q3\nharus diletakkan agar gaya Coulomb pada muatan q3\nsama dengan nol!\n3. Sebuah segitiga sama sisi ABC mempunyai panjang sisi\n6 cm. Apabila pada masing-masing titik sudut segitiga\nberturut-turut terdapat muatan listrik sebesar qA = +8 C,\nqB = -9 C, dan qC = +3 C, tentukan besarnya gaya Cou-\nlomb pada titik sudut C!\n(a) Muatan negatif (b) Muatan positif\nGambar 3.3 Garis-garis gaya listrik(c) Antara muatan positif dan muatan negatifB. Medan Listrik dan Kuat Medan Listrik\nMedan listrik didefinisikan sebagai ruangan di sekitar\nbenda bermuatan listrik, di mana jika sebuah bendabermuatan listrik berada di dalam ruangan tersebut akanmendapat gaya listrik (gaya Coulomb). Medan listrik termasukmedan vektor, sehingga untuk menyatakan arah medan listrikdinyatakan sama dengan arah gaya yang dialami oleh muatanpositif jika berada dalam sembarang tempat di dalam medantersebut. Arah medan listrik yang ditimbulkan oleh bendabermuatan positif dinyatakan keluar dari benda, sedangkanarah medan listrik yang ditimbulkan oleh benda bermuatannegatif dinyatakan masuk ke benda.\nUntuk menggambarkan medan listrik digunakan garis-\ngaris gaya listrik . Garis-garis gaya listrik yaitu garis lengkung\nyang dibayangkan merupakan lintasan yang ditempuh olehmuatan positif yang bergerak dalam medan listrik. Garis gayalistrik tidak mungkin akan berpotongan, sebab garis gayalistrik merupakan garis khayal yang berawal dari bendabermuatan positif dan akan berakhir di benda yang bermuatannegatif. Gambar (3.3) menggambarkan garis-garis gaya listrik\ndi sekitar benda bermuatan listrik.',
'Fisika SMA/MA XII 230yang diusulkan oleh Einstein. Percobaan Compton cukup\nsederhana yaitu sinar X monokromatik (sinar X yang memilikipanjang gelombang tunggal) dikenakan pada keping tipisberilium sebagai sasarannya. Kemudian untuk mengamatifoton dari sinar X dan elektron yang terhambur dipasangdetektor. Sinar X yang telah menumbuk elektron akankehilangan sebagian energinya yang kemudian terhamburdengan sudut hamburan sebesar T terhadap arah semula.\nBerdasarkan hasil pengamatan ternyata sinar X yang ter-hambur memiliki panjang gelombang yang lebih besar daripanjang gelombang sinar X semula. Hal ini dikarenakansebagian energinya terserap oleh elektron. Jika energi fotonsinar X mula-mula hf dan energi foton sinar X yang terhambur\nmenjadi ( hf – hf ’) dalam hal ini f > f’, sedangkan panjang\ngelombang yang terhambur menjadi tambah besar yaitu O > Oc.\nDengan menggunakan hukum ke-\nkekalan momentum dan kekekalan energi\nCompton berhasil menunjukkan bahwaperubahan panjang gelombang foton\nterhambur dengan panjang gelombang\nsemula, yang memenuhi persamaan :\n .... (7.6)\ndengan\nO = panjang gelombang sinar X sebelum tumbukan (m)\nOc = panjang gelombang sinar X setelah tumbukan (m)\nh = konstanta Planck (6,625 × 10-34 Js)\nmo= massa diam elektron (9,1 × 10-31 kg)\nc = kecepatan cahaya (3 × 108 ms-1)\nT = sudut hamburan sinar X terhadap arah semula (derajat\natau radian)\nBesaran \n sering disebut dengan panjang gelombang\nCompton . Jadi jelaslah sudah bahwa dengan hasil pengamatan\nCompton tentang hamburan foton dari sinar X menunjukkan\nbahwa foton dapat dipandang sebagai partikel, sehingga mem-perkuat teori kuantum yang mengatakan bahwa cahaya mem-punyai dua sifat, yaitu cahaya dapat sebagai gelombang dan\ncahaya dapat bersifat sebagai partikel yang sering disebut\nsebagai dualime gelombang cahaya .\nTFoton\nterhambur\nElektron terhamburFoton datang\nElektrondiamhfc\nhf\nGambar 7.7 Skema percobaan Compton untuk\nmenyelidiki tumbukan foton dan elektron',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 384]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Evaluated with
InformationRetrievalEvaluator
Metric | Value |
---|---|
cosine_accuracy@1 | 0.0878 |
cosine_accuracy@3 | 0.1271 |
cosine_accuracy@5 | 0.1524 |
cosine_accuracy@10 | 0.1903 |
cosine_precision@1 | 0.0878 |
cosine_precision@3 | 0.0424 |
cosine_precision@5 | 0.0305 |
cosine_precision@10 | 0.019 |
cosine_recall@1 | 0.0878 |
cosine_recall@3 | 0.1271 |
cosine_recall@5 | 0.1524 |
cosine_recall@10 | 0.1903 |
cosine_ndcg@10 | 0.1332 |
cosine_mrr@10 | 0.1156 |
cosine_map@100 | 0.1252 |
dot_accuracy@1 | 0.0881 |
dot_accuracy@3 | 0.1274 |
dot_accuracy@5 | 0.1515 |
dot_accuracy@10 | 0.1894 |
dot_precision@1 | 0.0881 |
dot_precision@3 | 0.0425 |
dot_precision@5 | 0.0303 |
dot_precision@10 | 0.0189 |
dot_recall@1 | 0.0881 |
dot_recall@3 | 0.1274 |
dot_recall@5 | 0.1515 |
dot_recall@10 | 0.1894 |
dot_ndcg@10 | 0.133 |
dot_mrr@10 | 0.1157 |
dot_map@100 | 0.1253 |
Training Details
Training Dataset
Unnamed Dataset
- Size: 3,327 training samples
- Columns:
sentence_0
andsentence_1
- Approximate statistics based on the first 1000 samples:
sentence_0 sentence_1 type string string details - min: 7 tokens
- mean: 12.11 tokens
- max: 61 tokens
- min: 2 tokens
- mean: 437.69 tokens
- max: 512 tokens
- Samples:
sentence_0 sentence_1 Here are two questions based on the context:
Pusat Perbukuan
Departemen Pendidikan NasionalWhat type of institution is Pusat Perbukuan?
Pusat Perbukuan
Departemen Pendidikan NasionalI don't see any context information provided. It seems there's nothing above the horizontal lines.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 20.0, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 10per_device_eval_batch_size
: 10num_train_epochs
: 16multi_dataset_batch_sampler
: round_robin
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 10per_device_eval_batch_size
: 10per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1num_train_epochs
: 16max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.0warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: round_robin
Training Logs
Click to expand
Epoch | Step | Training Loss | dot_map@100 |
---|---|---|---|
0.1502 | 50 | - | 0.0576 |
0.3003 | 100 | - | 0.0621 |
0.4505 | 150 | - | 0.0640 |
0.6006 | 200 | - | 0.0683 |
0.7508 | 250 | - | 0.0671 |
0.9009 | 300 | - | 0.0702 |
1.0 | 333 | - | 0.0724 |
1.0511 | 350 | - | 0.0726 |
1.2012 | 400 | - | 0.0751 |
1.3514 | 450 | - | 0.0774 |
1.5015 | 500 | 2.1413 | 0.0815 |
1.6517 | 550 | - | 0.0851 |
1.8018 | 600 | - | 0.0831 |
1.9520 | 650 | - | 0.0852 |
2.0 | 666 | - | 0.0859 |
2.1021 | 700 | - | 0.0890 |
2.2523 | 750 | - | 0.0872 |
2.4024 | 800 | - | 0.0897 |
2.5526 | 850 | - | 0.0929 |
2.7027 | 900 | - | 0.0924 |
2.8529 | 950 | - | 0.0948 |
3.0 | 999 | - | 0.0975 |
3.0030 | 1000 | 1.9512 | 0.0977 |
3.1532 | 1050 | - | 0.0993 |
3.3033 | 1100 | - | 0.0996 |
3.4535 | 1150 | - | 0.1033 |
3.6036 | 1200 | - | 0.1037 |
3.7538 | 1250 | - | 0.1055 |
3.9039 | 1300 | - | 0.1039 |
4.0 | 1332 | - | 0.1045 |
4.0541 | 1350 | - | 0.1064 |
4.2042 | 1400 | - | 0.1068 |
4.3544 | 1450 | - | 0.1074 |
4.5045 | 1500 | 1.8016 | 0.1090 |
4.6547 | 1550 | - | 0.1107 |
4.8048 | 1600 | - | 0.1111 |
4.9550 | 1650 | - | 0.1112 |
5.0 | 1665 | - | 0.1115 |
5.1051 | 1700 | - | 0.1117 |
5.2553 | 1750 | - | 0.1135 |
5.4054 | 1800 | - | 0.1124 |
5.5556 | 1850 | - | 0.1138 |
5.7057 | 1900 | - | 0.1167 |
5.8559 | 1950 | - | 0.1150 |
6.0 | 1998 | - | 0.1157 |
6.0060 | 2000 | 1.6129 | 0.1164 |
6.1562 | 2050 | - | 0.1185 |
6.3063 | 2100 | - | 0.1166 |
6.4565 | 2150 | - | 0.1152 |
6.6066 | 2200 | - | 0.1173 |
6.7568 | 2250 | - | 0.1185 |
6.9069 | 2300 | - | 0.1141 |
7.0 | 2331 | - | 0.1153 |
7.0571 | 2350 | - | 0.1154 |
7.2072 | 2400 | - | 0.1186 |
7.3574 | 2450 | - | 0.1163 |
7.5075 | 2500 | 1.4573 | 0.1171 |
7.6577 | 2550 | - | 0.1200 |
7.8078 | 2600 | - | 0.1190 |
7.9580 | 2650 | - | 0.1182 |
8.0 | 2664 | - | 0.1197 |
8.1081 | 2700 | - | 0.1195 |
8.2583 | 2750 | - | 0.1210 |
8.4084 | 2800 | - | 0.1199 |
8.5586 | 2850 | - | 0.1188 |
8.7087 | 2900 | - | 0.1207 |
8.8589 | 2950 | - | 0.1178 |
9.0 | 2997 | - | 0.1178 |
9.0090 | 3000 | 1.2947 | 0.1185 |
9.1592 | 3050 | - | 0.1207 |
9.3093 | 3100 | - | 0.1193 |
9.4595 | 3150 | - | 0.1203 |
9.6096 | 3200 | - | 0.1206 |
9.7598 | 3250 | - | 0.1233 |
9.9099 | 3300 | - | 0.1180 |
10.0 | 3330 | - | 0.1205 |
10.0601 | 3350 | - | 0.1208 |
10.2102 | 3400 | - | 0.1206 |
10.3604 | 3450 | - | 0.1184 |
10.5105 | 3500 | 1.2041 | 0.1212 |
10.6607 | 3550 | - | 0.1192 |
10.8108 | 3600 | - | 0.1219 |
10.9610 | 3650 | - | 0.1197 |
11.0 | 3663 | - | 0.1196 |
11.1111 | 3700 | - | 0.1219 |
11.2613 | 3750 | - | 0.1224 |
11.4114 | 3800 | - | 0.1199 |
11.5616 | 3850 | - | 0.1209 |
11.7117 | 3900 | - | 0.1211 |
11.8619 | 3950 | - | 0.1241 |
12.0 | 3996 | - | 0.1209 |
12.0120 | 4000 | 1.106 | 0.1217 |
12.1622 | 4050 | - | 0.1223 |
12.3123 | 4100 | - | 0.1222 |
12.4625 | 4150 | - | 0.1208 |
12.6126 | 4200 | - | 0.1213 |
12.7628 | 4250 | - | 0.1220 |
12.9129 | 4300 | - | 0.1231 |
13.0 | 4329 | - | 0.1220 |
13.0631 | 4350 | - | 0.1229 |
13.2132 | 4400 | - | 0.1236 |
13.3634 | 4450 | - | 0.1217 |
13.5135 | 4500 | 1.0598 | 0.1235 |
13.6637 | 4550 | - | 0.1235 |
13.8138 | 4600 | - | 0.1235 |
13.9640 | 4650 | - | 0.1225 |
14.0 | 4662 | - | 0.1234 |
14.1141 | 4700 | - | 0.1246 |
14.2643 | 4750 | - | 0.1244 |
14.4144 | 4800 | - | 0.1237 |
14.5646 | 4850 | - | 0.1244 |
14.7147 | 4900 | - | 0.1253 |
14.8649 | 4950 | - | 0.1244 |
15.0 | 4995 | - | 0.1253 |
15.0150 | 5000 | 1.0104 | 0.1248 |
15.1652 | 5050 | - | 0.1253 |
Framework Versions
- Python: 3.11.0
- Sentence Transformers: 3.0.1
- Transformers: 4.44.0
- PyTorch: 2.4.0+cu124
- Accelerate: 0.33.0
- Datasets: 2.21.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 24
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Model tree for Hemg/gte-small-llama
Base model
thenlper/gte-smallEvaluation results
- Cosine Accuracy@1 on Unknownself-reported0.088
- Cosine Accuracy@3 on Unknownself-reported0.127
- Cosine Accuracy@5 on Unknownself-reported0.152
- Cosine Accuracy@10 on Unknownself-reported0.190
- Cosine Precision@1 on Unknownself-reported0.088
- Cosine Precision@3 on Unknownself-reported0.042
- Cosine Precision@5 on Unknownself-reported0.030
- Cosine Precision@10 on Unknownself-reported0.019
- Cosine Recall@1 on Unknownself-reported0.088
- Cosine Recall@3 on Unknownself-reported0.127