You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

BGE-base-en-v1.5-Hotpotqa

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the sentence-transformers/hotpotqa dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 tokens
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'James D. Farley, Jr. had an early interest in automobiles because of his grandfather who worked for what company?',
    "Jim Farley (businessman) James D. Farley, Jr. (born June 1962) is an American automobile executive that currently serves as Ford Motor Company's Executive Vice President and president, Global Markets since June 2017. From 2015 to 2017, he was CEO and Chairman of Ford Europe. He had an early interest in automobiles, primarily spurred from his grandfather who worked at Henry Ford's River Rouge Plant starting in 1914.",
    'Continental Motors Company Continental Motors Company was an American manufacturer of internal combustion engines. The company produced engines as a supplier to many independent manufacturers of automobiles, tractors, trucks, and stationary equipment (such as pumps, generators, and industrial machinery drives) from the 1900s through the 1960s. Continental Motors also produced Continental-branded automobiles in 1932–1933. The Continental Aircraft Engine Company was formed in 1929 to develop and produce its aircraft engines, and would become the core business of Continental Motors, Inc.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.9069
dot_accuracy 0.0931
manhattan_accuracy 0.9066
euclidean_accuracy 0.9069
max_accuracy 0.9069

Triplet

Metric Value
cosine_accuracy 0.9075
dot_accuracy 0.0931
manhattan_accuracy 0.9056
euclidean_accuracy 0.9064
max_accuracy 0.9075

Triplet

Metric Value
cosine_accuracy 0.9074
dot_accuracy 0.0931
manhattan_accuracy 0.9063
euclidean_accuracy 0.9063
max_accuracy 0.9074

Triplet

Metric Value
cosine_accuracy 0.9061
dot_accuracy 0.0949
manhattan_accuracy 0.9014
euclidean_accuracy 0.9036
max_accuracy 0.9061

Triplet

Metric Value
cosine_accuracy 0.9055
dot_accuracy 0.0981
manhattan_accuracy 0.8984
euclidean_accuracy 0.9013
max_accuracy 0.9055

Training Details

Training Dataset

sentence-transformers/hotpotqa

  • Dataset: sentence-transformers/hotpotqa at f07d3cd
  • Size: 76,064 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 8 tokens
    • mean: 24.49 tokens
    • max: 108 tokens
    • min: 21 tokens
    • mean: 101.27 tokens
    • max: 512 tokens
    • min: 14 tokens
    • mean: 87.44 tokens
    • max: 407 tokens
  • Samples:
    anchor positive negative
    What historical geographic region in Central-Eastern Europe was the birthplace of a soldier of the Austro-Hungarian Army? Bruno Olbrycht Bruno Olbrycht (nom de guerre: Olza; 6 October 1895 – 23 March 1951) was a soldier of the Austro-Hungarian Army and officer (later general) of the Polish Army both in the Second Polish Republic and postwar Poland. Born on 6 October 1895 in Sanok, Austrian Galicia, Olbrycht fought in Polish Legions in World War I, Polish–Ukrainian War, Polish–Soviet War and the Invasion of Poland. He died on 23 March 1951 in Kraków. Padáň The village was first recorded in 1254 as "Padan", an old Pecheneg settlement. On the territory of the village, there used to be "Petény" village as well, which was mentioned in 1298 as the appurtenance of Pressburg Castle. Until the end of World War I, it was part of Hungary and fell within the Dunaszerdahely district of Pozsony County. After the Austro-Hungarian army disintegrated in November 1918, Czechoslovakian troops occupied the area. After the Treaty of Trianon of 1920, the village became officially part of Czechoslovakia. In November 1938, the First Vienna Award granted the area to Hungary and it was held by Hungary until 1945. After Soviet occupation in 1945, Czechoslovakian administration returned and the village became officially part of Czechoslovakia in 1947.
    Full Scale Assault is the fourth studio album by Dutch punk hardcore band Vitamin X, the album was recorded at Electrical Audio in Chicago by Steve Albini who previously recorded The Stooges, also known as Iggy and the Stooges, were an American rock band formed in Ann Arbor, Michigan in what year? Full Scale Assault Full Scale Assault is the fourth studio album by Dutch punk hardcore band Vitamin X. Released through Tankcrimes on October 10, 2008 in the US, and Agipunk in Europe. The album was recorded at Electrical Audio in Chicago by Steve Albini who previously recorded Nirvana, Neurosis, PJ Harvey, High on Fire, Iggy Pop & The Stooges. It features guest vocals from Negative Approach's singer John Brannon. Art is by John Dyer Baizley. The Dogs (US punk band) The Dogs are a three-piece proto-punk band formed in Lansing, Michigan, United States in 1969. They are noted for presaging the energy and sound of the later punk and hardcore genres.
    Which popular music style was a modification of the marches from "The March King" with heavy influences from African American communities? Ragtime Ragtime – also spelled rag-time or rag time – is a musical style that enjoyed its peak popularity between 1895 and 1918. Its cardinal trait is its syncopated, or "ragged", rhythm. The style has its origins in African-American communities in cities such as St. Louis. Ernest Hogan (1865–1909) was a pioneer of ragtime and was the first composer to have his ragtime pieces (or "rags") published as sheet music, beginning with the song "LA Pas Ma LA," published in 1895. Hogan has also been credited for coining the term "ragtime". The term is actually derived from his hometown "Shake Rag" in Bowling Green, Kentucky. Ben Harney, another Kentucky native, has often been credited for introducing the music to the mainstream public. His first ragtime composition, "You've Been a Good Old Wagon But You Done Broke", helped popularize the style. The composition was published in 1895, a few months after Ernest Hogan's "LA Pas Ma LA." Ragtime was also a modification of the march style popularized by John Philip Sousa, with additional polyrhythms coming from African music. Ragtime composer Scott Joplin ("ca." 1868–1917) became famous through the publication of the "Maple Leaf Rag" (1899) and a string of ragtime hits such as "The Entertainer" (1902), although he was later forgotten by all but a small, dedicated community of ragtime aficionados until the major ragtime revival in the early 1970s. For at least 12 years after its publication, "Maple Leaf Rag" heavily influenced subsequent ragtime composers with its melody lines, harmonic progressions or metric patterns. Joropo The Joropo is a musical style resembling the fandango, and an accompanying dance. It has African, Native South American and European influences and originated in the plains called "Los Llanos" of what is now Colombia and Venezuela. It is a fundamental genre of "música criolla" (creole music). It is also the most popular "folk rhythm": the well-known song "Alma Llanera" is a joropo, considered the unofficial national anthem of Venezuela.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "TripletLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

sentence-transformers/hotpotqa

  • Dataset: sentence-transformers/hotpotqa at f07d3cd
  • Size: 8,452 evaluation samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 8 tokens
    • mean: 23.94 tokens
    • max: 87 tokens
    • min: 16 tokens
    • mean: 101.15 tokens
    • max: 447 tokens
    • min: 12 tokens
    • mean: 86.87 tokens
    • max: 407 tokens
  • Samples:
    anchor positive negative
    What is the birthdate of this American dancer and choreographer of modern dance, who helped found the Joseph Campbell Foundation with Robert Walter? Robert Walter (editor) Robert Walter is an editor and an executive with several not-for-profit organizations. Most notably, he is the executive director and board president of the Joseph Campbell Foundation (JCF), an organization that he helped found in 1990 with choreographer Jean Erdman, Joseph Campbell's widow. Miguel Terekhov Miguel Terekhov (August 22, 1928 – January 3, 2012) was a Uruguayan-born American ballet dancer and ballet instructor. Terekhov and his wife, Yvonne Chouteau, one of the Five Moons, a group of Native American ballet dancers, founded the School of Dance at the University of Oklahoma in 1961.
    What is the difference between Konstantin Orbelyan and Haig P. Manoogian Konstantin Orbelyan Konstantin Aghaparoni Orbelyan (Armenian: Կոնստանտին Աղապարոնի Օրբելյան ; Russian: Константин Агапаронович Орбелян , July 29, 1928 – April 24, 2014) was an Armenian pianist, composer, head of the State Estrada Orchestra of Armenia. Mitrofan Lodyzhensky Mitrofan Vasilyevich Lodyzhensky (Russian: Митрофа́н Васи́льевич Лоды́женский , in some sources Лады́женский (Ladyzhensky ); February 27 [O.S. February 15] 1852 – May 31 [O.S. May 18] 1917 ) was a Russian religious philosopher, playwright, and statesman, best known for his "Mystical Trilogy" comprising "Super-consciousness and the Ways to Achieve It", "Light Invisible", and "Dark Force".
    Which movie has more producers, Laura's Star or 9? Laura's Star Laura's Star (German: Lauras Stern ) is a 2004 German animated feature film produced and directed by Thilo Rothkirch. It is based on the children's book "Lauras Stern" by Klaus Baumgart. It was released by Warner Bros. Family Entertainment. Laura Mañá Laura Mañá (born January 12, 1968 in Barcelona, Catalonia, Spain) is an actress, film director and screenwriter.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "TripletLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • resume_from_checkpoint: bge-base-hotpotwa-matryoshka
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: bge-base-hotpotwa-matryoshka
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss dim_128_cosine_accuracy dim_256_cosine_accuracy dim_512_cosine_accuracy dim_64_cosine_accuracy dim_768_cosine_accuracy
0.3366 50 23.6925 21.8521 0.9285 0.9288 0.9334 0.9226 0.9365
0.6731 100 22.4254 20.8726 0.9102 0.9110 0.9156 0.9063 0.9168
1.0097 150 22.046 20.7027 0.9142 0.9162 0.9188 0.9098 0.9200
1.3462 200 21.871 20.6600 0.9227 0.9198 0.9233 0.9159 0.9232
1.6828 250 21.7 20.6425 0.9193 0.9192 0.9203 0.9148 0.9217
2.0194 300 21.5785 20.6416 0.9113 0.9133 0.9149 0.9082 0.9142
2.3559 350 21.4963 20.5366 0.9141 0.9139 0.9162 0.9107 0.9177
2.6925 400 21.4012 20.5315 0.9103 0.9114 0.9135 0.9081 0.9136
3.0290 450 21.3447 20.5096 0.9093 0.9089 0.9102 0.9057 0.9106
3.3656 500 21.3029 20.5548 0.9061 0.9074 0.9075 0.9055 0.9069

Framework Versions

  • Python: 3.10.10
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification}, 
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
4
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for anindya-hf-2002/bge-base-finetuned-hotpotqa

Finetuned
(310)
this model

Dataset used to train anindya-hf-2002/bge-base-finetuned-hotpotqa

Evaluation results