Edit model card

BGE base Movie Matryoshka

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the q_asimple_for_bge_241019 dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("YxBxRyXJx/bge-base-movie-matryoshka")
# Run inference
sentences = [
    '11\tdocumentary film so unpleasant when most had sat through horror pictures that were appreciably more violent and bloody.  The answer that McCauley came up with was that the fictional nature of horror films affords viewers a sense of control by placing psychological distance between them and the violent acts they have witnessed. Most people who view horror movies understand that the filmed events are unreal, which furnishes them with psychological distance from the horror portrayed in the film. In fact, there is evidence that young viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal (Hoekstra, Harris, & Helmick, 1999). Four Viewing Motivations for Graphic Horror   According to Dr. Deirdre Johnston (1995) study Adolescents’ Motivations for Viewing Graphic Horror of Human Communication Research there are four different main reasons for viewing graphic horror. From the study of a small sample of 220 American adolescents who like watching horror movies, Dr. Johnston reported that: The four viewing motivations are found to be related to viewers’ cognitive and affective responses to horror films, as well as viewers’ tendency to identify with either the killers or victims in these films." Dr. Johnson notes that:  1) gore watchers typically had low empathy, high sensation seeking, and (among  males only) a strong identification with the killer, 2) thrill watchers typically had  both high empathy and sensation seeking, identified themselves more with the  victims, and liked the suspense of the film, 3) independent watchers typically had  a  high empathy for the victim along with a high positive effect for overcoming  fear, and 4) problem watchers typically had high empathy for the victim but were',
    'What is the primary reason why viewers who perceive greater realism in horror films are more negatively affected by their exposure to horror films than viewers who perceive the film as unreal?',
    'What shift in the cultural, political, and social contexts of the 1980s and 1990s may have led to the deconstruction of the hard body characters portrayed by actors such as Stallone and Schwarzenegger in more recent movies?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric dim_768 dim_512 dim_256 dim_128 dim_64
cosine_accuracy@1 0.8205 0.8462 0.8462 0.7692 0.5641
cosine_accuracy@3 0.9744 0.9231 0.9231 0.8974 0.8718
cosine_accuracy@5 1.0 1.0 0.9487 0.9487 0.9231
cosine_accuracy@10 1.0 1.0 1.0 0.9487 0.9487
cosine_precision@1 0.8205 0.8462 0.8462 0.7692 0.5641
cosine_precision@3 0.3248 0.3077 0.3077 0.2991 0.2906
cosine_precision@5 0.2 0.2 0.1897 0.1897 0.1846
cosine_precision@10 0.1 0.1 0.1 0.0949 0.0949
cosine_recall@1 0.8205 0.8462 0.8462 0.7692 0.5641
cosine_recall@3 0.9744 0.9231 0.9231 0.8974 0.8718
cosine_recall@5 1.0 1.0 0.9487 0.9487 0.9231
cosine_recall@10 1.0 1.0 1.0 0.9487 0.9487
cosine_ndcg@10 0.9208 0.9233 0.9234 0.8688 0.7682
cosine_mrr@10 0.894 0.8983 0.899 0.8419 0.7081
cosine_map@100 0.894 0.8983 0.899 0.8444 0.7089

Training Details

Training Dataset

q_asimple_for_bge_241019

  • Dataset: q_asimple_for_bge_241019 at 66635cd
  • Size: 183 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 183 samples:
    positive anchor
    type string string
    details
    • min: 191 tokens
    • mean: 356.1 tokens
    • max: 512 tokens
    • min: 16 tokens
    • mean: 36.04 tokens
    • max: 66 tokens
  • Samples:
    positive anchor
    1 Introduction Why do we watch horror films? What makes horror films so exciting to watch? Why do our bodies sweat and muscles tense when we are scared? How do filmmakers, producers, sound engineers, and cinematographers specifically design a horror film? Can horror movies cause negative, lasting effects on the audience? These are some of the questions that are answered by exploring the aesthetics of horror films and the psychology behind horror movies. Chapter 1, The Allure of Horror Film, illustrates why we are drawn to scary films by studying different psychological theories and factors. Ideas include: catharsis, subconscious mind, curiosity, thrill, escape from reality, relevance, unrealism, and imagination. Also, this chapter demonstrates why people would rather watch fiction films than documentaries and the motivations for viewing graphic horror. Chapter 2, Mise-en-scène in Horror Movies, includes purposeful arrangement of scenery and stage properties of horror movie. Also... What is the name of the emerging field of scientists and filmmakers that uses fMRI and EEG to read people's brain activity while watching movie scenes?
    3 Chapter 1: The Allure of Horror Film Overview Although watching horror films can make us feel anxious and uneasy, we still continue to watch other horror films one after another. It is ironic how we hate the feeling of being scared, but we still enjoy the thrill. So why do we pay money to watch something to be scared? Eight Theories on why we watch Horror Films From research by philosophers, psychoanalysts, and psychologists there are theories that can explain why we are drawn to watching horror films. The first theory, psychoanalyst, Sigmund Freud portrays that horror comes from the “uncanny” emergence of images and thoughts of the primitive id. The purpose of horror films is to highlight unconscious fears, desire, urges, and primeval archetypes that are buried deep in our collective subconscious images of mothers and shadows play important roles because they are common to us all. For example, in Alfred Hitchcock's Psycho, a mother plays the role of evil in the main character... What process, introduced by the Greek Philosopher Aristotle, involves the release of negative emotions through the observation of violent or scary events, resulting in a purging of aggressive emotions?
    5 principle unknowable (Jancovich, 2002, p. 35). This meaning, the audience already knows that the plot and the characters are already disgusting, but the surprises in the horror narrative through the discovery of curiosity should give satisfaction. Marvin Zuckerman (1979) proposed that people who scored high in sensation seeking scale often reported a greater interest in exciting things like rollercoasters, bungee jumping and horror films. He argued more individuals who are attracted to horror movies desire the sensation of experience. However, researchers did not find the correlation to thrill-seeking activities and enjoyment of watching horror films always significant. The Gender Socialization theory (1986) by Zillman, Weaver, Mundorf and Aust exposed 36 male and 36 female undergraduates to a horror movie with the same age, opposite-gender companion of low or high initial appeal who expressed mastery, affective indifference, or distress. They reported that young men enjoyed the fi... What is the proposed theory by Marvin Zuckerman (1979) regarding the relationship between sensation seeking and interest in exciting activities, including horror films?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
1.0 1 0.8987 0.8983 0.8835 0.8419 0.7773
2.0 2 0.9218 0.9141 0.9075 0.8721 0.8124
1.0 1 0.9218 0.9141 0.9075 0.8721 0.8124
2.0 2 0.9356 0.9302 0.9118 0.8750 0.8057
3.0 4 0.9302 0.9233 0.9234 0.8783 0.7759
4.0 5 0.9208 0.9233 0.9234 0.8688 0.7682
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.46.3
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.1.1
  • Datasets: 3.1.0
  • Tokenizers: 0.20.3

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
13
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for YxBxRyXJx/bge-base-movie-matryoshka

Finetuned
(256)
this model

Dataset used to train YxBxRyXJx/bge-base-movie-matryoshka

Evaluation results