dbswldnjs's picture
Upload folder using huggingface_hub
e9bf8b5 verified
metadata
base_model: google-bert/bert-base-multilingual-cased
datasets: []
language: []
library_name: sentence-transformers
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:1890
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      32์„ธ ์—ฌ์ž๊ฐ€ ๋ชฉ์„ ๋งค๋‹ค๊ฐ€ ๊ฐ€์กฑ์—๊ฒŒ ๋ฐœ๊ฒฌ๋˜์–ด ๋ณ‘์›์— ์™”๋‹ค. ์ž„์‹  16์ฃผ์˜€์œผ๋ฉฐ 1๊ฐœ์›” ์ „๋ถ€ํ„ฐ ์‹์‚ฌ๋ฅผ ํ•˜์ง€ ์•Š๊ณ  ๋ˆ„์›Œ๋งŒ ์ง€๋ƒˆ๋‹ค๊ณ  ํ•œ๋‹ค.
      ๊ธฐ๋ถ„์ด ์šฐ์šธํ•˜๊ณ  ์•„๋ฌด๊ฒƒ๋„ ํ•˜๊ธฐ๊ฐ€ ์‹ซ๋‹ค๊ณ  ํ•œ๋‹ค. ์•„์ด๋ฅผ ์ž˜ ํ‚ค์šธ ์ž์‹ ๋„ ์—†๊ณ  ์‚ด๊ณ  ์‹ถ์ง€ ์•Š์œผ๋‹ˆ ์ฃฝ๊ฒŒ ๋‚ด๋ฒ„๋ ค ๋‘๋ผ๊ณ  ํ•œ๋‹ค. ์น˜๋ฃŒ๋Š”?
    sentences:
      - ์ „๊ธฐ๊ฒฝ๋ จ์š”๋ฒ•
      - ํ•ญ์‘๊ณ ์ œ
      - ๊ดœ์ฐฎ๋‹ค๊ณ  ์•ˆ์‹ฌ์‹œํ‚ด
  - source_sentence: >-
      59์„ธ ์—ฌ์ž๊ฐ€ ์งˆ๋ถ„๋น„๋ฌผ์ด ์žˆ๊ณ  ์™ธ์Œ๋ถ€๊ฐ€ ๊ฑด์กฐํ•˜๊ณ  ๋”ฐ๊ฐ€์›Œ ๋ณ‘์›์— ์™”๋‹ค. ๋ณด์Šต์ œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ๋„ ์ฆ์ƒ์ด ์ง€์†๋˜์—ˆ๋‹ค. 40์„ธ์— ์ž๊ถ๊ทผ์ข…์œผ๋กœ
      ์ž๊ถ์ ˆ์ œ์ˆ ์„ ๋ฐ›์•˜๊ณ  ์™ผ์ชฝ ๋‹ค๋ฆฌ์˜ ๊นŠ์€์ •๋งฅํ˜ˆ์ „์ฆ์œผ๋กœ ์•ฝ๋ฌผ์„ ๋ณต์šฉ ์ค‘์ด๋‹ค. ์•ˆ๋ฉดํ™์กฐ์™€ ๋ถˆ๋ฉด์ฆ์ด 50๋Œ€ ์ดˆ๋ฐ˜์— ์žˆ์—ˆ๋‹ค๊ฐ€ ํ˜„์žฌ๋Š” ์—†๊ณ 
      ์„ฑ๊ตํ†ต์ด ์žˆ๋‹ค. ๊ณจ๋ฐ˜๊ฒ€์‚ฌ์—์„œ ์™ธ์Œ๋ถ€ ์œ„์ถ•์ด ๊ด€์ฐฐ๋˜์—ˆ๊ณ  ์งˆ๋ถ„๋น„๋ฌผ์˜ ์ –์€ํŽด๋ฐ”๋ฅธํ‘œ๋ณธ๊ฒ€์‚ฌ์—์„œ๋Š” ์ด์ƒ์ด ์—†๋‹ค. ์ฒ˜์น˜๋Š”?
    sentences:
      - ์‹œ์ƒํ•˜๋ถ€๊ธฐ๋Šฅ์ด์ƒ
      - ๊ฒฝ์งˆ ์—์ŠคํŠธ๋กœ๊ฒ
      - ๋ฉดํ—ˆ ์ทจ์†Œ์ผ๋ถ€ํ„ฐ 3๋…„ ๊ฒฝ๊ณผ
  - source_sentence: >-
      15์„ธ ์—ฌ์ž๊ฐ€ 5์ผ ์ „๋ถ€ํ„ฐ ์—ด์ด ๋‚˜๊ณ  ์˜คํ•œ์ด ๋“ ๋‹ค๋ฉฐ ๋ณ‘์›์— ์™”๋‹ค. ์Œ์‹์„ ์‚ผํ‚ฌ ๋•Œ ๋ชฉ์ด ์•„ํ”„๋‹ค๊ณ  ํ•œ๋‹ค. ํ˜ˆ์•• 100/60 mmHg,
      ๋งฅ๋ฐ• 75ํšŒ/๋ถ„, ํ˜ธํก 18ํšŒ/๋ถ„, ์ฒด์˜จ 38.0โ„ƒ์ด๋‹ค. ๋ชฉ์˜ ์–‘์ชฝ ์—ฌ๋Ÿฌ ๊ตฐ๋ฐ์—์„œ 1 cm ์ดํ•˜ ํฌ๊ธฐ์˜ ๋ฆผํ”„์ ˆ์ด ๋งŒ์ ธ์ง„๋‹ค. ๋ฆผํ”„์ ˆ์€
      ์••ํ†ต์ด ์žˆ์œผ๋‚˜ ์ฃผ์œ„ ์กฐ์ง์— ๊ณ ์ •๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค. ๋ชธ์—์„œ ๋ฐœ์ง„์€ ๋ณด์ด์ง€ ์•Š๋Š”๋‹ค. ํ˜ˆ์•ก๊ฒ€์‚ฌ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ๋‹ค์Œ ๊ฒ€์‚ฌ๋Š”?๋ฐฑํ˜ˆ๊ตฌ
      13,780/mm^3 (์ค‘์„ฑ๊ตฌ 25%, ๋ฆผํ”„๊ตฌ 64%) ํ˜ˆ์ƒ‰์†Œ 13.3 g/dL, ํ˜ˆ์†ŒํŒ 209,000/mm^3 ํ˜ˆ์•ก์š”์†Œ์งˆ์†Œ 7
      mg/dL,  ํฌ๋ ˆ์•„ํ‹ฐ๋‹Œ 0.5 mg/dL, ์•„์ŠคํŒŒ๋ฅดํ…Œ์ดํŠธ์•„๋ฏธ๋…ธ์ „๋‹ฌํšจ์†Œ 266 U/L ์•Œ๋ผ๋‹Œ์•„๋ฏธ๋…ธ์ „๋‹ฌํšจ์†Œ 298 U/L  ์ด๋นŒ๋ฆฌ๋ฃจ๋นˆ
      0.7 mg/dL, ์•Œ์นผ๋ฆฌ์ธ์‚ฐ๋ถ„ํ•ดํšจ์†Œ 146 U/L (์ฐธ๊ณ ์น˜, 33๏ฝž96) C-๋ฐ˜์‘๋‹จ๋ฐฑ์งˆ 13 mg/L (์ฐธ๊ณ ์น˜, <10) 
    sentences:
      - ํ˜ˆ์ฒญ ๋ฐ”์ด๋Ÿฌ์Šค์บก์‹œ๋“œํ•ญ์›(VCA) IgM ํ•ญ์ฒด
      - ์ธก์ • ๋ฐ”์ด์–ด์Šค
      - ๋‚ ํŠธ๋ ‰์†
  - source_sentence: >-
      ์ž„์‹ ๋‚˜์ด 27์ฃผ, ์ถœ์ƒ์ฒด์ค‘ 750 g์œผ๋กœ ํƒœ์–ด๋‚œ ์‹ ์ƒ์•„๊ฐ€ ์ƒํ›„ 5์ผ์งธ ๊ฐ‘์ž๊ธฐ ์ฒญ์ƒ‰์ฆ์ด ๋ฐœ์ƒํ•˜์˜€๋‹ค. ์ถœ์ƒ ์งํ›„ ํํ‘œ๋ฉดํ™œ์„ฑ์ œ๋ฅผ
      ํˆฌ์—ฌ๋ฐ›์•˜๊ณ , ์ดํ›„ ๊ธฐ๊ณ„ํ™˜๊ธฐ์น˜๋ฃŒ ์ค‘์ด๋‹ค. ์‹ฌ๋ฐ• 170ํšŒ/๋ถ„, ํ˜ธํก 80ํšŒ/๋ถ„, ๊ฒฝํ”ผ์‚ฐ์†Œํฌํ™”๋„๋Š” ์˜ค๋ฅธ์†๊ณผ ์™ผ๋ฐœ์—์„œ ๋ชจ๋‘ 60% ์ด๋‹ค.
      ์•ž๊ฐ€์Šด์ด ํŒฝ์ฐฝ๋˜๊ณ , ์˜ค๋ฅธ์ชฝ ๊ฐ€์Šด ์ฒญ์ง„์—์„œ ํ˜ธํก์Œ์ด ์ž˜ ๋“ค๋ฆฌ์ง€ ์•Š๋Š”๋‹ค. ๊ฒ€์‚ฌ๋Š”?
    sentences:
      - ์š”์ฒญ์— ์‘ํ•จ
      - ๋น„์ „ํ˜•์  ์–‘์ƒ ๋™๋ฐ˜ ์ฃผ์š”์šฐ์šธ์žฅ์• 
      - ๊ฐ€์Šด X์„ ์‚ฌ์ง„
  - source_sentence: >-
      58์„ธ ๋‚จ์ž๊ฐ€ 7์‹œ๊ฐ„ ์ „๋ถ€ํ„ฐ ์œ—๋ฐฐ๊ฐ€ ์•„ํŒŒ์„œ ๋ณ‘์›์— ์™”๋‹ค. ํ‰์†Œ์— ์•Œ์ฝ”์˜ฌ๊ฐ„๊ฒฝํ™”๋กœ ์น˜๋ฃŒ๋ฅผ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ ์†Œํ™”๊ถค์–‘์— ์˜ํ•œ ์ฒœ๊ณต์œผ๋กœ ์ˆ˜์ˆ ์„
      ๋ฐ›์„ ์˜ˆ์ •์ด๋‹ค. ํ˜ˆ์•• 130/90 mmHg, ๋งฅ๋ฐ• 95ํšŒ/๋ถ„, ํ˜ธํก 22ํšŒ/๋ถ„, ์ฒด์˜จ 37.5โ„ƒ์ด๋‹ค. ๋ฐฐ ์ „์ฒด๊ฐ€ ๋”ฑ๋”ฑํ•˜๊ณ  ๋ฐฐ์— ์••ํ†ต๊ณผ
      ๋ฐ˜๋™์••ํ†ต์ด ์žˆ๋‹ค. ํ˜ˆ์•ก๊ฒ€์‚ฌ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์ˆ˜์ˆ  ์ „ ํˆฌ์—ฌํ•ด์•ผ ํ•  ์ œ์ œ๋Š”?ํ˜ˆ์ƒ‰์†Œ 10.3 g/dL, ๋ฐฑํ˜ˆ๊ตฌ 22,000/mm^3,
      ํ˜ˆ์†ŒํŒ 120,000/mm^3 ํ”„๋กœํŠธ๋กฌ๋นˆ์‹œ๊ฐ„ 20์ดˆ(์ฐธ๊ณ ์น˜, 12.7๏ฝž15.4) ํ™œ์„ฑํ™”๋ถ€๋ถ„ํŠธ๋กฌ๋ณดํ”Œ๋ผ์Šคํ‹ด์‹œ๊ฐ„ 30์ดˆ(์ฐธ๊ณ ์น˜,
      26.3๏ฝž39.4) ์ด๋‹จ๋ฐฑ์งˆ 6.5 g/dL, ์•Œ๋ถ€๋ฏผ 3.0 g/dL,์ด๋นŒ๋ฆฌ๋ฃจ๋นˆ 3.5 mg/dL, 
    sentences:
      - โ€œ์ „ํŒŒ ๊ฐ€๋Šฅ์„ฑ์ด ์ด๋ ‡๊ฒŒ ๋†’์€๋ฐ๋„ ๋‹ค๋ฅธ ์‚ฌ๋žŒ์—๊ฒŒ ์ „ํŒŒ๋ฅผ ๋งค๊ฐœํ•˜๋Š” ํ–‰์œ„๋ฅผ ํ•˜๋ฉด ํ˜•์‚ฌ์ฒ˜๋ฒŒ์„ ๋ฐ›์„ ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.โ€
      - ์‹ ์„ ๋™๊ฒฐํ˜ˆ์žฅ
      - ๋ฉดํ—ˆ์ž๊ฒฉ ์ •์ง€

SentenceTransformer based on google-bert/bert-base-multilingual-cased

This is a sentence-transformers model finetuned from google-bert/bert-base-multilingual-cased. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '58์„ธ ๋‚จ์ž๊ฐ€ 7์‹œ๊ฐ„ ์ „๋ถ€ํ„ฐ ์œ—๋ฐฐ๊ฐ€ ์•„ํŒŒ์„œ ๋ณ‘์›์— ์™”๋‹ค. ํ‰์†Œ์— ์•Œ์ฝ”์˜ฌ๊ฐ„๊ฒฝํ™”๋กœ ์น˜๋ฃŒ๋ฅผ ๋ฐ›๊ณ  ์žˆ์œผ๋ฉฐ ์†Œํ™”๊ถค์–‘์— ์˜ํ•œ ์ฒœ๊ณต์œผ๋กœ ์ˆ˜์ˆ ์„ ๋ฐ›์„ ์˜ˆ์ •์ด๋‹ค. ํ˜ˆ์•• 130/90 mmHg, ๋งฅ๋ฐ• 95ํšŒ/๋ถ„, ํ˜ธํก 22ํšŒ/๋ถ„, ์ฒด์˜จ 37.5โ„ƒ์ด๋‹ค. ๋ฐฐ ์ „์ฒด๊ฐ€ ๋”ฑ๋”ฑํ•˜๊ณ  ๋ฐฐ์— ์••ํ†ต๊ณผ ๋ฐ˜๋™์••ํ†ต์ด ์žˆ๋‹ค. ํ˜ˆ์•ก๊ฒ€์‚ฌ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค. ์ˆ˜์ˆ  ์ „ ํˆฌ์—ฌํ•ด์•ผ ํ•  ์ œ์ œ๋Š”?ํ˜ˆ์ƒ‰์†Œ 10.3 g/dL, ๋ฐฑํ˜ˆ๊ตฌ 22,000/mm^3, ํ˜ˆ์†ŒํŒ 120,000/mm^3 ํ”„๋กœํŠธ๋กฌ๋นˆ์‹œ๊ฐ„ 20์ดˆ(์ฐธ๊ณ ์น˜, 12.7๏ฝž15.4) ํ™œ์„ฑํ™”๋ถ€๋ถ„ํŠธ๋กฌ๋ณดํ”Œ๋ผ์Šคํ‹ด์‹œ๊ฐ„ 30์ดˆ(์ฐธ๊ณ ์น˜, 26.3๏ฝž39.4) ์ด๋‹จ๋ฐฑ์งˆ 6.5 g/dL, ์•Œ๋ถ€๋ฏผ 3.0 g/dL,์ด๋นŒ๋ฆฌ๋ฃจ๋นˆ 3.5 mg/dL, ',
    '์‹ ์„ ๋™๊ฒฐํ˜ˆ์žฅ',
    '๋ฉดํ—ˆ์ž๊ฒฉ ์ •์ง€',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 1,890 training samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 11 tokens
    • mean: 112.75 tokens
    • max: 316 tokens
    • min: 3 tokens
    • mean: 8.62 tokens
    • max: 33 tokens
  • Samples:
    query answer
    ํ•ญ๋ฌธ์•• ์ธก์ • ๊ฒ€์‚ฌ์—์„œ ํ•ญ๋ฌธ ์••๋ ฅ์ด ์ฆ๊ฐ€ํ•˜๋Š” ๊ฒฝ์šฐ๋Š”? ํ•ญ๋ฌธ์—ด์ฐฝ(anal fissure)
    ๋ณต๋ถ€๋Œ€๋™๋งฅ(abdominal aorta) ์—์„œ ์ฒ˜์Œ ๋ถ„์ง€(first branch) ๋˜๋Š” ๋™๋งฅ์€? ๋Œ์ž˜๋ก์ฐฝ์ž๋™๋งฅ(ileocolic artery)
    58์„ธ ๋‚จ์ž๊ฐ€ ๋Œ€๋Ÿ‰ ์žฅ์ ˆ์ œ ํ›„ ์งง์€์ฐฝ์ž์ฆํ›„๊ตฐ(short bowel syndrome) ์œผ๋กœ 4๊ฐœ์›” ๊ฐ„ ์™„์ „๋น„๊ฒฝ๊ตฌ
    ์˜์–‘์š”๋ฒ•์„ ๋ฐ›๊ณ  ์žˆ๋Š” ์ค‘์ด๋‹ค. ์ฑ„ํ˜ˆ ํ›„ ํ”ผ๊ฐ€ ์ž˜ ๋ฉŽ์ง€ ์•Š์•˜๋‹ค. ํ˜ˆ์•ก๊ฒ€์‚ฌ ๊ฒฐ๊ณผ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
    ๊ฒฐํ•์ด ์˜์‹ฌ๋˜๋Š” ๊ฒƒ์€?
    ํ˜ˆ์ƒ‰์†Œ 13.5 g/dL, ๋ฐฑํ˜ˆ๊ตฌ 4,500/mm^3, ํ˜ˆ์†ŒํŒ 220,000/mm^3
    ์•Œ๋ถ€๋ฏผ 3.7 g/dL, ์ด ๋นŒ๋ฆฌ๋ฃจ๋นˆ 1.0 mg/dL, ์•Œ์นผ๋ฆฌ ์ธ์‚ฐ๋ถ„ํ•ดํšจ์†Œ(ALP) 90 U/L,
    ์•„์ŠคํŒŒ๋ฅดํ…Œ์ดํŠธ ์•„๋ฏธ๋…ธ์ „๋‹ฌํšจ์†Œ(AST) 22 U/L, ์•Œ๋ผ๋‹Œ ์•„๋ฏธ๋…ธ์ „๋‹ฌํšจ์†Œ(ALT) 16 U/L,
    ํ”„๋กœํŠธ๋กฌ๋นˆ์‹œ๊ฐ„ 30.5์ดˆ (์ฐธ๊ณ ์น˜, 12.715.4),
    ํ™œ์„ฑํ™”๋ถ€๋ถ„ํŠธ๋กฌ๋ณดํ”Œ๋ผ์Šคํ‹ด์‹œ๊ฐ„ 34.5์ดˆ (์ฐธ๊ณ ์น˜, 26.3
    39.4)
    ํŠธ๋กฌ๋นˆ
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 164 evaluation samples
  • Columns: query and answer
  • Approximate statistics based on the first 1000 samples:
    query answer
    type string string
    details
    • min: 18 tokens
    • mean: 153.24 tokens
    • max: 369 tokens
    • min: 3 tokens
    • mean: 9.71 tokens
    • max: 40 tokens
  • Samples:
    query answer
    ๊ด‘์—ญ์‹œ ์†Œ์žฌ ๋Œ€ํ•™๋ณ‘์›์— ์†Œ์†๋œ ๋‚ด๊ณผ ์ „๋ฌธ์˜ A๊ฐ€ ์ฝœ๋ ˆ๋ผ ํ™˜์ž๋ฅผ ์ง„๋‹จํ–ˆ๋‹ค. A๊ฐ€ ํ•  ์กฐ์น˜๋Š”? ๋ณ‘์›์žฅ์—๊ฒŒ ๋ณด๊ณ 
    A๋Š” ์ œ1๊ธ‰ ๊ฐ์—ผ๋ณ‘์œผ๋กœ ์ง„๋‹จ์„ ๋ฐ›์•˜๋‹ค. B๋Š” ๋งˆ์Šคํฌ๋ฅผ ์ฐฉ์šฉํ•˜์ง€ ์•Š์€ ์ฑ„ A์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ์ ‘์ด‰ํ–ˆ๋‹ค. B๋Š” ์ฆ์ƒ์ด ์—†๋‹ค. ์—ญํ•™์กฐ์‚ฌ๊ด€์€ ์ด ๋‹จ๊ณ„์—์„œ B๋ฅผ ๋ฌด์—‡์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š”๊ฐ€? ๊ฐ์—ผ๋ณ‘ ์˜์‹ฌ์ž
    ๊ฒ€์—ญ์†Œ ๋‚ด ๊ฒฉ๋ฆฌ๋ณ‘๋™์— ๊ฒฉ๋ฆฌ๋˜์–ด ์žˆ๋˜ ์ฝœ๋ ˆ๋ผ ํ™˜์ž A์˜ ๊ฐ์—ผ๋ ฅ์ด ์—†์–ด์ง„ ๊ฒƒ์ด ํ™•์ธ๋˜์—ˆ๋‹ค. A์— ๋Œ€ํ•œ ์กฐ์น˜๋Š”? ๊ฒฉ๋ฆฌ ํ•ด์ œ
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_eval_batch_size: 16
  • learning_rate: 3e-05
  • warmup_ratio: 0.1
  • fp16: True
  • load_best_model_at_end: True
  • ddp_find_unused_parameters: False

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • learning_rate: 3e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: False
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss loss
0.1055 25 2.4397 -
0.2110 50 1.986 -
0.3165 75 1.881 -
0.4219 100 1.8105 -
0.5274 125 1.7378 -
0.6329 150 1.5942 -
0.7384 175 1.4586 -
0.8439 200 1.3904 -
0.9494 225 1.4707 -
1.0 237 - 1.3109
1.0549 250 1.234 -
1.1603 275 1.1867 -
1.2658 300 1.0103 -
1.3713 325 1.088 -
1.4768 350 1.1066 -
1.5823 375 1.049 -
1.6878 400 1.0639 -
1.7932 425 1.1133 -
1.8987 450 0.9188 -
2.0 474 - 1.0434

Framework Versions

  • Python: 3.10.14
  • Sentence Transformers: 3.0.1
  • Transformers: 4.42.2
  • PyTorch: 2.3.0
  • Accelerate: 0.31.0
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}