File size: 2,756 Bytes
e625b2f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
---
license: apache-2.0
base_model: sentence-transformers/all-MiniLM-L6-v2
library_name: sentence-transformers
pipeline_tag: sentence-similarity
---
# HAI - HelpingAI Semantic Similarity Model

This is a **custom Sentence Transformer model** fine-tuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Designed as part of the **HelpingAI ecosystem**, it enhances **semantic similarity and contextual understanding**, with an emphasis on **emotionally intelligent responses**.

## Model Highlights

- **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)

## Model Details

### Features:
- **Input Dimensionality:** Handles up to 256 tokens per input.
- **Output Dimensionality:** 384-dimensional dense embeddings.

### Full Architecture
```python
SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) 
  (1): Pooling({'pooling_mode_mean_tokens': True})
  (2): Normalize()
)
```


## Training Overview

### Dataset:
- **Size:** 75897 samples
- **Structure:** `<sentence_0, sentence_1, similarity_score>`
- **Labels:** Float values between 0 (no similarity) and 1 (high similarity).

### Training Method:
- **Loss Function:** Cosine Similarity Loss
- **Batch Size:** 16
- **Epochs:** 20
- **Optimization:** AdamW optimizer with a learning rate of `5e-5`.

## Getting Started

### Installation
Ensure you have the `sentence-transformers` library installed:
```bash
pip install -U sentence-transformers
```

### Quick Start
Load and use the model in your Python environment:
```python
from sentence_transformers import SentenceTransformer

# Load the HelpingAI semantic similarity model
model = SentenceTransformer("HelpingAI/HAI")

# Encode sentences
sentences = [
    "A woman is slicing a pepper.",
    "A girl is styling her hair.",
    "The sun is shining brightly today."
]
embeddings = model.encode(sentences)
print(embeddings.shape)  # Output: (3, 384)

# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_scores = cosine_similarity([embeddings[0]], embeddings[1:])
print(similarity_scores)
```
 high accuracy in sentiment-informed response tests.

## Citation

If you use the HAI model, please cite the original Sentence-BERT paper:

```bibtex
@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}
```