|
--- |
|
license: apache-2.0 |
|
base_model: sentence-transformers/all-MiniLM-L6-v2 |
|
library_name: sentence-transformers |
|
pipeline_tag: sentence-similarity |
|
--- |
|
# HAI - HelpingAI Semantic Similarity Model |
|
|
|
This is a **custom Sentence Transformer model** fine-tuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Designed as part of the **HelpingAI ecosystem**, it enhances **semantic similarity and contextual understanding**, with an emphasis on **emotionally intelligent responses**. |
|
|
|
## Model Highlights |
|
|
|
- **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) |
|
|
|
## Model Details |
|
|
|
### Features: |
|
- **Input Dimensionality:** Handles up to 256 tokens per input. |
|
- **Output Dimensionality:** 384-dimensional dense embeddings. |
|
|
|
### Full Architecture |
|
```python |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) |
|
(1): Pooling({'pooling_mode_mean_tokens': True}) |
|
(2): Normalize() |
|
) |
|
``` |
|
|
|
|
|
## Training Overview |
|
|
|
### Dataset: |
|
- **Size:** 75897 samples |
|
- **Structure:** `<sentence_0, sentence_1, similarity_score>` |
|
- **Labels:** Float values between 0 (no similarity) and 1 (high similarity). |
|
|
|
### Training Method: |
|
- **Loss Function:** Cosine Similarity Loss |
|
- **Batch Size:** 16 |
|
- **Epochs:** 20 |
|
- **Optimization:** AdamW optimizer with a learning rate of `5e-5`. |
|
|
|
## Getting Started |
|
|
|
### Installation |
|
Ensure you have the `sentence-transformers` library installed: |
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
### Quick Start |
|
Load and use the model in your Python environment: |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Load the HelpingAI semantic similarity model |
|
model = SentenceTransformer("HelpingAI/HAI") |
|
|
|
# Encode sentences |
|
sentences = [ |
|
"A woman is slicing a pepper.", |
|
"A girl is styling her hair.", |
|
"The sun is shining brightly today." |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) # Output: (3, 384) |
|
|
|
# Calculate similarity |
|
from sklearn.metrics.pairwise import cosine_similarity |
|
similarity_scores = cosine_similarity([embeddings[0]], embeddings[1:]) |
|
print(similarity_scores) |
|
``` |
|
high accuracy in sentiment-informed response tests. |
|
|
|
## Citation |
|
|
|
If you use the HAI model, please cite the original Sentence-BERT paper: |
|
|
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |