HAI / README.md
Abhaykoul's picture
Update README.md
4c45c5b verified
|
raw
history blame
2.76 kB
---
license: apache-2.0
base_model: sentence-transformers/all-MiniLM-L6-v2
library_name: sentence-transformers
pipeline_tag: sentence-similarity
---
# HAI - HelpingAI Semantic Similarity Model
This is a **custom Sentence Transformer model** fine-tuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Designed as part of the **HelpingAI ecosystem**, it enhances **semantic similarity and contextual understanding**, with an emphasis on **emotionally intelligent responses**.
## Model Highlights
- **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)
## Model Details
### Features:
- **Input Dimensionality:** Handles up to 256 tokens per input.
- **Output Dimensionality:** 384-dimensional dense embeddings.
### Full Architecture
```python
SentenceTransformer(
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False})
(1): Pooling({'pooling_mode_mean_tokens': True})
(2): Normalize()
)
```
## Training Overview
### Dataset:
- **Size:** 75897 samples
- **Structure:** `<sentence_0, sentence_1, similarity_score>`
- **Labels:** Float values between 0 (no similarity) and 1 (high similarity).
### Training Method:
- **Loss Function:** Cosine Similarity Loss
- **Batch Size:** 16
- **Epochs:** 20
- **Optimization:** AdamW optimizer with a learning rate of `5e-5`.
## Getting Started
### Installation
Ensure you have the `sentence-transformers` library installed:
```bash
pip install -U sentence-transformers
```
### Quick Start
Load and use the model in your Python environment:
```python
from sentence_transformers import SentenceTransformer
# Load the HelpingAI semantic similarity model
model = SentenceTransformer("HelpingAI/HAI")
# Encode sentences
sentences = [
"A woman is slicing a pepper.",
"A girl is styling her hair.",
"The sun is shining brightly today."
]
embeddings = model.encode(sentences)
print(embeddings.shape) # Output: (3, 384)
# Calculate similarity
from sklearn.metrics.pairwise import cosine_similarity
similarity_scores = cosine_similarity([embeddings[0]], embeddings[1:])
print(similarity_scores)
```
high accuracy in sentiment-informed response tests.
## Citation
If you use the HAI model, please cite the original Sentence-BERT paper:
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```