--- license: apache-2.0 base_model: sentence-transformers/all-MiniLM-L6-v2 library_name: sentence-transformers pipeline_tag: sentence-similarity --- # HAI - HelpingAI Semantic Similarity Model This is a **custom Sentence Transformer model** fine-tuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). Designed as part of the **HelpingAI ecosystem**, it enhances **semantic similarity and contextual understanding**, with an emphasis on **emotionally intelligent responses**. ## Model Highlights - **Base Model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) ## Model Details ### Features: - **Input Dimensionality:** Handles up to 256 tokens per input. - **Output Dimensionality:** 384-dimensional dense embeddings. ### Full Architecture ```python SentenceTransformer( (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) (1): Pooling({'pooling_mode_mean_tokens': True}) (2): Normalize() ) ``` ## Training Overview ### Dataset: - **Size:** 75897 samples - **Structure:** `` - **Labels:** Float values between 0 (no similarity) and 1 (high similarity). ### Training Method: - **Loss Function:** Cosine Similarity Loss - **Batch Size:** 16 - **Epochs:** 20 - **Optimization:** AdamW optimizer with a learning rate of `5e-5`. ## Getting Started ### Installation Ensure you have the `sentence-transformers` library installed: ```bash pip install -U sentence-transformers ``` ### Quick Start Load and use the model in your Python environment: ```python from sentence_transformers import SentenceTransformer # Load the HelpingAI semantic similarity model model = SentenceTransformer("HelpingAI/HAI") # Encode sentences sentences = [ "A woman is slicing a pepper.", "A girl is styling her hair.", "The sun is shining brightly today." ] embeddings = model.encode(sentences) print(embeddings.shape) # Output: (3, 384) # Calculate similarity from sklearn.metrics.pairwise import cosine_similarity similarity_scores = cosine_similarity([embeddings[0]], embeddings[1:]) print(similarity_scores) ``` high accuracy in sentiment-informed response tests. ## Citation If you use the HAI model, please cite the original Sentence-BERT paper: ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```