--- base_model: Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 datasets: - Omartificial-Intelligence-Space/Arabic-stsb - Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class language: - ar library_name: sentence-transformers metrics: - pearson_cosine - spearman_cosine - pearson_manhattan - spearman_manhattan - pearson_euclidean - spearman_euclidean - pearson_dot - spearman_dot - pearson_max - spearman_max - mteb pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction - generated_from_trainer - dataset_size:947818 - loss:SoftmaxLoss - loss:CosineSimilarityLoss - transformers widget: - source_sentence: امرأة تكتب شيئاً sentences: - مراهق يتحدث إلى فتاة عبر كاميرا الإنترنت - امرأة تقطع البصل الأخضر. - مجموعة من كبار السن يتظاهرون حول طاولة الطعام. - source_sentence: تتشكل النجوم في مناطق تكوين النجوم، والتي تنشأ نفسها من السحب الجزيئية. sentences: - لاعب كرة السلة على وشك تسجيل نقاط لفريقه. - المقال التالي مأخوذ من نسختي من "أطلس البطريق الجديد للتاريخ الوسطى" - قد يكون من الممكن أن يوجد نظام شمسي مثل نظامنا خارج المجرة - source_sentence: >- تحت السماء الزرقاء مع الغيوم البيضاء، يصل طفل لمس مروحة طائرة واقفة على حقل من العشب. sentences: - امرأة تحمل كأساً - طفل يحاول لمس مروحة طائرة - اثنان من عازبين عن الشرب يستعدون للعشاء - source_sentence: رجل في منتصف العمر يحلق لحيته في غرفة ذات جدران بيضاء والتي لا تبدو كحمام sentences: - فتى يخطط اسمه على مكتبه - رجل ينام - المرأة وحدها وهي نائمة في غرفة نومها - source_sentence: الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة. sentences: - شخص طويل القامة - المرأة تنظر من النافذة. - لقد مات الكلب model-index: - name: >- SentenceTransformer based on Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2 results: - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts dev type: sts-dev metrics: - type: pearson_cosine value: 0.8390853221830158 name: Pearson Cosine - type: spearman_cosine value: 0.8410008255002589 name: Spearman Cosine - type: pearson_manhattan value: 0.8276538954353795 name: Pearson Manhattan - type: spearman_manhattan value: 0.8360889200075982 name: Spearman Manhattan - type: pearson_euclidean value: 0.8274021671008013 name: Pearson Euclidean - type: spearman_euclidean value: 0.8357887501417183 name: Spearman Euclidean - type: pearson_dot value: 0.8154259766643255 name: Pearson Dot - type: spearman_dot value: 0.81802827956939 name: Spearman Dot - type: pearson_max value: 0.8390853221830158 name: Pearson Max - type: spearman_max value: 0.8410008255002589 name: Spearman Max - task: type: semantic-similarity name: Semantic Similarity dataset: name: sts test type: sts-test metrics: - type: pearson_cosine value: 0.8130046542366043 name: Pearson Cosine - type: spearman_cosine value: 0.8172511596569861 name: Spearman Cosine - type: pearson_manhattan value: 0.8113865863454744 name: Pearson Manhattan - type: spearman_manhattan value: 0.8164081961542164 name: Spearman Manhattan - type: pearson_euclidean value: 0.810311097439534 name: Pearson Euclidean - type: spearman_euclidean value: 0.8157654465052717 name: Spearman Euclidean - type: pearson_dot value: 0.7907732563794702 name: Pearson Dot - type: spearman_dot value: 0.7886749863194292 name: Spearman Dot - type: pearson_max value: 0.8130046542366043 name: Pearson Max - type: spearman_max value: 0.8172511596569861 name: Spearman Max license: apache-2.0 --- # GATE-AraBert-v1 This is a General Arabic Text Embedding trained using SentenceTransformers in a multi-task setup. The system trains on the AllNLI and on the STS dataset. ## Model Details ### Model Description - **Model Type:** Sentence Transformer - **Base model:** [Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2](https://huggingface.co/Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2) - **Maximum Sequence Length:** 512 tokens - **Output Dimensionality:** 768 tokens - **Similarity Function:** Cosine Similarity - **Training Datasets:** - [all-nli](https://huggingface.co/datasets/Omartificial-Intelligence-Space/Arabic-NLi-Pair-Class) - [sts](https://huggingface.co/datasets/Omartificial-Intelligence-Space/arabic-stsb) - **Language:** ar ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("Omartificial-Intelligence-Space/GATE-AraBert-v1") # Run inference sentences = [ 'الكلب البني مستلقي على جانبه على سجادة بيج، مع جسم أخضر في المقدمة.', 'لقد مات الكلب', 'شخص طويل القامة', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 768] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Evaluation ### Metrics #### Semantic Similarity * Dataset: `sts-dev` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:----------| | pearson_cosine | 0.8391 | | **spearman_cosine** | **0.841** | | pearson_manhattan | 0.8277 | | spearman_manhattan | 0.8361 | | pearson_euclidean | 0.8274 | | spearman_euclidean | 0.8358 | | pearson_dot | 0.8154 | | spearman_dot | 0.818 | | pearson_max | 0.8391 | | spearman_max | 0.841 | #### Semantic Similarity * Dataset: `sts-test` * Evaluated with [EmbeddingSimilarityEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator) | Metric | Value | |:--------------------|:-----------| | pearson_cosine | 0.813 | | **spearman_cosine** | **0.8173** | | pearson_manhattan | 0.8114 | | spearman_manhattan | 0.8164 | | pearson_euclidean | 0.8103 | | spearman_euclidean | 0.8158 | | pearson_dot | 0.7908 | | spearman_dot | 0.7887 | | pearson_max | 0.813 | | spearman_max | 0.8173 | ## Citation ### BibTeX #### Sentence Transformers and SoftmaxLoss ```bibtex @inproceedings{reimers-2019-sentence-bert, title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", author = "Reimers, Nils and Gurevych, Iryna", booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", month = "11", year = "2019", publisher = "Association for Computational Linguistics", url = "https://arxiv.org/abs/1908.10084", } ```