LLMXperts
/

Arabic-Triplet-Matryoshka-V2

@@ -19,11 +19,6 @@ tags:
 license: apache-2.0
 ---
-# Arabic Triplet Matryoshka V2 Model [ATM2]
-![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/FrLQzFUJ3grEUOdONWGME.png)
 ## Model Description
 Arabic-Triplet-Matryoshka-V2-Model is a state-of-the-art Arabic language embedding model based on the [sentence-transformers](https://www.SBERT.net) framework. It is fine-tuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) and specifically designed to capture the rich semantic nuances of Arabic text.
@@ -37,67 +32,6 @@ This model maps sentences and paragraphs to a 768-dimensional dense vector space
 - Information retrieval
 - Question answering
-## Key Features
-- **State-of-the-Art Performance**: Achieved 0.85 on STS17 and 0.64 on STS22.v2 with an average score of 74.5, making it the leading Arabic embedding model currently available.
-- **MatryoshkaLoss Training**: Utilizes nested embedding learning techniques to create hierarchical embeddings at multiple resolutions.
-- **Optimization**: Trained for 3 epochs with a final training loss of 0.718.
-- **Full Arabic Language Support**: Designed specifically to handle the complexity and morphological richness of Arabic language.
-## Training Details
-The model was trained using a combination of two loss functions:
-- **MatryoshkaLoss**: Enables the creation of nested embeddings at multiple resolutions, allowing for efficient and adaptable representations.
-- **MultipleNegativesRankingLoss**: Enhances the model's ability to discriminate between semantically similar and dissimilar text pairs.
-Training parameters:
-- **Base model**: aubmindlab/bert-base-arabertv02
-- **Dataset**: akhooli/arabic-triplets-1m-curated-sims-len (1M samples)
-- **Epochs**: 3
-- **Final Loss**: 0.718
-- **Embedding Dimension**: 768
-## Performance
-The model demonstrates exceptional performance on standard Arabic semantic textual similarity benchmarks:
-- **STS17**: 0.85
-- **STS22.v2**: 0.64
-- **Average Performance**: 74.5
-This represents the current state-of-the-art for Arabic embedding models, outperforming previous approaches by a significant margin.
-## Use Cases
-This model is particularly well-suited for:
-- **Information Retrieval**: Enhancing search capabilities for Arabic content.
-- **Document Similarity**: Identifying similar documents or text passages.
-- **Text Classification**: Powering classification systems for Arabic content.
-- **Question Answering**: Supporting Arabic QA systems with improved semantic understanding.
-- **Semantic Clustering**: Organizing Arabic text data based on meaning.
-- **Cross-lingual Applications**: When combined with other language models for multilingual applications.
-## Usage Examples
-```python
-from sentence_transformers import SentenceTransformer
-# Download from the 🤗 Hub
-model = SentenceTransformer("Omartificial-Intelligence-Space/Arabic-Triplet-Matryoshka-V2")
-# Run inference
-sentences = [
-    'SENTENCE 1',
-    'SENTENCE 2',
-    'SENTENCE 3',
-]
-embeddings = model.encode(sentences)
-print(embeddings.shape)
-# [3, 768]
-# Get the similarity scores for the embeddings
-similarities = model.similarity(embeddings, embeddings)
-print(similarities.shape)
-# [3, 3]
-```
 ## Limitations
@@ -110,20 +44,3 @@ Despite its strong performance, users should be aware of the following limitatio
 This model is intended for research and applications that benefit Arabic language processing. Users should be mindful of potential biases that may exist in the training data and the resulting embeddings. We encourage responsible use of this technology and welcome feedback on ways to improve fairness and representation.
-## Citation
-If you use the Arabic Matryoshka Embeddings Model in your research or applications, please cite it as follows:
-```bibtex
-@article{nacar2024enhancing,
-  title={Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning},
-  author={Nacar, Omer and Koubaa, Anis},
-  journal={arXiv preprint arXiv:2407.21139},
-  year={2024}
-}
-```
-## Acknowledgements
-We would like to acknowledge [AraBERT](https://github.com/aub-mind/arabert) for the base model and [akhooli](https://huggingface.co/akhooli) for the valuable dataset that made this work possible.

 license: apache-2.0
 ---
 ## Model Description
 Arabic-Triplet-Matryoshka-V2-Model is a state-of-the-art Arabic language embedding model based on the [sentence-transformers](https://www.SBERT.net) framework. It is fine-tuned from [aubmindlab/bert-base-arabertv02](https://huggingface.co/aubmindlab/bert-base-arabertv02) and specifically designed to capture the rich semantic nuances of Arabic text.
 - Information retrieval
 - Question answering
 ## Limitations
 This model is intended for research and applications that benefit Arabic language processing. Users should be mindful of potential biases that may exist in the training data and the resulting embeddings. We encourage responsible use of this technology and welcome feedback on ways to improve fairness and representation.