--- library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - mteb datasets: - ms_marco model-index: - name: E:\HuggingFaceDataDownloader\results\finetuned_models\2000\2000_finetune results: - task: type: Classification dataset: type: DDSC/angry-tweets name: MTEB AngryTweetsClassification config: default split: test revision: 20b0e6081892e78179356fada741b7afa381443d metrics: - type: accuracy value: 56.084049665711554 - type: f1 value: 55.198013156852625 - task: type: BitextMining dataset: type: strombergnlp/bornholmsk_parallel name: MTEB BornholmBitextMining config: default split: test revision: 3bc5cfb4ec514264fe2db5615fac9016f7251552 metrics: - type: accuracy value: 47 - type: f1 value: 37.97365079365079 - type: precision value: 34.48333333333334 - type: recall value: 47 - task: type: Classification dataset: type: danish_political_comments name: MTEB DanishPoliticalCommentsClassification config: default split: train revision: edbb03726c04a0efab14fc8c3b8b79e4d420e5a1 metrics: - type: accuracy value: 40.88398556758257 - type: f1 value: 37.604524785367076 - task: type: Classification dataset: type: DDSC/lcc name: MTEB LccSentimentClassification config: default split: test revision: de7ba3406ee55ea2cc52a0a41408fa6aede6d3c6 metrics: - type: accuracy value: 59.599999999999994 - type: f1 value: 59.0619246469949 - task: type: Classification dataset: type: strombergnlp/nordic_langid name: MTEB NordicLangClassification config: default split: test revision: e254179d18ab0165fdb6dbef91178266222bee2a metrics: - type: accuracy value: 61.00333333333333 - type: f1 value: 60.45633325804296 - task: type: Classification dataset: type: ScandEval/scala-da name: MTEB ScalaDaClassification config: default split: test revision: 1de08520a7b361e92ffa2a2201ebd41942c54675 metrics: - type: accuracy value: 50.43457031250001 - type: ap value: 50.22017546538257 - type: f1 value: 50.03426509926491 --- # e5-dansk-test This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 1024 dimensional dense vector space and can be used for tasks like clustering or semantic search. The model was trained by MS-MARCO english dataset machine translated into the danish language to test whether Machine translation high quality datasets to a foreign language produces good results ## Usage (Sentence-Transformers) Using this model becomes easy when you have [sentence-transformers](https://www.SBERT.net) installed: ``` pip install -U sentence-transformers ``` Then you can use the model like this: ```python from sentence_transformers import SentenceTransformer sentences = ["Dette er en dansk sætning", "Dette er en også en dansk sætning"] model = SentenceTransformer('Jechto/e5-dansk-test-0.1') embeddings = model.encode(sentences) print(embeddings) ``` ## Training The model was trained with the parameters: **DataLoader**: `sentence_transformers.datasets.NoDuplicatesDataLoader.NoDuplicatesDataLoader` of length 10327 with parameters: ``` {'batch_size': 16} ``` **Loss**: `sentence_transformers.losses.MultipleNegativesRankingLoss.MultipleNegativesRankingLoss` with parameters: ``` {'scale': 20.0, 'similarity_fct': 'cos_sim'} ``` Parameters of the fit()-Method: ``` { "epochs": 1, "evaluation_steps": 2000, "evaluator": "sentence_transformers.evaluation.BinaryClassificationEvaluator.BinaryClassificationEvaluator", "max_grad_norm": 1, "optimizer_class": "", "optimizer_params": { "lr": 1e-05 }, "scheduler": "warmupconstant", "steps_per_epoch": null, "warmup_steps": 10000, "weight_decay": 0.01 } ``` ## Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False}) (2): Normalize() ) ``` ## Citing & Authors