eubert_embedding_v1 / README.md
scampion's picture
Update README.md
963b222
metadata
license: eupl-1.1

👷‍♂️ Work in progress

EUBERT Embedding v1

Based on the masked language model EUBERT this sentence transformer will allow to compute embeddings on various EU documents in 24 languages.

  • Number of dimensions: 768
  • Pre-trained model: EUBERT
  • Finetuned dataset: AllNLI
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('EuropeanParliament/eubert_embedding_v1')

vector = model.encode("Based on the masked language model EUBERT this sentence transformer will allow to compute embeddings on various EU documents in 24 languages.")

Evaluation and benchmarking are welcome

task dataset name config split revision accuracy ap f1 v_measure
Classification mteb/amazon_counterfactual MTEB AmazonCounterfactualClassification (en) en test e8379541af4e31359cca9fbcf4b00f2671dba205 65.46268656716417 28.448646125211685 59.381505835828655
Classification mteb/amazon_polarity MTEB AmazonPolarityClassification default test e2d317d38cd51312af73b3d32a06d1a08b442046 66.46035 61.29404861567824 66.33660156778977
Classification mteb/amazon_reviews_multi MTEB AmazonReviewsClassification (en) en test 1399c76144fd37290681b995c656ef9b2e06e26d 33.002 32.703439998458286
Clustering mteb/arxiv-clustering-p2p MTEB ArxivClusteringP2P default test a122ad7f3f0291bf49cc6f4d32aa80929df69d5d 26.726296122407874
Classification mteb/banking77 MTEB Banking77Classification default test 0fd18e25b25c072e09e0d92ab615fda904d66300 72.07792207792207 72.00698905672714
Classification mteb/emotion MTEB EmotionClassification default test 4f58c6b202a23cf9a4da393831edf4f9183cad37 25.45 22.489051015009604
Classification mteb/imdb MTEB ImdbClassification default test 3d86128a09e091d6018b6d26cad27f2739fc2db7 61.0288 56.84210754735158 60.72244426285243
Classification mteb/mtop_domain MTEB MTOPDomainClassification (en) en test d80d48c1eb48d3562165c59d59d0034df9fff0bf 78.63657090743274 77.33756273016937
Classification mteb/mtop_domain MTEB MTOPDomainClassification (de) de test d80d48c1eb48d3562165c59d59d0034df9fff0bf 67.63313609467455 65.31424834681424
Classification mteb/mtop_domain MTEB MTOPDomainClassification (es) es test d80d48c1eb48d3562165c59d59d0034df9fff0bf 72.03468979319545 70.33858350063844
Classification mteb/mtop_domain MTEB MTOPDomainClassification (fr) fr test d80d48c1eb48d3562165c59d59d0034df9fff0bf 69.33604760413404 67.2763398514464
Classification mteb/mtop_domain MTEB MTOPDomainClassification (hi) hi test d80d48c1eb48d3562165c59d59d0034df9fff0bf 19.336679813553243 17.640206592911305
Classification mteb/mtop_domain MTEB MTOPDomainClassification (th) th test d80d48c1eb48d3562165c59d59d0034df9fff0bf 14.958408679927668 12.200892995648038
Classification mteb/mtop_intent MTEB MTOPIntentClassification (en) en test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 53.504331965344285 37.650916452762054
Classification mteb/mtop_intent MTEB MTOPIntentClassification (de) de test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 52.8007889546351 35.18483837593346
Classification mteb/mtop_intent MTEB MTOPIntentClassification (es) es test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 53.268845897264846 37.54041476398511
Classification mteb/mtop_intent MTEB MTOPIntentClassification (fr) fr test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 47.59160663952396 33.779636915265606
Classification mteb/mtop_intent MTEB MTOPIntentClassification (hi) hi test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 4.180709931875224 2.240473672484894
Classification mteb/mtop_intent MTEB MTOPIntentClassification (th) th test ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba 4.1482820976491865 2.2953415174353546
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (af) af test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 43.843308675184936 42.83274171307546
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (am) am test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 8.459986550100874 8.56499841559428
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ar) ar test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 24.37457969065232 23.648464353469087
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (az) az test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 43.61129791526564 43.02872726206446
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (bn) bn test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 3.127101546738399 1.7632874555194573
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (cy) cy test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 39.882313382649635 39.09054995553107
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (da) da test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 49.05514458641561 47.97657474719148
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (de) de test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 47.723604572965705 46.266605736862424
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (el) el test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 49.2871553463349 49.110660419740945
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (en) en test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 54.80833893745797 53.43307984316261
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (es) es test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 48.73234700739745 48.290537885757345
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (fa) fa test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 34.60322797579018 33.21866171174647
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (fi) fi test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 47.09818426361803 46.24034140543536
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (fr) fr test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 47.92871553463349 47.2879827826325
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (he) he test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 23.429724277067923 22.973698726459283
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (hi) hi test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 3.1909885675857437 2.343483452751791
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (hu) hu test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 46.529926025554815 45.585210075220026
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (hy) hy test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 33.00605245460659 32.53906554922222
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (id) id test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 44.70073974445191 44.63772874280639
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (is) is test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 42.56556825823806 42.09519069412614
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (it) it test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 49.45191661062542 49.73648735452711
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ja) ja test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 36.03227975790181 34.81337003018146
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (jv) jv test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 39.85205110961668 39.16645932365053
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ka) ka test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 29.84532616005381 30.048107009813975
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (km) km test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 5.4942837928715536 3.9402294020821236
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (kn) kn test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 3.5541358439811694 2.3408708229868385
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ko) ko test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 31.055817081371888 30.54791134524761
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (lv) lv test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 48.44989912575656 47.46077758238515
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ml) ml test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 2.89172831203766 1.1484871860887453
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (mn) mn test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 38.924008069939475 38.953938082398274
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (ms) ms test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 43.25151311365165 42.31124560201582
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (my) my test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 3.5137861466039007 1.7087643302156377
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (nb) nb test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 45.34633490248823 44.7188441016561
Classification mteb/amazon_massive_intent MTEB MassiveIntentClassification (nl) nl test 31efe3c427b0bae9c22cbb560b8f15491cc6bed7 47.25285810356422 45.442034061197944

Author : sebastien.campion@europarl.europa.eu

Contributor(s):

  • Dominik Skotarczak (benchmark)