YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
BPE based tokenizer used for the MEHDIE project and the training of a bilingual BERT model.
Vocabulary size: 52000 Trained on:
- Arabic dataset: https://huggingface.co/datasets/bigscience-data/roots_ar_openiti_proc
- Hebrew/English dataset: https://huggingface.co/datasets/mehdie/sefaria
Examples: Hebrew:
- "ืื ืืกืคืจ ืืืืืจ ืืืืจืื ืฉืกืคืจ ืืืฉ ืืื ืืืจืฅ ื ืืืจื ืฉืฉืื ืจืื ืื ืืืื ืืจ ืืื ื ืืืืืืื. ืืืื ืืืื ืืืื ืืืจืฆืืช ืจืืืช ืืจืืืงืืช ืืืฉืจ ืืชืคืจืฉ ืืืืจืื ืืื ืืืื ืืงืื ืฉืื ืื ืืชื ืื ืืืืจืื ืฉืจืื ืื ืฉืฉืืข ืืคื ืื ืฉื ืืืช ืืฉืจ ื ืฉืืขื ืืืจืฅ ืกืคืจื: ืืื ืืื ืืืืจ ืืงืฆืช ืืืืืืื ืืื ืฉืืืื ืฉืืืงืฆืช ืืงืืืืช ืืืฉืื ืืืื ืืืจืื ืืื ืขืื ืืืจืฅ ืงืฉืืืืื ืืฉื ืช ืชืชืงื"
- {'input_ids': [1060, 15784, 20958, 31767, 476, 4398, 3294, 1812, 19949, 42648, 455, 38010, 2069, 23008, 978, 11894, 3509, 8222, 973, 26, 23816, 8043, 461, 19170, 2998, 6517, 4245, 960, 5536, 928, 4122, 1008, 2643, 16456, 2702, 10350, 1796, 3044, 1333, 1488, 1019, 5501, 15530, 1109, 26822, 8473, 11437, 5419, 1919, 467, 13163, 6566, 4398, 454, 38, 7922, 1203, 41248, 9907, 21722, 1001, 16464, 931, 1123, 9907, 9647, 1053, 3044, 4553, 3573, 2851, 4088, 9330, 3492, 18352, 1057, 23994, 32635, 463], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Arabic:
- "ุณูุณูุฉ ุงูุฃุฌุฒุงุก ูุงููุชุจ ุงูุญุฏูุซูุฉ ุงูููุงุฆุฏ ูุงูุฃุฎุจุงุฑ ูุงูุญูุงูุงุช ุนู ุงูุดุงูุนู ูุญุงุชู ุงูุฃุตู ูู ุนุฑูู ุงููุฑุฎู ูุบูุฑูู ููู ุญุฏุซ ุงููููู ุฃุจู ุนูู ุงูุญุณู ุจู ุงูุญุณูู ุจู ุญู ูุงู ุงููู ุฐุงูู ุงูุดุงูุนู ุฏุฑุงุณุฉ ูุชุญููู ูุชุนููู ุงูุทุจุนุฉ ุงูุฃููู ุงูุฌุฒุก ุงูุฃูู ู ู ุงูููุงุฆุฏ ูุงูุฃุฎุจุงุฑ ูุงูุญูุงูุงุช ุนู ุงูุดุงูุนู ูุญุงุชู ุงูุฃุตู ูู ุนุฑูู ุงููุฑุฎู ูุบูุฑูู ุฑุถู ุงููู ุนููู ุฑูุงูุฉ"
- {'input_ids': [27193, 15595, 34780, 1361, 949, 13852, 21459, 2169, 30440, 896, 2040, 41252, 9723, 50442, 16317, 3057, 1675, 1216, 3320, 958, 910, 1260, 888, 1532, 888, 912, 935, 13333, 2040, 36093, 22637, 49937, 16554, 2254, 4572, 1576, 890, 13852, 21459, 2169, 30440, 896, 2040, 41252, 9723, 50442, 16317, 3057, 1432, 904, 2710, 1933], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
English:
- "The medieval Arabic name of the northernmost of the three provinces of the Jazira, the other two being Diyar Mudar and Diyar Rabi'a"
- {'input_ids': [2034, 16522, 4490, 1270, 22040, 1837, 2340, 7960, 1183, 989, 10048, 2068, 90, 13377, 1183, 989, 8235, 14261, 1021, 7322, 1183, 989, 54, 18017, 17311, 24, 989, 3249, 5269, 8500, 48, 17821, 1294, 57, 3307, 1294, 1261, 48, 17821, 1294, 26438, 85, 19, 77], 'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
HF Inference deployability: The model has no library tag.