akhooli/arabic-colbertv2-250k-norm
This Arabic ColBERT model is reasonably, but not fully, trained on 250k normalized queries sampled from the Arabic mMARCO dataset.
Training parameters are in the metadata file.
See https://www.linkedin.com/posts/akhooli_arabic-bert-tokenizers-you-may-need-to-normalize-activity-7225747473523216384-D1oH
Please note that there is another model trained (partially) on normalized 711k
dataset: akhooli/arabic-colbertv2-711k-norm.
This model should be good for ranking and retrieval but not for critical tasks. A demo example using it is the Quran Semantic Search. If you downloaded it before Aug. 6, 2024, you are advised to refresh your copy.
You need to normalize your query and document(s) for better results:
from unicodedata import normalize
query_n = normalize('NFKC', query)
- Downloads last month
- 136