--- datasets: - armvectores/hy_wikipedia_2023 pipeline_tag: feature-extraction language: - hy library_name: fasttext --- 414M tokens 1) 73M hy wikipedia 2) 341M arlis database 74951 unique words 3-5 ngrams 5 window length 300 embedding dim skipgram minimum number of words 150 100 epochs, 0.05 start lr 26 hours on 20 xeon gold cores How to use 1) Install fastText ``` pip install fasttext-wheel ``` 2) Import fastText in python ``` import fasttext from huggingface_hub import hf_hub_download model_path = hf_hub_download(local_dir=".", repo_id="armvectores/wikipedia_arlis_tokens_fasttextskipgram_300_5", filename="model.bin") model = fasttext.load_model(model_path) ``` 3) Examples of usage ``` word = 'զենքեր' print(model.get_nearest_neighbors(word)) print(model.get_sentence_vector(word)) ```