Word2vec
/

polyglot_words_embeddings_sah

Model card Files Files and versions Community

polyglot_words_embeddings_sah / README.md

lbourdois's picture

Create README.md

1508ca5 about 1 year ago

|

history blame contribute delete

1.3 kB

	---
	tags:
	- word2vec
	language: sah
	license: gpl-3.0
	---

	## Description
	Word embedding model trained by Al-Rfou et al.


	## How to use?

	```
	import pickle
	from numpy import dot
	from numpy.linalg import norm
	from huggingface_hub import hf_hub_download
	words, embeddings = pickle.load(open(hf_hub_download(repo_id="Word2vec/polyglot_words_embeddings_en", filename="words_embeddings_en.pkl"), 'rb'),encoding="latin1")

	word = "Irish"
	a = embeddings[words.index(word)]
	most_similar = []
	for i in range(len(embeddings)):
	if i != words.index(word):
	b = embeddings[i]
	cos_sim = dot(a, b)/(norm(a)*norm(b))
	most_similar.append(cos_sim)
	else:
	most_similar.append(0)

	words[most_similar.index(max(most_similar))]
	```

	## Citation

	```
	@InProceedings{polyglot:2013:ACL-CoNLL,
	author = {Al-Rfou, Rami and Perozzi, Bryan and Skiena, Steven},
	title = {Polyglot: Distributed Word Representations for Multilingual NLP},
	booktitle = {Proceedings of the Seventeenth Conference on Computational Natural Language Learning},
	month = {August},
	year = {2013},
	address = {Sofia, Bulgaria},
	publisher = {Association for Computational Linguistics},
	pages = {183--192},
	url = {http://www.aclweb.org/anthology/W13-3520}
	}
	```