--- language: en license: apache-2.0 library_name: sentence-transformers tags: - sentence-transformers - feature-extraction - sentence-similarity - transformers pipeline_tag: sentence-similarity --- Distilbert encoder models trained on Amazon product-to-product recommendation dataset (LF-AmazonTitles-1.3M) using the [DEXML](https://github.com/nilesh2797/DEXML) ([Dual Encoder for eXtreme Multi-Label classification, ICLR'24](https://arxiv.org/pdf/2310.10636v2.pdf)) method. ## Inference Usage (Sentence-Transformers) With `sentence-transformers` installed you can use this model as following: ```python from sentence_transformers import SentenceTransformer sentences = ["This is an example sentence", "Each sentence is converted"] model = SentenceTransformer('quicktensor/dexml_lf-amazontitles-1.3m') embeddings = model.encode(sentences) print(embeddings) ``` ## Usage (HuggingFace Transformers) With huggingface transformers you only need to be a bit careful with how you pool the transformer output to get the embedding, you can use this model as following; ```python from transformers import AutoTokenizer, AutoModel pooler = lambda x: x[:, 0, :] # Choose CLS token sentences = ["This is an example sentence", "Each sentence is converted"] tokenizer = AutoTokenizer.from_pretrained('quicktensor/dexml_lf-amazontitles-1.3m') model = AutoModel.from_pretrained('quicktensor/dexml_lf-amazontitles-1.3m') encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt') with torch.no_grad(): embeddings = pooler(model(**encoded_input)) print(embeddings) ``` ## Cite If you found this model helpful, please cite our work as: ```bib @InProceedings{DEXML, author = "Gupta, N. and Khatri, D. and Rawat, A-S. and Bhojanapalli, S. and Jain, P. and Dhillon, I.", title = "Dual-encoders for Extreme Multi-label Classification", booktitle = "International Conference on Learning Representations", month = "May", year = "2024" } ```