FinE5: Finance-Adapted Text Embedding Model This financial embedding model is fine-tuned on a synthesized finance corpus, following the training pipeline of e5-mistral-7b-instruct (Wang et al., 2023). It ranks top on FinMTEB (Feb 16, 2025), with no overlap between training data and benchmark test set.
The training data and pipeline are detailed in the paper.
‼️ Request Access
Ready to dive in? Explore the power of FinE5 with AbaciNLP, our newly launched platform tailored for financial embeddings. We’re thrilled to invite you to experience Abaci Embeddings firsthand—and to make it even better, your first 10 million tokens are on us, completely free. Sign up today and start embedding your financial text data easily!
💻 Website: https://abacinlp.com/
💡 API Tutorials: https://abacinlp.com/tutorial
🔎 Blogs: https://abacinlp.com/blog
Citation
If you find our work helpful, please cite:
@misc{tang2025finmtebfinancemassivetext,
title={FinMTEB: Finance Massive Text Embedding Benchmark},
author={Yixuan Tang and Yi Yang},
year={2025},
eprint={2502.10990},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.10990},
}
@misc{tang2024needdomainspecificembeddingmodels,
title={Do We Need Domain-Specific Embedding Models? An Empirical Investigation},
author={Yixuan Tang and Yi Yang},
year={2024},
eprint={2409.18511},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2409.18511},
}