louisbrulenaudet
commited on
Commit
•
7fa08f1
1
Parent(s):
6ab9645
Update README.md
Browse files
README.md
CHANGED
@@ -358,6 +358,10 @@ datasets:
|
|
358 |
|
359 |
# Lemone-Embed: A Series of Fine-Tuned Embedding Models for French Taxation
|
360 |
|
|
|
|
|
|
|
|
|
361 |
This sentence transformers model, specifically designed for French taxation, has been fine-tuned on a dataset comprising 43 million tokens, integrating a blend of semi-synthetic and fully synthetic data generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation.
|
362 |
|
363 |
The model is tailored to meet the specific demands of information retrieval across large-scale tax-related corpora, supporting the implementation of production-ready Retrieval-Augmented Generation (RAG) applications. Its primary purpose is to enhance the efficiency and accuracy of legal processes in the taxation domain, with an emphasis on delivering consistent performance in real-world settings, while also contributing to advancements in legal natural language processing research.
|
|
|
358 |
|
359 |
# Lemone-Embed: A Series of Fine-Tuned Embedding Models for French Taxation
|
360 |
|
361 |
+
<div class="not-prose bg-gradient-to-r from-gray-50-to-white text-gray-900 border" style="border-radius: 8px; padding: 0.5rem 1rem;">
|
362 |
+
<p>This series is made up of 7 models, 3 basic models of different sizes trained on 1 epoch, 3 models trained on 2 epochs making up the Boost series and a Pro model with a non-Roberta architecture.</p>
|
363 |
+
</div>
|
364 |
+
|
365 |
This sentence transformers model, specifically designed for French taxation, has been fine-tuned on a dataset comprising 43 million tokens, integrating a blend of semi-synthetic and fully synthetic data generated by GPT-4 Turbo and Llama 3.1 70B, which have been further refined through evol-instruction tuning and manual curation.
|
366 |
|
367 |
The model is tailored to meet the specific demands of information retrieval across large-scale tax-related corpora, supporting the implementation of production-ready Retrieval-Augmented Generation (RAG) applications. Its primary purpose is to enhance the efficiency and accuracy of legal processes in the taxation domain, with an emphasis on delivering consistent performance in real-world settings, while also contributing to advancements in legal natural language processing research.
|