anastasiastasenko
commited on
Commit
•
2c3d7aa
1
Parent(s):
1bbbedf
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,7 @@ language:
|
|
14 |
---
|
15 |
**Pleias-360m-RAG 0.1** is a specialized language model designed by PleIAs for Retrieval-Augmented Generation.
|
16 |
|
17 |
-
Similarly to its base model, Pleias-360m, Pleias-360m-RAG 0.1 aims to be a fully open model (weights, code, data), only trained on content with a permissible license and fully compliant with the
|
18 |
|
19 |
## Description
|
20 |
PleIAs-360m-RAG is continuous pretraining of Pleias-360m on a new dataset of 45,088,768,000 tokens modeling common retrieval tasks. All the content of the dataset is ultimately coming from Common Corpus.
|
@@ -25,7 +25,7 @@ Pleias-360m-RAG includes the main features of the original base model:
|
|
25 |
* Extremely low level of toxicity and problematic content.
|
26 |
|
27 |
Pleias-360m-RAG supports retrieval-augmented generation with enhanced verifiability, source analysis and grounding on submitted sources. This includes:
|
28 |
-
* Standardized structure and special tokens to include queries, sources, .
|
29 |
* Anticipation of various query forms in multiple languages, from actual drafted questions to unstructured list of keyword search.
|
30 |
* Source analysis/criticism which also acts as an integrated reranker step.
|
31 |
* Generation of ground answers with references and excerpts linked to the original sources.
|
@@ -34,7 +34,7 @@ Given its small size, Pleias-360m-RAG 0.1 was originally conceived as an experim
|
|
34 |
|
35 |
Initial tests have shown that the RAG design has significantly improved the factuality and verifiability of the model. Even when the grounding does not work perfectly, the information remains much closer to the original sources.
|
36 |
|
37 |
-
As a result, Pleias-360m-RAG 0.1 has been already tested and integrated into multiple applied RAG projects, including Pleias flagship application
|
38 |
|
39 |
## Training
|
40 |
PleIAs-360m-RAG was trained at Jean-Zay with 16 h100s with Nanotron, the pretraining library from HuggingFace. We provide the complete settings as a yaml file as part of our release.
|
|
|
14 |
---
|
15 |
**Pleias-360m-RAG 0.1** is a specialized language model designed by PleIAs for Retrieval-Augmented Generation.
|
16 |
|
17 |
+
Similarly to its base model, Pleias-360m, Pleias-360m-RAG 0.1 aims to be a fully open model (weights, code, data), only trained on content with a permissible license and fully compliant with the European AI Act.
|
18 |
|
19 |
## Description
|
20 |
PleIAs-360m-RAG is continuous pretraining of Pleias-360m on a new dataset of 45,088,768,000 tokens modeling common retrieval tasks. All the content of the dataset is ultimately coming from Common Corpus.
|
|
|
25 |
* Extremely low level of toxicity and problematic content.
|
26 |
|
27 |
Pleias-360m-RAG supports retrieval-augmented generation with enhanced verifiability, source analysis and grounding on submitted sources. This includes:
|
28 |
+
* Standardized structure and special tokens to include queries, sources, references.
|
29 |
* Anticipation of various query forms in multiple languages, from actual drafted questions to unstructured list of keyword search.
|
30 |
* Source analysis/criticism which also acts as an integrated reranker step.
|
31 |
* Generation of ground answers with references and excerpts linked to the original sources.
|
|
|
34 |
|
35 |
Initial tests have shown that the RAG design has significantly improved the factuality and verifiability of the model. Even when the grounding does not work perfectly, the information remains much closer to the original sources.
|
36 |
|
37 |
+
As a result, Pleias-360m-RAG 0.1 has been already tested and integrated into multiple applied RAG projects, including Pleias flagship application Scholasticai.
|
38 |
|
39 |
## Training
|
40 |
PleIAs-360m-RAG was trained at Jean-Zay with 16 h100s with Nanotron, the pretraining library from HuggingFace. We provide the complete settings as a yaml file as part of our release.
|