Pclanglais
commited on
Commit
•
773fd94
1
Parent(s):
06fee8d
Update README.md
Browse files
README.md
CHANGED
@@ -20,10 +20,10 @@ language:
|
|
20 |
|
21 |
**Pleias-Pico** is a 353 million parameters specialized language model designed by PleIAs for Retrieval-Augmented Generation.
|
22 |
|
23 |
-
Similarly to its base model, Pleias-350m, Pleias-
|
24 |
|
25 |
## Description
|
26 |
-
Pleias-Pico is continuous pretrain of Pleias-
|
27 |
|
28 |
Pleias-Pico includes the main features of the original base model:
|
29 |
* Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content.
|
@@ -38,7 +38,7 @@ Pleias-Pico supports retrieval-augmented generation with enhanced verifiability,
|
|
38 |
|
39 |
Initial tests have shown that the RAG design has significantly improved the factuality and verifiability of the model. Even when the grounding does not work perfectly, the information remains much closer to the original sources.
|
40 |
|
41 |
-
As a result, Pleias-
|
42 |
|
43 |
## Training
|
44 |
Pleias-Pico was trained at Jean-Zay with 16 h100s with Nanotron, the pretraining library from HuggingFace. We provide the complete settings as a yaml file as part of our release.
|
|
|
20 |
|
21 |
**Pleias-Pico** is a 353 million parameters specialized language model designed by PleIAs for Retrieval-Augmented Generation.
|
22 |
|
23 |
+
Similarly to its base model, Pleias-350m, Pleias-Pico aims to be a fully open model (weights, code, data), only trained on content with a permissible license and fully compliant with the European AI Act.
|
24 |
|
25 |
## Description
|
26 |
+
Pleias-Pico is continuous pretrain of Pleias-350m on a new dataset of 45,088,768,000 tokens modeling common retrieval tasks. All the content of the dataset is ultimately coming from [Common Corpus](https://huggingface.co/datasets/PleIAs/common_corpus).
|
27 |
|
28 |
Pleias-Pico includes the main features of the original base model:
|
29 |
* Only trained on open data under a permissible license and in compliance with the European AI Act. By design, all Pleias model are unable to output copyrighted content.
|
|
|
38 |
|
39 |
Initial tests have shown that the RAG design has significantly improved the factuality and verifiability of the model. Even when the grounding does not work perfectly, the information remains much closer to the original sources.
|
40 |
|
41 |
+
As a result, Pleias-Pico has been already tested and integrated into multiple applied RAG projects, including Pleias's flagship application Scholasticai.
|
42 |
|
43 |
## Training
|
44 |
Pleias-Pico was trained at Jean-Zay with 16 h100s with Nanotron, the pretraining library from HuggingFace. We provide the complete settings as a yaml file as part of our release.
|