projecte-aina
/

FLOR-6.3B-Instructed

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

mmarimon commited on Mar 19

Commit

7fc3836

•

1 Parent(s): 678012b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ license: apache-2.0
 ## Model description
-**FLOR-6.3B-Instructed** is a 6.3B-parameter transformer-based causal language model for Catalan, Spanish, and English, trained on a combined dataset from [InstruCAT](https://huggingface.co/datasets/BSC-LT/InstruCat), a Catalan language set of instruction generated automatically from prject-aina task orientated dataset, a subset of the [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset for English, and [MENTOR_ES](https://huggingface.co/datasets/projecte-aina/MENTOR_ES) and [MENTOR_CA](https://huggingface.co/datasets/projecte-aina/MENTOR_CA), a Spanish and Catalan sets of instructions commisioned by the BSC Language Technologies Unit.
 It is the result of a language adaptation technique performed on [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1),
 which involves modifying the model's vocabulary and embedding layer, and continuously pre-training the model with 140B tokens in our target languages.
 Blog post describing the base model: [flor-6-3b, a chinchilla compliant model](https://medium.com/@mpamies247/flor-6-3b-a-chinchilla-compliant-model-for-catalan-spanish-and-english-7cdb389a9aac)

 ## Model description
+**FLOR-6.3B-Instructed** is a 6.3B-parameter transformer-based causal language model for Catalan, Spanish, and English, trained on a combined dataset from [InstruCat](https://huggingface.co/datasets/BSC-LT/InstruCat), a Catalan language set of instruction generated automatically from project-aina task orientated dataset, a subset of the [Dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) dataset for English, and [MENTOR_ES](https://huggingface.co/datasets/projecte-aina/MENTOR_ES) and [MENTOR_CA](https://huggingface.co/datasets/projecte-aina/MENTOR_CA), a Spanish and Catalan sets of instructions commisioned by the BSC Language Technologies Unit.
 It is the result of a language adaptation technique performed on [BLOOM-7.1B](https://huggingface.co/bigscience/bloom-7b1),
 which involves modifying the model's vocabulary and embedding layer, and continuously pre-training the model with 140B tokens in our target languages.
 Blog post describing the base model: [flor-6-3b, a chinchilla compliant model](https://medium.com/@mpamies247/flor-6-3b-a-chinchilla-compliant-model-for-catalan-spanish-and-english-7cdb389a9aac)