Edit model card

BLOOM-CLP German (6.4B parameters)

This is a monolingual German language model trained using the CLP-Transfer method based on BLOOM-7b1.

You can try out the model at European Language Grid.

UPDATE: We recently released an instruction-tuned version of this model: malteos/bloom-6b4-clp-german-oasst-v0.1.

How to use

You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility:

>>> from transformers import pipeline, set_seed
>>> generator = pipeline('text-generation', model='malteos/bloom-6b4-clp-german')
>>> set_seed(42)
>>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=3)

[{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."},
 {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"},
 {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"},]

Training dataset

Code

Hardware

Evaluation

Validation PPL compared to from-scratch training (the lower the better):

Tokens vs PPL

Additional evaluations can be found in our paper.

How to cite

If you are using our code or models, please cite our paper:

@misc{Ostendorff2023clp,
  doi = {10.48550/ARXIV.2301.09626},
  author = {Ostendorff, Malte and Rehm, Georg},
  title = {Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning},
  publisher = {arXiv},
  year = {2023}
}

License

BigScience BLOOM RAIL 1.0

Downloads last month
2,898
Safetensors
Model size
6.25B params
Tensor type
FP16
·

Dataset used to train malteos/bloom-6b4-clp-german

Spaces using malteos/bloom-6b4-clp-german 3