README / README.md
thomwolf's picture
thomwolf HF staff
Update README
2d0dcf3
|
raw
history blame
3.83 kB
metadata
title: README
emoji: 🐢
colorFrom: green
colorTo: yellow
sdk: static
pinned: false

BigScience Large Language Model Training

Training a multilingual 176 billion parameters model in the open BigScience Logo

BigScience is a open and collaborative workshop around the study and creation of very large language models gathering more than 1000 researchers around the worlds. You can find more information on the main website at https://bigscience.huggingface.co.

The training of BigScience’s main model started on March 11, 2022 11:42am PST and will last 3-4 months on the 416 A100 GPUs of the Jean Zay public supercomputer

You can follow the training at https://twitter.com/BigScienceLLM

Summary of the model, dataset, hardware, training and environmental considerations:

The model

The dataset

The engineering side

Environmental considerations

  • Jean Zay, the supercomputer we are using for model training, is mostly powered by nuclear energy, which is a low carbon energy source.
  • Significant efforts were made to make sure that the computing infrastructure is as efficient as possible — the heat generated by the hardware even gets used for heating buildings on campus!
  • More information:
    • We are currently working on making a precise estimate of the carbon emitted during all of the steps of model training, including intermediate experiments as well as inference. More soon!