Text Generation
Transformers
PyTorch
Spanish
gptj
causal-lm
Inference Endpoints
versae commited on
Commit
cd92248
1 Parent(s): 98e3973

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -10,6 +10,7 @@ datasets:
10
 
11
  ---
12
 
 
13
  - [Version v1beta3](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3): July 22nd, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3-half)*, at step 850k)
14
  - [Version v1beta2](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2): June 6th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2-half)*, at step 616k)
15
  - [Version v1beta1](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta1-half): April 28th, 2022 (*half-precision weights only*, at step 408k)
@@ -63,7 +64,7 @@ BERTIN-GPT-J-6B was finetuned on [mC4-es-sampled (gaussian)](https://huggingface
63
 
64
  ## Training procedure
65
 
66
- This model was finetuned for 40 billion tokens (40,384,790,528) over 616,000 steps on a single TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly.
67
 
68
  ## Intended Use and Limitations
69
 
 
10
 
11
  ---
12
 
13
+ - - [Version ✨v1✨](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1): August 25th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1-half)*, at step 1M)
14
  - [Version v1beta3](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3): July 22nd, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta3-half)*, at step 850k)
15
  - [Version v1beta2](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2): June 6th, 2022 (*[full](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2) and [half-precision weights](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta2-half)*, at step 616k)
16
  - [Version v1beta1](https://huggingface.co/bertin-project/bertin-gpt-j-6B/tree/v1beta1-half): April 28th, 2022 (*half-precision weights only*, at step 408k)
 
64
 
65
  ## Training procedure
66
 
67
+ This model was finetuned for ~65 billion tokens (65,536,000,000) over 1,000,000 steps on a single TPU v3-8 VM. It was trained as an autoregressive language model, using cross-entropy loss to maximize the likelihood of predicting the next token correctly. Training took roughly 6 months.
68
 
69
  ## Intended Use and Limitations
70