Covenant72B / README.md
joellidin's picture
Update to Checkpoint-Two (420B tokens)
d4178f9 verified
|
raw
history blame
2.41 kB
metadata
license: apache-2.0
datasets:
  - mlfoundations/dclm-baseline-1.0-parquet

Covenant72B

Covenant72B is the largest permissionless collaboratively trained language model trained entirely from scratch at the 72 billion parameter scale.

It is being trained with 20+ globally distributed participants coordinated via decentralized infrastructure on the Bittensor blockchain.

Checkpoint-Two marks the second release, corresponding to 420 billion tokens processed. Model files are available in the Checkpoint-Two branch. Future checkpoints will be updated here.

Checkpoint Two


Training Details

Property Value
Model size 72B
Architecture LLaMA-style
Target token budget 1.2T (420B for current checkpoint)
Compute participants 20+
Minimal compute per participant 8×B200 or equivalent
Dataset DCLM-baseline
Optimizer SparseLoCo (communication-efficient optimizer)

Performance on Benchmarks

All results are 0-shot acc-norm (%) unless noted.

Model Compute Environment / Permissions Size Tokens ARC-C ARC-E PIQA OpenBookQA HellaSwag Winogrande (acc) MMLU (acc)
Intellect-1 Internet / Whitelist 10B 1T 44.8 71.6 77.7 43.6 70.5 63.1 32.7
Psyche Consilience-7Y9 Internet / Whitelist 40B 1.2T 31.1 55.8 76.1 34.8 63.7 57.0 24.2
Covenant72B (Checkpoint-Two) Internet / Permissionless 72B 420B 53.84 77.74 80.58 44.60 77.08 71.43 47.49
LLM360 K2 ckpt_108 Centralized Cluster 65B 420B 45.73 70.54 80.90 43.20 78.23 71.90 50.01
LLM360 K2 Stage 1 Centralized Cluster 65B 1.4T 53.84 75.93 82.48 48.00 82.81 76.64 63.90
LLaMA-2-7B Centralized Cluster 7B 2T 45.90 74.58 75.92 44.20 75.92 68.90 40.86
LLaMA-2-70B Centralized Cluster 70B 2T 57.59 80.77 82.92 48.60 83.86 77.58 65.56

For more details, refer to Checkpoint One on Templar Research.