YAML Metadata Error: "datasets[0]" with value "The Pile" is not valid. If possible, use a dataset id from https://hf.co/datasets.

RWKV-3 169M

Model Description

RWKV-3 169M is a L12-D768 causal language model trained on the Pile. See https://github.com/BlinkDL/RWKV-LM for details.

At this moment you have to use my Github code (https://github.com/BlinkDL/RWKV-v2-RNN-Pile) to run it.

ctx_len = 768 n_layer = 12 n_embd = 768

Final checkpoint: RWKV-3-Pile-20220720-10704.pth : Trained on the Pile for 328B tokens.

  • Pile loss 2.5596
  • LAMBADA ppl 28.82, acc 32.33%
  • PIQA acc 64.15%
  • SC2016 acc 57.88%
  • Hellaswag acc_norm 32.45%

Preview checkpoint: 20220703-1652.pth : Trained on the Pile for 50B tokens. Pile loss 2.6375, LAMBADA ppl 33.30, acc 31.24%.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Spaces using BlinkDL/rwkv-3-pile-169m 2