670 million parameter language model trained on the lunaris-ultrafineweb-20b-tokenized pretokenized dataset, using the custom BPE tokenizer. Trained for around ~60-70 hours on A100 and around ~3-4B tokens seen. Model written in Pytorch with Pytorch Lightning and Wandb for Logging. Trained on GCP Vertex AI CustomJob. For Software Engineering year 12 major project.

Try with Google Colab: https://colab.research.google.com/drive/1wjKTknn3bSSYD7wO7oKXSDytSDQp3FwG?usp=sharing

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

  • Code: [More Information Needed]
  • Paper: [More Information Needed]
  • Docs: [More Information Needed]
Downloads last month
6
Safetensors
Model size
0.7B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Jumpr/apollousaLM-670M