670 million parameter language model trained on the lunaris-ultrafineweb-20b-tokenized pretokenized dataset, using the custom BPE tokenizer. Trained for around ~60-70 hours on A100 and around ~3-4B tokens seen. Model written in Pytorch with Pytorch Lightning and Wandb for Logging. Trained on GCP Vertex AI CustomJob. For Software Engineering year 12 major project.

Try with Google Colab: https://colab.research.google.com/drive/1wjKTknn3bSSYD7wO7oKXSDytSDQp3FwG?usp=sharing

This model has been pushed to the Hub using the PytorchModelHubMixin integration:

Code: [More Information Needed]
Paper: [More Information Needed]
Docs: [More Information Needed]

Downloads last month: 6

Safetensors

Model size

0.7B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Jumpr
/

apollousaLM-670M

Dataset used to train Jumpr/apollousaLM-670M