meryyllebr543/lunaris-ultrafineweb-20b-tokenized
Updated • 166 • 3
670 million parameter language model trained on the lunaris-ultrafineweb-20b-tokenized pretokenized dataset, using the custom BPE tokenizer. Trained for around ~60-70 hours on A100 and around ~3-4B tokens seen. Model written in Pytorch with Pytorch Lightning and Wandb for Logging. Trained on GCP Vertex AI CustomJob. For Software Engineering year 12 major project.
Try with Google Colab: https://colab.research.google.com/drive/1wjKTknn3bSSYD7wO7oKXSDytSDQp3FwG?usp=sharing
This model has been pushed to the Hub using the PytorchModelHubMixin integration: