|
--- |
|
license: mit |
|
language: |
|
- en |
|
tags: |
|
- babylm |
|
--- |
|
|
|
# Lil-Bevo-X |
|
|
|
Lil-Bevo-X is UT Austin's submission to the BabyLM challenge, specifically the *strict* track. |
|
|
|
[Link to GitHub Repo](https://github.com/venkatasg/Lil-Bevo) |
|
|
|
## TLDR: |
|
- Unigram tokenizer trained on 10M BabyLM tokens plus MAESTRO dataset for a vocab size of 32k. |
|
- `deberta-base-v3` trained on mixture of MAESTRO and 100M tokens for 3 epochs. |
|
- Model continues training for 100,000 steps with 128 sequence length. |
|
- Model continues training for 65,000 steps with 512 sequence length. |
|
- Model is trained with targeted linguistic masking for 1 epoch. |
|
|
|
|
|
This README will be updated with more details soon. |