|
--- |
|
|
|
|
|
{} |
|
--- |
|
|
|
# Model Card for *NondeterministicShuffle* GPT-2 (without Positional Encodings) |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
This is one model in a collection of models trained on the impossible |
|
languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416). |
|
|
|
This model is a GPT-2 Small model trained *without positional encodings* |
|
from scratch on the ***NondeterministicShuffle*** |
|
language. We include a total of 30 checkpoints over the course of |
|
model training, from step 100 to 3000 in increments of 100 steps. |
|
The main branch contains the final checkpoint (3000), and the other |
|
checkpoints are accessible as revisions. |
|
|
|
![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png) |
|
|
|
## Model Details |
|
|
|
- **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts |
|
- **Model type:** Causal Language Model |
|
- **Language(s) (NLP):** English |
|
- **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models |
|
- **Paper:** https://arxiv.org/pdf/2401.06416 |
|
|
|
## Uses |
|
|
|
This artefact is solely intended for the study of language learning |
|
and acquisition in computational models. It should not be |
|
used in any production setting. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
**Important:** This will download our modified GPT-2 code that does |
|
not have absolute positional encodings. If using this model in the |
|
same environment as another GPT-2 model with positional encodings, |
|
load the second model as a `GPT2Model` explicitly. |
|
|
|
```python |
|
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer |
|
import torch |
|
|
|
# Load model and tokenizer |
|
model_id = "mission-impossible-lms/nondeterministic-shuffle-gpt2-no-pos" |
|
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True) |
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
# Set up the prompt and encode it |
|
prompt = "He clean" |
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
# Generate text |
|
output = model.generate(inputs.input_ids, max_length=20) |
|
|
|
# Decode and print the generated text |
|
generated_text = tokenizer.decode(output[0], skip_special_tokens=True) |
|
print(generated_text) |
|
``` |
|
|
|
By default, the `main` branch of this model repo loads the |
|
last model checkpoint (3000). To access the other checkpoints, |
|
use the `revision` argument: |
|
|
|
``` |
|
model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500") |
|
``` |
|
This loads the model at checkpoint 500. |
|
|
|
## Training Details |
|
|
|
### Training Data |
|
|
|
This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/). |
|
Before training, we first transform the dataset into |
|
the corresponding impossible language, as described in |
|
our paper. |
|
|
|
### Training Procedure |
|
|
|
This model was trained for 3,000 gradient steps with |
|
a batch size of 2^19 tokens. We train with a learning |
|
rate that linearly warms up from 0 to 6e-4 over 300 steps. |
|
|
|
## Environmental Impact |
|
|
|
- **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs. |
|
- **Hours used:** ~24 hours. |
|
|
|
## Citation |
|
|
|
```bibtex |
|
@inproceedings{kallini-etal-2024-mission, |
|
title = "Mission: Impossible Language Models", |
|
author = "Kallini, Julie and |
|
Papadimitriou, Isabel and |
|
Futrell, Richard and |
|
Mahowald, Kyle and |
|
Potts, Christopher", |
|
editor = "Ku, Lun-Wei and |
|
Martins, Andre and |
|
Srikumar, Vivek", |
|
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", |
|
month = aug, |
|
year = "2024", |
|
address = "Bangkok, Thailand", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://aclanthology.org/2024.acl-long.787", |
|
doi = "10.18653/v1/2024.acl-long.787", |
|
pages = "14691--14714", |
|
} |
|
``` |
|
|
|
## Model Card Authors |
|
|
|
Julie Kallini |
|
|
|
## Model Card Contact |
|
|
|
kallini@stanford.edu |