juliekallini commited on
Commit
1fe3864
1 Parent(s): 2bf0079

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +126 -0
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
3
+ # Doc / guide: https://huggingface.co/docs/hub/model-cards
4
+ {}
5
+ ---
6
+
7
+ # Model Card for *NondeterministicShuffle* GPT-2 (without Positional Encodings)
8
+
9
+ <!-- Provide a quick summary of what the model is/does. -->
10
+
11
+ This is one model in a collection of models trained on the impossible
12
+ languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).
13
+
14
+ This model is a GPT-2 Small model trained *without positional encodings*
15
+ from scratch on the ***NondeterministicShuffle***
16
+ language. We include a total of 30 checkpoints over the course of
17
+ model training, from step 100 to 3000 in increments of 100 steps.
18
+ The main branch contains the final checkpoint (3000), and the other
19
+ checkpoints are accessible as revisions.
20
+
21
+ ![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)
22
+
23
+ ## Model Details
24
+
25
+ - **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
26
+ - **Model type:** Causal Language Model
27
+ - **Language(s) (NLP):** English
28
+ - **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models
29
+ - **Paper:** https://arxiv.org/pdf/2401.06416
30
+
31
+ ## Uses
32
+
33
+ This artefact is solely intended for the study of language learning
34
+ and acquisition in computational models. It should not be
35
+ used in any production setting.
36
+
37
+ ## How to Get Started with the Model
38
+
39
+ Use the code below to get started with the model.
40
+
41
+ **Important:** This will download our modified GPT-2 code that does
42
+ not have absolute positional encodings. If using this model in the
43
+ same environment as another GPT-2 model with positional encodings,
44
+ load the second model as a `GPT2Model` explicitly.
45
+
46
+ ```python
47
+ from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
48
+ import torch
49
+
50
+ # Load model and tokenizer
51
+ model_id = "mission-impossible-lms/nondeterministic-shuffle-gpt2-no-pos"
52
+ model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True)
53
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
54
+
55
+ # Set up the prompt and encode it
56
+ prompt = "He clean"
57
+ inputs = tokenizer(prompt, return_tensors="pt")
58
+
59
+ # Generate text
60
+ output = model.generate(inputs.input_ids, max_length=20)
61
+
62
+ # Decode and print the generated text
63
+ generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
64
+ print(generated_text)
65
+ ```
66
+
67
+ By default, the `main` branch of this model repo loads the
68
+ last model checkpoint (3000). To access the other checkpoints,
69
+ use the `revision` argument:
70
+
71
+ ```
72
+ model = GPT2LMHeadModel.from_pretrained(model_id, revision="checkpoint-500")
73
+ ```
74
+ This loads the model at checkpoint 500.
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ This model was trained on the [100M-word BabyLM dataset](https://babylm.github.io/).
81
+ Before training, we first transform the dataset into
82
+ the corresponding impossible language, as described in
83
+ our paper.
84
+
85
+ ### Training Procedure
86
+
87
+ This model was trained for 3,000 gradient steps with
88
+ a batch size of 2^19 tokens. We train with a learning
89
+ rate that linearly warms up from 0 to 6e-4 over 300 steps.
90
+
91
+ ## Environmental Impact
92
+
93
+ - **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
94
+ - **Hours used:** ~24 hours.
95
+
96
+ ## Citation
97
+
98
+ ```bibtex
99
+ @inproceedings{kallini-etal-2024-mission,
100
+ title = "Mission: Impossible Language Models",
101
+ author = "Kallini, Julie and
102
+ Papadimitriou, Isabel and
103
+ Futrell, Richard and
104
+ Mahowald, Kyle and
105
+ Potts, Christopher",
106
+ editor = "Ku, Lun-Wei and
107
+ Martins, Andre and
108
+ Srikumar, Vivek",
109
+ booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
110
+ month = aug,
111
+ year = "2024",
112
+ address = "Bangkok, Thailand",
113
+ publisher = "Association for Computational Linguistics",
114
+ url = "https://aclanthology.org/2024.acl-long.787",
115
+ doi = "10.18653/v1/2024.acl-long.787",
116
+ pages = "14691--14714",
117
+ }
118
+ ```
119
+
120
+ ## Model Card Authors
121
+
122
+ Julie Kallini
123
+
124
+ ## Model Card Contact
125
+
126
+ kallini@stanford.edu