fraserlove
/

gpt-alpha

@@ -4,6 +4,7 @@ datasets:
 - HuggingFaceFW/fineweb-edu
 language:
 - en
 ---
 # GPT 124M
 A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
@@ -20,23 +21,41 @@ India’s story begins with a very ancient Vedic religion. They were the ancient
 Once upon a time, the King of Italy, who was to govern what would become the world, thought that it would be a great and noble undertaking to introduce the Roman Senate into the country in order to defend Rome — to defend her own capital in a very civilized manner, to promote the arts and promote the Roman religion. Accordingly, Rome,
 ```
-### Inference
-The GPT model can be used for inference using the `inference.py` script. The script generates completions given a context. The completions are generated using the top-k sampling strategy. The maximum length of the completions, temperature and k value can be set in the script. The model can be loaded from a PyTorch checkpoint `torch.load('cache/logs/124M.pt', map_location=device)` or from a cached Hugging Face model `GPT.from_pretrained('cache/models')` after training. The model can then be used for inference as follows:
 ```python
-import torch
-from gpt import GPT
-from transformers import AutoTokenizer
-device = 'cuda' if torch.cuda.is_available() else 'cpu'
-# Load the tokeniser and model
 tokeniser = AutoTokenizer.from_pretrained('fraserlove/gpt-124m')
-model = GPT.from_pretrained('fraserlove/gpt-124m').to(device)
 context = 'Once upon a time,'
-context = torch.tensor(tokeniser.encode(context), dtype=torch.long).to(device)
-samples = model.generate(context, n_samples=2, max_tokens=64)
-samples = [samples[j, :].tolist() for j in range(samples.size(0))]
-print('\n'.join(tokeniser.decode(sample).split('<|endoftext|>')[0] for sample in samples))
 ```

 - HuggingFaceFW/fineweb-edu
 language:
 - en
+pipeline_tag: text-generation
 ---
 # GPT 124M
 A pretrained GPT model with 124M parameters trained on 40B tokens of educational content. The full implementation of the model can be found on GitHub [here](https://github.com/fraserlove/gpt). The model was trained for 4 epochs on the 10B token subset of [fineweb-edu](https://arxiv.org/pdf/2406.17557), a large-scale dataset of educational content. The model surpassed GPT-3 124M on [HellaSwag](https://arxiv.org/pdf/1905.07830) after just 38B tokens, this is a 7.8x improvement over GPT-3 which was trained on 300B tokens. The final model at 40B tokens achieved a HellaSwag score of 0.339.
 Once upon a time, the King of Italy, who was to govern what would become the world, thought that it would be a great and noble undertaking to introduce the Roman Senate into the country in order to defend Rome — to defend her own capital in a very civilized manner, to promote the arts and promote the Roman religion. Accordingly, Rome,
 ```
+## Inference
+The model can be directly used with a pipeline for text generation:
 ```python
+>>> from transformers import pipeline, set_seed
+>>> generator = pipeline('text-generation', model='fraserlove/gpt-124m')
+>>> set_seed(0)
+>>> generator('Once upon a time,', max_length=30, num_return_sequences=5, do_sample=True)
+[{'generated_text': 'Once upon a time, my father had some way that would help him win his first war. There was a man named John. He was the husband'},
+ {'generated_text': 'Once upon a time, this particular breed would be considered a “chicken fan”; today, the breed is classified as a chicken.'},
+ {'generated_text': 'Once upon a time, there was a famous English nobleman named King Arthur (in the Middle Ages, it was called ‘the Arthur’'},
+ {'generated_text': "Once upon a time, the Christian God created the world in the manner which, under different circumstances, was true of the world's existence. The universe"},
+ {'generated_text': 'Once upon a time, I wrote all of the letters of an alphabets in a single document. Then I read each letter of that alphabet'}]
+```
+The model can also be used directly for inference:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
 tokeniser = AutoTokenizer.from_pretrained('fraserlove/gpt-124m')
+model = AutoModelForCausalLM.from_pretrained('fraserlove/gpt-124m')
 context = 'Once upon a time,'
+context = tokeniser.encode(context, return_tensors='pt')
+samples = model.generate(context, max_new_tokens=64, do_sample=True, num_return_sequences=2)
+decoded = tokeniser.batch_decode(samples)
+print('\n'.join(decoded))
+```
+To get the features of a given text:
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokeniser = AutoTokenizer.from_pretrained('fraserlove/gpt-124m')
+model = AutoModelForCausalLM.from_pretrained('fraserlove/gpt-124m')
+text = 'Once upon a time,'
+encoded_input = tokeniser(text, return_tensors='pt')
+output = model(**encoded_input)
 ```