|
--- |
|
license: apache-2.0 |
|
base_model: EleutherAI/pythia-160m-deduped |
|
tags: |
|
- generated_from_trainer |
|
datasets: |
|
- FineWebSentences |
|
metrics: |
|
- accuracy |
|
model-index: |
|
- name: pythia-finewebedu |
|
results: |
|
- task: |
|
name: Causal Language Modeling |
|
type: text-generation |
|
dataset: |
|
name: FineWebSentences |
|
type: FineWebSentences |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.24020533058796614 |
|
--- |
|
|
|
# pythia-finewebedu |
|
|
|
- Generate half intelligible English sentences using a small GPT like model. |
|
- Will output one sentence at a time. |
|
|
|
This model is a fine-tuned version of [EleutherAI/pythia-160m-deduped](https://huggingface.co/EleutherAI/pythia-160m-deduped) on the FineWebSentences dataset. |
|
It achieves the following results on the evaluation set: |
|
- Loss: 4.7702 |
|
- Accuracy: 0.2402 |
|
|
|
## Model description |
|
|
|
To generate 10 random sentences starting from an empty string on a CUDA device: |
|
|
|
```python |
|
from transformers import pipeline, set_seed |
|
|
|
generator = pipeline('text-generation', model='agentlans/pythia-finewebedu', device='cuda') |
|
|
|
set_seed(1234) |
|
results = generator("", max_length=100, num_return_sequences=10, do_sample=True) |
|
|
|
for x in results: |
|
print(x['generated_text']) |
|
``` |
|
|
|
Output: |
|
```text |
|
They are also, you need to get great results at her school. |
|
According to him the term of the Newer, as an entity of the country. |
|
- To provide less information to help prevent and respond appropriately, it also seems to take action. |
|
He was an important historical project that he was going to have a history, but the fact that he lived in the US and then he can move back to where he left. |
|
By the use of the ESLP and INGELTS OF THE TRAIL ORD and REPORTANCE OR: |
|
However, the system and the Internet have not been built. |
|
To bridge your teeth with your teeth of the plaque build up with the new teeth and tartar attachments to the tissues, as those without an orthoker. |
|
This is more difficult than other to learn the basics of the workbooks, where a few thousand notes the same idea that the author can be seen on the work of the project.) |
|
This study was that by one of the six states, in the middle of a union that he had to marry or union union. |
|
- A-Pangana and Pitta, P.A. L. T.C. |
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
- For generating short lines of English text |
|
- Could be useful for |
|
- data augmentation |
|
- creative inspiration |
|
- entertainment |
|
- CAPTCHA |
|
- Can be further finetuned on other data such as: |
|
- prompts |
|
- famous quotes |
|
- news headlines |
|
- blog post titles |
|
|
|
Limitations include: |
|
|
|
- Not guaranteed to make sensible, coherent, or grammatically correct sentences |
|
- No regard for accuracy or truthfulness whatsoever |
|
- It's a bunch of words from a probability model, what do you expect? |
|
|
|
## Training and evaluation data |
|
|
|
Sentences from [HuggingFaceFW/fineweb-edu](https://huggingface.co/datasets/HuggingFaceFW/fineweb-edu) |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 5e-05 |
|
- train_batch_size: 8 |
|
- eval_batch_size: 8 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 3.0 |
|
|
|
### Training results |
|
|
|
No overfitting. Lower loss with Pythia-160m than with Pythia-70m, as expected. |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.39.3 |
|
- Pytorch 2.3.0+cu121 |
|
- Datasets 2.18.0 |
|
- Tokenizers 0.15.2 |
|
|