---
license: mit
---

# GPT-2 alpaca clean

This repository contains a finetuned version of the GPT-2 language model, 
trained on the [alpaca-cleaned](https://huggingface.co/datasets/yahma/alpaca-cleaned)
dataset. The alpaca-cleaned dataset is a cleaned and filtered version of the 
Stanford Alpaca dataset.

## Model Details

The GPT-2 model was finetuned using the Hugging Face Transformers library on the alpaca-cleaned dataset. The finetuned model can be used for various natural language processing tasks, such as text generation, summarization, and question answering.

## Usage

To use the finetuned model, you can load it using the Hugging Face Transformers library. Here's an example:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Rjonah321/gpt2-alpaca-clean"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

input_text = "Write a short story about a dog."
input_ids = tokenizer.encode(input_text, return_tensors='pt')

output = model.generate(input_ids, max_length=200, do_sample=True, top_k=50, top_p=0.95, num_return_sequences=1)
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

print(generated_text)
```

## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgements
```
@misc{alpaca_cleaned_2023,
author = {Yahma},
title = {Alpaca-cleaned Dataset},
year = 2023,
url = {https://huggingface.co/datasets/yahma/alpaca-cleaned},
note = {Accessed: 2024-06-19},
license = {CC-BY-4.0}
}
```