|
--- |
|
language: |
|
- en |
|
license: apache-2.0 |
|
library_name: transformers |
|
base_model: |
|
- EleutherAI/gpt-neo-1.3B |
|
datasets: |
|
- legacy-datasets/wikipedia |
|
metrics: |
|
- perplexity |
|
- accuracy |
|
new_version: Kimargin/GPT-NEO-1.3B-wiki |
|
model-index: |
|
- name: GPT-NEO-1.3B-wiki |
|
results: |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: IFEval (0-Shot) |
|
type: HuggingFaceH4/ifeval |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: inst_level_strict_acc and prompt_level_strict_acc |
|
value: 19.21 |
|
name: strict accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: BBH (3-Shot) |
|
type: BBH |
|
args: |
|
num_few_shot: 3 |
|
metrics: |
|
- type: acc_norm |
|
value: 3.42 |
|
name: normalized accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MATH Lvl 5 (4-Shot) |
|
type: hendrycks/competition_math |
|
args: |
|
num_few_shot: 4 |
|
metrics: |
|
- type: exact_match |
|
value: 0.83 |
|
name: exact match |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: GPQA (0-shot) |
|
type: Idavidrein/gpqa |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 0.0 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MuSR (0-shot) |
|
type: TAUR-Lab/MuSR |
|
args: |
|
num_few_shot: 0 |
|
metrics: |
|
- type: acc_norm |
|
value: 6.93 |
|
name: acc_norm |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
- task: |
|
type: text-generation |
|
name: Text Generation |
|
dataset: |
|
name: MMLU-PRO (5-shot) |
|
type: TIGER-Lab/MMLU-Pro |
|
config: main |
|
split: test |
|
args: |
|
num_few_shot: 5 |
|
metrics: |
|
- type: acc |
|
value: 1.1 |
|
name: accuracy |
|
source: |
|
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Kimargin/GPT-NEO-1.3B-wiki |
|
name: Open LLM Leaderboard |
|
--- |
|
|
|
# Model Card for GPT-NEO-1.3B-wiki |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This model is based on [EleutherAI/gpt-neo-1.3B](https://huggingface.co/EleutherAI/gpt-neo-1.3B) and has been fine-tuned on the Wikipedia dataset. It is designed for text generation tasks such as summarization, question answering, and text completion in English. The model is fine-tuned to improve the fluency and factual accuracy of the generated content. |
|
|
|
- **Developed by:** Kimargin |
|
- **Model type:** Fine-tuned model |
|
- **Language(s):** English |
|
- **License:** Apache 2.0 |
|
- **Finetuned from model:** EleutherAI/gpt-neo-1.3B |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [Kimargin/GPT-NEO-1.3B-wiki](https://huggingface.co/Kimargin/GPT-NEO-1.3B-wiki) |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used for tasks like text generation, summarization, and question-answering. It is useful for generating coherent and factual text based on English-language prompts. |
|
|
|
### Downstream Use |
|
|
|
The model can be fine-tuned further for domain-specific applications such as legal or medical text generation, creating specialized question-answering systems, or generating structured content from prompts. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model should not be used in critical applications (e.g., legal, medical, or financial advice) as it may generate biased, inaccurate, or misleading information. It is also not suited for real-time decision-making. |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Since the model was trained on Wikipedia data, it may inherit biases present in the dataset. Users should be cautious when using the model for generating sensitive or potentially biased content. The model may produce inaccurate or misleading text if given ambiguous or misleading prompts. |
|
|
|
### Recommendations |
|
|
|
Users should verify the outputs of the model, especially in critical use cases, and should not rely solely on the model for factual accuracy without human verification. |
|
|
|
## How to Get Started with the Model |
|
|
|
To use the model, you can load it as follows: |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki") |
|
model = AutoModelForCausalLM.from_pretrained("Kimargin/GPT-NEO-1.3B-wiki") |
|
|
|
input_text = "What happened during World War II?" |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
outputs = model.generate(inputs["input_ids"], max_length=100) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
|
|
|
## Training Details |
|
|
|
### Training Data |
|
The model was fine-tuned on a subset of the Wikipedia dataset, which contains a broad range of general knowledge topics. This dataset was chosen to improve the model's capability to generate accurate, general-domain knowledge. |
|
|
|
### Training Procedure |
|
The model was fine-tuned using mixed precision (float16) on multiple GPUs for three epochs. The training was done to minimize perplexity and improve fluency in generated text. |
|
|
|
### Training Hyperparameters |
|
- **Learning rate:** 5e-5 |
|
- **Batch size:** 16 |
|
- **Epochs:** 3 |
|
- **Precision:** float16 (mixed precision) |
|
|
|
## Evaluation |
|
|
|
### Testing Data |
|
The model was evaluated using a validation subset of the Wikipedia dataset to measure its performance on general text generation tasks. |
|
|
|
### Metrics |
|
- **Perplexity:** The model achieved a perplexity of 25.3 on the validation set. |
|
- **Accuracy:** The accuracy of the model in generating factual answers was evaluated qualitatively. |
|
|
|
### Results |
|
The model demonstrates good performance in generating coherent and contextually relevant text, but it may still struggle with niche or specialized topics that are underrepresented in its training data. |
|
|
|
## Environmental Impact |
|
Training large models like GPT-Neo has a significant carbon footprint due to the computational resources required. The estimated environmental impact of fine-tuning this model is as follows: |
|
|
|
- **Hardware Type:** NVIDIA A100 GPUs |
|
- **Hours used:** 20 hours |
|
- **Cloud Provider:** Google Cloud |
|
- **Compute Region:** US-Central |
|
- **Carbon Emitted:** ~50 kg CO2 (estimated using the [ML Impact calculator](https://mlco2.github.io/impact#compute)) |
|
|
|
## Technical Specifications |
|
|
|
### Model Architecture and Objective |
|
The model is a causal language model with 1.3 billion parameters, based on the GPT-Neo architecture. It generates text by predicting the next word in a sequence, making it suitable for text completion and generation tasks. |
|
|
|
### Compute Infrastructure |
|
The model was trained on NVIDIA A100 GPUs using Google Cloud infrastructure. |
|
|
|
## Citation |
|
|
|
If you use this model, please cite the original GPT-Neo model as follows: |
|
|
|
```bibtex |
|
@article{gpt-neo, |
|
author = {EleutherAI}, |
|
title = {GPT-Neo: Large Scale Autoregressive Language Model}, |
|
year = {2021}, |
|
url = {https://github.com/EleutherAI/gpt-neo} |
|
} |
|
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) |
|
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Kimargin__GPT-NEO-1.3B-wiki) |
|
|
|
| Metric |Value| |
|
|-------------------|----:| |
|
|Avg. | 5.25| |
|
|IFEval (0-Shot) |19.21| |
|
|BBH (3-Shot) | 3.42| |
|
|MATH Lvl 5 (4-Shot)| 0.83| |
|
|GPQA (0-shot) | 0.00| |
|
|MuSR (0-shot) | 6.93| |
|
|MMLU-PRO (5-shot) | 1.10| |
|
|
|
|