taskGPT2-xl v0.2a

---
datasets:
- cot
- cos_e
- math_qa
- CShorten/ML-ArXiv-Papers
- gsm8k
inference:
  parameters:
    max_new_tokens: 32
    temperature: 1
    top_k: 1
license: apache-2.0
language:
- en
pipeline_tag: text-generation
widget:
- text: "Please answer to the following question. Who is going to be the next Ballon d'or?"
  example_title: "Question Answering"
- text: "Q: Can Geoffrey Hinton have a conversation with George Washington? Give the rationale before answering."
  example_title: "Logical reasoning"
- text: "Please answer the following question. What is the boiling point of Nitrogen?"
  example_title: "Scientific knowledge"
- text: "Answer the following yes/no question. Can you write a whole Haiku in a single tweet?"
  example_title: "Yes/no question"
- text: "Answer the following yes/no question by reasoning step-by-step. Can you write a whole Haiku in a single tweet?"
  example_title: "Reasoning task"
- text: "Q: ( False or not False or False ) is? A: Let's think step by step"
  example_title: "Boolean Expressions"
- text: "The square root of x is the cube root of y. What is y to the power of 2, if x = 4?"
  example_title: "Math reasoning"
- text: "Premise:  At my age you will probably have learnt one lesson. Hypothesis:  It's not certain how many lessons you'll learn by your thirties. Does the premise entail the hypothesis?"
  example_title: "Premise and hypothesis"

library_name: transformers
tags:
- finance
- code
---

<h1 style="font-size: 42px">taskGPT2-xl v0.2a<h1/>


# Model Summary

> I finetuned GPT2 on text2code, cot, math and FLAN tasks, on some tasks its performs better than GPT-JT

I create a collection of open techniques and datasets to build taskGPT2-xl:
- 
- The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html) not yet, [FLAN dataset](https://github.com/google-research/FLAN), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions) .


# Quick Start

```python
from transformers import pipeline
pipe = pipeline(model='AlexWortega/taskGPT2-xl')
pipe('''"I love this!" Is it positive? A:''')
```
or
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("taskGPT2-xl")
model = AutoModelForCausalLM.from_pretrained("taskGPT2-xl")
```

# License

The weights of taskGPT2-xl are licensed under version 2.0 of the Apache License.

# Training Details
I used datasets from huggingface:
 - strategyqa_train
 - aqua_train
 - qed_train


## Hyperparameters

I used Novograd with a learning rate of 2e-5 and global batch size of 6 (3 for each data parallel worker).
I use both data parallelism and pipeline parallelism to conduct training.
During training, we truncate the input sequence to 512 tokens, and for input sequence that contains less than 512 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.


# References

#Metrics

SOON

## BibTeX entry and citation info

```bibtex
@article{
  title={GPT2xl is underrated task solver},
  author={Nickolich Aleksandr, Karina Romanova, Arseniy Shahmatov, Maksim Gersimenko},
  year={2023}
}
```