metadata

datasets:
  - cot
  - cos_e
  - math_qa
  - CShorten/ML-ArXiv-Papers
  - gsm8k
inference:
  parameters:
    max_new_tokens: 32
    temperature: 1
    top_k: 1
license: apache-2.0
language:
  - en
pipeline_tag: text-generation
widget:
  - text: >-
      Please answer to the following question. Who is going to be the next
      Ballon d'or?
    example_title: Question Answering
  - text: >-
      Q: Can Geoffrey Hinton have a conversation with George Washington? Give
      the rationale before answering.
    example_title: Logical reasoning
  - text: >-
      Please answer the following question. What is the boiling point of
      Nitrogen?
    example_title: Scientific knowledge
  - text: >-
      Answer the following yes/no question. Can you write a whole Haiku in a
      single tweet?
    example_title: Yes/no question
  - text: >-
      Answer the following yes/no question by reasoning step-by-step. Can you
      write a whole Haiku in a single tweet?
    example_title: Reasoning task
  - text: 'Q: ( False or not False or False ) is? A: Let''s think step by step'
    example_title: Boolean Expressions
  - text: >-
      The square root of x is the cube root of y. What is y to the power of 2,
      if x = 4?
    example_title: Math reasoning
  - text: >-
      Premise:  At my age you will probably have learnt one lesson. Hypothesis: 
      It's not certain how many lessons you'll learn by your thirties. Does the
      premise entail the hypothesis?
    example_title: Premise and hypothesis
library_name: transformers
tags:
  - finance
  - code

taskGPT2-xl v0.2a

Model Summary

I finetuned GPT2 on text2code, cot, math and FLAN tasks, on some tasks its performs better than GPT-JT

I create a collection of open techniques and datasets to build taskGPT2-xl:

The model was trained on a large collection of diverse data, including Chain-of-Thought (CoT) not yet, FLAN dataset, Natural-Instructions (NI) dataset .

Quick Start

from transformers import pipeline
pipe = pipeline(model='AlexWortega/taskGPT2-xl')
pipe('''"I love this!" Is it positive? A:''')

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("taskGPT2-xl")
model = AutoModelForCausalLM.from_pretrained("taskGPT2-xl")

License

The weights of taskGPT2-xl are licensed under version 2.0 of the Apache License.

Training Details

I used datasets from huggingface:

strategyqa_train
aqua_train
qed_train

Hyperparameters

I used Novograd with a learning rate of 2e-5 and global batch size of 6 (3 for each data parallel worker). I use both data parallelism and pipeline parallelism to conduct training. During training, we truncate the input sequence to 512 tokens, and for input sequence that contains less than 512 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.

References

#Metrics

SOON

BibTeX entry and citation info

@article{
  title={GPT2xl is underrated task solver},
  author={Nickolich Aleksandr, Karina Romanova, Arseniy Shahmatov, Maksim Gersimenko},
  year={2023}
}