AlexWortega commited on
Commit
623af53
1 Parent(s): 43c8ef7

Upload README (1).md

Browse files
Files changed (1) hide show
  1. README (1).md +92 -0
README (1).md ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - cot
4
+ - cos_e
5
+ - math_qa
6
+ - CShorten/ML-ArXiv-Papers
7
+ - gsm8k
8
+ - code_x_glue_tc_text_to_code
9
+ - Muennighoff/P3
10
+ - HuggingFaceH4/self-instruct-seed
11
+ - truthful_qa
12
+ - empathetic_dialogues
13
+ inference:
14
+ parameters:
15
+ max_new_tokens: 32
16
+ temperature: 1
17
+ top_k: 1
18
+ license: apache-2.0
19
+ language:
20
+ - en
21
+ pipeline_tag: text-generation
22
+ widget:
23
+ - example_title: QA
24
+ text: What is a BERT?
25
+ - example_title: Open domain QA
26
+ text: Please answer the following question. What is the boiling point of Nitrogen?
27
+ - example_title: Theme text Generation
28
+ text: Generate text about BERT
29
+ - null
30
+ ---
31
+
32
+ <h1 style="font-size: 42px">taskGPT2-xl v0.2a<h1/>
33
+
34
+
35
+
36
+ # Model Summary
37
+
38
+ > I finetuned GPT2 on text2code, cot, math and FLAN tasks, on some tasks its performs better than GPT-JT
39
+
40
+ I create a collection of open techniques and datasets to build taskGPT2-xl:
41
+ -
42
+ - The model was trained on a large collection of diverse data, including [Chain-of-Thought (CoT)](https://ai.googleblog.com/2022/05/language-models-perform-reasoning-via.html), [FLAN dataset](https://github.com/google-research/FLAN), [Natural-Instructions (NI) dataset](https://github.com/allenai/natural-instructions).
43
+
44
+
45
+ # Quick Start
46
+
47
+ ```python
48
+ from transformers import pipeline
49
+ pipe = pipeline(model='AlexWortega/taskGPT2-xl')
50
+ pipe('''"I love this!" Is it positive? A:''')
51
+ ```
52
+ or
53
+ ```python
54
+ from transformers import AutoTokenizer, AutoModelForCausalLM
55
+ tokenizer = AutoTokenizer.from_pretrained("taskGPT2-xl")
56
+ model = AutoModelForCausalLM.from_pretrained("taskGPT2-xl")
57
+ ```
58
+
59
+ # License
60
+
61
+ The weights of taskGPT2-xl are licensed under version 2.0 of the Apache License.
62
+
63
+ # Training Details
64
+ I used datasets from huggingface:
65
+ - strategyqa_train
66
+ - aqua_train
67
+ - qed_train
68
+
69
+
70
+ ## Hyperparameters
71
+
72
+ I used Novograd with a learning rate of 2e-5 and global batch size of 6 (3 for each data parallel worker).
73
+ I use both data parallelism and pipeline parallelism to conduct training.
74
+ During training, we truncate the input sequence to 512 tokens, and for input sequence that contains less than 512 tokens, we concatenate multiple sequences into one long sequence to improve the data efficiency.
75
+
76
+
77
+
78
+ # References
79
+
80
+ #Metrics
81
+
82
+ SOON
83
+
84
+ ## BibTeX entry and citation info
85
+
86
+ ```bibtex
87
+ @article{
88
+ title={GPT2xl is underrated task solver},
89
+ author={Nickolich Aleksandr, Karina Romanova, Arseniy Shahmatov, Maksim Gersimenko},
90
+ year={2023}
91
+ }
92
+ ```