rpasunuru commited on
Commit
2217fd5
1 Parent(s): 25dcde7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -0
README.md ADDED
@@ -0,0 +1,60 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ tags:
4
+ - text-generation
5
+ - opt
6
+
7
+ license: other
8
+ commercial: false
9
+ ---
10
+ # OPT-IML
11
+
12
+ ## Model Description
13
+
14
+ [OPT-IML (OPT + Instruction Meta-Learning)](https://arxiv.org/abs/2212.12017) is a set of instruction-tuned versions of OPT, on a collection of ~2000 NLP tasks gathered from 8 NLP benchmarks, called OPT-IML Bench.
15
+
16
+ We provide two model versions:
17
+ * OPT-IML trained on 1500 tasks with several tasks held-out for purposes of downstream evaluation, and
18
+ * OPT-IML-Max trained on all ~2000 tasks
19
+
20
+ ### How to use
21
+ You can use this model directly with a pipeline for text generation.
22
+
23
+ ```python
24
+ >>> from transformers import pipeline
25
+
26
+ >>> generator = pipeline('text-generation', model="facebook/opt-iml-1.3b")
27
+
28
+ >>> generator("What is the capital of USA?")
29
+ ```
30
+
31
+ ### Limitations and bias
32
+
33
+ While OPT-IML models outperform baseline OPT on an extensive set of evaluations,
34
+ nevertheless, they are susceptible to the various risks associated with using large language models
35
+ relating to factual correctness, generation of toxic language and enforcing stereotypes. While we release our
36
+ OPT-IML models to proliferate future work on instruction-tuning and to improve the availability
37
+ of large instruction-tuned causal LMs, the use of these models should be
38
+ accompanied with responsible best practices.
39
+
40
+ ## Training data
41
+ OPT-IML models are trained on OPT-IML Bench, a large benchmark for Instruction MetaLearning (IML) of 2000 NLP tasks consolidated into task categories from 8 existing benchmarks include Super-NaturalInstructions, FLAN, PromptSource, etc.
42
+
43
+ ## Training procedure
44
+ The texts are tokenized using the GPT2 byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
45
+
46
+ The 30B model was fine-tuned on 64 40GB A100 GPUs. During fine-tuning, models saw approximately 2 billion tokens, which is only 0.6% of the pre-training
47
+ budget of OPT.
48
+
49
+
50
+ ### BibTeX entry and citation info
51
+ ```bibtex
52
+ @misc{iyer2022opt,
53
+ title={OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization},
54
+ author={Iyer, Srinivasan and Lin, Xi Victoria and Pasunuru, Ramakanth and Mihaylov, Todor and Simig, D{\'a}niel and Yu, Ping and Shuster, Kurt and Wang, Tianlu and Liu, Qing and Koura, Punit Singh and others},
55
+ year={2022},
56
+ eprint={2212.12017},
57
+ archivePrefix={arXiv},
58
+ primaryClass={cs.CL}
59
+ }
60
+ ```