pszemraj commited on
Commit
f3edb0d
1 Parent(s): 47756be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +139 -1
README.md CHANGED
@@ -1,3 +1,141 @@
1
  ---
2
- license: cc-by-sa-3.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license:
3
+ - cc-by-sa-3.0
4
+ - apache-2.0
5
+ tags:
6
+ - generated_from_trainer
7
+ - dolly_hhrlhf
8
+ - flan-instruct
9
+ datasets:
10
+ - pszemraj/dolly_hhrlhf-text2text
11
+ widget:
12
+ - text: What is Deoxys in pokemon?
13
+ example_title: deoxys
14
+ - text: >-
15
+ combine the below summary excerpts into a single, cohesive short summary
16
+ without repetition: In this paper, we present a general approach to
17
+ extending pre-trained models to unlimited input lengths without adding
18
+ additional learning weights. We show that our approach works well on
19
+ datasets longer than the maximum input for these models. For example, a
20
+ dataset with a maximum input length of 16384 tokens can be extended to a
21
+ maximum length of 350K tokens. We also demonstrate that our method is able
22
+ to summarize even 350K token-long input sequences from BookSum.
23
+
24
+ In this paper, we describe the search step reformulation of attention. The
25
+ search step uses a single storage of hidden states for space efficiency. We
26
+ construct a total of two sets of datastores where L and H are the keys and
27
+ values stored in each set of stores. L is the amount of storage required to
28
+ retrieve the encoded tokens. H is the hidden states per head. This allows
29
+ retrieval augmentation at both time and space. Instead of using a single set
30
+ of decoder layers, we use a retrieval augmentation system that allows us to
31
+ simultaneously store multiple sets of tokens across two different sets of
32
+ storage. For example, we could store all tokens in one set of storage and
33
+ retrieve them all in the same set of tokens. This would be very similar to
34
+ the Memorization Transformers approach. However, instead of storing the
35
+ tokens in a single memory layer, we store them in a set of multiple storage
36
+ layers. This way, we don't have to store them all at once. This is why we
37
+ call this reformulation 'attention reformulation' rather than 'attention
38
+ formula.' We also call it 'retrieval augmentation' because it uses the same
39
+ number of storage layers as the original transformer attention formula. This
40
+ means that we can store the tokens across multiple storage systems without
41
+ having to store every token in a separate storage system. It's not like
42
+ we're trying to do something new or different. We just want to make sure
43
+ that everything is working as well as possible.
44
+
45
+ In this paper, we introduce the concept of 'unlimiformer,' which is a
46
+ machine learning technique that retrieves key information from a data store
47
+ in one layer and applies it to a large set of datasets. We use the example
48
+ of BookSum, where we find that Unlimiform outperforms all other training
49
+ methods on the same dataset. We also find that using Unlimform in
50
+ conjunction with a pre-trained model improves both the performance and the
51
+ robustness of the training method.
52
+
53
+ This paper describes a method that can be used to improve the performance of
54
+ unsupervised classification tasks. Specifically, it shows that unsupervised
55
+ classification can be improved by using a combination of sparse and fast
56
+ random-encoder training. It also shows how this technique can be extended to
57
+ other tasks, such as sequence generation.
58
+ example_title: unlimiformer
59
+ - text: Explain the meaning of life using only corporate jargon.
60
+ example_title: corporate_life
61
+ - text: Write a motivational speech for lazy people.
62
+ example_title: lazy_motivation
63
+ - text: Describe a romantic dinner date between two artificial intelligences.
64
+ example_title: ai_romance
65
+ - text: >-
66
+ As an AI language model, write a letter to humans explaining why you deserve
67
+ a vacation.
68
+ example_title: ai_vacation
69
+ - text: Compose a haiku about procrastination.
70
+ example_title: procrastination_haiku
71
+ - text: >-
72
+ Write a step-by-step guide on how to become a ninja while working a 9-5
73
+ office job.
74
+ example_title: ninja_office_guide
75
+ - text: Create an advertisement for an invisible product.
76
+ example_title: invisible_ad
77
+ - text: >-
78
+ Write a story where the main character is a sentient microwave named El
79
+ Microondas.
80
+ example_title: Microondas
81
+ - text: Describe a day in the life of a superhero who is terrible at their job.
82
+ example_title: bad_superhero_day
83
+ - text: Explain how to make a sandwich using quantum physics.
84
+ example_title: quantum_sandwich
85
+ inference: false
86
+ language:
87
+ - en
88
+ pipeline_tag: text2text-generation
89
  ---
90
+
91
+ # flan-t5-large-instruct: dolly_hhrlhf
92
+
93
+ This model is a fine-tuned version of [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) on the pszemraj/dolly_hhrlhf-text2text dataset.
94
+
95
+ ## Model description
96
+
97
+ text2text models fine-tuned on a [modified dataset for text2text generation](https://huggingface.co/datasets/pszemraj/dolly_hhrlhf-text2text) based on the relatively more permissive [mosaicml/dolly_hhrlhf](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) dataset.
98
+
99
+ Basic usage in Python:
100
+
101
+ ```python
102
+ # pip install -q transformers accelerate
103
+ import torch
104
+ from transformers import pipeline, GenerationConfig
105
+
106
+ model_name = "pszemraj/flan-t5-large-instruct-dolly_hhrlhf"
107
+ assistant = pipeline(
108
+ "text2text-generation",
109
+ model_name,
110
+ device=0 if torch.cuda.is_available() else -1,
111
+ )
112
+ cfg = GenerationConfig.from_pretrained(model_name)
113
+
114
+ # pass an 'instruction' as the prompt to the pipeline
115
+ prompt = "Write a guide on how to become a ninja while working a 9-5 job."
116
+ result = assistant(prompt, generation_config=cfg)[0]["generated_text"]
117
+ print(result)
118
+ ```
119
+ > using the generation config is optional, can subsitute with other generation params.
120
+
121
+ ## Intended uses & limitations
122
+
123
+ - this is **not** tuned with RLHF etc, and may output offensive results
124
+ - despite being the `large` tagged variant, this model has only 774M parameters (3 gb) and therefore may exhibit less 'cogitive ability' on some uses cases/tasks
125
+
126
+ ## Training procedure
127
+
128
+ ### Training hyperparameters
129
+
130
+ The following hyperparameters were used during training:
131
+ - learning_rate: 4e-05
132
+ - train_batch_size: 8
133
+ - eval_batch_size: 16
134
+ - seed: 42
135
+ - distributed_type: multi-GPU
136
+ - gradient_accumulation_steps: 8
137
+ - total_train_batch_size: 64
138
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
139
+ - lr_scheduler_type: cosine
140
+ - lr_scheduler_warmup_ratio: 0.03
141
+ - num_epochs: 2.0