File size: 3,623 Bytes
716f1ef
 
 
 
 
c559e40
 
1bc65b3
e45e211
a49acd1
e45e211
a49acd1
e45e211
 
a49acd1
e45e211
a49acd1
e45e211
 
a49acd1
e45e211
a49acd1
e45e211
a49acd1
 
 
 
 
 
 
 
 
 
 
 
 
 
e45e211
 
 
 
 
 
 
 
f28ce2c
e45e211
 
 
 
 
 
5858e96
a49acd1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: apache-2.0
datasets:
- databricks/databricks-dolly-15k
---
![Banner](tinyChat.jpeg)

# tinyChat: Instruction-Based LLM, <1% the size of GPT-3

Introducing tinyChat, the instruction-based Large Language Model (LLM) that’s less than 1% the size of GPT-3.5. tinyChat is an open-source model under the Apache 2.0 license and based on Google’s Flan-T5-Large, a 770m parameter model. By fine tuning on the databricks-dolly-15k dataset, tinyChat demonstrates improved outputs on a range of tasks compared to Flan-T5. Although not as performant as larger models, tinyChat can perform a variety of NLP tasks such as summarization, question answering, and sentiment analysis using instruction prompts.

tinyChat is available on the HuggingFace model hub and the code repository is on [GitHub](https://github.com/Leadmatic/tinyChat). 


## Dataset

[databricks-dolly-15k](https://huggingface.co/datasets/databricks/databricks-dolly-15k) - databricks-dolly-15k is an open source dataset of instruction-following records generated by thousands of Databricks employees in several of the behavioral categories outlined in the InstructGPT paper, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.


## Benchmark

The following are results from different models tested on the EleutherAI LLM Evaluation Harness. These results indicate that tinyChat is not as performant as the other models. It also shows that tinyChat is only slightly better than Flan-t5-large on openbookqa while performing worse on other datasets. However, tinyChat shows better outputs when provided with creative prompts compared to its base model. See [blog post](https://leadmaticv3.webflow.io/blog/tinychat) for examples. This shows the limitations of these benchmarks for evaluating generative models.

| model                    | openbookqa | arc_easy | winogrande | hellaswag | arc_challenge | piqa    | boolq   |
|--------------------------|-----------|----------|------------|-----------|---------------|---------|---------|
| cerebras/Cerebras-GPT-13B | 0.36      | 0.598906 | 0.607735   | 0.593109  | 0.325939      | 0.749728 | 0.611621|
| EleutherAI/gpt-j-6B       | 0.382     | 0.621633 | 0.651144   | 0.662617  | 0.363481      | 0.761153 | 0.655963|
| dolly-v1-6b (1 epoch)     | 0.428     | 0.608586 | 0.633781   | 0.650568  | 0.377133      | 0.761697 | 0.69633 |
| dolly-v1-6b (10 epochs)   | 0.41      | 0.62963  | 0.643252   | 0.676758  | 0.384812      | 0.773667 | 0.687768|
| EleutherAI/gpt-neox-20b   | 0.402     | 0.683923 | 0.656669   | 0.7142    | 0.408703      | 0.784004 | 0.695413|
| google/flan-t5-large      | 0.3120    | 0.5724   | 0.5991     | 0.4871    | 0.3072        | 0.7220   | 0.8645  |
| leadmatic/tinyChat        | 0.3320    | 0.4811   | 0.5825     | 0.4519    | 0.2961        | 0.7073   | 0.8358  |


## Limitations

tinyChat is prone to hallucination and displays model bias. It is under active development and is currently intended for research purposes only.

## Running the Code

```python
import transformers
from transformers import PeftModel

model_name = "google/flan-t5-large"
peft_model_id = "Leadmatic/tinyChat"
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
base_model = transformers.AutoModelForSeq2SeqLM.from_pretrained(model_name)
peft_model = PeftModel.from_pretrained(base_model, peft_model_id)

inputs = tokenizer("""[INSERT INSTRUCTION HERE]""", return_tensors="pt")
outputs = peft_model.generate(**inputs, max_length=300, do_sample=True)
print(tokenizer.batch_decode(outputs, skip_special_tokens=True))
```