flexudy commited on
Commit
da43d4f
β€’
1 Parent(s): 446e7f9
Files changed (1) hide show
  1. README.md +54 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Cheapity3 🐷
2
+ GPT3-like T5 model trained to generate text in multiple languages.
3
+
4
+ ## Motivation
5
+ - GPT models are expensive run.
6
+ - GPT models are monolingual.
7
+
8
+ ## Solution
9
+ - Maybe, Small Models aren't Terrible (*SMarT*)
10
+ - Plus, they are cheaper to run.
11
+
12
+ I fine-tuned T5 on multiple languages (πŸ‡¬πŸ‡§ English, πŸ‡©πŸ‡ͺ German, πŸ‡«πŸ‡· French) and multiple academic text snippets from various
13
+ domains like tech, law, finance and science etc. to generate text, just like GPT models do.
14
+
15
+ ## Usage
16
+ - Provide some text e.g `"Italy, officially the Italian Republic is a country consisting of"`
17
+ - Tell Cheapity3 how many words you want to generate e.g `15` -- πŸ˜ƒ Yes, you can control the length.
18
+ - Cheapity3 reads your text and generates a continuation containing approximately 15 words.
19
+
20
+ ```python
21
+ from transformers import AutoTokenizer, AutoModelWithLMHead
22
+
23
+ tokenizer = AutoTokenizer.from_pretrained("flexudy/cheapity3")
24
+
25
+ model = AutoModelWithLMHead.from_pretrained("flexudy/cheapity3")
26
+
27
+ input_text = "guess: Italy, officially the Italian Republic is a country consisting of { _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ }" # 15 words
28
+
29
+ inputs = tokenizer.encode(input_text, return_tensors="pt", truncation=True, max_length=512)
30
+
31
+ input_ids = inputs["input_ids"]
32
+
33
+ attention_mask = inputs["attention_mask"]
34
+
35
+ outputs = model.generate(
36
+ input_ids=input_ids,
37
+ attention_mask=attention_mask,
38
+ max_length=128,
39
+ do_sample=True,
40
+ early_stopping=True,
41
+ num_return_sequences=4,
42
+ repetition_penalty=2.5
43
+ )
44
+
45
+ for i in range(4):
46
+ print(tokenizer.decode(outputs[i], skip_special_tokens=True, clean_up_tokenization_spaces=True))
47
+
48
+ # >
49
+ # >
50
+ # >
51
+ # >
52
+ ```
53
+
54
+ #