NickyNicky commited on
Commit
edf8d43
1 Parent(s): 276d4dd

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Open-Orca/OpenOrca
5
+ - OpenAssistant/oasst_top1_2023-08-25
6
+ language:
7
+ - bg
8
+ - ca
9
+ - cs
10
+ - da
11
+ - de
12
+ - en
13
+ - es
14
+ - fr
15
+ - hr
16
+ - hu
17
+ - it
18
+ - nl
19
+ - pl
20
+ - pt
21
+ - ro
22
+ - ru
23
+ - sl
24
+ - sr
25
+ - sv
26
+ - uk
27
+
28
+ library_name: transformers
29
+ ---
30
+
31
+ ```
32
+ reference-data-model:
33
+
34
+ datasets:
35
+ - OpenAssistant/oasst_top1_2023-08-25:
36
+ lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk"
37
+ link: https://huggingface.co/datasets/OpenAssistant/oasst_top1_2023-08-25
38
+
39
+ model:
40
+ - Open-Orca/Mistral-7B-OpenOrca
41
+ link: https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca
42
+
43
+ 100 examples of generating:
44
+ link: https://docs.google.com/spreadsheets/d/1_4rqFnhgvjA7trwAaEidaRWczAMzuKpw/edit?usp=sharing&ouid=116592149115238887304&rtpof=true&sd=true
45
+
46
+ ```
47
+
48
+
49
+ ## Version
50
+ ```py
51
+ import torch, transformers,torchvision
52
+ torch.__version__,transformers.__version__, torchvision.__version__
53
+ #OUTPUTS: ('2.0.1+cu118', '4.34.0.dev0', '0.15.2+cu118')
54
+ ```
55
+
56
+ ## How to use
57
+ ```py
58
+
59
+ from transformers import (
60
+ AutoModelForCausalLM,
61
+ AutoTokenizer,
62
+ BitsAndBytesConfig,
63
+ HfArgumentParser,
64
+ TrainingArguments,
65
+ pipeline,
66
+ logging,
67
+ GenerationConfig,
68
+ TextIteratorStreamer,
69
+ )
70
+ import torch
71
+
72
+ # model_id = 'Open-Orca/Mistral-7B-OpenOrca'
73
+ model_id='NickyNicky/Mistral-7B-OpenOrca-oasst_top1_2023-08-25-v2'
74
+
75
+ model = AutoModelForCausalLM.from_pretrained(model_id,
76
+ device_map="auto",
77
+ trust_remote_code=True,
78
+ torch_dtype=torch.bfloat16,
79
+ load_in_4bit=True,
80
+ low_cpu_mem_usage= True,
81
+ )
82
+
83
+ max_length=2048
84
+ print("max_length",max_length)
85
+
86
+
87
+ tokenizer = AutoTokenizer.from_pretrained(model_id,
88
+ # use_fast = False,
89
+ max_length=max_length,)
90
+
91
+ tokenizer.pad_token = tokenizer.eos_token
92
+ tokenizer.padding_side = 'right'
93
+
94
+ #EXAMPLE #1
95
+ txt="""<|im_start|>user
96
+ I'm looking for an efficient Python script to output prime numbers. Can you help me out? I'm interested in a script that can handle large numbers and output them quickly. Also, it would be great if the script could take a range of numbers as input and output all the prime numbers within that range. Can you generate a script that fits these requirements? Thanks!<|im_end|>
97
+ <|im_start|>assistant
98
+ """
99
+
100
+ #EXAMPLE #2
101
+ txt="""<|im_start|>user
102
+ Estoy desarrollando una REST API con Nodejs, y estoy tratando de aplicar algún sistema de seguridad, ya sea con tokens o algo similar, me puedes ayudar?<|im_end|>
103
+ <|im_start|>assistant
104
+ """
105
+
106
+ inputs = tokenizer.encode(txt, return_tensors="pt").to("cuda")
107
+
108
+ generation_config = GenerationConfig(
109
+ max_new_tokens=max_new_tokens,
110
+ temperature=0.7,
111
+ top_p=0.9,
112
+ top_k=len_tokens,
113
+ repetition_penalty=1.11,
114
+ do_sample=True,
115
+ # pad_token_id=tokenizer.eos_token_id,
116
+ # eos_token_id=tokenizer.eos_token_id,
117
+ # use_cache=True,
118
+ # stopping_criteria= StoppingCriteriaList([stopping_criteria]),
119
+ )
120
+ outputs = model.generate(generation_config=generation_config,
121
+ input_ids=inputs,)
122
+ return tokenizer.decode(outputs[0], skip_special_tokens=False) #True
123
+ ```