pszemraj commited on
Commit
2385c54
1 Parent(s): 3fb5b15

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -0
README.md CHANGED
@@ -1,3 +1,68 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - togethercomputer/RedPajama-Data-1T
5
+ language:
6
+ - en
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - sharded
10
+ - bf16
11
+ - instruct
12
  ---
13
+
14
+ # togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1
15
+
16
+
17
+ This is the `togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1` model but the model file(s) were sharded to ~2GB each to ensure it's possible to load on low-RAM runtimes (like Colab).
18
+
19
+ Please refer to the [original model card](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1) for all details/issues w.r.t. to this model. Below as an adapted version of the inference code just as a reference.
20
+
21
+ ## basic inference
22
+
23
+ See the original model card for more options etc.
24
+
25
+ install packages
26
+
27
+ ```bash
28
+ pip install -U transformers accelerate
29
+ ```
30
+
31
+ inference (this will use a GPU if available):
32
+ ```python
33
+ import torch
34
+ import transformers
35
+ from transformers import AutoTokenizer, AutoModelForCausalLM
36
+
37
+ MIN_TRANSFORMERS_VERSION = "4.25.1"
38
+
39
+ # check transformers version
40
+ assert (
41
+ transformers.__version__ >= MIN_TRANSFORMERS_VERSION
42
+ ), f"Please upgrade transformers to version {MIN_TRANSFORMERS_VERSION} or higher."
43
+
44
+ model_name = "ethzanalytics/RedPajama-INCITE-Instruct-7B-v0.1-sharded-bf16"
45
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
46
+ model = AutoModelForCausalLM.from_pretrained(
47
+ model_name, torch_dtype=torch.bfloat16, device_map="auto"
48
+ )
49
+ # infer
50
+ prompt = "Q: The capital of France is?\nA:"
51
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
52
+ input_length = inputs.input_ids.shape[1]
53
+ outputs = model.generate(
54
+ **inputs,
55
+ max_new_tokens=128,
56
+ do_sample=True,
57
+ temperature=0.7,
58
+ top_p=0.7,
59
+ top_k=50,
60
+ return_dict_in_generate=True,
61
+ )
62
+ token = outputs.sequences[0, input_length:]
63
+ output_str = tokenizer.decode(token)
64
+ print(output_str)
65
+ """
66
+ Paris
67
+ """
68
+ ```