Transformers
English
ctranslate2
int8
float16
Inference Endpoints
michaelfeil commited on
Commit
48bd069
1 Parent(s): 3b26078

Upload togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 ctranslate fp16 weights

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -34,18 +34,18 @@ inference:
34
  max_new_tokens: 128
35
  ---
36
  # # Fast-Inference with Ctranslate2
37
- Speedup inference by 2x-8x using int8 inference in C++
38
 
39
  quantized version of [togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1)
40
  ```bash
41
- pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0
42
  ```
43
  Converted on 2023-05-19 using
44
  ```
45
- ct2-transformers-converter --model togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 --output_dir /home/michael/tmp-ct2fast-RedPajama-INCITE-Instruct-7B-v0.1 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
46
  ```
47
 
48
- Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
49
  - `compute_type=int8_float16` for `device="cuda"`
50
  - `compute_type=int8` for `device="cpu"`
51
 
@@ -63,7 +63,7 @@ model = GeneratorCT2fromHfHub(
63
  tokenizer=AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1")
64
  )
65
  outputs = model.generate(
66
- text=["How do you call a fast Flan-ingo?", "User: How are you doing?"],
67
  )
68
  print(outputs)
69
  ```
 
34
  max_new_tokens: 128
35
  ---
36
  # # Fast-Inference with Ctranslate2
37
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
38
 
39
  quantized version of [togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1](https://huggingface.co/togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1)
40
  ```bash
41
+ pip install hf-hub-ctranslate2>=2.0.6
42
  ```
43
  Converted on 2023-05-19 using
44
  ```
45
+ ct2-transformers-converter --model togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1 --output_dir /home/feil_m/tmp-ct2fast-RedPajama-INCITE-Instruct-7B-v0.1 --force --copy_files tokenizer.json README.md tokenizer_config.json generation_config.json special_tokens_map.json .gitattributes --quantization float16
46
  ```
47
 
48
+ Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
49
  - `compute_type=int8_float16` for `device="cuda"`
50
  - `compute_type=int8` for `device="cpu"`
51
 
 
63
  tokenizer=AutoTokenizer.from_pretrained("togethercomputer/RedPajama-INCITE-Instruct-7B-v0.1")
64
  )
65
  outputs = model.generate(
66
+ text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
67
  )
68
  print(outputs)
69
  ```