michaelfeil commited on
Commit
2b639aa
1 Parent(s): deab3ca

Upload EleutherAI/pythia-160m ctranslate fp16 weights

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -13,18 +13,18 @@ datasets:
13
  - the_pile
14
  ---
15
  # # Fast-Inference with Ctranslate2
16
- Speedup inference by 2x-8x using int8 inference in C++
17
 
18
  quantized version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m)
19
  ```bash
20
- pip install hf-hub-ctranslate2>=2.0.6 ctranslate2>=3.13.0
21
  ```
22
  Converted on 2023-05-19 using
23
  ```
24
- ct2-transformers-converter --model EleutherAI/pythia-160m --output_dir /home/michael/tmp-ct2fast-pythia-160m --force --copy_files tokenizer.json README.md tokenizer_config.json special_tokens_map.json .gitattributes --quantization float16
25
  ```
26
 
27
- Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
28
  - `compute_type=int8_float16` for `device="cuda"`
29
  - `compute_type=int8` for `device="cpu"`
30
 
@@ -42,7 +42,7 @@ model = GeneratorCT2fromHfHub(
42
  tokenizer=AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
43
  )
44
  outputs = model.generate(
45
- text=["How do you call a fast Flan-ingo?", "User: How are you doing?"],
46
  )
47
  print(outputs)
48
  ```
 
13
  - the_pile
14
  ---
15
  # # Fast-Inference with Ctranslate2
16
+ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
17
 
18
  quantized version of [EleutherAI/pythia-160m](https://huggingface.co/EleutherAI/pythia-160m)
19
  ```bash
20
+ pip install hf-hub-ctranslate2>=2.0.6
21
  ```
22
  Converted on 2023-05-19 using
23
  ```
24
+ ct2-transformers-converter --model EleutherAI/pythia-160m --output_dir /home/feil_m/tmp-ct2fast-pythia-160m --force --copy_files tokenizer.json README.md tokenizer_config.json special_tokens_map.json .gitattributes --quantization float16
25
  ```
26
 
27
+ Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
28
  - `compute_type=int8_float16` for `device="cuda"`
29
  - `compute_type=int8` for `device="cpu"`
30
 
 
42
  tokenizer=AutoTokenizer.from_pretrained("EleutherAI/pythia-160m")
43
  )
44
  outputs = model.generate(
45
+ text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
46
  )
47
  print(outputs)
48
  ```