michaelfeil commited on
Commit
09704ca
1 Parent(s): df39314

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Fast-Inference with Ctranslate2
2
+ Speedup inference by 2x-8x using int8 inference in C++
3
+
4
+ ```bash
5
+ pip install hf_hub_ctranslate2>=1.0.0 ctranslate2>=3.13.0
6
+ ```
7
+
8
+
9
+ Checkpoint compatible to [ctranslate2](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
10
+ - `compute_type=int8_float16` for `device="cuda"`
11
+ - `compute_type=int8` for `device="cuda"`
12
+
13
+ ```python
14
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
15
+
16
+ model_name = "michaelfeil/ct2fast-flan-alpaca-base"
17
+ model = GeneratorCT2fromHfHub(
18
+ # load in int8 on CUDA
19
+ model_name_or_path=model_name,
20
+ device="cuda",
21
+ compute_type="int8_float16"
22
+ )
23
+ outputs = model.generate(
24
+ text=["How do you call a fast Flan-ingo?", "Translate to german: How are you doing?"],
25
+ min_decoding_length=24,
26
+ max_decoding_length=32,
27
+ max_input_length=512,
28
+ beam_size=5
29
+ )
30
+ print(outputs)
31
+ ```