gaudi commited on
Commit
177012d
·
1 Parent(s): 22c52df

README.md Update

Browse files
Files changed (1) hide show
  1. README.md +23 -14
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  tags:
4
  - ctranslate2
@@ -16,12 +15,13 @@ license: apache-2.0
16
 
17
  CTranslate2 implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.
18
 
19
- CTranslate2 is SOTA and is one of the most performant ways of hosting translation models at scale. Current supported models include:
20
  - Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
21
  - Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
22
  - Encoder-only models: BERT, DistilBERT, XLM-RoBERTa
23
 
24
- Speed up inference times by about **2x-8x** using **int8** inference in C++. CTranslate2 is SOTA for hosting translation models at scale.
 
25
  # CTranslate2 Benchmarks
26
  Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings. Tested against `newstest2014` (En -> De) dataset.
27
 
@@ -51,12 +51,16 @@ Please note that the results presented below are only valid for the configuratio
51
  **Source to benchmark information can be found [here](https://github.com/OpenNMT/CTranslate2).**<br />
52
  **Original model BLEU scores can be found [here](https://huggingface.co/Helsinki-NLP/opus-mt-de-ln).**
53
 
 
 
 
 
54
  # CTranslate2 Installation
55
  ```bash
56
  pip install hf-hub-ctranslate2>=1.0.0 ctranslate2>=3.13.0
57
  ```
58
  ### ct2-transformers-converter Command Used:
59
- ```
60
  ct2-transformers-converter --model Helsinki-NLP/opus-mt-de-ln --output_dir ./ctranslate2/opus-mt-de-ln-ctranslate2 --force --copy_files README.md generation_config.json tokenizer_config.json vocab.json source.spm .gitattributes target.spm --quantization float16
61
  ```
62
  # CTranslate2 Converted Checkpoint Information:
@@ -65,24 +69,29 @@ ct2-transformers-converter --model Helsinki-NLP/opus-mt-de-ln --output_dir ./ctr
65
  - [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
66
 
67
  **Compute Type:**
68
- - `compute_type=int8_float16` for `device="cuda"`
69
  - `compute_type=int8` for `device="cpu"`
70
 
71
  # Sample Code - ctranslate2
 
 
 
 
 
72
  ```python
73
  from ctranslate2 import Translator
74
  import transformers
75
 
76
- model_name = "gaudi/opus-mt-de-ln-ctranslate2"
77
  translator = Translator(
78
- model_path=model_name,
79
- device="cuda",
80
- inter_threads=1,
81
- intra_threads=4,
82
- compute_type="int8_float16",
83
  )
84
 
85
- tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
86
 
87
  source = tokenizer.convert_ids_to_tokens(tokenizer.encode("XXXXXX, XXX XX XXXXXX."))
88
  results = translator.translate_batch([source])
@@ -98,9 +107,9 @@ from transformers import AutoTokenizer
98
 
99
  model_name = "gaudi/opus-mt-de-ln-ctranslate2"
100
  model = TranslatorCT2fromHfHub(
101
- model_name_or_path=model_name,
102
  device="cuda",
103
- compute_type="int8_float16" # load in int8 on CUDA,
104
  tokenizer=AutoTokenizer.from_pretrained(model_name)
105
  )
106
  outputs = model.generate(
 
 
1
  ---
2
  tags:
3
  - ctranslate2
 
15
 
16
  CTranslate2 implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.
17
 
18
+ CTranslate2 is one of the most performant ways of hosting translation models at scale. Current supported models include:
19
  - Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
20
  - Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
21
  - Encoder-only models: BERT, DistilBERT, XLM-RoBERTa
22
 
23
+ The project is production-oriented and comes with backward compatibility guarantees, but it also includes experimental features related to model compression and inference acceleration.
24
+
25
  # CTranslate2 Benchmarks
26
  Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings. Tested against `newstest2014` (En -> De) dataset.
27
 
 
51
  **Source to benchmark information can be found [here](https://github.com/OpenNMT/CTranslate2).**<br />
52
  **Original model BLEU scores can be found [here](https://huggingface.co/Helsinki-NLP/opus-mt-de-ln).**
53
 
54
+ ## Internal Benchmarks
55
+ Internal testing on our end showed **inference times reduced by 6x-10x** on average compared the vanilla checkpoints using the *transformers* library. A **slight reduction on BLEU scores (~5%)** was also identified in comparison to the vanilla checkpoints with a few exceptions. This is likely due to several factors, one being the quantization applied. Further testing is needed from our end to better assess the reduction in translation quality. The command used to compile the vanilla checkpoint into a CTranslate2 model can be found below. Modifying this command can yield differing balances between inferencing performance and translation quality.
56
+
57
+
58
  # CTranslate2 Installation
59
  ```bash
60
  pip install hf-hub-ctranslate2>=1.0.0 ctranslate2>=3.13.0
61
  ```
62
  ### ct2-transformers-converter Command Used:
63
+ ```bash
64
  ct2-transformers-converter --model Helsinki-NLP/opus-mt-de-ln --output_dir ./ctranslate2/opus-mt-de-ln-ctranslate2 --force --copy_files README.md generation_config.json tokenizer_config.json vocab.json source.spm .gitattributes target.spm --quantization float16
65
  ```
66
  # CTranslate2 Converted Checkpoint Information:
 
69
  - [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
70
 
71
  **Compute Type:**
72
+ - `compute_type=int8_float16` for `device="cuda"`
73
  - `compute_type=int8` for `device="cpu"`
74
 
75
  # Sample Code - ctranslate2
76
+ #### Clone the repository to the working directory or wherever you wish to store the model artifacts. ####
77
+ ```bash
78
+ git clone https://huggingface.co/gaudi/opus-mt-de-ln-ctranslate2
79
+ ```
80
+ #### Take the python code below and update the 'model_dir' variable to the location of the cloned repository. ####
81
  ```python
82
  from ctranslate2 import Translator
83
  import transformers
84
 
85
+ model_dir = "./opus-mt-de-ln-ctranslate2" # Path to model directory.
86
  translator = Translator(
87
+ model_path=model_dir,
88
+ device="cuda", # cpu, cuda, or auto.
89
+ inter_threads=1, # Maximum number of parallel translations.
90
+ intra_threads=4, # Number of OpenMP threads per translator.
91
+ compute_type="int8_float16", # int8 for cpu or int8_float16 for cuda.
92
  )
93
 
94
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_dir)
95
 
96
  source = tokenizer.convert_ids_to_tokens(tokenizer.encode("XXXXXX, XXX XX XXXXXX."))
97
  results = translator.translate_batch([source])
 
107
 
108
  model_name = "gaudi/opus-mt-de-ln-ctranslate2"
109
  model = TranslatorCT2fromHfHub(
110
+ model_name_or_path=model_name,
111
  device="cuda",
112
+ compute_type="int8_float16",
113
  tokenizer=AutoTokenizer.from_pretrained(model_name)
114
  )
115
  outputs = model.generate(