michaelfeil commited on
Commit
48221dd
1 Parent(s): 0e006ee

Upload bigcode/starcoder ctranslate fp16 weights

Browse files
Files changed (1) hide show
  1. README.md +5 -4
README.md CHANGED
@@ -264,9 +264,9 @@ Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on
264
 
265
  quantized version of [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
266
  ```bash
267
- pip install hf-hub-ctranslate2>=2.0.8
268
  ```
269
- Converted on 2023-05-30 using
270
  ```
271
  ct2-transformers-converter --model bigcode/starcoder --output_dir /home/michael/tmp-ct2fast-starcoder --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json generation_config.json special_tokens_map.json .gitattributes --quantization float16 --trust_remote_code
272
  ```
@@ -290,7 +290,8 @@ model = GeneratorCT2fromHfHub(
290
  )
291
  outputs = model.generate(
292
  text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
293
- max_length=64
 
294
  )
295
  print(outputs)
296
  ```
@@ -356,7 +357,7 @@ print(tokenizer.decode(outputs[0]))
356
  Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
357
 
358
  ```python
359
- input_text = "<fim-prefix>def print_hello_world():\n <fim-suffix>\n print('Hello world!')<fim-middle>"
360
  inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
361
  outputs = model.generate(inputs)
362
  print(tokenizer.decode(outputs[0]))
 
264
 
265
  quantized version of [bigcode/starcoder](https://huggingface.co/bigcode/starcoder)
266
  ```bash
267
+ pip install hf-hub-ctranslate2>=2.0.8 ctranslate2>=3.14.0
268
  ```
269
+ Converted on 2023-05-31 using
270
  ```
271
  ct2-transformers-converter --model bigcode/starcoder --output_dir /home/michael/tmp-ct2fast-starcoder --force --copy_files merges.txt tokenizer.json README.md tokenizer_config.json vocab.json generation_config.json special_tokens_map.json .gitattributes --quantization float16 --trust_remote_code
272
  ```
 
290
  )
291
  outputs = model.generate(
292
  text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
293
+ max_length=64,
294
+ include_prompt_in_result=False
295
  )
296
  print(outputs)
297
  ```
 
357
  Fill-in-the-middle uses special tokens to identify the prefix/middle/suffix part of the input and output:
358
 
359
  ```python
360
+ input_text = "<fim_prefix>def print_hello_world():\n <fim_suffix>\n print('Hello world!')<fim_middle>"
361
  inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
362
  outputs = model.generate(inputs)
363
  print(tokenizer.decode(outputs[0]))