michaelfeil commited on May 23, 2023

Commit

fe53256

•

1 Parent(s): 87f3421

Upload HuggingFaceH4/starchat-alpha ctranslate fp16 weights

Browse files

Files changed (19) hide show

README.md +132 -0
TRAINER_README.md +63 -0
added_tokens.json +6 -0
all_results.json +14 -0
config.json +5 -0
dialogue_template.json +8 -0
eval_results.json +9 -0
generation_config.json +6 -0
handler.py +51 -0
merges.txt +0 -0
model.bin +3 -0
requirements.txt +5 -0
special_tokens_map.json +11 -0
tokenizer.json +0 -0
tokenizer_config.json +31 -0
train_results.json +8 -0
trainer_state.json +127 -0
training_args.bin +3 -0
vocabulary.txt +0 -0

README.md ADDED Viewed

	@@ -0,0 +1,132 @@

+---
+license: bigcode-openrail-m
+datasets:
+- OpenAssistant/oasst1
+- databricks/databricks-dolly-15k
+language:
+- en
+library_name: transformers
+tags:
+- ctranslate2
+- int8
+- float16
+- code
+---
+# # Fast-Inference with Ctranslate2
+Speedup inference while reducing memory by 2x-4x using int8 inference in C++ on CPU or GPU.
+quantized version of [HuggingFaceH4/starchat-alpha](https://huggingface.co/HuggingFaceH4/starchat-alpha)
+```bash
+pip install hf-hub-ctranslate2>=2.0.8
+```
+Converted on 2023-05-23 using
+```
+ct2-transformers-converter --model HuggingFaceH4/starchat-alpha --output_dir /home/michael/tmp-ct2fast-starchat-alpha --force --copy_files merges.txt all_results.json training_args.bin tokenizer.json README.md dialogue_template.json tokenizer_config.json eval_results.json TRAINER_README.md train_results.json generation_config.json trainer_state.json special_tokens_map.json added_tokens.json handler.py requirements.txt .gitattributes --quantization float16
+```
+Checkpoint compatible to [ctranslate2>=3.13.0](https://github.com/OpenNMT/CTranslate2) and [hf-hub-ctranslate2>=2.0.6](https://github.com/michaelfeil/hf-hub-ctranslate2)
+- `compute_type=int8_float16` for `device="cuda"`
+- `compute_type=int8`  for `device="cpu"`
+```python
+from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
+from transformers import AutoTokenizer
+model_name = "michaelfeil/ct2fast-starchat-alpha"
+# use either TranslatorCT2fromHfHub or GeneratorCT2fromHfHub here, depending on model.
+model = GeneratorCT2fromHfHub(
+        # load in int8 on CUDA
+        model_name_or_path=model_name,
+        device="cuda",
+        compute_type="int8_float16",
+        # tokenizer=AutoTokenizer.from_pretrained("HuggingFaceH4/starchat-alpha")
+)
+outputs = model.generate(
+    text=["How do you call a fast Flan-ingo?", "User: How are you doing? Bot:"],
+    max_length=64
+)
+print(outputs)
+```
+# Licence and other remarks:
+This is just a quantized version. Licence conditions are intended to be idential to original huggingface repo.
+# Original description
+# Model Card for StarChat Alpha
+<!-- Provide a quick summary of what the model is/does. -->
+StarChat is a series of language models that are fine-tuned from StarCoder to act as helpful coding assistants. StarChat Alpha is the first of these models, and as an alpha release is only intended for educational or research purpopses. In particular, the model has not been aligned to human preferences with techniques like RLHF, so may generate problematic content (especially when prompted to do so).
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Model type:** A 16B parameter GPT-like model fine-tuned on a blend of the [`oasst1`](https://huggingface.co/datasets/OpenAssistant/oasst1) and [`databricks-dolly-15k`](https://huggingface.co/datasets/databricks/databricks-dolly-15k) datasets.
+- **Language(s) (NLP):** English
+- **License:** BigCode Open RAIL-M v1
+- **Finetuned from model:** [bigcode/starcoderbase](https://huggingface.co/bigcode/starcoderbase)
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/bigcode-project/starcoder
+- **Demo:** https://huggingface.co/spaces/HuggingFaceH4/starchat-playground
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+StarChat Alpha is intended for educational and/or research purposes and in that respect can be used to probe the programming capabilities of open-source language models.
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+StarChat Alpha has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so).
+Models trained primarily on code data will also have a more skewed demographic bias commensurate with the demographics of the GitHub community, for more on this see the [StarCoder dataset](https://huggingface.co/datasets/bigcode/starcoderdata) which is derived from The Stack.
+Since the base model was pretrained on a large corpus of code, it may produce code snippets that are syntactically valid but semantically incorrect.
+For example, it may produce code that does not compile or that produces incorrect results.
+It may also produce code that is vulnerable to security exploits.
+We have observed the model also has a tendency to produce false URLs which should be carefully inspected before clicking.
+StarChat Alpha was fine-tuned from the base model [StarCoder Base](https://huggingface.co/bigcode/starcoderbase), please refer to its model card's [Limitations Section](https://huggingface.co/bigcode/starcoderbase#limitations) for relevant information.
+In particular, the model was evaluated on some categories of gender biases, propensity for toxicity, and risk of suggesting code completions with known security flaws; these evaluations are reported in its [technical report](https://drive.google.com/file/d/1cN-b9GnWtHzQRoE7M7gAEyivY0kl4BYs/view).
+## How to Get Started with the Model
+Use the code below to get started with the model.
+```python
+from transformers import pipeline
+pipe = pipeline("text-generation", model="HuggingFaceH4/starchat-alpha")
+# Inputs use chat tokens
+inputs = "<|system|>\n<|end|>\n<|user|>How can I sort a list in Python?<|end|>\n<|assistant|>"
+outputs = pipe(inputs)
+```
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+```
+@article{Tunstall2023starchat-alpha,
+  author = {Tunstall, Lewis and Lambert, Nathan and Rajani, Nazneen and Beeching, Edward and Le Scao, Teven and von Werra, Leandro and Han, Sheon and Schmid, Philipp and Rush, Alexander},
+  title = {Creating a Coding Assistant with StarCoder},
+  journal = {Hugging Face Blog},
+  year = {2023},
+  note = {https://huggingface.co/blog/starchat},
+}
+```

TRAINER_README.md ADDED Viewed

	@@ -0,0 +1,63 @@

+---
+tags:
+- generated_from_trainer
+model-index:
+- name: starcoder-ift
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# starcoder-ift
+This model is a fine-tuned version of [bigcode/large-model](https://huggingface.co/bigcode/large-model) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.4943
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 2
+- eval_batch_size: 2
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 8
+- total_train_batch_size: 128
+- total_eval_batch_size: 16
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.03
+- num_epochs: 3
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 1.6668        | 0.99  | 65   | 1.6167          |
+| 1.3584        | 2.0   | 131  | 1.5126          |
+| 1.0949        | 2.98  | 195  | 1.4943          |
+### Framework versions
+- Transformers 4.28.1
+- Pytorch 1.13.1+cu117
+- Datasets 2.12.0
+- Tokenizers 0.13.3

added_tokens.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "<|assistant|>": 49154,
+  "<|end|>": 49155,
+  "<|system|>": 49152,
+  "<|user|>": 49153
+}

all_results.json ADDED Viewed

	@@ -0,0 +1,14 @@

+{
+    "epoch": 2.98,
+    "eval_loss": 1.4942539930343628,
+    "eval_runtime": 22.3723,
+    "eval_samples": 879,
+    "eval_samples_per_second": 39.29,
+    "eval_steps_per_second": 2.458,
+    "perplexity": 4.456011097278015,
+    "train_loss": 1.4244933299529248,
+    "train_runtime": 3089.4842,
+    "train_samples": 8372,
+    "train_samples_per_second": 8.13,
+    "train_steps_per_second": 0.063
+}

config.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

dialogue_template.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "system": "",
+  "messages": null,
+  "system_token": "<|system|>",
+  "user_token": "<|user|>",
+  "assistant_token": "<|assistant|>",
+  "end_token": "<|end|>"
+}

eval_results.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+    "epoch": 2.98,
+    "eval_loss": 1.4942539930343628,
+    "eval_runtime": 22.3723,
+    "eval_samples": 879,
+    "eval_samples_per_second": 39.29,
+    "eval_steps_per_second": 2.458,
+    "perplexity": 4.456011097278015
+}

generation_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "_from_model_config": true,
+  "bos_token_id": 0,
+  "eos_token_id": 0,
+  "transformers_version": "4.28.1"
+}

handler.py ADDED Viewed

	@@ -0,0 +1,51 @@

+from typing import Any, Dict
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from peft import PeftConfig, PeftModel
+class EndpointHandler:
+    def __init__(self, path=""):
+        # load model and processor from path
+        self.tokenizer = AutoTokenizer.from_pretrained(path)
+        try:
+            config = PeftConfig.from_pretrained(path)
+            model = AutoModelForCausalLM.from_pretrained(
+                config.base_model_name_or_path,
+                return_dict=True,
+                load_in_8bit=True,
+                device_map="auto",
+                torch_dtype=torch.float16,
+            )
+            model.resize_token_embeddings(len(self.tokenizer))
+            model = PeftModel.from_pretrained(model, path)
+        except Exception:
+            model = AutoModelForCausalLM.from_pretrained(
+                path,
+                device_map="auto",
+                load_in_8bit=True,
+                torch_dtype=torch.float16,
+            )
+        self.model = model
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+    def __call__(self, data: Dict[str, Any]) -> Dict[str, str]:
+        # process input
+        inputs = data.pop("inputs", data)
+        parameters = data.pop("parameters", None)
+        # preprocess
+        inputs = self.tokenizer(inputs, return_tensors="pt").to(self.device)
+        # pass inputs with all kwargs in data
+        if parameters is not None:
+            outputs = self.model.generate(**inputs, **parameters)
+        else:
+            outputs = self.model.generate(**inputs)
+        # postprocess the prediction
+        prediction = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
+        return [{"generated_text": prediction}]

merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:51cb45ed493b6884702ab15c1580a6269dbb57571f9ddf473bbe063ec842c38d
+size 36949920075

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+transformers==4.28.1
+accelerate>=0.16.0
+bitsandbytes
+sentencepiece
+git+https://github.com/huggingface/peft.git@632997d1fb776c3cf05d8c2537ac9a98a7ce9435

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,11 @@

+{
+  "additional_special_tokens": [
+    "<|system|>",
+    "<|user|>",
+    "<|assistant|>",
+    "<|end|>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "eos_token": "<|endoftext|>",
+  "unk_token": "<|endoftext|>"
+}

tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "add_prefix_space": false,
+  "additional_special_tokens": [
+    "<|endoftext|>",
+    "<fim_prefix>",
+    "<fim_middle>",
+    "<fim_suffix>",
+    "<fim_pad>",
+    "<filename>",
+    "<gh_stars>",
+    "<issue_start>",
+    "<issue_comment>",
+    "<issue_closed>",
+    "<jupyter_start>",
+    "<jupyter_text>",
+    "<jupyter_code>",
+    "<jupyter_output>",
+    "<empty_output>",
+    "<commit_before>",
+    "<commit_msg>",
+    "<commit_after>",
+    "<reponame>"
+  ],
+  "bos_token": "<|endoftext|>",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 1000000000000000019884624838656,
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>",
+  "vocab_size": 49152
+}

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 2.98,
+    "train_loss": 1.4244933299529248,
+    "train_runtime": 3089.4842,
+    "train_samples": 8372,
+    "train_samples_per_second": 8.13,
+    "train_steps_per_second": 0.063
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,127 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.9770992366412212,
+  "global_step": 195,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.02,
+      "learning_rate": 0.0,
+      "loss": 2.2517,
+      "step": 1
+    },
+    {
+      "epoch": 0.24,
+      "learning_rate": 2e-05,
+      "loss": 1.9184,
+      "step": 16
+    },
+    {
+      "epoch": 0.49,
+      "learning_rate": 2e-05,
+      "loss": 1.7725,
+      "step": 32
+    },
+    {
+      "epoch": 0.73,
+      "learning_rate": 2e-05,
+      "loss": 1.7075,
+      "step": 48
+    },
+    {
+      "epoch": 0.98,
+      "learning_rate": 2e-05,
+      "loss": 1.6668,
+      "step": 64
+    },
+    {
+      "epoch": 0.99,
+      "eval_loss": 1.6166764497756958,
+      "eval_runtime": 22.5473,
+      "eval_samples_per_second": 38.985,
+      "eval_steps_per_second": 2.439,
+      "step": 65
+    },
+    {
+      "epoch": 1.22,
+      "learning_rate": 2e-05,
+      "loss": 1.4567,
+      "step": 80
+    },
+    {
+      "epoch": 1.47,
+      "learning_rate": 2e-05,
+      "loss": 1.4109,
+      "step": 96
+    },
+    {
+      "epoch": 1.71,
+      "learning_rate": 2e-05,
+      "loss": 1.3875,
+      "step": 112
+    },
+    {
+      "epoch": 1.95,
+      "learning_rate": 2e-05,
+      "loss": 1.3584,
+      "step": 128
+    },
+    {
+      "epoch": 2.0,
+      "eval_loss": 1.512587547302246,
+      "eval_runtime": 22.5311,
+      "eval_samples_per_second": 39.013,
+      "eval_steps_per_second": 2.441,
+      "step": 131
+    },
+    {
+      "epoch": 2.2,
+      "learning_rate": 2e-05,
+      "loss": 1.1651,
+      "step": 144
+    },
+    {
+      "epoch": 2.44,
+      "learning_rate": 2e-05,
+      "loss": 1.1092,
+      "step": 160
+    },
+    {
+      "epoch": 2.69,
+      "learning_rate": 2e-05,
+      "loss": 1.0948,
+      "step": 176
+    },
+    {
+      "epoch": 2.93,
+      "learning_rate": 2e-05,
+      "loss": 1.0949,
+      "step": 192
+    },
+    {
+      "epoch": 2.98,
+      "eval_loss": 1.4942539930343628,
+      "eval_runtime": 22.489,
+      "eval_samples_per_second": 39.086,
+      "eval_steps_per_second": 2.446,
+      "step": 195
+    },
+    {
+      "epoch": 2.98,
+      "step": 195,
+      "total_flos": 417490551177216.0,
+      "train_loss": 1.4244933299529248,
+      "train_runtime": 3089.4842,
+      "train_samples_per_second": 8.13,
+      "train_steps_per_second": 0.063
+    }
+  ],
+  "max_steps": 195,
+  "num_train_epochs": 3,
+  "total_flos": 417490551177216.0,
+  "trial_name": null,
+  "trial_params": null
+}

training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1d8f2bb4f7ded48c54feaaf417fab68eb49470cdcc11bde02d559bf5b1e45582
+size 4987

vocabulary.txt ADDED Viewed

The diff for this file is too large to render. See raw diff