|
--- |
|
tags: |
|
- text-generation |
|
- 8bit |
|
- 8-bit |
|
- quantization |
|
- compression |
|
- chatbot |
|
- dialogue |
|
- conversation |
|
datasets: |
|
- daily_dialog |
|
inference: False |
|
license: apache-2.0 |
|
--- |
|
|
|
# ethzanalytics/gpt-j-8bit-KILT_WoW_10k_steps |
|
|
|
|
|
<a href="https://colab.research.google.com/gist/pszemraj/e49c60aafe04acc52fcfdd1baefe12e4/-ai-msgbot-gpt-j-6b-8bit-with-hub.ipynb"> |
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> |
|
</a> |
|
|
|
This is a version of `hivemind/gpt-j-6B-8bit` fine-tuned on the [Wizard of Wikipedia](https://arxiv.org/abs/1811.01241) dataset for 10k steps (_just under an epoch_) on an A100. it can be used as a chatbot. It is designed to be used with [ai-msgbot](https://github.com/pszemraj/ai-msgbot) to take advantage of the prompt engineering. |
|
|
|
## Usage |
|
|
|
_**NOTE: this needs to be loaded via the special patching technique** outlined in the hivemind model card (as with all 8bit models)_ |
|
|
|
Examples of how to load the model correctly are already in place in the notebook linked above. A `.py` of said notebook was uploaded to the repo for reference - [link here](https://huggingface.co/ethzanalytics/gpt-j-8bit-KILT_WoW_10k_steps/blob/main/ai_msgbot_gpt_j_6b_8bit_with_hub.py) |
|
|
|
## Training |
|
|
|
For details, please see [this wandb report](https://wandb.ai/pszemraj/conversational-6B-train-vanilla/reports/Training-6B-GPT-J-8bit-for-Dialogue--VmlldzoyNTg3MzE0) for both the daily-dialogues version and the WoW version. |
|
|
|
|
|
--- |
|
|