--- datasets: - anon8231489123/ShareGPT_Vicuna_unfiltered - ehartford/wizard_vicuna_70k_unfiltered - ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered - QingyiSi/Alpaca-CoT - teknium/GPT4-LLM-Cleaned - teknium/GPTeacher-General-Instruct - metaeval/ScienceQA_text_only - hellaswag - openai/summarize_from_feedback - riddle_sense - gsm8k - ewof/code-alpaca-instruct-unfiltered language: - en library_name: transformers pipeline_tag: text-generation --- # Manticore 30B Chat (ALPHA) - Alpha release of checkpoint before train and eval loss spikes. Additionally, there seems to be some alignment which is easily jailbroken. Manticore 30B Chat builds on Manticore v1 with new datasets, including a de-duped subset of the Pygmalion dataset. It also removes all Alpaca style prompts using `###` in favor of chat only style prompts using `USER:`,`ASSISTANT:` as well as [pygmalion/metharme prompting](https://huggingface.co/PygmalionAI/metharme-7b#prompting) using `<|system|>, <|user|> and <|model|>` tokens. Questions, comments, feedback, looking to donate, or want to help? Reach out on our [Discord](https://discord.gg/EqrvvehG) or email [wing@openaccessaicollective.org](mailto:wing@openaccessaicollective.org) # Training Datasets Manticore 30B Chat is a Llama 30B model fine-tuned on the following datasets along with the datasets from the original Manticore 30B. **Manticore 30B Chat was trained on effectively 40% of the datasets below due to only training for 0.4 epochs. - de-duped pygmalion dataset, filtered down to RP data - [riddle_sense](https://huggingface.co/datasets/riddle_sense) - instruct augmented - hellaswag, updated for detailed explanations w 30K+ rows - [gsm8k](https://huggingface.co/datasets/gsm8k) - instruct augmented - [ewof/code-alpaca-instruct-unfiltered](https://huggingface.co/datasets/ewof/code-alpaca-instruct-unfiltered) Manticore 30B - [ShareGPT](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered) - based on a cleaned and de-suped subset - [WizardLM](https://huggingface.co/datasets/ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered) - [Wizard-Vicuna](https://huggingface.co/datasets/ehartford/wizard_vicuna_70k_unfiltered) - [subset of QingyiSi/Alpaca-CoT for roleplay and CoT](https://huggingface.co/QingyiSi/Alpaca-CoT) - [GPT4-LLM-Cleaned](https://huggingface.co/datasets/teknium/GPT4-LLM-Cleaned) - [GPTeacher-General-Instruct](https://huggingface.co/datasets/teknium/GPTeacher-General-Instruct) - ARC-Easy & ARC-Challenge - instruct augmented for detailed responses, derived from the `train` split - [hellaswag](https://huggingface.co/datasets/hellaswag) - 5K row subset of instruct augmented for concise responses, derived from the `train` split - [metaeval/ScienceQA_text_only](https://huggingface.co/datasets/metaeval/ScienceQA_text_only) - instruct for concise responses - [openai/summarize_from_feedback](https://huggingface.co/datasets/openai/summarize_from_feedback) - instruct augmented tl;dr summarization Not added from Manticore 13B: - mmlu - mmlu datasets were not added to this model as the `test` split is used for benchmarks # Shoutouts Special thanks to Nanobit for helping with Axolotl, TheBloke for quantizing these models are more accessible to all, ehartford for cleaned datasets, and 0x000011b for the RP dataset. # Demo Try out the model in HF Spaces. The demo uses a quantized GGML version of the model to quickly return predictions on smaller GPUs (and even CPUs). Quantized GGML may have some minimal loss of model quality. - https://huggingface.co/spaces/openaccess-ai-collective/manticore-13b-chat-pyg ## Release Notes - https://wandb.ai/wing-lian/manticore-13b-v2/runs/ij10c6m3 ## Build Manticore was built with [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) on 8xA100 80GB - 0.4 epochs taking approximately 14 hours. No further epochs will be released for the alpha. - The configuration to duplicate this build is provided in this repo's [/config folder](https://huggingface.co/openaccess-ai-collective/manticore-30b-chat-pyg-alpha/tree/main/configs). ## Bias, Risks, and Limitations Manticore has not been aligned to human preferences with techniques like RLHF or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). Manticore was fine-tuned from the base model LlaMa 13B, please refer to its model card's Limitations Section for relevant information. ## Examples TBD