--- license: apache-2.0 language: - en tags: - llama - InstructGPT - hf --- # Camel 🐪 5B |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-5B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets) ## Model Description Camel-5b is a trained large language model that follows instructions. Based on [Palmyra-Base](https://huggingface.co/Writer/palmyra-base) is trained on ~70k instruction & response fine tuning records generated by Writer Team from the InstructGPT paper, including brainstorming, classification, closed quality assurance, generation, information extraction, open quality assurance, and summarization. ## usage : ```python import os import torch from transformers import AutoTokenizer, AutoModelForCausalLM # set HF_TOKEN in terminal as export HF_TOKEN=hf_*** auth_token = os.environ.get("HF_TOKEN", True) model_name = "Writer/camel-5b" tokenizer = AutoTokenizer.from_pretrained( model_name, use_auth_token=auth_token ) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype=torch.float16, use_auth_token=auth_token, ) instruction = "Describe a futuristic device that revolutionizes space travel." PROMPT_DICT = { "prompt_input": ( "Below is an instruction that describes a task, paired with an input that provides further context. " "Write a response that appropriately completes the request\n\n" "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:" ), "prompt_no_input": ( "Below is an instruction that describes a task. " "Write a response that appropriately completes the request.\n\n" "### Instruction:\n{instruction}\n\n### Response:" ), } text = ( PROMPT_DICT["prompt_no_input"].format(instruction=instruction) if not input else PROMPT_DICT["prompt_input"].format(instruction=instruction, input=input) ) model_inputs = tokenizer(text, return_tensors="pt").to("cuda") output_ids = model.generate( **model_inputs, max_length=100, ) output_text = tokenizer.batch_decode(output_ids, skip_special_tokens=True)[0] clean_output = output_text.split("### Response:")[1].strip() print(clean_output) ``` ` ### Limitations and Biases Camel's core functionality is to take a string of text and predict the next token. While language models are widely used for other tasks, there are many unknowns in this work. When prompting Camel, keep in mind that the next statistically likely token is not always the token that produces the most "accurate" text. Never rely on Camel to produce factually correct results. Camel was trained on Writer’s custom data. As with all language models, it is difficult to predict how Camel will respond to specific prompts, and offensive content may appear unexpectedly. We recommend that the outputs be curated or filtered by humans before they are released, both to censor undesirable content and to improve the quality of the results. ## Evaluation results Evaluation of Camel-5B model on the benchmark Coming Soon ## Citation and Related Information To cite this model: ``` @misc{Camel, author = {Writer Engineering team}, title = {{Camel-5B InstructGPT}}, howpublished = {\url{https://dev.writer.com}}, year = 2023, month = April } ```