--- license: apache-2.0 language: - en - ar library_name: transformers tags: - English - Arabic - Decoder - Casual-lm - LLM - 4-bit --- # Jais-7b-chat (Its a double quantized version) This model is the double quantized version of `jais-13b-chat` by core42. The aim is to run the model in GPU poor machines. For high quality tasks its better to use the 13b model not quantized one. Model creator: [Core42](https://huggingface.co/core42) Original model: jais-13b-chat # How To Run Just run it as a text-generation pipeline task. # System Requirements: It successfully has been tested on Google Colab Pro `T4` instance. # How To Run: 1. First install libs: ```sh pip install -Uq huggingface_hub transformers bitsandbytes xformers accelerate ``` 2. Create the pipeline: ```py from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, TextStreamer, BitsAndBytesConfig tokenizer = AutoTokenizer.from_pretrained("erfanvaredi/jais-7b-chat") model = AutoModelForCausalLM.from_pretrained( "erfanvaredi/jais-7b-chat", trust_remote_code=True, device_map='auto', ) # Create a pipeline pipe = pipeline(model=model, tokenizer=tokenizer, task='text-generation') ``` 3. Create prompt: ```py chat = [ {"role": "user", "content": 'Tell me a funny joke about Large Language Models.'}, ] prompt = pipe.tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True) ``` 4. Create streamer (Its optional. If u want to have generated texts as stream, do it else it does'nt matter): ```py streamer = TextStreamer( tokenizer, skip_prompt=True, stop_token=[tokenizer.eos_token] ) ``` 5. Ask the model: ```py pipe( prompt, streamer=streamer, max_new_tokens=256, temperature=0, ) ``` :)