--- license: apache-2.0 language: - en - fr library_name: transformers pipeline_tag: text-generation tags: - mistral - mergekit - merge --- ## Mistral-Depth-UP-Scaled-9B An auto-regressive causal LM created by combining 2x finetuned mistral 7B into one. ## Benchmarks Coming soon. ## Usage : ``` python # Load in4Bit from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig import torch nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) model = AutoModelForCausalLM.from_pretrained( "ayoubkirouane/Mistral-Depth-UP-Scaled-9B", device_map='auto', quantization_config=nf4_config, use_cache=False ) tokenizer = AutoTokenizer.from_pretrained("ayoubkirouane/Mistral-Depth-UP-Scaled-9B") tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right" def generate_response(prompt, model , max_new_tokens): encoded_input = tokenizer(prompt, return_tensors="pt", add_special_tokens=True) model_inputs = encoded_input.to('cuda') generated_ids = model.generate(**model_inputs, max_new_tokens=max_new_tokens, do_sample=True, pad_token_id=tokenizer.eos_token_id) decoded_output = tokenizer.batch_decode(generated_ids) return decoded_output[0].replace(prompt, "") generate_response(prompt="What is GANs ?", model=model , max_new_tokens=100) ```