--- license: cc-by-nc-4.0 language: - en tags: - text-generation datasets: - stanford_alpaca pipeline_tag: text-generation ---

Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications.

LLM Generation models trained by Jina AI, Finetuner team.

This repo contains the full weights (16bit) for Falcon-7b fit on the [Code Alpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) dataset. ## Reproduction This version of the weights was trained with the following hyperparameters: - Epochs: 6 - Batch size: 128 - Micro batch size: 8 - Learning rate: 3e-4 - Lora _r_: 8 - Lora target modules: query_key_value You can reproduce using this repository: https://github.com/jina-ai/jerboa Make sure you install requirements and finetune using this command using the following command: ``` python finetune.py \ --base-model tiiuae/falcon-7b --lora-target-modules query_key_value \ --data-path sahil2801/CodeAlpaca-20k --output-dir ./lora-alpaca-code \ --batch-size 128 --micro-batch-size 8 --eval-limit 45 \ --eval-file code_eval.jsonl --wandb-project jerboa --wandb-log-model \ --wandb-watch gradients --num-epochs 6 ``` ## Inference: ```Python import torch from transformers import AutoTokenizer, AutoModelForCausalLM TOKENIZER_SOURCE = 'tiiuae/falcon-7b' BASE_MODEL = 'jinaai/falcon-7b-code-alpaca' DEVICE = "cuda" PROMPT = """ Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Write a for loop in python ### Input: ### Response: """ model = AutoModelForCausalLM.from_pretrained( pretrained_model_name_or_path=BASE_MODEL, torch_dtype=torch.float16, trust_remote_code=True, device_map='auto', ) model.eval() tokenizer = AutoTokenizer.from_pretrained( TOKENIZER_SOURCE, trust_remote_code=True, padding_side='left', ) tokenizer.pad_token = tokenizer.eos_token inputs = tokenizer(PROMPT, return_tensors="pt") input_ids = inputs["input_ids"].to(DEVICE) input_attention_mask = inputs["attention_mask"].to(DEVICE) with torch.no_grad(): generation_output = model.generate( input_ids=input_ids, attention_mask=input_attention_mask, return_dict_in_generate=True, max_new_tokens=32, eos_token_id=tokenizer.eos_token_id, ) generation_output = generation_output.sequences[0] output = tokenizer.decode(generation_output, skip_special_tokens=True) print(output) ```