jinaai
/

falcon-7b-code-alpaca

Text Generation

RefinedWebModel

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

falcon-7b-code-alpaca / README.md

sebaweis's picture

Update README.md

14b60eb over 1 year ago

|

2.81 kB

	---
	license: cc-by-nc-4.0
	language:
	- en
	tags:
	- text-generation
	datasets:
	- stanford_alpaca
	pipeline_tag: text-generation
	---

	<br><br>

	<p align="center">
	<img src="https://github.com/jina-ai/finetuner/blob/main/docs/_static/finetuner-logo-ani.svg?raw=true" alt="Finetuner logo: Finetuner helps you to create experiments in order to improve embeddings on search tasks. It accompanies you to deliver the last mile of performance-tuning for neural search applications." width="150px">
	</p>


	<p align="center">
	<b>LLM Generation models trained by Jina AI, Finetuner team.</b>
	</p>

	This repo contains the full weights (16bit) for Falcon-7b
	fit on the [Code Alpaca](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) dataset.

	## Reproduction
	This version of the weights was trained with the following hyperparameters:

	- Epochs: 6
	- Batch size: 128
	- Micro batch size: 8
	- Learning rate: 3e-4
	- Lora _r_: 8
	- Lora target modules: query_key_value

	You can reproduce using this repository:

	https://github.com/jina-ai/jerboa

	Make sure you install requirements and finetune using this command using the following command:

	```
	python finetune.py \
	--base-model tiiuae/falcon-7b --lora-target-modules query_key_value \
	--data-path sahil2801/CodeAlpaca-20k --output-dir ./lora-alpaca-code \
	--batch-size 128 --micro-batch-size 8 --eval-limit 45 \
	--eval-file code_eval.jsonl --wandb-project jerboa --wandb-log-model \
	--wandb-watch gradients --num-epochs 6
	```

	## Inference:

	```Python
	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM


	TOKENIZER_SOURCE = 'tiiuae/falcon-7b'
	BASE_MODEL = 'jinaai/falcon-7b-code-alpaca'
	DEVICE = "cuda"

	PROMPT = """
	Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

	### Instruction:
	Write a for loop in python

	### Input:

	### Response:
	"""
	model = AutoModelForCausalLM.from_pretrained(
	pretrained_model_name_or_path=BASE_MODEL,
	torch_dtype=torch.float16,
	trust_remote_code=True,
	device_map='auto',
	)

	model.eval()

	tokenizer = AutoTokenizer.from_pretrained(
	TOKENIZER_SOURCE,
	trust_remote_code=True,
	padding_side='left',
	)
	tokenizer.pad_token = tokenizer.eos_token

	inputs = tokenizer(PROMPT, return_tensors="pt")
	input_ids = inputs["input_ids"].to(DEVICE)
	input_attention_mask = inputs["attention_mask"].to(DEVICE)

	with torch.no_grad():
	generation_output = model.generate(
	input_ids=input_ids,
	attention_mask=input_attention_mask,
	return_dict_in_generate=True,
	max_new_tokens=32,
	eos_token_id=tokenizer.eos_token_id,
	)
	generation_output = generation_output.sequences[0]
	output = tokenizer.decode(generation_output, skip_special_tokens=True)

	print(output)

	```