Mrw33554432
/

bitLinear-phi-1.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

bitLinear-phi-1.5 / README.md

Mrw33554432's picture

Update README.md

d013df6 verified 4 months ago

|

1.98 kB

	---
	license: mit
	datasets:
	- wikipedia
	---
	# BitLinear-phi-1.5

	BitLinear-phi-1.5 is a model trained partially using the method described in [The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits](https://arxiv.org/abs/2402.17764).

	Our BitLinear layer will only apply 1-bit quantization to the weight, all other computations in the paper is discarded.

	The model structure is from [phi-1.5](https://huggingface.co/microsoft/phi-1_5), with all linear layers except lm_head replaced with our custom BitLinear layer.

	It was trained on a small subset of the [wikipedia dataset](https://huggingface.co/datasets/wikipedia) dataset, for research validation purpose only.

	Please notice the kernel is not optimzed for 1-bit matrix yet.

	```python
	dataset = load_dataset("wikipedia", "20220301.en")
	dataset = dataset['train'].select(range(int(1e5)))
	```
	The model is trained on a 3090(24GB) for 16 hours.

	### For training code, check --placeholder--.

	The training code should be compatible with most of the LLMs in huggingface, but you have to start from scratch.

	Using pretrained model weight will not work due to gradient explosion.

	## Sample inference code


	```python
	import torch
	from replace_hf import replace_linear_in_hf
	from transformers import AutoModelForCausalLM, AutoTokenizer


	def quick_test(model, tokenizer, prompt: str):
	# Encode the inputs
	inputs = tokenizer.encode(prompt, return_tensors="pt")

	# Generate outputs
	outputs = model.generate(inputs, max_length=64)

	# Decode and print the outputs
	print(tokenizer.decode(outputs[0]))


	torch.set_default_device("cuda")

	tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-1_5", trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained("Mrw33554432/bitLinear-phi-1.5", trust_remote_code=True)

	print(model)
	# Replace Linear layers with BitLinear
	replace_linear_in_hf(model, keep_param=True)
	print(model)

	quick_test(model, tokenizer, prompt="Tom is the")
	```