ManishThota
/

CustomModel

Text Generation

text-generation-inference

Model card Files Files and versions Community

CustomModel / README.md

ManishThota's picture

Update README.md

7b6dadc verified about 1 year ago

|

2.53 kB

	---
	license: creativeml-openrail-m
	---
	---
	<h1 align='center' style='font-size: 36px; font-weight: bold;'>Sparrow</h1>
	<h3 align='center' style='font-size: 24px;'>Blazzing Fast Tiny Vision Language Model</h3>


	<p align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/650c7fbb8ffe1f53bdbe1aec/DTjDSq2yG-5Cqnk6giPFq.jpeg" width="50%" height="auto"/>
	</p>

	<p align='center', style='font-size: 16px;' >A Custom 3B parameter Model Enhanced for Educational Contexts: This specialized model integrates slide-text pairs from machine learning classes, leveraging a unique training approach. It connects a frozen pre-trained vision encoder (SigLip) with a frozen language model (Phi-2) through an innovative projector. The model employs attention mechanisms and language modeling loss to deeply understand and generate educational content, specifically tailored to the context of machine learning education. Built by <a href="https://www.linkedin.com/in/manishkumarthota/">@Manish</a> The model is released for research purposes only, commercial use is not allowed. </p>

	## How to use


	Install dependencies
	```bash
	pip install transformers # latest version is ok, but we recommend v4.31.0
	pip install -q pillow accelerate einops
	```

	You can use the following code for model inference. The format of text instruction is similar to [LLaVA](https://github.com/haotian-liu/LLaVA).

	```Python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from PIL import Image

	torch.set_default_device("cuda")

	#Create model
	model = AutoModelForCausalLM.from_pretrained(
	"ManishThota/Sparrow",
	torch_dtype=torch.float16,
	device_map="auto",
	trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("ManishThota/Sparrow", trust_remote_code=True)

	#function to generate the answer
	def predict(question, image_path):
	#Set inputs
	text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\n{question}? ASSISTANT:"
	image = Image.open(image_path)

	input_ids = tokenizer(text, return_tensors='pt').input_ids.to('cuda')
	image_tensor = model.image_preprocess(image)

	#Generate the answer
	output_ids = model.generate(
	input_ids,
	max_new_tokens=25,
	images=image_tensor,
	use_cache=True)[0]

	return tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip()

	```