Update README.md

1931f99 verified 6 months ago

5.01 kB

	---
	library_name: pruna-engine
	thumbnail: "https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg"
	metrics:
	- memory_disk
	- memory_inference
	- inference_latency
	- inference_throughput
	- inference_CO2_emissions
	- inference_energy_consumption
	---
	<!-- header start -->
	<!-- 200823 -->
	<div style="width: auto; margin-left: auto; margin-right: auto">
	<a href="https://www.pruna.ai/" target="_blank" rel="noopener noreferrer">
	<img src="https://i.imgur.com/eDAlcgk.png" alt="PrunaAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
	</a>
	</div>
	<!-- header end -->

	[![Twitter](https://img.shields.io/twitter/follow/PrunaAI?style=social)](https://twitter.com/PrunaAI)
	[![GitHub](https://img.shields.io/github/followers/PrunaAI?label=Follow%20%40PrunaAI&style=social)](https://github.com/PrunaAI)
	[![LinkedIn](https://img.shields.io/badge/LinkedIn-Connect-blue)](https://www.linkedin.com/company/93832878/admin/feed/posts/?feedType=following)
	[![Discord](https://img.shields.io/badge/Discord-Join%20Us-blue?style=social&logo=discord)](https://discord.gg/rskEr4BZJx)

	# Simply make AI models cheaper, smaller, faster, and greener!

	- Give a thumbs up if you like this model!
	- Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
	- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).
	- Read the documentations to know more [here](https://pruna-ai-pruna.readthedocs-hosted.com/en/latest/)
	- Join Pruna AI community on Discord [here](https://discord.com/invite/vb6SmA3hxu) to share feedback/suggestions or get help.

	Frequently Asked Questions
	- *How does the compression work?* The model is compressed by using bitsandbytes.
	- *How does the model quality change?* The quality of the model output will slightly degrade.
	- *What is the model format?* We the standard safetensors format.
	- *How to compress my own models?* You can request premium access to more compression methods and tech support for your specific use-cases [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).

	## Usage
	### Presequities
	Jamba requires you use `transformers` version 4.39.0 or higher:
	```bash
	pip install transformers>=4.39.0
	```

	In order to run optimized Mamba implementations, you first need to install `mamba-ssm` and `causal-conv1d`:
	```bash
	pip install mamba-ssm causal-conv1d>=1.2.0
	```
	You also have to have the model on a CUDA device.

	You can run the model not using the optimized Mamba kernels, but it is not recommended as it will result in significantly lower latencies. In order to do that, you'll need to specify `use_mamba_kernels=False` when loading the model.

	### Run the model
	``` python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model = AutoModelForCausalLM.from_pretrained("PrunaAI/Jamba-v0.1-bnb-4bit",
	trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("PrunaAI/Jamba-v0.1-bnb-4bit")

	input_ids = tokenizer("In the recent Super Bowl LVIII,", return_tensors='pt').to(model.device)["input_ids"]

	outputs = model.generate(input_ids, max_new_tokens=216)

	print(tokenizer.batch_decode(outputs))
	# ["<\|startoftext\|>In the recent Super Bowl LVIII, the Kansas City Chiefs emerged victorious, defeating the San Francisco 49ers in a thrilling overtime showdown. The game was a nail-biter, with both teams showcasing their skills and determination.\n\nThe Chiefs, led by their star quarterback Patrick Mahomes, displayed their offensive prowess, while the 49ers, led by their strong defense, put up a tough fight. The game went into overtime, with the Chiefs ultimately securing the win with a touchdown.\n\nThe victory marked the Chiefs' second Super Bowl win in four years, solidifying their status as one of the top teams in the NFL. The game was a testament to the skill and talent of both teams, and a thrilling end to the NFL season.\n\nThe Super Bowl is not just about the game itself, but also about the halftime show and the commercials. This year's halftime show featured a star-studded lineup, including Usher, Alicia Keys, and Lil Jon. The show was a spectacle of music and dance, with the performers delivering an energetic and entertaining performance.\n"]

	```

	## Credits & License

	The license of the smashed model follows the license of the original model. Please check the license of the original model ai21labs/Jamba-v0.1 before using this model which provided the base model. The license of the `pruna-engine` is [here](https://pypi.org/project/pruna-engine/) on Pypi.

	## Want to compress other models?

	- Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact).
	- Request access to easily compress your own AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai).