ethzanalytics
/

open_llama_13b-sharded-8bit

Text Generation

text-generation-inference

8-bit precision

Model card Files Files and versions Community

open_llama_13b-sharded-8bit / README.md

pszemraj's picture

add colab

8b74e44 over 1 year ago

|

1.06 kB

	---
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- 8bit
	- sharded
	- open_llama
	inference: False
	---

	# open_llama_13b-sharded-8bit

	This is [open_llama_13b](https://huggingface.co/openlm-research/open_llama_13b) sharded into 2 GB shards, and in 8-bit precision using `bitsandbytes==0.38.0`. Please refer to the original model card for details.

	<a href="https://colab.research.google.com/gist/pszemraj/166ad661c6af1e024d4e2897621fc886/open_llama_13b-sharded-8bit-example.ipynb">
	<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
	</a>


	## loading

	```sh
	pip install -U -q sentencepiece transformers accelerate bitsandbytes
	```

	load the model and tokenizer:

	```python
	import torch
	from transformers import LlamaTokenizer, LlamaForCausalLM

	model_name = "ethzanalytics/open_llama_13b-sharded-8bit"
	tokenizer = LlamaTokenizer.from_pretrained(model_name, use_fast=False)
	model = LlamaForCausalLM.from_pretrained(
	model_name,
	load_in_8bit=True,
	device_map="auto",
	)
	```