Update README.md

735c151 verified 8 months ago

4.94 kB

	---
	license: apache-2.0
	pipeline_tag: text-generation
	---

	<p align="center" style="font-size:34px;"><b>Buddhi 7B</b></p>

	# Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)

	## Model Description

	<!-- Provide a quick summary of what the model is/does. -->

	Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.

	## Dataset Creation

	## Architecture

	### Hardware requirements:
	> For 128k Context Length
	> - 80GB VRAM - A100 Preferred

	> For 32k Context Length
	> - 40GB VRAM - A100 Preferred

	### vLLM - For Faster Inference

	#### Installation

	```
	!pip install vllm
	!pip install flash_attn # If Flash Attention 2 is supported by your System
	```
	Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it.

	Implementation:

	```python
	from vllm import LLM, SamplingParams

	llm = LLM(
	model='aiplanet/Buddhi-128K-Chat',
	gpu_memory_utilization=0.99,
	max_model_len=131072
	)

	prompts = [
	"""<s> [INST] Please tell me a joke. [/INST] """,
	"""<s> [INST] What is Machine Learning? [/INST] """
	]

	sampling_params = SamplingParams(
	temperature=0.8,
	top_p=0.95,
	max_tokens=1000
	)

	outputs = llm.generate(prompts, sampling_params)

	for output in outputs:
	prompt = output.prompt
	generated_text = output.outputs[0].text
	print(generated_text)
	print("\n\n")
	```

	### Transformers - Basic Implementation

	```python
	import torch
	import transformers
	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16
	)

	model_name = "aiplanet/Buddhi-128K-Chat"

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=bnb_config,
	device_map="sequential",
	trust_remote_code=True
	)

	tokenizer = AutoTokenizer.from_pretrained(
	model,
	trust_remote_code=True
	)

	prompt = "<s> [INST] Please tell me a small joke. [/INST] "

	tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
	outputs = model.generate(
	**tokens,
	max_new_tokens=100,
	do_sample=True,
	top_p=0.95,
	temperature=0.8,
	)

	decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
	print(f"Output:\n{decoded_output[len(prompt):]}")
	```

	Output

	```
	Output:
	Why don't scientists trust atoms?

	Because they make up everything.
	```

	## Evaluation

	\| Model \| HellaSWAG \| ARC-Challenge \| MMLU \| TruthfulQA \| Winogrande \|
	\|--------------------------------------\|-----------\|---------------\|-------\|------------\|------------\|
	\| Buddhi-128K-Chat \| 82.78 \| 57.51 \| 57.39 \| 55.44 \| 78.37 \|
	\| NousResearch/Yarn-Mistral-7b-128k \| 80.58 \| 58.87 \| 60.64 \| 42.46 \| 72.85 \|


	## Prompt Template for Buddi-128-Chat

	In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

	```
	"<s>[INST] What is your favourite condiment? [/INST]"
	"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
	"[INST] Do you have mayonnaise recipes? [/INST]"

	```

	## Get in Touch

	You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)

	Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!


	### Framework versions

	- Transformers 4.39.2
	- Pytorch 2.2.1+cu121
	- Datasets 2.18.0
	- Accelerate 0.27.2
	- flash_attn 2.5.6

	### Citation

	```
	@misc {Chaitanya890, lucifertrj ,
	author = { {Chaitanya Singhal},{Tarun Jain} },
	title = { Buddhi-128k-Chat by AI Planet},
	year = 2024,
	url = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
	publisher = { Hugging Face }
	}
	```