---
license: apache-2.0
pipeline_tag: text-generation
---

<p align="center" style="font-size:34px;"><b>Buddhi 7B</b></p>

# Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)

## Model Description

<!-- Provide a quick summary of what the model is/does. -->

Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering. 

## Dataset Creation

## Architecture

### Hardware requirements:
> For 128k Context Length
> - 80GB VRAM - A100 Preferred

> For 32k Context Length
> - 40GB VRAM - A100 Preferred

### vLLM - For Faster Inference

#### Installation

```
!pip install vllm
!pip install flash_attn # If Flash Attention 2 is supported by your System
```
Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it.

**Implementation**:

```python
from vllm import LLM, SamplingParams

llm = LLM(
  model='aiplanet/Buddhi-128K-Chat',
  gpu_memory_utilization=0.99,
  max_model_len=131072
)

prompts = [
  """<s> [INST] Please tell me a joke. [/INST] """,
  """<s> [INST] What is Machine Learning? [/INST] """
]

sampling_params = SamplingParams(
  temperature=0.8,
  top_p=0.95,
  max_tokens=1000
)

outputs = llm.generate(prompts, sampling_params)

for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(generated_text)
    print("\n\n")
```

### Transformers - Basic Implementation

```python
import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model_name = "aiplanet/Buddhi-128K-Chat"

model = AutoModelForCausalLM.from_pretrained(
  model_name,
  quantization_config=bnb_config,
  device_map="sequential",
  trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
  model,
  trust_remote_code=True
)

prompt = "<s> [INST] Please tell me a small joke. [/INST] "

tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(
  **tokens,
  max_new_tokens=100,
  do_sample=True,
  top_p=0.95,
  temperature=0.8,
)

decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
print(f"Output:\n{decoded_output[len(prompt):]}")
```

Output

```
Output:
Why don't scientists trust atoms?

Because they make up everything.
```

## Evaluation

| Model                                | HellaSWAG | ARC-Challenge | MMLU  | TruthfulQA | Winogrande |
|--------------------------------------|-----------|---------------|-------|------------|------------|
| Buddhi-128K-Chat                     | 82.78     | 57.51         | 57.39 | 55.44      | 78.37      |
| NousResearch/Yarn-Mistral-7b-128k    | 80.58     | 58.87         | 60.64 | 42.46      | 72.85      |


## Prompt Template for Buddi-128-Chat

In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

```
"<s>[INST] What is your favourite condiment? [/INST]"
"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
"[INST] Do you have mayonnaise recipes? [/INST]"

```

## Get in Touch

 You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)

 Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!


 ### Framework versions

- Transformers 4.39.2
- Pytorch 2.2.1+cu121
- Datasets 2.18.0
- Accelerate 0.27.2
- flash_attn 2.5.6

 ### Citation

 ```
 @misc {Chaitanya890, lucifertrj ,
	author       = { {Chaitanya Singhal},{Tarun Jain} },
	title        = { Buddhi-128k-Chat by AI Planet},
	year         = 2024,
	url          = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
	publisher    = { Hugging Face }
}
 ```