--- license: apache-2.0 pipeline_tag: text-generation ---

Buddhi 7B

# Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing) ## Model Description Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering. ## Dataset Creation ## Architecture ### Hardware requirements: > For 128k Context Length > - 80GB VRAM - A100 Preferred > For 32k Context Length > - 40GB VRAM - A100 Preferred ### vLLM - For Faster Inference #### Installation ``` !pip install vllm !pip install flash_attn # If Flash Attention 2 is supported by your System ``` Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it. **Implementation**: ```python from vllm import LLM, SamplingParams llm = LLM( model='aiplanet/Buddhi-128K-Chat', gpu_memory_utilization=0.99, max_model_len=131072 ) prompts = [ """ [INST] Please tell me a joke. [/INST] """, """ [INST] What is Machine Learning? [/INST] """ ] sampling_params = SamplingParams( temperature=0.8, top_p=0.95, max_tokens=1000 ) outputs = llm.generate(prompts, sampling_params) for output in outputs: prompt = output.prompt generated_text = output.outputs[0].text print(generated_text) print("\n\n") ``` ### Transformers - Basic Implementation ```python import torch import transformers from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.bfloat16 ) model_name = "aiplanet/Buddhi-128K-Chat" model = AutoModelForCausalLM.from_pretrained( model_name, quantization_config=bnb_config, device_map="sequential", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained( model, trust_remote_code=True ) prompt = " [INST] Please tell me a small joke. [/INST] " tokens = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate( **tokens, max_new_tokens=100, do_sample=True, top_p=0.95, temperature=0.8, ) decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0] print(f"Output:\n{decoded_output[len(prompt):]}") ``` Output ``` Output: Why don't scientists trust atoms? Because they make up everything. ``` ## Evaluation | Model | HellaSWAG | ARC-Challenge | MMLU | TruthfulQA | Winogrande | |--------------------------------------|-----------|---------------|-------|------------|------------| | Buddhi-128K-Chat | 82.78 | 57.51 | 57.39 | 55.44 | 78.37 | | NousResearch/Yarn-Mistral-7b-128k | 80.58 | 58.87 | 60.64 | 42.46 | 72.85 | ## Prompt Template for Buddi-128-Chat In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id. ``` "[INST] What is your favourite condiment? [/INST]" "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen! " "[INST] Do you have mayonnaise recipes? [/INST]" ``` ## Get in Touch You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun) Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet! ### Framework versions - Transformers 4.39.2 - Pytorch 2.2.1+cu121 - Datasets 2.18.0 - Accelerate 0.27.2 - flash_attn 2.5.6 ### Citation ``` @misc {Chaitanya890, lucifertrj , author = { {Chaitanya Singhal},{Tarun Jain} }, title = { Buddhi-128k-Chat by AI Planet}, year = 2024, url = { https://huggingface.co/aiplanet//Buddhi-128K-Chat }, publisher = { Hugging Face } } ```