aiplanet
/

buddhi-128k-chat-7b

Text Generation

Transformers

Safetensors

mistral

conversational

custom_code

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

lucifertrj

Chaitanya890 commited on Apr 3

Commit

ca09642

•

1 Parent(s): 866e00c

Update README.md (#3)

Browse files

- Update README.md (d0d452eb0b473aee0368e5e04b9c9ff250101c57)

Co-authored-by: Chaitanya Singhal <Chaitanya890@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +160 -0

README.md CHANGED Viewed

@@ -1,3 +1,163 @@
 ---
 license: apache-2.0
 ---

 ---
 license: apache-2.0
 ---
+<p align="center" style="font-size:34px;"><b>Buddhi 7B</b></p>
+# Buddhi-7B vLLM Inference: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/11_8W8FpKK-856QdRVJLyzbu9g-DMxNfg?usp=sharing)
+# Model Description
+<!-- Provide a quick summary of what the model is/does. -->
+Buddhi is a general-purpose chat model, meticulously fine-tuned on the Mistral 7B Instruct, and optimised to handle an extended context length of up to 128,000 tokens using the innovative YaRN [(Yet another Rope Extension)](https://arxiv.org/abs/2309.00071) Technique. This enhancement allows Buddhi to maintain a deeper understanding of context in long documents or conversations, making it particularly adept at tasks requiring extensive context retention, such as comprehensive document summarization, detailed narrative generation, and intricate question-answering.
+## Architecture
+### Hardware requirements:
+> For 128k Context Length
+> - 80GB VRAM - A100 Preferred
+> For 32k Context Length
+> - 40GB VRAM - A100 Preferred
+### vLLM - For Faster Inference
+#### Installation
+```
+!pip install vllm
+!pip install flash_attn # If Flash Attention 2 is supported by your System
+```
+Please check out [Flash Attention 2](https://github.com/Dao-AILab/flash-attention) Github Repository for more instructions on how to Install it.
+**Implementation**:
+```python
+from vllm import LLM, SamplingParams
+llm = LLM(
+  model='aiplanet/Buddhi-128K-Chat',
+  gpu_memory_utilization=0.99,
+  max_model_len=131072
+)
+prompts = [
+  """<s> [INST] Please tell me a joke. [/INST] """,
+  """<s> [INST] What is Machine Learning? [/INST] """
+]
+sampling_params = SamplingParams(
+  temperature=0.8,
+  top_p=0.95,
+  max_tokens=1000
+)
+outputs = llm.generate(prompts, sampling_params)
+for output in outputs:
+    prompt = output.prompt
+    generated_text = output.outputs[0].text
+    print(generated_text)
+    print("\n\n")
+```
+### Transformers - Basic Implementation
+```python
+import torch
+import transformers
+from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.bfloat16
+)
+model_name = "aiplanet/Buddhi-128K-Chat"
+model = AutoModelForCausalLM.from_pretrained(
+  model_name,
+  quantization_config=bnb_config,
+  device_map="sequential",
+  trust_remote_code=True
+)
+tokenizer = AutoTokenizer.from_pretrained(
+  model,
+  trust_remote_code=True
+)
+prompt = "<s> [INST] Please tell me a small joke. [/INST] "
+tokens = tokenizer(prompt, return_tensors="pt").to("cuda")
+outputs = model.generate(
+  **tokens,
+  max_new_tokens=100,
+  do_sample=True,
+  top_p=0.95,
+  temperature=0.8,
+)
+decoded_output = tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True)[0]
+print(f"Output:\n{decoded_output[len(prompt):]}")
+```
+Output
+```
+Output:
+Why don't scientists trust atoms?
+Because they make up everything.
+```
+## Prompt Template for Panda Coder 13B
+In order to leverage instruction fine-tuning, your prompt should be surrounded by [INST] and [/INST] tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.
+```
+"<s>[INST] What is your favourite condiment? [/INST]"
+"Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s> "
+"[INST] Do you have mayonnaise recipes? [/INST]"
+```
+## 🔗 Key Features:
+ 🎯 Precision and Efficiency: The model is tailored for accuracy, ensuring your code is not just functional but also efficient.
+ ✨ Unleash Creativity: Whether you're a novice or an expert coder, Panda-Coder is here to support your coding journey, offering creative solutions to your programming challenges.
+ 📚 Evol Instruct Code: It's built on the robust Evol Instruct Code 80k-v1 dataset, guaranteeing top-notch code generation.
+ 📢 What's Next?: We believe in continuous improvement and are excited to announce that in our next release, Panda-Coder will be enhanced with a custom dataset. This dataset will not only expand the language support but also include hardware programming languages like MATLAB, Embedded C, and Verilog. 🧰💡
+ ## Get in Touch
+ You can schedule a 1:1 meeting with our DevRel & Community Team to get started with AI Planet Open Source LLMs and GenAI Stack. Schedule the call here: [https://calendly.com/jaintarun](https://calendly.com/jaintarun)
+ Stay tuned for more updates and be a part of the coding evolution. Join us on this exciting journey as we make AI accessible to all at AI Planet!
+ ### Framework versions
+- Transformers 4.39.2
+- Pytorch 2.2.1+cu121
+- Datasets 2.18.0
+- Accelerate 0.27.2
+- flash_attn 2.5.6
+ ### Citation
+ ```
+ @misc {Chaitanya890,
+	author       = { {Chaitanya Singhal} },
+	title        = { Buddhi-128k-Chat by AI Planet},
+	year         = 2024,
+	url          = { https://huggingface.co/aiplanet//Buddhi-128K-Chat },
+	publisher    = { Hugging Face }
+}
+ ```