afrideva
/

phine-2-v0-GGUF

+---
+base_model: freecs/phine-2-v0
+datasets:
+- vicgalle/alpaca-gpt4
+inference: false
+license: unknown
+model_creator: freecs
+model_name: phine-2-v0
+pipeline_tag: text-generation
+quantized_by: afrideva
+tags:
+- gguf
+- ggml
+- quantized
+- q2_k
+- q3_k_m
+- q4_k_m
+- q5_k_m
+- q6_k
+- q8_0
+---
+# freecs/phine-2-v0-GGUF
+Quantized GGUF model files for [phine-2-v0](https://huggingface.co/freecs/phine-2-v0) from [freecs](https://huggingface.co/freecs)
+| Name | Quant method | Size |
+| ---- | ---- | ---- |
+| [phine-2-v0.fp16.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.fp16.gguf) | fp16 | 5.56 GB  |
+| [phine-2-v0.q2_k.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q2_k.gguf) | q2_k | 1.17 GB  |
+| [phine-2-v0.q3_k_m.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q3_k_m.gguf) | q3_k_m | 1.48 GB  |
+| [phine-2-v0.q4_k_m.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q4_k_m.gguf) | q4_k_m | 1.79 GB  |
+| [phine-2-v0.q5_k_m.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q5_k_m.gguf) | q5_k_m | 2.07 GB  |
+| [phine-2-v0.q6_k.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q6_k.gguf) | q6_k | 2.29 GB  |
+| [phine-2-v0.q8_0.gguf](https://huggingface.co/afrideva/phine-2-v0-GGUF/resolve/main/phine-2-v0.q8_0.gguf) | q8_0 | 2.96 GB  |
+## Original Model Card:
+---
+# Model Card: Phine-2-v0
+## Overview
+- **Model Name:** Phine-2
+- **Base Model:** Phi-2 (Microsoft model)
+- **Created By:** [GR](https://twitter.com/gr_username)
+- **Donations Link:** [Click Me](https://www.buymeacoffee.com/gr.0)
+## Code Usage
+To try Phine, use the following Python code snippet:
+```python
+#######################
+'''
+Name: Phine Inference
+License: MIT
+'''
+#######################
+##### Dependencies
+""" IMPORTANT: Uncomment the following line if you are in a Colab/Notebook environment """
+#!pip install gradio einops accelerate bitsandbytes transformers
+#####
+import gradio as gr
+import transformers
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+import random
+import re
+def cut_text_after_last_token(text, token):
+    last_occurrence = text.rfind(token)
+    if last_occurrence != -1:
+        result = text[last_occurrence + len(token):].strip()
+        return result
+    else:
+        return None
+class _SentinelTokenStoppingCriteria(transformers.StoppingCriteria):
+    def __init__(self, sentinel_token_ids: torch.LongTensor,
+                 starting_idx: int):
+        transformers.StoppingCriteria.__init__(self)
+        self.sentinel_token_ids = sentinel_token_ids
+        self.starting_idx = starting_idx
+    def __call__(self, input_ids: torch.LongTensor,
+                 _scores: torch.FloatTensor) -> bool:
+        for sample in input_ids:
+            trimmed_sample = sample[self.starting_idx:]
+            if trimmed_sample.shape[-1] < self.sentinel_token_ids.shape[-1]:
+                continue
+            for window in trimmed_sample.unfold(
+                    0, self.sentinel_token_ids.shape[-1], 1):
+                if torch.all(torch.eq(self.sentinel_token_ids, window)):
+                    return True
+        return False
+model_path = 'freecs/phine-2-v0'
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
+model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True, load_in_4bit=False, torch_dtype=torch.float16).to(device) #remove .to() if load_in_4/8bit = True
+sys_message = "You are an AI assistant named Phine developed by FreeCS.org. You are polite and smart." #System Message
+def phine(message, history, temperature, top_p, top_k, repetition_penalty):
+    n = 0
+    context = ""
+    if history and len(history) > 0:
+        for x in history:
+          for h in x:
+            if n%2 == 0:
+              context+=f"""\n<|prompt|>{h}\n"""
+            else:
+              context+=f"""<|response|>{h}"""
+            n+=1
+    else:
+        context = ""
+    prompt = f"""\n<|system|>{sys_message}"""+context+"\n<|prompt|>"+message+"<|endoftext|>\n<|response|>"
+    tokenized = tokenizer(prompt, return_tensors="pt").to(device)
+    stopping_criteria_list = transformers.StoppingCriteriaList([
+        _SentinelTokenStoppingCriteria(
+            sentinel_token_ids=tokenizer(
+                "<|endoftext|>",
+                add_special_tokens=False,
+                return_tensors="pt",
+            ).input_ids.to(device),
+            starting_idx=tokenized.input_ids.shape[-1])
+    ])
+    token = model.generate(**tokenized,
+                        stopping_criteria=stopping_criteria_list,
+                        do_sample=True,
+                        max_length=2048, temperature=temperature, top_p=top_p, top_k = top_k, repetition_penalty = repetition_penalty
+                           )
+    completion = tokenizer.decode(token[0], skip_special_tokens=False)
+    token = "<|response|>"
+    res = cut_text_after_last_token(completion, token)
+    return res.replace('<|endoftext|>', '')
+demo = gr.ChatInterface(phine,
+                          additional_inputs=[
+                              gr.Slider(0.1, 2.0, label="temperature", value=0.5),
+                              gr.Slider(0.1, 2.0, label="Top P", value=0.9),
+                              gr.Slider(1, 500, label="Top K", value=50),
+                              gr.Slider(0.1, 2.0, label="Repetition Penalty", value=1.15)
+                          ]
+                          )
+if __name__ == "__main__":
+    demo.queue().launch(share=True, debug=True) #If debug=True causes problems you can set it to False
+```
+## Contact
+For inquiries, collaboration opportunities, or additional information, reach out to me on Twitter: [gr](https://twitter.com/gr_username).
+## Disclaimer
+As of now, I have not applied Reinforcement Learning from Human Feedback (RLHF). Due to this, the model may generate unexpected or potentially unethical outputs.
+---