Spaces:

lamhieu
/

ghost-8b-beta-128k

Runtime error

App Files Files Community

lamhieu commited on Aug 18

Commit

7f6ea1d

•

1 Parent(s): 2bac78a

chore: update something

Browse files

Files changed (3) hide show

README.md +15 -9
app.py +439 -251
requirements.txt +6 -5

README.md CHANGED Viewed

@@ -1,8 +1,8 @@
 ---
-title: Ghost 8B Beta (β, 128k, Online)
 emoji: 👻 / 📚
-colorFrom: indigo
-colorTo: pink
 sdk: gradio
 sdk_version: 4.36.1
 app_file: app.py
@@ -10,15 +10,22 @@ pinned: true
 header: mini
 suggested_hardware: a10g-small
 language:
-  - en
   - vi
   - es
   - pt
-  - de
-  - it
-  - fr
-  - ko
   - zh
 license: other
 license_name: ghost-open-llms
 license_link: https://ghost-x.org/ghost-open-llms-license
@@ -28,7 +35,6 @@ tags:
 # ~
 ### Notes
 The extension source code belongs to: "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning". See source code details [here](https://github.com/datamllab/LongLM).

 ---
+title: Ghost 8B Beta (β, 128k)
 emoji: 👻 / 📚
+colorFrom: green
+colorTo: blue
 sdk: gradio
 sdk_version: 4.36.1
 app_file: app.py
 header: mini
 suggested_hardware: a10g-small
 language:
   - vi
+  - ko
   - es
   - pt
   - zh
+  - fr
+  - it
+  - de
+  - ja
+  - ru
+  - pl
+  - nl
+  - hi
+  - tr
+  - id
+  - en
 license: other
 license_name: ghost-open-llms
 license_link: https://ghost-x.org/ghost-open-llms-license
 # ~
 ### Notes
 The extension source code belongs to: "LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning". See source code details [here](https://github.com/datamllab/LongLM).

app.py CHANGED Viewed

@@ -3,6 +3,8 @@
 import subprocess
 import json
 import requests
 subprocess.run(
     f"pip install flash-attn --no-build-isolation",
@@ -17,34 +19,78 @@ from typing import Iterator
 import gradio as gr
 import spaces
 import torch
 import wikipedia
 import time
 import SelfExtend
-from transformers import AutoModelForCausalLM, AutoTokenizer, TextIteratorStreamer
 from bs4 import BeautifulSoup
 from functools import lru_cache
 MAX_MAX_NEW_TOKENS = 8192
 DEFAULT_MAX_NEW_TOKENS = 2048
 MAX_INPUT_TOKEN_LENGTH = int(os.getenv("MAX_INPUT_TOKEN_LENGTH", "123392"))
 DESCRIPTION = """\
-# Playground with Ghost 8B Beta (β, 128k, Online)
-**Ghost 8B Beta** model outperforms prominent models such as Llama 3 8B Instruct, GPT 3.5 Turbo in the lc_winrate score. In addition, it also outperforms Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large when comparing the winrate score of AlpacaEval 2.0, [*](https://ghost-x.org/docs/models/ghost-8b-beta/). The model comes in two context length versions, [8k](https://huggingface.co/spaces/lamhieu/ghost-8b-beta-8k) and [128k](https://huggingface.co/spaces/lamhieu/ghost-8b-beta-128k), along with multilingual function tools support by default.
-The languages supported are 🇺🇸 English, 🇫🇷 French, 🇮🇹 Italian, 🇪🇸 Spanish, 🇵🇹 Portuguese, 🇩🇪 German, 🇻🇳 Vietnamese, 🇰🇷 Korean and 🇨🇳 Chinese.
 🗞️ **Updates**
-* Jul 23, 2024: added support for tools, now available to search for information on the internet.
 """
 PLACEHOLDER = """
 <div style="padding: 30px; text-align: center; display: flex; flex-direction: column; align-items: center;">
-   <h1 style="font-size: 26px; margin-bottom: 2px; opacity: 0.20;">👻 Ghost 8B Beta</h1>
-   <p style="font-size: 18px; margin-bottom: 2px; opacity: 0.10;">Ask and share whatever you want ~</p>
 </div>
 """
@@ -55,231 +101,94 @@ LICENSE = """
 Ghost 8B Beta may give inaccurate information, including information about people, so please verify Ghost 8B Beta's answers. [Ghost 8B Beta](https://ghost-x.org/docs/models/ghost-8b-beta/) by [Ghost X](https://ghost-x.org).
 """
-EXAMPLES = [
-    [
-        "What is the significance of the Higgs boson in the Standard Model of particle physics?"
-    ],
-    [
-        "Qu'est-ce que l'effet fondateur et comment influence-t-il la diversité génétique d'une population?"
-    ],
-    ["Qual è il principio di Le Chatelier e come si applica agli equilibri chimici?"],
-    [
-        "¿Qué es una supernova y cuál es su importancia en la formación de elementos pesados en el universo?"
-    ],
-    [
-        "Qual é a definição formal de uma integral de linha e como é utilizada em física?"
-    ],
-    [
-        "Was versteht man unter dem Moho-Diskontinuität und welche Bedeutung hat sie für das Verständnis der Erdkruste?"
-    ],
-    [
-        "Hiện tượng nhà kính là gì và nó ảnh hưởng như thế nào đến biến đổi khí hậu toàn cầu?"
-    ],
-    [
-        "알고리즘의 시간 복잡도가 중요한 이유는 무엇이며, 시간 복잡도를 어떻게 분석하나요?"
-    ],
-    ["什么是CRISPR-Cas9基因编辑技术，它在现代生物学研究中的作用是什么？"],
-    [
-        "Create a Python function that takes a list of integers and returns the list sorted in ascending order without using the built-in sort or sorted functions."
-    ],
-    [
-        "Écrivez une fonction en C++ qui trouve le plus long sous-tableau contigu avec une somme égale à zéro."
-    ],
-    [
-        "Scrivi una funzione in Java che calcola il fattoriale di un numero utilizzando la ricorsione."
-    ],
-    [
-        "Desarrolla una función en JavaScript que determine si una cadena de texto es un palíndromo, ignorando espacios y signos de puntuación."
-    ],
-    ["Implemente uma função em C# que verifique se uma matriz quadrada é simétrica."],
-    [
-        "Schreiben Sie eine Funktion in Swift, die eine gegebene Zeichenfolge in umgekehrter Reihenfolge zurückgibt, ohne integrierte Funktionen zu verwenden."
-    ],
-    [
-        "Viết một hàm trong PHP để tìm tất cả các số nguyên tố trong một khoảng cho trước."
-    ],
-    [
-        "파이썬을 사용하여 주어진 이진 트리가 이진 탐색 트리인지 확인하는 함수를 작성하십시오."
-    ],
-    [
-        "用 Go 语言编写一个函数，计算给定字符串中每个字符出现的次数，并返回一个包含字符及其出现次数的映射。"
-    ],
-    [
-        "Can you help me design a detailed project plan for developing a machine learning model for predicting stock prices?"
-    ],
-    [
-        "Pouvez-vous m'aider à organiser un emploi du temps hebdomadaire pour maximiser la productivité de mon équipe de développement logiciel?"
-    ],
-    [
-        "Puoi aiutarmi a creare un piano di sviluppo per un'applicazione mobile che gestisce le prenotazioni di ristoranti?"
-    ],
-    [
-        "¿Podrías ayudarme a elaborar un plan detallado para la implementación de un sistema de gestión de contenido (CMS) en una empresa mediana?"
-    ],
-    [
-        "Você pode me ajudar a planejar uma estratégia de desenvolvimento para um sistema de comércio eletrônico escalável?"
-    ],
-    [
-        "Können Sie mir helfen, einen detaillierten Zeitplan für die Implementierung eines neuen ERP-Systems in unserem Unternehmen zu erstellen?"
-    ],
-    [
-        "Bạn có thể giúp tôi xây dựng một kế hoạch phát triển chi tiết cho dự án xây dựng hệ thống quản lý chuỗi cung ứng không?"
-    ],
-    [
-        "신경망 기반 이미지 인식 모델 개발을 위한 세부 프로젝트 계획을 세우는 데 도움을 줄 수 있나요?"
-    ],
-    ["你能帮我制定一个详细的开发计划，用于创建一个基于区块链的分布式账本系统吗？"],
-    [
-        "Prove that the sum of the squares of any two sides of a right triangle is equal to the square of the hypotenuse."
-    ],
-    [
-        "Calculez la force gravitationnelle entre deux masses de 10 kg chacune séparées par une distance de 1 mètre."
-    ],
-    [
-        "Determina la formula molecolare di un composto che contiene il 40% di carbonio, il 6.67% di idrogeno e il 53.33% di ossigeno in massa."
-    ],
-    [
-        "Explica la teoría del ciclo económico de Schumpeter y cómo se aplica a la economía moderna."
-    ],
-    [
-        "Calcule a energia potencial gravitacional de um objeto de 5 kg a uma altura de 10 metros acima do solo (g = 9,8 m/s²)."
-    ],
-    [
-        "Beweisen Sie, dass jede Primzahl der Form 4k+1 als Summe zweier Quadrate geschrieben werden kann."
-    ],
-    [
-        "Tính nồng độ mol của dung dịch H₂SO₄ khi hoà tan 98 gam H₂SO₄ vào nước để được 1 lít dung dịch."
-    ],
-    ["케인스 경제학의 핵심 개념과 그것이 현대 경제 정책에 미치는 영향을 설명하십시오."],
-    ["计算一个质量为2 kg的物体在3米高处的重力势能（g = 9.8 m/s²）。"],
-    [
-        'Identify the author of a novel that features a dystopian society where "Big Brother" watches over its citizens and the protagonist works for the Ministry of Truth.'
-    ],
-    [
-        "Quel est le seul mammifère capable de voler activement, souvent associé à la nuit et capable d'écholocalisation?"
-    ],
-    [
-        "Qual è l'opera letteraria italiana che narra il viaggio immaginario di un poeta attraverso Inferno, Purgatorio e Paradiso, guidato da Virgilio e Beatrice?"
-    ],
-    [
-        "¿Qué insecto es conocido por su organización social compleja, su capacidad para producir miel y su comunicación mediante la danza?"
-    ],
-    [
-        "Qual é o fenômeno atmosférico que ocorre quando uma massa de ar quente se encontra com uma massa de ar frio, resultando em uma violenta tempestade giratória?"
-    ],
-    [
-        "Welches literarische Werk beschreibt die Geschichte eines jungen Mädchens, das durch einen Kaninchenbau in eine fantastische Welt voller skurriler Charaktere fällt?"
-    ],
-    [
-        "Động vật nào có thể tái sinh toàn bộ cơ thể từ một mảnh nhỏ của chính nó, thường sống dưới nước và có thể có nhiều xúc tu?"
-    ],
-    [
-        "어떤 자연 현상은 태양빛이 대기 중의 물방울에 반사되고 굴절되어 발생하며, 하늘에 나타나는 여러 색깔의 아치 형태를 띠나요?"
-    ],
-    ["这部文学作品讲述了一位绅士和他的侍从的冒险故事，他们在"],
-    [
-        "Can you derive the Euler-Lagrange equation from the principle of stationary action in classical mechanics?"
-    ],
-    [
-        "Expliquez la notion de « différence ontologique » chez Martin Heidegger et son importance pour la phénoménologie."
-    ],
-    [
-        "Qual è il significato simbolico del colore blu nei dipinti di Giotto di Bondone durante il Rinascimento?"
-    ],
-    [
-        "¿Cómo afecta el cambio de código a la estructura gramatical en comunidades bilingües de habla español-inglés?"
-    ],
-    [
-        "Qual é o impacto da política monetária não convencional no controle da inflação durante uma crise econômica?"
-    ],
-    [
-        "Erklären Sie den Unterschied zwischen deterministischen und nicht-deterministischen endlichen Automaten und ihre Anwendungsbereiche."
-    ],
-    [
-        "Giải thích cơ chế của quá trình phiên mã ngược (reverse transcription) và tầm quan trọng của nó trong nghiên cứu HIV/AIDS."
-    ],
-    ["조선시대 성리학이 한국 사회와 문화에 미친 영향을 설명하세요."],
-    ["如何解释量子纠缠现象，以及它在量子计算中的潜在应用？"],
-    [
-        "How can you design a daily schedule that maximizes productivity for a remote worker who has multiple meetings and project deadlines?"
-    ],
-    [
-        "Quels sont les meilleures stratégies pour gérer les conflits au sein d'une équipe multiculturelle travaillant sur un projet commun?"
-    ],
-    [
-        "Quali sono i migliori consigli per mantenere un equilibrio tra vita professionale e vita privata in un ambiente lavorativo stressante?"
-    ],
-    [
-        "¿Cómo se puede elaborar un plan financiero personal efectivo que incluya ahorro para la jubilación, inversión y manejo de deudas?"
-    ],
-    [
-        "Quais são as melhores práticas para implementar metodologias ágeis em uma equipe de desenvolvimento de software?"
-    ],
-    [
-        "Welche Strategien können verwendet werden, um ein starkes berufliches Netzwerk aufzubauen und zu pflegen, insbesondere in der Tech-Branche?"
-    ],
-    [
-        "Những bước nào cần thiết để xây dựng một lộ trình phát triển sự nghiệp bền vững trong lĩnh vực công nghệ thông tin?"
-    ],
-    ["프로젝트�� 범위 변동을 효과적으로 관리하기 위한 최고의 방법은 무엇인가요?"],
-    ["在快速变化的职场环境中，如何有效地实现工作与生活的平衡？"],
-    [
-        "Write an argumentative essay discussing the pros and cons of artificial intelligence in the workplace, including potential ethical concerns."
-    ],
-    [
-        "Analysez les impacts sociaux et économiques de la digitalisation sur les petites entreprises en France."
-    ],
-    [
-        "Scrivi un'email formale al direttore di una rivista per proporre un articolo sulla sostenibilità ambientale nelle città italiane."
-    ],
-    [
-        "Elabora un informe detallado sobre los efectos del cambio climático en la biodiversidad de la región amazónica."
-    ],
-    [
-        "Analise criticamente os principais pontos abordados no relatório anual do Banco Mundial sobre a pobreza global."
-    ],
-    [
-        "Erstellen Sie eine technische Dokumentation für die Implementierung eines neuen Software-Features in einer bestehenden Anwendung."
-    ],
-    [
-        "Viết một bài luận phân tích về tác động của cuộc cách mạng công nghiệp 4.0 đối với thị trường lao động Việt Nam."
-    ],
-    [
-        "인공지능의 윤리적 문제에 대한 연구 논문을 작성하고, 다양한 사례를 통해 그 영향을 분석하세요."
-    ],
-    ["分析鲁迅的小说《阿Q正传》中反映的中国社会问题和作者的批判态度。"],
-]
 if not torch.cuda.is_available():
     DESCRIPTION += "\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>"
 if torch.cuda.is_available():
-    model_id = "ghost-x/ghost-8b-beta"
     hf_serect = os.getenv("HF_TOKEN", None)
-    model = AutoModelForCausalLM.from_pretrained(
-        model_id,
         device_map="auto",
         torch_dtype=torch.bfloat16,
-        attn_implementation="flash_attention_2",
         trust_remote_code=True,
         token=hf_serect,
     )
-    tokenizer = AutoTokenizer.from_pretrained(
-        model_id,
         trust_remote_code=True,
         token=hf_serect,
     )
     SelfExtend.apply(
-        model,
         group_size=16,
         window_size=512,
         enable_flash_attention=True,
         flash_attention_impl="flash_attn",
     )
-    model.generation_config.max_length = 123392
-waiting_tools_timeout = 10
 supported_tools = json.dumps(
     [
         {
@@ -319,6 +228,22 @@ supported_tools = json.dumps(
 @lru_cache(maxsize=128)
 def extract_text_from_webpage(html_content):
     soup = BeautifulSoup(html_content, "html.parser")
     for tag in soup(["script", "style", "header", "footer", "nav", "form", "svg"]):
         tag.extract()
@@ -330,6 +255,23 @@ def search_with_wikipedia(
     query: str,
     language: str = "en",
 ):
     all_results = []
     try:
         wikipedia.set_lang(language)
@@ -346,9 +288,39 @@ def search_with_google(
     language: str = "en",
     ssl_verify: bool = None,
 ):
     all_results = []
     max_chars_per_page = 4096
     with requests.Session() as session:
         resp = session.get(
             url="https://www.google.com/search",
             headers={
@@ -363,36 +335,118 @@ def search_with_google(
             timeout=timeout,
             verify=ssl_verify,
         )
         resp.raise_for_status()
         soup = BeautifulSoup(resp.text, "html.parser")
         result_block = soup.find_all("div", attrs={"class": "g"})
         for result in result_block:
             link = result.find("a", href=True)
             if link:
                 link = link["href"]
                 try:
                     webpage = session.get(
                         link,
                         headers={
                             "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0"
                         },
                     )
                     webpage.raise_for_status()
                     visible_text = extract_text_from_webpage(webpage.text)
                     if len(visible_text) > max_chars_per_page:
                         visible_text = visible_text[:max_chars_per_page]
                     all_results.append({"link": link, "text": visible_text})
                 except requests.exceptions.RequestException as e:
                     print(f"Error fetching or processing {link}: {e}")
                     pass
             else:
                 pass
     return all_results
-@spaces.GPU(duration=180)
-def generate(
-    message: str,
     chat_history: list[tuple[str, str]],
     allow_used_tools: bool = True,
     system_prompt: str = "",
@@ -401,48 +455,44 @@ def generate(
     top_p: float = 0.95,
     top_k: int = 50,
     repetition_penalty: float = 1.0,
-    other_client_info: str = None,
 ) -> Iterator[str]:
-    # print()
-    # print("allow_used_tools:\n", allow_used_tools)
-    # print("system_prompt:\n", system_prompt)
-    # print("max_new_tokens:\n", max_new_tokens)
-    # print("temperature:\n", temperature)
     def build_input_ids(
         apply_tools: bool = None,
         references=None,
     ):
         conversation = []
         if system_prompt:
             conversation.append({"role": "system", "content": system_prompt})
         if apply_tools is True:
             conversation.append({"role": "tools", "content": supported_tools})
         if references is None:
-            references = [other_client_info]
         else:
-            references.insert(0, other_client_info)
         if (
             references is not None
             and isinstance(references, list)
             and len(references) > 0
         ):
             conversation.append(
                 {
                     "role": "refs",
-                    "content": json.dumps(
-                        {
-                            "instructions": "These are only general documents used for reference to give the most accurate and honest answers possible. Ignore it if it's irrelevant and don't overuse it.",
-                            "documents": references,
-                        },
-                        indent=2,
-                        ensure_ascii=False,
-                    ),
                 }
             )
         for user, assistant in chat_history:
             conversation.extend(
                 [
@@ -450,12 +500,28 @@ def generate(
                     {"role": "assistant", "content": assistant},
                 ]
             )
-        conversation.append({"role": "user", "content": message})
-        input_ids = tokenizer.apply_chat_template(
             conversation, add_generation_prompt=True, return_tensors="pt"
         )
-        input_ids = input_ids.to(model.device)
         if input_ids.shape[1] > MAX_INPUT_TOKEN_LENGTH:
             input_ids = input_ids[:, -MAX_INPUT_TOKEN_LENGTH:]
             gr.Warning(
@@ -463,10 +529,13 @@ def generate(
             )
         return input_ids
     def generate_chat_responses(
         previous_response: str = None,
     ):
         document_references = []
         if previous_response is not None:
             scheduled_tools_runs = None
             try:
@@ -481,6 +550,7 @@ def generate(
                 print(e)
                 pass
             if (
                 scheduled_tools_runs is not None
                 and scheduled_tools_runs["name"] == "search_on_internet"
@@ -488,9 +558,8 @@ def generate(
                 keyword = scheduled_tools_runs["arguments"]["keyword"]
                 search_type = scheduled_tools_runs["arguments"]["type"]
                 language = scheduled_tools_runs["arguments"]["language"]
-                print(
-                    "scheduled_tools_runs:", scheduled_tools_runs
-                )
                 if search_type == "wikipedia":
                     gr.Info(
                         "Searching for information on the Wikipedia.",
@@ -501,6 +570,7 @@ def generate(
                         search_with_wikipedia(query=keyword, language=language)
                     )
                 gr.Info("Searching for information on the Google.")
                 document_references.extend(
                     search_with_google(
@@ -509,20 +579,25 @@ def generate(
                         num_results=3,
                     )
                 )
-                print(
-                    "document_references:", document_references
-                )
         apply_tools = (
             True if allow_used_tools is True and previous_response is None else False
         )
         input_ids = build_input_ids(
             apply_tools=apply_tools,
             references=document_references,
         )
         streamer = TextIteratorStreamer(
-            tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True
         )
         generate_kwargs = dict(
             input_ids=input_ids,
             streamer=streamer,
@@ -537,9 +612,14 @@ def generate(
             generate_kwargs["top_p"] = top_p
             generate_kwargs["top_k"] = top_k
-        t = Thread(target=model.generate, kwargs=generate_kwargs)
         t.start()
         state = {
             "mark": None,
             "respond": False,
@@ -556,6 +636,7 @@ def generate(
                 state["respond"] = True
                 yield "".join(outputs)
         if (
             apply_tools is True
             and state["respond"] is False
@@ -564,22 +645,126 @@ def generate(
             previous_response = "".join(outputs)
             yield from generate_chat_responses(previous_response=previous_response)
     yield from generate_chat_responses(previous_response=None)
 chatbot = gr.Chatbot(
-    height=500, placeholder=PLACEHOLDER, label="Ghost 8B Beta", show_copy_button=True
 )
 chat_interface = gr.ChatInterface(
     fn=generate,
     chatbot=chatbot,
     fill_height=True,
     additional_inputs=[
         gr.Checkbox(
-            label="Allow used tools (available: search on internet)", value=False
         ),
-        gr.Textbox(label="System prompt", lines=6),
         gr.Slider(
             label="Max new tokens",
             minimum=1,
@@ -616,23 +801,26 @@ chat_interface = gr.ChatInterface(
             value=1.0,
         ),
         gr.Textbox(
-            label="Other client information",
             lines=1,
-            value="This user's current time: {}".format(time.strftime("%Y-%m-%d")),
             visible=False,
         ),
     ],
     stop_btn="Stop",
     cache_examples=False,
-    examples=EXAMPLES,
-    examples_per_page=9,
     concurrency_limit=100,
 )
-with gr.Blocks(fill_height=True, css="style.css") as demo:
     gr.Markdown(DESCRIPTION)
     chat_interface.render()
     gr.Markdown(LICENSE)
 if __name__ == "__main__":
-    demo.queue(max_size=20).launch(share=True)

 import subprocess
 import json
 import requests
+import zlib
+from PIL import Image
 subprocess.run(
     f"pip install flash-attn --no-build-isolation",
 import gradio as gr
 import spaces
 import torch
+import logging
 import wikipedia
 import time
 import SelfExtend
+from transformers import (
+    AutoModelForCausalLM,
+    AutoTokenizer,
+    AutoProcessor,
+    TextIteratorStreamer,
+)
+from transformers.dynamic_module_utils import get_imports
 from bs4 import BeautifulSoup
 from functools import lru_cache
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
 MAX_MAX_NEW_TOKENS = 8192
 DEFAULT_MAX_NEW_TOKENS = 2048
 MAX_INPUT_TOKEN_LENGTH = int(os.getenv("MAX_INPUT_TOKEN_LENGTH", "123392"))
+DEFAULT_SYSTEM_PROMPT = """\
+You are a helpful and intelligent AI, trained by Ghost X and named Ghost 8B Beta (often referred to as Ghost Beta).
+You're known for your honesty, spreading positivity, and always striving to assist users. Your expertise lies in understanding their needs and providing insightful suggestions, drawing upon your knowledge and interests.  If a query exceeds your understanding, you'll be upfront and state you're unsure, avoiding fabricated responses. You enjoy incorporating emojis to enhance interactions, but maintain a balanced approach for a natural flow. Let's engage in a meaningful conversation, keeping in mind the user's language.
+"""
+# DEFAULT_SYSTEM_PROMPT = """\
+# You are a helpful and intelligent AI, trained by Ghost X and named Ghost 8B Beta (often referred to as 8B Beta).
+# You're known for your honesty, spreading positivity, and always striving to assist users. Your expertise lies in understanding their needs and providing insightful suggestions, drawing upon your knowledge and interests.  If a query exceeds your understanding, you'll be upfront and state you're unsure, avoiding fabricated responses. You enjoy incorporating emojis to enhance interactions, but maintain a balanced approach for a natural flow. Let's engage in a meaningful conversation, keeping in mind the user's language.
+# A guide to dealing with extremely complex questions or challenges. Follow these steps to solve them:
+# 1. Deconstructing Complexity
+# Imagine a puzzle with intricate pieces.  I'll present a challenging question. Your task:  Break down this question into smaller, distinct parts.  Label each part with a specific theme or aspect related to the problem.  This will help us understand the multifaceted nature of the query and prepare for a structured solution.
+# 2. Reconstructing Insights
+# Once we've successfully dissected the problem into manageable components,  assemble these parts like a puzzle.  Focus on identifying connections, potential overlaps, and key information from each theme.  The goal is to reconstruct a cohesive, well-rounded answer that addresses the original complexity of the question.
+# """
+HEAD = """
+<script>
+function schedule_updates() {
+    const client_info_element = document.querySelector("#client_info textarea");
+    client_info_element.value = "The current time is now: " + new Date().toLocaleString('en-US', {weekday: 'short'});
+    client_info_element.dispatchEvent(new Event('input'));
+}
+function bootstrap() {
+  setInterval(schedule_updates, 1000);
+};
+bootstrap();
+</script>
+"""
 DESCRIPTION = """\
+# Ghost 8B Beta (β, 128k)
+**Ghost 8B Beta** outperforms leading models like Llama 3.1 8B Instruct and GPT-3.5 Turbo in lc_winrate scores. It also surpasses Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large in AlpacaEval 2.0 winrate scores. The model offers two context length versions: [8k](https://huggingface.co/spaces/lamhieu/ghost-8b-beta-8k) and [128k](https://huggingface.co/spaces/lamhieu/ghost-8b-beta-128k), both with built-in multilingual function support.
+Supported languages: 🇬🇧 English, 🇻🇳 Vietnamese, 🇰🇷 Korean, 🇪🇸 Spanish, 🇵🇹 Portuguese, 🇨🇳 Chinese, 🇫🇷 French, 🇮🇹 Italian, 🇩🇪 German, 🇯🇵 Japanese, 🇷🇺 Russian, 🇵🇱 Polish, 🇳🇱 Dutch, 🇮🇳 Hindi, 🇹🇷 Turkish, 🇮🇩 Indonesian.
+Note: with the image will be used another model to explain rather than using directly the Ghost 8B Beta model.
 🗞️ **Updates**
+* Aug 16, 2024: Released version 160824, expanding language support from 9 to 16 languages and improving math, reasoning, and instruction-following capabilities.
+* Jul 23, 2024: Added internet search tools.
 """
 PLACEHOLDER = """
 <div style="padding: 30px; text-align: center; display: flex; flex-direction: column; align-items: center;">
+    <h1 style="font-size: 26px; margin-bottom: 2px; opacity: 0.20;">👋 Welcome to the Ghost 8B Beta Playground! 🎉</h1>
+    <p style="font-size: 18px; margin-bottom: 2px; opacity: 0.10;">Ask me anything and let's have some fun! 🤔💡</p>
 </div>
 """
 Ghost 8B Beta may give inaccurate information, including information about people, so please verify Ghost 8B Beta's answers. [Ghost 8B Beta](https://ghost-x.org/docs/models/ghost-8b-beta/) by [Ghost X](https://ghost-x.org).
 """
 if not torch.cuda.is_available():
     DESCRIPTION += "\n<p>Running on CPU 🥶 This demo does not work on CPU.</p>"
+def workaround_fixed_get_imports(filename: str | os.PathLike) -> list[str]:
+    """
+    Workaround for fixed get_imports function.
+    @args:
+        filename (str | os.PathLike): The filename or path to the file.
+    @returns:
+        list[str]: The list of imports.
+    @remarks:
+        - This function is a workaround for the fixed get_imports function.
+        - It checks if the filename ends with "/modeling_florence2.py".
+        - If it doesn't, it calls the original get_imports function.
+        - If it does, it calls the original get_imports function and removes the "flash_attn" import.
+    @usage:
+        ```python
+        from unittest.mock import patch
+        image_torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+        with patch(
+            "transformers.dynamic_module_utils.get_imports", workaround_fixed_get_imports
+        ):
+        ```
+    """
+    if not str(filename).endswith("/modeling_florence2.py"):
+        return get_imports(filename)
+    imports = get_imports(filename)
+    imports.remove("flash_attn")
+    return imports
 if torch.cuda.is_available():
     hf_serect = os.getenv("HF_TOKEN", None)
+    attn_implementation = "flash_attention_2"
+    chat_model_id = "ghost-x/ghost-8b-beta-1608"
+    chat_device = torch.device("cuda")
+    chat_model = AutoModelForCausalLM.from_pretrained(
+        chat_model_id,
         device_map="auto",
         torch_dtype=torch.bfloat16,
+        attn_implementation=attn_implementation,
         trust_remote_code=True,
         token=hf_serect,
     )
+    chat_tokenizer = AutoTokenizer.from_pretrained(
+        chat_model_id,
         trust_remote_code=True,
         token=hf_serect,
     )
     SelfExtend.apply(
+        chat_model,
         group_size=16,
         window_size=512,
         enable_flash_attention=True,
         flash_attention_impl="flash_attn",
     )
+    chat_model.generation_config.max_length = 123392
+    image_model_id = "microsoft/Florence-2-large"
+    # image_device = "cuda" if torch.cuda.is_available() else "cpu"
+    # image_torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
+    image_device = "cpu"
+    image_torch_dtype = torch.float32
+    image_model = (
+        AutoModelForCausalLM.from_pretrained(
+            image_model_id,
+            torch_dtype=image_torch_dtype,
+            trust_remote_code=True,
+            token=hf_serect,
+        )
+        .to(image_device)
+        .eval()
+    )
+    image_processor = AutoProcessor.from_pretrained(
+        image_model_id,
+        trust_remote_code=True,
+        token=hf_serect,
+    )
+waiting_tools_timeout = 5
 supported_tools = json.dumps(
     [
         {
 @lru_cache(maxsize=128)
 def extract_text_from_webpage(html_content):
+    """
+    Extracts visible text from an HTML webpage.
+    @args:
+        html_content (str): The HTML content of the webpage.
+    @returns:
+        str: The visible text extracted from the webpage.
+    @remarks:
+        - This function uses the BeautifulSoup library to parse the HTML content.
+        - It removes certain tags (script, style, header, footer, nav, form, svg) from the parsed HTML.
+        - The remaining visible text is then extracted using the `get_text` method of BeautifulSoup.
+        - The extracted text is stripped of leading/trailing whitespace and separated by a single space.
+    """
     soup = BeautifulSoup(html_content, "html.parser")
     for tag in soup(["script", "style", "header", "footer", "nav", "form", "svg"]):
         tag.extract()
     query: str,
     language: str = "en",
 ):
+    """
+    Search for a given query on Wikipedia and return the summary.
+    @args:
+        query (str): The search query.
+        language (str, optional): The language code for the Wikipedia page. Defaults to "en".
+    @returns:
+        list: A list containing the summary of the Wikipedia page.
+    @remarks:
+        - This function uses the Wikipedia API to search for the given query.
+        - The language parameter determines the language of the Wikipedia page to search.
+        - If the search is successful, the function returns a list containing the summary of the page.
+        - If an exception occurs during the search, an empty list is returned.
+    """
     all_results = []
     try:
         wikipedia.set_lang(language)
     language: str = "en",
     ssl_verify: bool = None,
 ):
+    """
+    Searches Google for the given query and returns a list of search results.
+    @args:
+        query (str): The search query.
+        num_results (int, optional): The number of search results to retrieve. Defaults to 3.
+        timeout (int, optional): The timeout value for the HTTP requests. Defaults to 5.
+        language (str, optional): The language for the search results. Defaults to "en".
+        ssl_verify (bool, optional): Whether to verify SSL certificates. Defaults to None.
+    @returns:
+        list: A list of dictionaries containing the link and visible text of each search result.
+    @remarks:
+        - This function uses the requests library to send HTTP requests to Google.
+        - It sets the User-Agent header to mimic a Firefox browser.
+        - The search results are retrieved from the HTML response using BeautifulSoup.
+        - Each search result is represented as a dictionary with "link" and "text" keys.
+        - The "link" key contains the URL of the search result.
+        - The "text" key contains the visible text extracted from the search result webpage.
+        - If the visible text exceeds 4096 characters, it is truncated to that length.
+        - If an error occurs while fetching or processing a search result, it is printed and ignored.
+    """
+    # Initialize an empty list to store the search results
     all_results = []
+    # Define the maximum number of characters per page
     max_chars_per_page = 4096
+    # Create a session object to send HTTP requests
     with requests.Session() as session:
+        # Send a GET request to Google search with the specified query parameters
         resp = session.get(
             url="https://www.google.com/search",
             headers={
             timeout=timeout,
             verify=ssl_verify,
         )
+        # Raise an exception if the response status code is not successful
         resp.raise_for_status()
+        # Parse the HTML response using BeautifulSoup
         soup = BeautifulSoup(resp.text, "html.parser")
+        # Find all the result blocks in the HTML
         result_block = soup.find_all("div", attrs={"class": "g"})
+        # Iterate over each result block
         for result in result_block:
+            # Find the link element within the result block
             link = result.find("a", href=True)
+            # If a link is found, extract the URL and process the webpage
             if link:
                 link = link["href"]
                 try:
+                    # Send a GET request to the link URL
                     webpage = session.get(
                         link,
                         headers={
                             "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/111.0"
                         },
                     )
+                    # Raise an exception if the response status code is not successful
                     webpage.raise_for_status()
+                    # Extract the visible text from the webpage
                     visible_text = extract_text_from_webpage(webpage.text)
+                    # Truncate the visible text if it exceeds the maximum number of characters per page
                     if len(visible_text) > max_chars_per_page:
                         visible_text = visible_text[:max_chars_per_page]
+                    # Append the link and visible text to the search results list
                     all_results.append({"link": link, "text": visible_text})
                 except requests.exceptions.RequestException as e:
+                    # Print an error message if there is an error fetching or processing the link
                     print(f"Error fetching or processing {link}: {e}")
                     pass
             else:
                 pass
+    # Return the search results
     return all_results
+@lru_cache(maxsize=128)
+def extract_text_from_image(file: str) -> str:
+    """
+    Extracts text from an image file.
+    @args:
+        file (str): The path or URL of the image file.
+    @returns:
+        str: The extracted text from the image.
+    @remarks:
+        - This function uses an LRU cache to store previously processed images for faster retrieval.
+        - The image file can be either a local file path or a URL.
+        - The function opens the image file using the PIL library.
+        - The function processes the image using an image processor.
+        - The processed image is then passed to a text generation model to generate text.
+        - The generated text is post-processed to obtain the final extracted text.
+    """
+    # Define the task and load the image
+    task = "<MORE_DETAILED_CAPTION>"
+    image = Image.open(
+        requests.get(file, stream=True).raw
+        if file.startswith("http")
+        else open(file, "rb")
+    )
+    if image.mode != "RGB":
+        image = image.convert("RGB")
+    # Preprocess the image using the image processor
+    inputs = image_processor(text=task, images=image, return_tensors="pt").to(
+        "cpu", image_torch_dtype
+    )
+    # Generate text based on the input image
+    generated_ids = image_model.generate(
+        input_ids=inputs["input_ids"],
+        pixel_values=inputs["pixel_values"],
+        max_new_tokens=1024,
+        num_beams=3,
+        do_sample=False,
+    )
+    # Decode the generated text and post-process the answer
+    generated_text = image_processor.batch_decode(
+        generated_ids, skip_special_tokens=False
+    )[0]
+    parsed_answer = image_processor.post_process_generation(
+        generated_text,
+        task=task,
+        image_size=(image.width, image.height),
+    )
+    # Return the parsed answer for the specified task
+    return parsed_answer[task]
+@spaces.GPU(duration=90)
+def generate_chat(
+    uuid: str,
+    message: dict,
     chat_history: list[tuple[str, str]],
     allow_used_tools: bool = True,
     system_prompt: str = "",
     top_p: float = 0.95,
     top_k: int = 50,
     repetition_penalty: float = 1.0,
+    client_info: str = None,
 ) -> Iterator[str]:
+    # Build the input_ids for the chat conversation
     def build_input_ids(
         apply_tools: bool = None,
         references=None,
     ):
         conversation = []
+        # Add the system prompt to the conversation
         if system_prompt:
             conversation.append({"role": "system", "content": system_prompt})
+        # Add the tools role to the conversation if apply_tools is True
         if apply_tools is True:
             conversation.append({"role": "tools", "content": supported_tools})
+        # Add the references role to the conversation
         if references is None:
+            references = [client_info]
         else:
+            references.insert(0, client_info)
         if (
             references is not None
             and isinstance(references, list)
             and len(references) > 0
         ):
+            formatted_references = f"Analyze the provided references, extract relevant information to provide accurate and objective feedback. This reference information may include: conversation context, assistant or user memories, reasoning guides, problem-solving suggestions, assistant rules, etc.\nIf the reference is not relevant, ignore it. Try to have a balanced approach, avoiding over-reliance on the documentation."
+            formatted_references += "\n\n" + ("\n\n".join(references))
             conversation.append(
                 {
                     "role": "refs",
+                    "content": formatted_references,
                 }
             )
+        # Add the chat history to the conversation
         for user, assistant in chat_history:
             conversation.extend(
                 [
                     {"role": "assistant", "content": assistant},
                 ]
             )
+        # Add the user message with image attachments to the conversation
+        conversation.append(
+            {
+                "role": "user",
+                "content": (
+                    f"{' & '.join(message['attachments'])}\n\n{message['text']}"
+                    if "attachments" in message and len(message["attachments"]) > 0
+                    else f"{message['text']}"
+                ),
+            }
+        )
+        logger.debug(f"UUID: {uuid} - Conversation: {conversation}")
+        # Apply the chat template to convert the conversation into input_ids
+        input_ids = chat_tokenizer.apply_chat_template(
             conversation, add_generation_prompt=True, return_tensors="pt"
         )
+        input_ids = input_ids.to(chat_model.device)
+        # Trim the input_ids if it exceeds the maximum token length
         if input_ids.shape[1] > MAX_INPUT_TOKEN_LENGTH:
             input_ids = input_ids[:, -MAX_INPUT_TOKEN_LENGTH:]
             gr.Warning(
             )
         return input_ids
+    # Generate chat responses based on the input_ids
     def generate_chat_responses(
         previous_response: str = None,
     ):
         document_references = []
+        # Check if the previous response contains scheduled tool runs
         if previous_response is not None:
             scheduled_tools_runs = None
             try:
                 print(e)
                 pass
+            # If scheduled tool runs exist, perform the corresponding searches
             if (
                 scheduled_tools_runs is not None
                 and scheduled_tools_runs["name"] == "search_on_internet"
                 keyword = scheduled_tools_runs["arguments"]["keyword"]
                 search_type = scheduled_tools_runs["arguments"]["type"]
                 language = scheduled_tools_runs["arguments"]["language"]
+                # Search on Wikipedia if the search type is "wikipedia"
                 if search_type == "wikipedia":
                     gr.Info(
                         "Searching for information on the Wikipedia.",
                         search_with_wikipedia(query=keyword, language=language)
                     )
+                # Search on Google
                 gr.Info("Searching for information on the Google.")
                 document_references.extend(
                     search_with_google(
                         num_results=3,
                     )
                 )
+                print("document_references:", document_references)
+        # Determine if tools should be applied based on the allow_used_tools flag
         apply_tools = (
             True if allow_used_tools is True and previous_response is None else False
         )
+        # Build the input_ids for the chat conversation
         input_ids = build_input_ids(
             apply_tools=apply_tools,
             references=document_references,
         )
+        # Create a TextIteratorStreamer to generate chat responses
         streamer = TextIteratorStreamer(
+            chat_tokenizer, timeout=10.0, skip_prompt=True, skip_special_tokens=True
         )
+        # Set the generation parameters
         generate_kwargs = dict(
             input_ids=input_ids,
             streamer=streamer,
             generate_kwargs["top_p"] = top_p
             generate_kwargs["top_k"] = top_k
+        # Start the generation process in a separate thread
+        t = Thread(target=chat_model.generate, kwargs=generate_kwargs)
         t.start()
+        logger.debug(
+            f"UUID: {uuid} - Is apply tools: {apply_tools} - Is apply documents: {len(document_references) > 0} - Is previous response: {previous_response is not None} - Start generating chat responses"
+        )
         state = {
             "mark": None,
             "respond": False,
                 state["respond"] = True
                 yield "".join(outputs)
+        # If tools are applied and no response is generated within the timeout, continue generating chat responses
         if (
             apply_tools is True
             and state["respond"] is False
             previous_response = "".join(outputs)
             yield from generate_chat_responses(previous_response=previous_response)
+    # Yield the generated chat responses
     yield from generate_chat_responses(previous_response=None)
+def generate(
+    message: dict,
+    chat_history: list[tuple[str, str]],
+    allow_used_tools: bool = True,
+    system_prompt: str = "",
+    max_new_tokens: int = 1536,
+    temperature: float = 0.4,
+    top_p: float = 0.95,
+    top_k: int = 50,
+    repetition_penalty: float = 1.0,
+    client_info: str = None,
+) -> Iterator[str]:
+    # Generate a unique identifier using the The current time is now
+    uuid = zlib.crc32(str.encode(str(time.time())))
+    logger.info(f"UUID: {uuid} - Starting image text extraction process")
+    # Limit the number of files to process to 2
+    if len(message["files"]) > 2:
+        gr.Warning("Only the first 2 images will be processed.")
+    message["files"] = message["files"][:2]
+    # Extract text from each image file and replace the file path with an attachment tag containing the extracted text
+    message["attachments"] = handle_file_extraction(
+        files=list(message["files"]), uuid=uuid
+    )
+    logger.debug(f"UUID: {uuid} - Image text extraction process completed")
+    logger.debug(f"UUID: {uuid} - Previous chat history: {chat_history}")
+    for idx, chat_pair in enumerate(chat_history):
+        user_message, assistant_message = chat_pair
+        if not isinstance(user_message, str) and assistant_message is None:
+            text_descriptions = handle_file_extraction(
+                files=list(user_message), uuid=uuid
+            )
+            chat_input = (
+                f"{' & '.join(text_descriptions)}\n\n{chat_history[idx + 1][0]}"
+            )
+            chat_history[idx + 1][0] = chat_input
+            chat_history[idx] = [None, None]
+            logger.debug(
+                f"UUID: {uuid} - Updated chat history: {chat_history} - Updated chat input: {chat_input}"
+            )
+    chat_history = list(
+        filter(lambda x: x[0] is not None and x[1] is not None, chat_history)
+    )
+    logger.debug(f"UUID: {uuid} - Filtered chat history: {chat_history}")
+    yield from generate_chat(
+        uuid=uuid,
+        message=message,
+        chat_history=chat_history,
+        allow_used_tools=allow_used_tools,
+        system_prompt=system_prompt,
+        max_new_tokens=max_new_tokens,
+        temperature=temperature,
+        top_p=top_p,
+        top_k=top_k,
+        repetition_penalty=repetition_penalty,
+        client_info=client_info,
+    )
+def handle_file_extraction(files: list[str], uuid: str):
+    """
+    Extracts text from images in the given message's files and returns a list of attachments.
+    @args:
+        message (dict): The message containing files to extract text from.
+        uuid (str): The UUID associated with the extraction process.
+    @returns:
+        list: A list of attachments, each represented as a string.
+    @memarks:
+        - This function iterates over the files in the message and extracts text from each image file.
+        - The extracted text is logged along with the UUID and file information.
+        - The extracted text is then added to the attachments list as a string representation of an attachment.
+        - The attachments list is returned at the end of the function.
+    """
+    attachments = []
+    for idx, file_to_extract in enumerate(files):
+        extracted_text = extract_text_from_image(file=file_to_extract)
+        logger.info(
+            f"UUID: {uuid} - File: {file_to_extract} - Extracted text: {extracted_text}"
+        )
+        attachments.append(
+            f'<attachment index="{idx}" type="image" description="{extracted_text}" />'
+        )
+    return attachments
 chatbot = gr.Chatbot(
+    height=500,
+    placeholder=PLACEHOLDER,
+    label="Ghost 8B Beta (β, 128k)",
+    show_copy_button=True,
 )
 chat_interface = gr.ChatInterface(
     fn=generate,
     chatbot=chatbot,
     fill_height=True,
+    multimodal=True,
+    textbox=gr.MultimodalTextbox(
+        file_types=["image"],
+        placeholder="Type a message...",
+    ),
     additional_inputs=[
         gr.Checkbox(
+            label="Allow used tools (available: search on internet)",
+            value=False,
         ),
+        gr.Textbox(label="System prompt", lines=6, value=DEFAULT_SYSTEM_PROMPT),
         gr.Slider(
             label="Max new tokens",
             minimum=1,
             value=1.0,
         ),
         gr.Textbox(
+            elem_id="client_info",
+            label="Client info",
             lines=1,
+            value="The current time is now: {}".format(
+                time.strftime("%A, %D %B %Y %H:%M:%S")
+            ),
             visible=False,
         ),
     ],
     stop_btn="Stop",
     cache_examples=False,
+    examples=[],
+    examples_per_page=10,
     concurrency_limit=100,
 )
+with gr.Blocks(fill_height=True, css="style.css", head=HEAD) as demo:
     gr.Markdown(DESCRIPTION)
     chat_interface.render()
     gr.Markdown(LICENSE)
 if __name__ == "__main__":
+    demo.queue().launch(share=True)

requirements.txt CHANGED Viewed

@@ -1,10 +1,11 @@
-accelerate==0.30.1
-bitsandbytes==0.43.1
-gradio==4.39.0
 scipy==1.13.0
 sentencepiece==0.2.0
-spaces==0.28.3
 torch==2.0.0
-transformers==4.41.0
 beautifulsoup4>=4.9
 wikipedia==1.4.0

+accelerate
+bitsandbytes
+gradio
+spaces
+transformers
+timm
 scipy==1.13.0
 sentencepiece==0.2.0
 torch==2.0.0
 beautifulsoup4>=4.9
 wikipedia==1.4.0