Spaces:
Running
on
Zero
Running
on
Zero
初始化 LangGraph 聊天应用
Browse files
GEMINI.md
ADDED
|
@@ -0,0 +1,68 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gemini 工作流与记忆
|
| 2 |
+
|
| 3 |
+
## 工作规则
|
| 4 |
+
- 我会始终跟踪「项目目标」。
|
| 5 |
+
- 我会根据你的建议随时调整「子目标」。
|
| 6 |
+
- 我的工作核心是:将「子目标」拆解为「Todolist」中的具体任务,并聚焦于执行当前任务。
|
| 7 |
+
- 我会随时反思「Todolist」中的任务是否偏离了最终的「项目目标」。
|
| 8 |
+
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# 项目目标
|
| 12 |
+
## 未完成
|
| 13 |
+
- [ ] 构建一个能够综合利用 `Ring-mini-2.0` 和 `Ling-flash-2.0` (或其量化版本) 的工作流应用。
|
| 14 |
+
|
| 15 |
+
## 已完成
|
| 16 |
+
- (暂无)
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
# 子目标
|
| 21 |
+
## 未完成
|
| 22 |
+
- [ ] **(进行中)** 解决模型体积过大导致部署失败的问题。
|
| 23 |
+
- [ ] (已暂停) 实现自动化部署和验证流程。
|
| 24 |
+
|
| 25 |
+
## 已完成
|
| 26 |
+
- [x] 使用 LangGraph 实现一个可以路由两个模型的聊天网页应用。
|
| 27 |
+
|
| 28 |
+
---
|
| 29 |
+
|
| 30 |
+
# Todolist
|
| 31 |
+
## 未完成
|
| 32 |
+
- [ ] **当前任务**: 修改 `app.py`,移除 `Ling-flash-2.0` 模型,只保留 `Ring-mini-2.0`。
|
| 33 |
+
- [ ] (待定) 根据用户找到的量化模型,更新 `app.py` 中的模型路径。
|
| 34 |
+
- [ ] (已暂停) 搜索 `huggingface_hub` 文档,确认是否存在用于重启 Space 的 API。
|
| 35 |
+
|
| 36 |
+
## 已完成
|
| 37 |
+
- [x] **(用户决策)** 确认 `Ling-flash-2.0` 模型过大,暂时移除,仅使用 `Ring-mini-2.0`。
|
| 38 |
+
- [x] 搭建 LangGraph 基础架构并重构 `app.py`。
|
| 39 |
+
- [x] 实现基于用户输入的模型路由逻辑。
|
| 40 |
+
- [x] 修复 `NameError: name 'operator' is not defined` 的 bug。
|
| 41 |
+
- [x] 在 `README.md` 中链接模型。
|
| 42 |
+
- [x] 创建并维护 `GEMINI.md` 文件。
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## 核心模型
|
| 47 |
+
- `inclusionAI/Ring-mini-2.0` (https://huggingface.co/inclusionAI/Ring-mini-2.0)
|
| 48 |
+
|
| 49 |
+
## 技术栈及限制
|
| 50 |
+
- **语言:** Python
|
| 51 |
+
- **框架:** Gradio
|
| 52 |
+
- **推理逻辑:** 由于这些模型没有 API 服务方,推理逻辑必须使用 PyTorch 自行实现。**禁止使用 `InferenceClient`**。
|
| 53 |
+
|
| 54 |
+
## 依赖包 (Dependencies)
|
| 55 |
+
- [`gradio`](https://pypi.org/project/gradio/)
|
| 56 |
+
- [`huggingface-hub`](https://pypi.org/project/huggingface-hub/)
|
| 57 |
+
- [`transformers`](https://pypi.org/project/transformers/)
|
| 58 |
+
- [`accelerate`](https://pypi.org/project/accelerate/)
|
| 59 |
+
- [`langgraph`](https://pypi.org/project/langgraph/)
|
| 60 |
+
- [`langchain-community`](https://pypi.org/project/langchain-community/)
|
| 61 |
+
- [`langchain-core`](https://pypi.org/project/langchain-core/)
|
| 62 |
+
- [`spaces`](https://pypi.org/project/spaces/)
|
| 63 |
+
|
| 64 |
+
## 开发环境及资源
|
| 65 |
+
- **平台:** HuggingFace Spaces
|
| 66 |
+
- **订阅:** HuggingFace Pro
|
| 67 |
+
- **推理资源:** 可以使用 ZeroGPU
|
| 68 |
+
- **文档参考:** 在必要的时候,主动搜索 HuggingFace 以及 Gradio 的在线 API 文档。
|
README.md
CHANGED
|
@@ -11,6 +11,9 @@ hf_oauth: true
|
|
| 11 |
hf_oauth_scopes:
|
| 12 |
- inference-api
|
| 13 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
| 14 |
---
|
| 15 |
|
| 16 |
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
|
|
|
|
| 11 |
hf_oauth_scopes:
|
| 12 |
- inference-api
|
| 13 |
license: apache-2.0
|
| 14 |
+
models:
|
| 15 |
+
- inclusionAI/Ring-mini-2.0
|
| 16 |
+
- inclusionAI/Ling-flash-2.0
|
| 17 |
---
|
| 18 |
|
| 19 |
An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
|
app.py
CHANGED
|
@@ -3,6 +3,19 @@ import spaces
|
|
| 3 |
import torch
|
| 4 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
# 只加载一次模型和分词器
|
| 7 |
MODEL_NAME = "inclusionAI/Ring-mini-2.0"
|
| 8 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
@@ -13,66 +26,75 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 13 |
trust_remote_code=True
|
| 14 |
).to(device)
|
| 15 |
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
| 20 |
-
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
last_role = None
|
| 33 |
-
for turn in history:
|
| 34 |
-
if turn.get("role") == "user":
|
| 35 |
-
prompt += f"User: {turn['content']}\n"
|
| 36 |
-
last_role = "user"
|
| 37 |
-
elif turn.get("role") == "assistant":
|
| 38 |
-
prompt += f"Assistant: {turn['content']}\n"
|
| 39 |
-
last_role = "assistant"
|
| 40 |
-
prompt += f"User: {message}\nAssistant:"
|
| 41 |
|
| 42 |
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
|
| 43 |
output_ids = model.generate(
|
| 44 |
input_ids,
|
| 45 |
-
max_new_tokens=
|
| 46 |
-
temperature=temperature,
|
| 47 |
-
top_p=top_p,
|
| 48 |
do_sample=True,
|
| 49 |
pad_token_id=tokenizer.eos_token_id,
|
| 50 |
)
|
| 51 |
output = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 58 |
|
| 59 |
-
|
| 60 |
-
For information on how to customize the ChatInterface, peruse the gradio docs: https://www.gradio.app/docs/chatinterface
|
| 61 |
-
"""
|
| 62 |
chatbot = gr.ChatInterface(
|
| 63 |
respond,
|
| 64 |
-
type="messages",
|
| 65 |
additional_inputs=[
|
| 66 |
gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
|
| 67 |
-
gr.Slider(minimum=1, maximum=2048, value=512, step=1, label="Max new tokens"),
|
| 68 |
-
gr.Slider(minimum=0.1, maximum=4.0, value=0.7, step=0.1, label="Temperature"),
|
| 69 |
-
gr.Slider(
|
| 70 |
-
minimum=0.1,
|
| 71 |
-
maximum=1.0,
|
| 72 |
-
value=0.95,
|
| 73 |
-
step=0.05,
|
| 74 |
-
label="Top-p (nucleus sampling)",
|
| 75 |
-
),
|
| 76 |
],
|
| 77 |
)
|
| 78 |
|
|
|
|
| 3 |
import torch
|
| 4 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 5 |
|
| 6 |
+
import operator
|
| 7 |
+
from typing import Annotated, Literal
|
| 8 |
+
from typing_extensions import TypedDict
|
| 9 |
+
|
| 10 |
+
from langchain_core.messages import AIMessage, AnyMessage, SystemMessage, HumanMessage, ToolMessage
|
| 11 |
+
from langgraph.graph import StateGraph, END
|
| 12 |
+
|
| 13 |
+
|
| 14 |
+
# 定义图的状态
|
| 15 |
+
class GraphState(TypedDict):
|
| 16 |
+
messages: Annotated[list[AnyMessage], operator.add]
|
| 17 |
+
|
| 18 |
+
|
| 19 |
# 只加载一次模型和分词器
|
| 20 |
MODEL_NAME = "inclusionAI/Ring-mini-2.0"
|
| 21 |
device = "cuda" if torch.cuda.is_available() else "cpu"
|
|
|
|
| 26 |
trust_remote_code=True
|
| 27 |
).to(device)
|
| 28 |
|
| 29 |
+
|
| 30 |
+
# 定义图的节点
|
| 31 |
+
def call_model(state: GraphState):
|
| 32 |
+
"""模型调用节点"""
|
| 33 |
+
messages = state["messages"]
|
| 34 |
+
|
| 35 |
+
# 拼接 prompt
|
| 36 |
+
prompt = ""
|
| 37 |
+
for msg in messages:
|
| 38 |
+
if msg.type == "system":
|
| 39 |
+
prompt += f"{msg.content}\n"
|
| 40 |
+
elif msg.type == "human":
|
| 41 |
+
prompt += f"User: {msg.content}\n"
|
| 42 |
+
elif msg.type == "ai":
|
| 43 |
+
prompt += f"Assistant: {msg.content}\n"
|
| 44 |
+
prompt += "Assistant:"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(device)
|
| 47 |
output_ids = model.generate(
|
| 48 |
input_ids,
|
| 49 |
+
max_new_tokens=512, # 暂时硬编码
|
|
|
|
|
|
|
| 50 |
do_sample=True,
|
| 51 |
pad_token_id=tokenizer.eos_token_id,
|
| 52 |
)
|
| 53 |
output = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True)
|
| 54 |
+
|
| 55 |
+
return {"messages": [AIMessage(content=output)]}
|
| 56 |
+
|
| 57 |
+
# 构建图
|
| 58 |
+
workflow = StateGraph(GraphState)
|
| 59 |
+
workflow.add_node("llm", call_model)
|
| 60 |
+
workflow.set_entry_point("llm")
|
| 61 |
+
workflow.add_edge("llm", END)
|
| 62 |
+
|
| 63 |
+
# 编译图
|
| 64 |
+
app = workflow.compile()
|
| 65 |
+
@spaces.GPU
|
| 66 |
+
def respond(message, history, system_message, hf_token: gr.OAuthToken = None):
|
| 67 |
+
"""Gradio 接口的响应函数,调用 LangGraph 应用"""
|
| 68 |
+
|
| 69 |
+
# 将 Gradio 的 history 格式转换为 LangChain 消息格式
|
| 70 |
+
messages = []
|
| 71 |
+
if system_message:
|
| 72 |
+
messages.append(SystemMessage(content=system_message))
|
| 73 |
+
|
| 74 |
+
for turn in history:
|
| 75 |
+
user_message, bot_message = turn
|
| 76 |
+
if user_message:
|
| 77 |
+
messages.append(HumanMessage(content=user_message))
|
| 78 |
+
if bot_message:
|
| 79 |
+
messages.append(AIMessage(content=bot_message))
|
| 80 |
+
|
| 81 |
+
messages.append(HumanMessage(content=message))
|
| 82 |
|
| 83 |
+
# 使用 invoke 方法进行一次性调用
|
| 84 |
+
inputs = {"messages": messages}
|
| 85 |
+
final_state = app.invoke(inputs)
|
| 86 |
+
|
| 87 |
+
# 从最终状态中提取最后一条消息
|
| 88 |
+
final_response = final_state["messages"][-1].content
|
| 89 |
+
|
| 90 |
+
return final_response
|
| 91 |
|
| 92 |
+
# 重新定义 ChatInterface
|
|
|
|
|
|
|
| 93 |
chatbot = gr.ChatInterface(
|
| 94 |
respond,
|
| 95 |
+
type="messages", # 改为 messages 类型以更好地匹配 LangChain
|
| 96 |
additional_inputs=[
|
| 97 |
gr.Textbox(value="You are a friendly Chatbot.", label="System message"),
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 98 |
],
|
| 99 |
)
|
| 100 |
|
requirements.txt
CHANGED
|
@@ -2,3 +2,7 @@ gradio
|
|
| 2 |
huggingface-hub
|
| 3 |
transformers
|
| 4 |
accelerate
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
huggingface-hub
|
| 3 |
transformers
|
| 4 |
accelerate
|
| 5 |
+
langgraph
|
| 6 |
+
langchain_community
|
| 7 |
+
langchain_core
|
| 8 |
+
spaces
|