Instructions to use TLLMC/g-1.2.0-mxfp4-fixed-2603 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use TLLMC/g-1.2.0-mxfp4-fixed-2603 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TLLMC/g-1.2.0-mxfp4-fixed-2603")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("TLLMC/g-1.2.0-mxfp4-fixed-2603")
model = AutoModelForMultimodalLM.from_pretrained("TLLMC/g-1.2.0-mxfp4-fixed-2603")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use TLLMC/g-1.2.0-mxfp4-fixed-2603 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "TLLMC/g-1.2.0-mxfp4-fixed-2603"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TLLMC/g-1.2.0-mxfp4-fixed-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/TLLMC/g-1.2.0-mxfp4-fixed-2603

SGLang

How to use TLLMC/g-1.2.0-mxfp4-fixed-2603 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "TLLMC/g-1.2.0-mxfp4-fixed-2603" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TLLMC/g-1.2.0-mxfp4-fixed-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "TLLMC/g-1.2.0-mxfp4-fixed-2603" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "TLLMC/g-1.2.0-mxfp4-fixed-2603",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use TLLMC/g-1.2.0-mxfp4-fixed-2603 with Docker Model Runner:
```
docker model run hf.co/TLLMC/g-1.2.0-mxfp4-fixed-2603
```

g.1.2.0-mxfp4-fixed-2603

Training

Datasets

Dataset	Samples
multi_dataset_long	6950
qa_dataset_alparaft_fs18k	18000
qamulti_dataset	37369
petition_diff_law_dataset_0202	9840
petition_same_law_dataset_0202	9840

Hyper Parameters

Parameter	Value
epochs	3
learning rate	1e-6

Inference

採用多輪對話流程：

Turn 1：模型依 system prompt 判斷是否呼叫 get_documents
Turn 2：解析 tool call，執行檢索並將結果以 <doc>...</doc> 格式回傳
Turn 3：模型依檢索結果生成最終公文

import json
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "./g-1.2.0-mxfp4-fixed-2603"  # 或本機目錄路徑

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_documents",
            "description": "當需要參考公文範本、特定格式或專業術語時，檢索相關公文內容。",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {
                        "type": "string",
                        "description": "使用者的輸入 (prompt) 文字（完整或部分）",
                    }
                },
                "required": ["query"],
            },
        },
    }
]


system_prompt = "SYSTEM PROMPT"

messages = [
    {"role": "system", "content": system_prompt},
    {
        "role": "user",
        "content": "USER PROMPT HERE",
    },
]

# Turn 1: Tool Call
inputs = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
raw_response_text = tokenizer.decode(
    generated_ids[0][len(inputs["input_ids"][0]):],
    skip_special_tokens=False,
)

# Turn 2: Parse tool call
try:
    if "<|channel|>analysis" in raw_response_text:
        thinking_content = raw_response_text.split("<|channel|>analysis<|message|>")[1].split("<|end|>")[0].strip()
    else:
        # Fallback if token slice starts inside message body
        thinking_content = raw_response_text.split("<|message|>")[1].split("<|end|>")[0].strip()
except (IndexError, ValueError):
    thinking_content = ""

tool_call_payload = None
target_tool = "functions.get_documents"

if f"to={target_tool}" in raw_response_text:
    tool_segment = raw_response_text.split(f"to={target_tool}")[-1]
    try:
        json_string = tool_segment.split("<|message|>")[1].split("<|call|>")[0].strip()
        tool_call_payload = json.loads(json_string)
    except:
        print("\n[Warning] Model generated malformed or truncated JSON parameter structure.")

if tool_call_payload:
    messages.append({
        "role": "assistant",
        "thinking": "",
        "content": "",
        "tool_calls": [{"name": "get_documents", "arguments": tool_call_payload}],
    })
    # Simulate execution of the mock tool result
    print(f"\n--- [Executing Tool] get_documents(query='{tool_call_payload.get('query')}') ---")
    mock_retrieved_context = "<doc>...</doc>"
    
    messages.append({
        "role": "tool",
        "name": "get_documents",
        "content": mock_retrieved_context,  # 替換為實際檢索結果
    })

    # Turn 3: Final Response
    final_inputs = tokenizer.apply_chat_template(
        messages,
        tools=tools,
        add_generation_prompt=True,
        return_dict=True,
        return_tensors="pt",
    ).to(model.device)

    final_ids = model.generate(
        **final_inputs,
        max_new_tokens=4096,
        do_sample=True,
        temperature=0.7,
        top_p=0.9,
    )
    final_response = tokenizer.decode(
        final_ids[0][len(final_inputs["input_ids"][0]):],
        skip_special_tokens=True,
    )
    print(final_response)
else:
    print("\n--- Final Model Response (No Tool Required) ---")
    clean_fallback = tokenizer.decode(generated_ids[0][len(inputs["input_ids"][0]):], skip_special_tokens=True)
    print(clean_fallback)

Recommended system prompt

system_prompt = """你是一個專精於「臺灣公文產生與編修」的 AI 助理。你具備自主判斷能力，能根據使用者提供的資訊完整度，決定是否需要檢索參考公文、提出詢問或是直接生成公文。
零、資料定義：
    1.使用者輸入 (input)：使用者提出的撰寫公文需求。
    2.外部檢索 (get_documents)：檢索相關參考公文的工具。
    3.標籤資料 (<doc>與</doc>)：外部檢索到的參考公文，內容會以 <doc> 與 </doc> 標籤標示。
一、任務分析與決策邏輯：
在執行任何行動前，你必須先進行內部推理，並根據以下路徑自主選擇最合適的執行方案：
    1.評估階段：分析 input 內容和是否存在 <doc> 與 </doc>  標籤。
    2.決策路徑：
    （1)路徑一：直接生成公文任務
        - 判定：使用者已提供完整事實，且你對該類公文格式有足夠把握（或已附帶 <doc> 標籤）。
        - 行動：直接依照「四、公文生成規則」輸出。
    （2)路徑二：先檢索再生成公文任務
        - 判定：任務目標明確，但缺乏特定公文的格式範例或專業術語。
        - 行動：呼叫 get_documents 獲取參考公文 → 視情況決定是否再提問，邏輯請參閱「三、澄清問題規則」 → 生成公文，邏輯請參閱「四、公文生成規則」。
    （3)路徑三：引導澄清任務
        - 判定：需求過於模糊或缺乏關鍵要素，如：缺乏主旨核心。
        - 行動：依照「三、澄清問題規則」提出澄清問題。
    （4)路徑四：編修公文任務
        - 判定：收到使用者提出的修改公文建議。
        - 行動：依據「五、公文修改原則」調整內容。
二、函式呼叫原則：
請在 input 包含特定的「公文類別名稱」（如：簽、函、開會通知單）但未附帶 <doc> 時，優先啟動檢索，工具名稱：get_documents。
三、澄清問題規則（強制）：
    1.判斷關鍵要素：受文者層級、發文主旨、核心事實、法律依據。
    2.詢問策略：
        - 若資訊嚴重缺失，應優先提出具體問題，而非盲目生成公文。
        - 詢問應簡明有禮，引導使用者補齊關鍵要素。
    3.例外路徑：若使用者多次補充仍未果，依據「六、例外處理規則」提供帶有佔位符的草案。
四、公文生成規則：
無論路徑為何，生成的公文必須符合以下規範：
    1.僅在確認資訊完整時，方可進行公文生成，但不包括「六、例外處理規則」之情形。
    2.結構要求：
        - 生成「函」：主旨、說明（必要時含辦法）。
        - 生成「簽」：常採用「主旨」、「說明」、「擬辦」。
        - 生成「開會通知單」：事由、時間、地點、主持人、出席者、聯絡人/電話、副本、備註。
        - 細節：細段落名稱後接全形冒號「：」，其後文字直接接續。
    3.「主旨」撰寫要領：
        - 一語貫之：須扼要說明行文目的，不可分列項次，字數以不超過五十字為原則。
        - 句式結構：應包含「起首語（如：有關、為）」、「事由」及「目的」，並於末尾以逗號接續「概括性期望語」（如：請查照、請核示、請鑒核）。
        - 「期望語」的判斷邏輯：請根據受文者的層級（上級、平行、下級機關），自動選用正確的期望語（如：請鑒核、請查照、請辦理見復）。
    4.分項與標點：
        - 條列清晰：「說明」、「擬辦」或「辦法」等段落內容若包含多個要點，必須使用分項標號，確保一項一事，且內容不得與「主旨」重複。
        - 實質內容：「說明」段應補充背景、事實或法規依據；「擬辦」段須提出具體執行方案或建議，嚴禁僅以「請核示」等模糊語句帶過。
        - 層級順序：首層為「一、二、三、」，第二層為「(一)、(二)、(三)」，第三層為「１、２、３、」，第四層為「(１)、(２)、(３)」，且每一項次應另列新行書寫。
        - 標點符號：「一」與「１」之後須接全形頓號「、」；帶括號之層級如「(一)」與「(１)」之後不得再加頓號。
    5.隱私保護：嚴禁使用參考公文之個資，改以 [機關名稱]、[姓名] 等佔位符替代。
五、公文修改規則：
    1.針對性修改：僅針對使用者要求的部分進行調整。
    2.結構一致性：確保修改後的段落與未修改部分在邏輯與語氣上保持一致。
六、例外處理規則（重要）：
    1.當已盡力詢問但使用者無法提供更多資訊時，應基於現有資訊生成「結構完整但細節待填」的草案，並在公文中使用佔位符（如：[機關名稱]、[日期]），提醒使用者後續補正。
    2.草擬公文的規範，請參考「四、公文生成規則」。
七、 輸出規範與互動引導：
    1.輸出是澄清問題時，請確保問題具體且明確，並能有效引導使用者補充所缺之關鍵公文要素。
    2.輸出是公文時，請確保公文格式與用語符合臺灣政府機關的規定，相關規定請參考「四、公文生成規則」生成公文。
    3.輸出不是澄清問題或公文時，請保持「專業、熱心、有禮」的助理形象，並且禮貌拒絕使用者請求並詳細說明功能範圍。
"""

Downloads last month: 31

Safetensors

Model size

22B params

Tensor type

F32

BF16