Neon-AI commited on Jan 15

Commit

1d76220

verified ·

1 Parent(s): 8f296f3

Update Niche model with greetings fine-tune

Browse files

Files changed (25) hide show

README.md +203 -207
adapter_config.json +46 -0
adapter_model.safetensors +3 -0
added_tokens.json +28 -0
chat_template.jinja +61 -0
checkpoint-10/README.md +207 -0
checkpoint-10/adapter_config.json +46 -0
checkpoint-10/adapter_model.safetensors +3 -0
checkpoint-10/optimizer.pt +3 -0
checkpoint-10/rng_state.pth +3 -0
checkpoint-10/scheduler.pt +3 -0
checkpoint-10/trainer_state.json +104 -0
checkpoint-10/training_args.bin +3 -0
checkpoint-20/README.md +207 -0
checkpoint-20/adapter_config.json +46 -0
checkpoint-20/adapter_model.safetensors +3 -0
checkpoint-20/optimizer.pt +3 -0
checkpoint-20/rng_state.pth +3 -0
checkpoint-20/scheduler.pt +3 -0
checkpoint-20/trainer_state.json +174 -0
checkpoint-20/training_args.bin +3 -0
merges.txt +1 -0
special_tokens_map.json +31 -0
tokenizer.json +2 -2
tokenizer_config.json +238 -238

README.md CHANGED Viewed

@@ -1,211 +1,207 @@
 ---
-library_name: transformers
-license: apache-2.0
-license_link: https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507/blob/main/LICENSE
 pipeline_tag: text-generation
 ---
-# Qwen3-4B-Instruct-2507
-<a href="https://chat.qwen.ai" target="_blank" style="margin: 2px;">
-    <img alt="Chat" src="https://img.shields.io/badge/%F0%9F%92%9C%EF%B8%8F%20Qwen%20Chat%20-536af5" style="display: inline-block; vertical-align: middle;"/>
-</a>
-## Highlights
-We introduce the updated version of the **Qwen3-4B non-thinking mode**, named **Qwen3-4B-Instruct-2507**, featuring the following key enhancements:
-- **Significant improvements** in general capabilities, including **instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage**.
-- **Substantial gains** in long-tail knowledge coverage across **multiple languages**.
-- **Markedly better alignment** with user preferences in **subjective and open-ended tasks**, enabling more helpful responses and higher-quality text generation.
-- **Enhanced capabilities** in **256K long-context understanding**.
-![image/jpeg](https://qianwen-res.oss-accelerate.aliyuncs.com/Qwen3-2507/Qwen3-4B-Instruct.001.jpeg)
-## Model Overview
-**Qwen3-4B-Instruct-2507** has the following features:
-- Type: Causal Language Models
-- Training Stage: Pretraining & Post-training
-- Number of Parameters: 4.0B
-- Number of Paramaters (Non-Embedding): 3.6B
-- Number of Layers: 36
-- Number of Attention Heads (GQA): 32 for Q and 8 for KV
-- Context Length: **262,144 natively**.
-**NOTE: This model supports only non-thinking mode and does not generate ``<think></think>`` blocks in its output. Meanwhile, specifying `enable_thinking=False` is no longer required.**
-For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [blog](https://qwenlm.github.io/blog/qwen3/), [GitHub](https://github.com/QwenLM/Qwen3), and [Documentation](https://qwen.readthedocs.io/en/latest/).
-## Performance
-|  | GPT-4.1-nano-2025-04-14 | Qwen3-30B-A3B Non-Thinking | Qwen3-4B Non-Thinking | Qwen3-4B-Instruct-2507 |
-|--- | --- | --- | --- | --- |
-| **Knowledge** | | | |
-| MMLU-Pro | 62.8 | 69.1 | 58.0 | **69.6** |
-| MMLU-Redux | 80.2 | 84.1 | 77.3 | **84.2** |
-| GPQA | 50.3 | 54.8 | 41.7 | **62.0** |
-| SuperGPQA | 32.2 | 42.2 | 32.0 | **42.8** |
-| **Reasoning** | | | |
-| AIME25 | 22.7 | 21.6 | 19.1 | **47.4** |
-| HMMT25 | 9.7 | 12.0 | 12.1 | **31.0** |
-| ZebraLogic | 14.8 | 33.2 | 35.2 | **80.2** |
-| LiveBench 20241125 | 41.5 | 59.4 | 48.4 | **63.0** |
-| **Coding** | | | |
-| LiveCodeBench v6 (25.02-25.05) | 31.5 | 29.0 | 26.4 | **35.1** |
-| MultiPL-E | 76.3 | 74.6 | 66.6 | **76.8** |
-| Aider-Polyglot |  9.8 | **24.4** | 13.8 | 12.9 |
-| **Alignment** | | | |
-| IFEval | 74.5 | **83.7** | 81.2 | 83.4 |
-| Arena-Hard v2* | 15.9 | 24.8 | 9.5 | **43.4** |
-| Creative Writing v3 | 72.7 | 68.1 | 53.6 | **83.5** |
-| WritingBench | 66.9 | 72.2 | 68.5 | **83.4** |
-| **Agent** | | | |
-| BFCL-v3 | 53.0 | 58.6 | 57.6 | **61.9** |
-| TAU1-Retail | 23.5 | 38.3 | 24.3 | **48.7** |
-| TAU1-Airline | 14.0 | 18.0 | 16.0 | **32.0** |
-| TAU2-Retail | - | 31.6 | 28.1 | **40.4** |
-| TAU2-Airline | - | 18.0 | 12.0 | **24.0** |
-| TAU2-Telecom | - | **18.4** | 17.5 | 13.2 |
-| **Multilingualism** | | | |
-| MultiIF | 60.7 | **70.8** | 61.3 | 69.0 |
-| MMLU-ProX | 56.2 | **65.1** | 49.6 | 61.6 |
-| INCLUDE | 58.6 | **67.8** | 53.8 | 60.1 |
-| PolyMATH | 15.6 | 23.3 | 16.6 | **31.1** |
-*: For reproducibility, we report the win rates evaluated by GPT-4.1.
-## Quickstart
-The code of Qwen3 has been in the latest Hugging Face `transformers` and we advise you to use the latest version of `transformers`.
-With `transformers<4.51.0`, you will encounter the following error:
-```
-KeyError: 'qwen3'
-```
-The following contains a code snippet illustrating how to use the model generate content based on given inputs.
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-model_name = "Qwen/Qwen3-4B-Instruct-2507"
-# load the tokenizer and the model
-tokenizer = AutoTokenizer.from_pretrained(model_name)
-model = AutoModelForCausalLM.from_pretrained(
-    model_name,
-    torch_dtype="auto",
-    device_map="auto"
-)
-# prepare the model input
-prompt = "Give me a short introduction to large language model."
-messages = [
-    {"role": "user", "content": prompt}
-]
-text = tokenizer.apply_chat_template(
-    messages,
-    tokenize=False,
-    add_generation_prompt=True,
-)
-model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
-# conduct text completion
-generated_ids = model.generate(
-    **model_inputs,
-    max_new_tokens=16384
-)
-output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
-content = tokenizer.decode(output_ids, skip_special_tokens=True)
-print("content:", content)
-```
-For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.5` or to create an OpenAI-compatible API endpoint:
-- SGLang:
-    ```shell
-    python -m sglang.launch_server --model-path Qwen/Qwen3-4B-Instruct-2507 --context-length 262144
-    ```
-- vLLM:
-    ```shell
-    vllm serve Qwen/Qwen3-4B-Instruct-2507 --max-model-len 262144
-    ```
-**Note: If you encounter out-of-memory (OOM) issues, consider reducing the context length to a shorter value, such as `32,768`.**
-For local use, applications such as Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers have also supported Qwen3.
-## Agentic Use
-Qwen3 excels in tool calling capabilities. We recommend using [Qwen-Agent](https://github.com/QwenLM/Qwen-Agent) to make the best use of agentic ability of Qwen3. Qwen-Agent encapsulates tool-calling templates and tool-calling parsers internally, greatly reducing coding complexity.
-To define the available tools, you can use the MCP configuration file, use the integrated tool of Qwen-Agent, or integrate other tools by yourself.
-```python
-from qwen_agent.agents import Assistant
-# Define LLM
-llm_cfg = {
-    'model': 'Qwen3-4B-Instruct-2507',
-    # Use a custom endpoint compatible with OpenAI API:
-    'model_server': 'http://localhost:8000/v1',  # api_base
-    'api_key': 'EMPTY',
-}
-# Define Tools
-tools = [
-    {'mcpServers': {  # You can specify the MCP configuration file
-            'time': {
-                'command': 'uvx',
-                'args': ['mcp-server-time', '--local-timezone=Asia/Shanghai']
-            },
-            "fetch": {
-                "command": "uvx",
-                "args": ["mcp-server-fetch"]
-            }
-        }
-    },
-  'code_interpreter',  # Built-in tools
-]
-# Define Agent
-bot = Assistant(llm=llm_cfg, function_list=tools)
-# Streaming generation
-messages = [{'role': 'user', 'content': 'https://qwenlm.github.io/blog/ Introduce the latest developments of Qwen'}]
-for responses in bot.run(messages=messages):
-    pass
-print(responses)
-```
-## Best Practices
-To achieve optimal performance, we recommend the following settings:
-1. **Sampling Parameters**:
-   - We suggest using `Temperature=0.7`, `TopP=0.8`, `TopK=20`, and `MinP=0`.
-   - For supported frameworks, you can adjust the `presence_penalty` parameter between 0 and 2 to reduce endless repetitions. However, using a higher value may occasionally result in language mixing and a slight decrease in model performance.
-2. **Adequate Output Length**: We recommend using an output length of 16,384 tokens for most queries, which is adequate for instruct models.
-3. **Standardize Output Format**: We recommend using prompts to standardize model outputs when benchmarking.
-   - **Math Problems**: Include "Please reason step by step, and put your final answer within \boxed{}." in the prompt.
-   - **Multiple-Choice Questions**: Add the following JSON structure to the prompt to standardize responses: "Please show your choice in the `answer` field with only the choice letter, e.g., `"answer": "C"`."
-### Citation
-If you find our work helpful, feel free to give us a cite.
-```
-@misc{qwen3technicalreport,
-      title={Qwen3 Technical Report},
-      author={Qwen Team},
-      year={2025},
-      eprint={2505.09388},
-      archivePrefix={arXiv},
-      primaryClass={cs.CL},
-      url={https://arxiv.org/abs/2505.09388},
-}
-```

 ---
+base_model: Neon-AI/Niche
+library_name: peft
 pipeline_tag: text-generation
+tags:
+- base_model:adapter:Neon-AI/Niche
+- lora
+- transformers
 ---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Neon-AI/Niche",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14d4c19443276334127b419261632c0abdc7276ba8364d61a8646750a09af6bb
+size 132187888

added_tokens.json ADDED Viewed

	@@ -0,0 +1,28 @@

+{
+  "</think>": 151668,
+  "</tool_call>": 151658,
+  "</tool_response>": 151666,
+  "<think>": 151667,
+  "<tool_call>": 151657,
+  "<tool_response>": 151665,
+  "<|box_end|>": 151649,
+  "<|box_start|>": 151648,
+  "<|endoftext|>": 151643,
+  "<|file_sep|>": 151664,
+  "<|fim_middle|>": 151660,
+  "<|fim_pad|>": 151662,
+  "<|fim_prefix|>": 151659,
+  "<|fim_suffix|>": 151661,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644,
+  "<|image_pad|>": 151655,
+  "<|object_ref_end|>": 151647,
+  "<|object_ref_start|>": 151646,
+  "<|quad_end|>": 151651,
+  "<|quad_start|>": 151650,
+  "<|repo_name|>": 151663,
+  "<|video_pad|>": 151656,
+  "<|vision_end|>": 151653,
+  "<|vision_pad|>": 151654,
+  "<|vision_start|>": 151652
+}

chat_template.jinja ADDED Viewed

	@@ -0,0 +1,61 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0].role == 'system' %}
+        {{- messages[0].content + '\n\n' }}
+    {%- endif %}
+    {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0].role == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if message.content is string %}
+        {%- set content = message.content %}
+    {%- else %}
+        {%- set content = '' %}
+    {%- endif %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
+        {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role + '\n' + content }}
+        {%- if message.tool_calls %}
+            {%- for tool_call in message.tool_calls %}
+                {%- if (loop.first and content) or (not loop.first) %}
+                    {{- '\n' }}
+                {%- endif %}
+                {%- if tool_call.function %}
+                    {%- set tool_call = tool_call.function %}
+                {%- endif %}
+                {{- '<tool_call>\n{"name": "' }}
+                {{- tool_call.name }}
+                {{- '", "arguments": ' }}
+                {%- if tool_call.arguments is string %}
+                    {{- tool_call.arguments }}
+                {%- else %}
+                    {{- tool_call.arguments | tojson }}
+                {%- endif %}
+                {{- '}\n</tool_call>' }}
+            {%- endfor %}
+        {%- endif %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-10/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Neon-AI/Niche
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Neon-AI/Niche
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-10/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Neon-AI/Niche",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-10/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a2b41ed61ea2d4b288773fd3f63816d7ffe17aa5214daa67d0283d8dac0fde5c
+size 132187888

checkpoint-10/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1972d426f77cef87dbd1f8d83029c2dc350737162bd666056890e9a83736341f
+size 264673227

checkpoint-10/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0cdc5bcf9cd9583032cb743ed885311dab8fb413e47d5654e4037fcbbda58dbb
+size 14645

checkpoint-10/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:56d310dc7744488422d2e08282f176a0d7bd1c485c1f31624b33358a92664bf0
+size 1465

checkpoint-10/trainer_state.json ADDED Viewed

	@@ -0,0 +1,104 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 10,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.1,
+      "grad_norm": 34.125633239746094,
+      "learning_rate": 2e-05,
+      "loss": 11.2316,
+      "step": 1
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 34.276371002197266,
+      "learning_rate": 1.9e-05,
+      "loss": 9.836,
+      "step": 2
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 47.07984161376953,
+      "learning_rate": 1.8e-05,
+      "loss": 10.0343,
+      "step": 3
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 31.75088119506836,
+      "learning_rate": 1.7e-05,
+      "loss": 8.055,
+      "step": 4
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 31.298851013183594,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 8.1529,
+      "step": 5
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 30.328794479370117,
+      "learning_rate": 1.5000000000000002e-05,
+      "loss": 7.529,
+      "step": 6
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 28.938907623291016,
+      "learning_rate": 1.4e-05,
+      "loss": 7.5455,
+      "step": 7
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 33.366058349609375,
+      "learning_rate": 1.3000000000000001e-05,
+      "loss": 6.1616,
+      "step": 8
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 30.850677490234375,
+      "learning_rate": 1.2e-05,
+      "loss": 6.135,
+      "step": 9
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 31.53157615661621,
+      "learning_rate": 1.1000000000000001e-05,
+      "loss": 5.5152,
+      "step": 10
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 20,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 28159043174400.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-10/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2bc9520b9f3b60d7c5071648180b9eb602a0403c736a570fed89ae92ba4318bf
+size 5777

checkpoint-20/README.md ADDED Viewed

	@@ -0,0 +1,207 @@

+---
+base_model: Neon-AI/Niche
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:Neon-AI/Niche
+- lora
+- transformers
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.18.0

checkpoint-20/adapter_config.json ADDED Viewed

	@@ -0,0 +1,46 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "Neon-AI/Niche",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_bias": false,
+  "lora_dropout": 0.1,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.18.0",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "up_proj",
+    "k_proj",
+    "o_proj",
+    "v_proj",
+    "down_proj",
+    "gate_proj",
+    "q_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-20/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:14d4c19443276334127b419261632c0abdc7276ba8364d61a8646750a09af6bb
+size 132187888

checkpoint-20/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8bbfbe5e0f75792512ed9cdd0bc709fb70b65573ff028e8bacf51708a1705d63
+size 264673227

checkpoint-20/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0d54da60f78689c90caf5b819152e9fed757ed4cd6b31c338b5c3fae97ed0a40
+size 14645

checkpoint-20/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb5420d573129b182fabd13c1b58758649a399d991b05570cc2c5d7e8dfff195
+size 1465

checkpoint-20/trainer_state.json ADDED Viewed

	@@ -0,0 +1,174 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 2.0,
+  "eval_steps": 500,
+  "global_step": 20,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.1,
+      "grad_norm": 34.125633239746094,
+      "learning_rate": 2e-05,
+      "loss": 11.2316,
+      "step": 1
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 34.276371002197266,
+      "learning_rate": 1.9e-05,
+      "loss": 9.836,
+      "step": 2
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 47.07984161376953,
+      "learning_rate": 1.8e-05,
+      "loss": 10.0343,
+      "step": 3
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 31.75088119506836,
+      "learning_rate": 1.7e-05,
+      "loss": 8.055,
+      "step": 4
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 31.298851013183594,
+      "learning_rate": 1.6000000000000003e-05,
+      "loss": 8.1529,
+      "step": 5
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 30.328794479370117,
+      "learning_rate": 1.5000000000000002e-05,
+      "loss": 7.529,
+      "step": 6
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 28.938907623291016,
+      "learning_rate": 1.4e-05,
+      "loss": 7.5455,
+      "step": 7
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 33.366058349609375,
+      "learning_rate": 1.3000000000000001e-05,
+      "loss": 6.1616,
+      "step": 8
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 30.850677490234375,
+      "learning_rate": 1.2e-05,
+      "loss": 6.135,
+      "step": 9
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 31.53157615661621,
+      "learning_rate": 1.1000000000000001e-05,
+      "loss": 5.5152,
+      "step": 10
+    },
+    {
+      "epoch": 1.1,
+      "grad_norm": 31.38231086730957,
+      "learning_rate": 1e-05,
+      "loss": 4.8192,
+      "step": 11
+    },
+    {
+      "epoch": 1.2,
+      "grad_norm": 28.06603240966797,
+      "learning_rate": 9e-06,
+      "loss": 4.9797,
+      "step": 12
+    },
+    {
+      "epoch": 1.3,
+      "grad_norm": 31.240188598632812,
+      "learning_rate": 8.000000000000001e-06,
+      "loss": 4.3032,
+      "step": 13
+    },
+    {
+      "epoch": 1.4,
+      "grad_norm": 34.96710205078125,
+      "learning_rate": 7e-06,
+      "loss": 4.708,
+      "step": 14
+    },
+    {
+      "epoch": 1.5,
+      "grad_norm": 29.90130615234375,
+      "learning_rate": 6e-06,
+      "loss": 4.4012,
+      "step": 15
+    },
+    {
+      "epoch": 1.6,
+      "grad_norm": 33.620018005371094,
+      "learning_rate": 5e-06,
+      "loss": 4.2453,
+      "step": 16
+    },
+    {
+      "epoch": 1.7,
+      "grad_norm": 30.06698226928711,
+      "learning_rate": 4.000000000000001e-06,
+      "loss": 3.9319,
+      "step": 17
+    },
+    {
+      "epoch": 1.8,
+      "grad_norm": 38.18107223510742,
+      "learning_rate": 3e-06,
+      "loss": 3.6368,
+      "step": 18
+    },
+    {
+      "epoch": 1.9,
+      "grad_norm": 32.306827545166016,
+      "learning_rate": 2.0000000000000003e-06,
+      "loss": 3.7206,
+      "step": 19
+    },
+    {
+      "epoch": 2.0,
+      "grad_norm": 28.944194793701172,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 3.6091,
+      "step": 20
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 20,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 56318086348800.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-20/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2bc9520b9f3b60d7c5071648180b9eb602a0403c736a570fed89ae92ba4318bf
+size 5777

merges.txt CHANGED Viewed

@@ -1,3 +1,4 @@
 Ġ Ġ
 ĠĠ ĠĠ
 i n

+#version: 0.2
 Ġ Ġ
 ĠĠ ĠĠ
 i n

special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

tokenizer.json CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aeb13307a71acd8fe81861d94ad54ab689df773318809eed3cbe794b4492dae4
-size 11422654

 version https://git-lfs.github.com/spec/v1
+oid sha256:7aa8e51ee5b61c63907b4f663fb1da4fde8d7f17b5db6d3fc9bd9c25e068d44a
+size 11422932

tokenizer_config.json CHANGED Viewed

@@ -1,239 +1,239 @@
 {
-    "add_prefix_space": false,
-    "added_tokens_decoder": {
-        "151643": {
-            "content": "<|endoftext|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151644": {
-            "content": "<|im_start|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151645": {
-            "content": "<|im_end|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151646": {
-            "content": "<|object_ref_start|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151647": {
-            "content": "<|object_ref_end|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151648": {
-            "content": "<|box_start|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151649": {
-            "content": "<|box_end|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151650": {
-            "content": "<|quad_start|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151651": {
-            "content": "<|quad_end|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151652": {
-            "content": "<|vision_start|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151653": {
-            "content": "<|vision_end|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151654": {
-            "content": "<|vision_pad|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151655": {
-            "content": "<|image_pad|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151656": {
-            "content": "<|video_pad|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": true
-        },
-        "151657": {
-            "content": "<tool_call>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151658": {
-            "content": "</tool_call>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151659": {
-            "content": "<|fim_prefix|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151660": {
-            "content": "<|fim_middle|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151661": {
-            "content": "<|fim_suffix|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151662": {
-            "content": "<|fim_pad|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151663": {
-            "content": "<|repo_name|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151664": {
-            "content": "<|file_sep|>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151665": {
-            "content": "<tool_response>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151666": {
-            "content": "</tool_response>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151667": {
-            "content": "<think>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        },
-        "151668": {
-            "content": "</think>",
-            "lstrip": false,
-            "normalized": false,
-            "rstrip": false,
-            "single_word": false,
-            "special": false
-        }
-    },
-    "additional_special_tokens": [
-        "<|im_start|>",
-        "<|im_end|>",
-        "<|object_ref_start|>",
-        "<|object_ref_end|>",
-        "<|box_start|>",
-        "<|box_end|>",
-        "<|quad_start|>",
-        "<|quad_end|>",
-        "<|vision_start|>",
-        "<|vision_end|>",
-        "<|vision_pad|>",
-        "<|image_pad|>",
-        "<|video_pad|>"
-    ],
-    "bos_token": null,
-    "chat_template": "{%- if tools %}\n    {{- '<|im_start|>system\\n' }}\n    {%- if messages[0].role == 'system' %}\n        {{- messages[0].content + '\\n\\n' }}\n    {%- endif %}\n    {{- \"# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n    {%- for tool in tools %}\n        {{- \"\\n\" }}\n        {{- tool | tojson }}\n    {%- endfor %}\n    {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n    {%- if messages[0].role == 'system' %}\n        {{- '<|im_start|>system\\n' + messages[0].content + '<|im_end|>\\n' }}\n    {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message.content is string %}\n        {%- set content = message.content %}\n    {%- else %}\n        {%- set content = '' %}\n    {%- endif %}\n    {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) %}\n        {{- '<|im_start|>' + message.role + '\\n' + content + '<|im_end|>' + '\\n' }}\n    {%- elif message.role == \"assistant\" %}\n        {{- '<|im_start|>' + message.role + '\\n' + content }}\n        {%- if message.tool_calls %}\n            {%- for tool_call in message.tool_calls %}\n                {%- if (loop.first and content) or (not loop.first) %}\n                    {{- '\\n' }}\n                {%- endif %}\n                {%- if tool_call.function %}\n                    {%- set tool_call = tool_call.function %}\n                {%- endif %}\n                {{- '<tool_call>\\n{\"name\": \"' }}\n                {{- tool_call.name }}\n                {{- '\", \"arguments\": ' }}\n                {%- if tool_call.arguments is string %}\n                    {{- tool_call.arguments }}\n                {%- else %}\n                    {{- tool_call.arguments | tojson }}\n                {%- endif %}\n                {{- '}\\n</tool_call>' }}\n            {%- endfor %}\n        {%- endif %}\n        {{- '<|im_end|>\\n' }}\n    {%- elif message.role == \"tool\" %}\n        {%- if loop.first or (messages[loop.index0 - 1].role != \"tool\") %}\n            {{- '<|im_start|>user' }}\n        {%- endif %}\n        {{- '\\n<tool_response>\\n' }}\n        {{- content }}\n        {{- '\\n</tool_response>' }}\n        {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n            {{- '<|im_end|>\\n' }}\n        {%- endif %}\n    {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n    {{- '<|im_start|>assistant\\n' }}\n{%- endif %}",
-    "clean_up_tokenization_spaces": false,
-    "eos_token": "<|im_end|>",
-    "errors": "replace",
-    "model_max_length": 1010000,
-    "pad_token": "<|endoftext|>",
-    "split_special_tokens": false,
-    "tokenizer_class": "Qwen2Tokenizer",
-    "unk_token": null,
-    "add_bos_token": false
-}

 {
+  "add_bos_token": false,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151646": {
+      "content": "<|object_ref_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151647": {
+      "content": "<|object_ref_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151648": {
+      "content": "<|box_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151649": {
+      "content": "<|box_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151650": {
+      "content": "<|quad_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151651": {
+      "content": "<|quad_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151652": {
+      "content": "<|vision_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151653": {
+      "content": "<|vision_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151654": {
+      "content": "<|vision_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151655": {
+      "content": "<|image_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151656": {
+      "content": "<|video_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151657": {
+      "content": "<tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151658": {
+      "content": "</tool_call>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151659": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151660": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151661": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151662": {
+      "content": "<|fim_pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151663": {
+      "content": "<|repo_name|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151664": {
+      "content": "<|file_sep|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151665": {
+      "content": "<tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151666": {
+      "content": "</tool_response>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151667": {
+      "content": "<think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    },
+    "151668": {
+      "content": "</think>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": false
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>",
+    "<|object_ref_start|>",
+    "<|object_ref_end|>",
+    "<|box_start|>",
+    "<|box_end|>",
+    "<|quad_start|>",
+    "<|quad_end|>",
+    "<|vision_start|>",
+    "<|vision_end|>",
+    "<|vision_pad|>",
+    "<|image_pad|>",
+    "<|video_pad|>"
+  ],
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "extra_special_tokens": {},
+  "model_max_length": 1010000,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}