JusteLeo
/

Qwen3-0.6B-T5-xxl-split

@@ -1,94 +1,94 @@
----
-license: apache-2.0
-language:
-  - en
-base_model: JusteLeo/Qwen3-0.6B-T5-xxl
-tags:
-  - split
-  - encoder
-  - embedding
-  - Text Generation
----
-# Qwen3-0.6B-T5-xxl-split
-## Model Description
-This repository provides the components of the `Qwen3-0.6B-T5-xxl` model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications.
-Both components are provided in **float32** format to ensure maximum precision for downstream tasks like quantization.
-## Repository Contents
-- **/qwen_body/**: Contains the fine-tuned `Qwen3-0.6B` model body. This is a standard Hugging Face model directory. The model weights are in `float32`.
-- **/projection_head/**: Contains the fine-tuned projection head as a single `projection_head.pth` file. This is a PyTorch state dictionary.
-## How to Use
-To use these components, you need to load them separately and then combine them in a two-step inference process.
-```python
-import torch
-from torch import nn
-from transformers import AutoTokenizer, AutoModel
-import numpy as np
-# --- 1. Load Components ---
-device = "cuda"
-# Load the model body
-body_model = AutoModel.from_pretrained("./qwen_body").to(device)
-tokenizer = AutoTokenizer.from_pretrained("./qwen_body")
-# Load the projection head
-# First, re-create the architecture
-input_dim = body_model.config.hidden_size # 1024
-hidden_dim = 2048
-output_dim = 4096
-head_model = nn.Sequential(
-    nn.Linear(input_dim, hidden_dim),
-    nn.GELU(),
-    nn.Dropout(0.1),
-    nn.Linear(hidden_dim, output_dim)
-).to(device)
-# Then, load the saved weights
-head_model.load_state_dict(torch.load("./projection_head/projection_head.pth"))
-body_model.eval()
-head_model.eval()
-# --- 2. Create a unified inference function ---
-def get_final_embedding(text: str):
-    # a) Tokenize the input text
-    inputs = tokenizer(text, return_tensors="pt").to(device)
-    # b) Get the base embedding from the body model
-    with torch.no_grad():
-        outputs_body = body_model(**inputs)
-        last_hidden_state = outputs_body.last_hidden_state
-    # c) Perform mean pooling
-    attention_mask = inputs['attention_mask']
-    mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
-    sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1)
-    sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9)
-    pooled_embedding = sum_embeddings / sum_mask
-    # d) Pass the pooled embedding through the projection head
-    with torch.no_grad():
-        final_embedding = head_model(pooled_embedding)
-    return final_embedding
-# --- 3. Test the pipeline ---
-prompt = "A high-tech laboratory with glowing vials and holographic displays."
-embedding = get_final_embedding(prompt)
-print("Inference successful!")
-print(f"Output shape: {embedding.shape}")
-# Expected output shape: (1, 4096)
-```
-## License
-This repository is licensed under the **MIT License**.

+---
+license: apache-2.0
+language:
+  - en
+base_model: JusteLeo/Qwen3-0.6B-T5-xxl
+tags:
+  - split
+  - encoder
+  - embedding
+  - Text Generation
+---
+# Qwen3-0.6B-T5-xxl-split
+## Model Description
+This repository provides the components of the `Qwen3-0.6B-T5-xxl` model, split into two parts. This is intended for advanced users who wish to perform custom operations, such as GGUF conversion or other model architecture modifications.
+Both components are provided in **float32** format to ensure maximum precision for downstream tasks like quantization.
+## Repository Contents
+- **/qwen_body/**: Contains the fine-tuned `Qwen3-0.6B` model body. This is a standard Hugging Face model directory. The model weights are in `float32`.
+- **/projection_head/**: Contains the fine-tuned projection head as a single `projection_head.pth` file. This is a PyTorch state dictionary.
+## How to Use
+To use these components, you need to load them separately and then combine them in a two-step inference process.
+```python
+import torch
+from torch import nn
+from transformers import AutoTokenizer, AutoModel
+import numpy as np
+# --- 1. Load Components ---
+device = "cuda"
+# Load the model body
+body_model = AutoModel.from_pretrained("./qwen_body").to(device)
+tokenizer = AutoTokenizer.from_pretrained("./qwen_body")
+# Load the projection head
+# First, re-create the architecture
+input_dim = body_model.config.hidden_size # 1024
+hidden_dim = 2048
+output_dim = 4096
+head_model = nn.Sequential(
+    nn.Linear(input_dim, hidden_dim),
+    nn.GELU(),
+    nn.Dropout(0.1),
+    nn.Linear(hidden_dim, output_dim)
+).to(device)
+# Then, load the saved weights
+head_model.load_state_dict(torch.load("./projection_head/projection_head.pth"))
+body_model.eval()
+head_model.eval()
+# --- 2. Create a unified inference function ---
+def get_final_embedding(text: str):
+    # a) Tokenize the input text
+    inputs = tokenizer(text, return_tensors="pt").to(device)
+    # b) Get the base embedding from the body model
+    with torch.no_grad():
+        outputs_body = body_model(**inputs)
+        last_hidden_state = outputs_body.last_hidden_state
+    # c) Perform mean pooling
+    attention_mask = inputs['attention_mask']
+    mask_expanded = attention_mask.unsqueeze(-1).expand(last_hidden_state.size()).float()
+    sum_embeddings = torch.sum(last_hidden_state * mask_expanded, 1)
+    sum_mask = torch.clamp(mask_expanded.sum(1), min=1e-9)
+    pooled_embedding = sum_embeddings / sum_mask
+    # d) Pass the pooled embedding through the projection head
+    with torch.no_grad():
+        final_embedding = head_model(pooled_embedding)
+    return final_embedding
+# --- 3. Test the pipeline ---
+prompt = "A high-tech laboratory with glowing vials and holographic displays."
+embedding = get_final_embedding(prompt)
+print("Inference successful!")
+print(f"Output shape: {embedding.shape}")
+# Expected output shape: (1, 4096)
+```
+## License
+This repository is licensed under the **Apache license 2.0**.