Lorenzob
/

luminar-nano

Safetensors

trm

Model card Files Files and versions

xet

Community

Lorenzob commited on Oct 12

Commit

2119d2a

verified ·

1 Parent(s): 216cb97

Upload folder using huggingface_hub

Browse files

Files changed (2) hide show

README.md +4 -2
handler.py +118 -0

README.md CHANGED Viewed

@@ -3,6 +3,8 @@
 This is a TRM model trained using the provided datasets.
-## How to use
-[More detailed usage instructions can be added here]

 This is a TRM model trained using the provided datasets.
+## How to use for Inference
+You can use this model for inference via the Hugging Face Inference API or with the `transformers` library.
+Make sure you have the `modelling_trm.py` file in the same directory as the model files if using the `transformers` library locally.

handler.py ADDED Viewed

	@@ -0,0 +1,118 @@

+import torch
+import os
+from transformers import AutoTokenizer
+# Import your custom model class
+import sys
+# Add the local directory containing modelling_trm.py to the Python path
+sys.path.insert(0, ".") # Assuming the handler will be in the root of the repo
+from modelling_trm import TRM, TRMConfig
+sys.path.pop(0) # Remove the path after import
+class InferenceHandler:
+    def __init__(self):
+        self.model = None
+        self.tokenizer = None
+        self.device = None
+    def load(self, model_path="."):
+        # Load model and tokenizer
+        self.device = "cuda" if torch.cuda.is_available() else "cpu"
+        print(f"Loading model on device: {self.device}")
+        # Load the config
+        config = TRMConfig.from_pretrained(model_path)
+        # Load the model
+        self.model = TRM.from_pretrained(model_path, config=config)
+        self.model.to(self.device)
+        self.model.eval() # Set model to evaluation mode
+        # Load the tokenizer (using a placeholder as the original had issues)
+        # You might need to adapt this based on your actual tokenizer
+        try:
+            self.tokenizer = AutoTokenizer.from_pretrained(model_path)
+        except Exception:
+            # Fallback to a basic tokenizer if loading from path fails
+             self.tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
+             print("Loaded a placeholder tokenizer (bert-base-uncased) for inference.")
+    def preprocess(self, inputs):
+        # Preprocess inputs for the model
+        # 'inputs' will be the data received by the inference endpoint
+        # This needs to be adapted based on the expected input format (e.g., text string)
+        # For text generation, 'inputs' could be a string or a list of strings.
+        if isinstance(inputs, str):
+            inputs = [inputs]
+        elif not isinstance(inputs, list):
+             raise ValueError("Input must be a string or a list of strings.")
+        # Tokenize the input
+        # Ensure padding and truncation are handled
+        tokenized_inputs = self.tokenizer(inputs, return_tensors="pt", padding=True, truncation=True, max_length=self.model.config.seq_len)
+        # Move tokenized inputs to the model's device
+        tokenized_inputs = {k: v.to(self.device) for k, v in tokenized_inputs.items()}
+        # Return only the inputs expected by the TRM model
+        # Based on training, TRM seems to only take 'input_ids'
+        return {'input_ids': tokenized_inputs['input_ids']}
+    def inference(self, inputs):
+        # Perform inference with the model
+        # 'inputs' here is the output of the preprocess method
+        with torch.no_grad():
+            # Perform the forward pass
+            # Assuming the model only takes input_ids
+            outputs = self.model(**inputs)
+        # The model's output structure might differ, assuming it returns logits
+        # You might need to adapt this based on the actual TRM output for inference
+        # For text generation, you might use model.generate() instead of a simple forward pass
+        # This example performs a simple forward pass and returns logits
+        logits = outputs.logits if hasattr(outputs, 'logits') else outputs['logits'] # Adapt based on model output
+        return logits # Or process logits further for text generation
+    def postprocess(self, outputs):
+        # Postprocess the model outputs
+        # 'outputs' here is the output of the inference method (e.g., logits)
+        # For text generation, you would typically decode the generated token IDs
+        # This is a placeholder postprocessing step (e.g., returning the raw logits as a list)
+        # Example: decode token IDs if using model.generate()
+        # generated_ids = outputs[0] # Assuming outputs from generate() is a tensor
+        # generated_text = self.tokenizer.decode(generated_ids, skip_special_tokens=True)
+        # return generated_text
+        # For this basic handler returning logits, just convert to CPU and list
+        return outputs.cpu().tolist()
+    def handle(self, data):
+        # Main inference handler function
+        # 'data' is the input received by the inference endpoint
+        # 1. Preprocess
+        model_input = self.preprocess(data)
+        # 2. Inference
+        model_output = self.inference(model_input)
+        # 3. Postprocess
+        response = self.postprocess(model_output)
+        return response
+# Example usage (for testing locally)
+# if __name__ == "__main__":
+#     handler = InferenceHandler()
+#     handler.load()
+#     test_input = "This is a test input"
+#     output = handler.handle(test_input)
+#     print("Inference output:", output)