Q-bert
/

Mamba-790M

@@ -35,6 +35,30 @@ print(generated_text)
 ```
 > Hi, I'm looking for a new job. I've been working at a company for about a year now.
 # Credits:
 https://huggingface.co/state-spaces

 ```
 > Hi, I'm looking for a new job. I've been working at a company for about a year now.
+# For Training:
+```python
+from transformers import Trainer ,TrainingArguments
+import torch
+import os
+class MambaTrainer(Trainer):
+    def compute_loss(self, model, inputs, return_outputs=False):
+        input_ids = inputs.pop("input_ids")
+        lm_logits = model(input_ids)[0]
+        labels = input_ids.to(lm_logits.device)
+        shift_logits = lm_logits[:, :-1, :].contiguous()
+        labels = labels[:, 1:].contiguous()
+        loss_fct = torch.nn.CrossEntropyLoss()
+        lm_loss = loss_fct(shift_logits.view(-1, shift_logits.size(-1)), labels.view(-1))
+        return lm_loss
+```
+You must use this class for training. And fp16 must be **False**.
 # Credits:
 https://huggingface.co/state-spaces