aari1995
/

germeo-7b-laser

Text Generation

Inference Endpoints

text-generation-inference

Model card Files Files and versions Community

aari1995 commited on Apr 10

Commit

38e1149

•

1 Parent(s): 5555ddd

Update README.md

Files changed (1) hide show

README.md +37 -0

README.md CHANGED Viewed

@@ -160,6 +160,43 @@ prompt = "Schreibe eine Stellenanzeige für Data Scientist bei AXA!"
 final_prompt = prompt_template.format(prompt=prompt)
 ```
 ### German benchmarks
 | **German tasks:**             | **MMLU-DE**    | **Hellaswag-DE** | **ARC-DE**      |**Average**      |

 final_prompt = prompt_template.format(prompt=prompt)
 ```
+#### Limit the model to output reply-only:
+  To solve this, you need to implement a custom stopping criteria:
+```python
+from transformers import StoppingCriteria
+class GermeoStoppingCriteria(StoppingCriteria):
+  def __init__(self, target_sequence, prompt):
+      self.target_sequence = target_sequence
+      self.prompt=prompt
+  def __call__(self, input_ids, scores, **kwargs):
+      # Get the generated text as a string
+      generated_text = tokenizer.decode(input_ids[0])
+      generated_text = generated_text.replace(self.prompt,'')
+      # Check if the target sequence appears in the generated text
+      if self.target_sequence in generated_text:
+          return True  # Stop generation
+      return False  # Continue generation
+  def __len__(self):
+      return 1
+  def __iter__(self):
+      yield self
+```
+This then expects your input prompt (formatted as given into the model), and a stopping criteria, in this case the im_end token. Simply add it to the generation:
+```python
+generation_output = model.generate(
+    tokens,
+    streamer=streamer,
+    max_new_tokens=1012,
+    stopping_criteria=GermeoStoppingCriteria("<|im_end|>", prompt_template.format(prompt=prompt))
+)
+```
 ### German benchmarks
 | **German tasks:**             | **MMLU-DE**    | **Hellaswag-DE** | **ARC-DE**      |**Average**      |