microsoft
/

Orca-2-7b

@@ -6,21 +6,35 @@ pipeline_tag: text-generation
 <!-- Provide a quick summary of what the model is/does. -->
-In Orca 2, we continue exploring how improved training signals can give smaller LMs enhanced reasoning abilities, typically
-found only in much larger models. We seek to teach small LMs to employ different solution
-strategies for different tasks, potentially different from the one used by the
-larger model. For example, while larger models might provide a direct answer
-to a complex task, smaller models may not have the same capacity. In Orca
-2, we teach the model various reasoning techniques (step-by-step, recall
-then generate, recall-reason-generate, direct answer, etc.). More crucially,
-we aim to help the model learn to determine the most effective solution
-strategy for each task. Orca 2 models were trained by continual training of LLaMA-2 base models of the same size.
 ## Model Details
 Refer to LLaMA-2 for details on model architectures.
 ## Uses
@@ -82,8 +96,90 @@ This model is solely designed for research settings, and its testing has only be
 out in such environments. It should not be used in downstream applications, as additional
 analysis is needed to assess potential harm or bias in the proposed application.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]

 <!-- Provide a quick summary of what the model is/does. -->
+Orca is a helpful assistant that is built for research purposes only and provides a single turn response
+in tasks such as reasoning over user given data, reading comprehension, math problem solving and text summarization.
+The model is designed to excel particularly in reasoning.
+We open-source Orca to encourage further research on the development, evaluation, and alignment of smaller LMs.
+## What is Orca’s intended use(s)?
++ Orca is built for research purposes only.
++ The main purpose is to allow the research community to assess its abilities and to provide a foundation for building better frontier models.
+## How was Orca evaluated?
++ Orca has been evaluated on a large number of tasks ranging from reasoning to safety. Please refer to Sections 6, 7, 8, 9, 10, and 11 in the paper for details about different evaluation experiments.
 ## Model Details
 Refer to LLaMA-2 for details on model architectures.
+Orca is a finetuned version of LLAMA-2. Orca’s training data is a synthetic dataset that was created to enhance the small model’s reasoning abilities. All synthetic training data was filtered using the Azure content filters.
+More details about the model can be found at: LINK to Tech Report
+## License
+The model is licensed under the Microsoft Research License.
+Llama 2 is licensed under the LLAMA 2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.
 ## Uses
 out in such environments. It should not be used in downstream applications, as additional
 analysis is needed to assess potential harm or bias in the proposed application.
+## Getting started with Orca 2
+**Safe inference with Azure AI Content Safety**
+The usage of Azure AI Content Safety on top of model prediction is strongly encouraged
+and can help prevent content harms. Azure AI Content Safety is a content moderation platform
+that uses AI to keep your content safe. By integrating Orca with Azure AI Content Safety,
+we can moderate the model output by scanning it for sexual content, violence, hate, and
+self-harm with multiple severity levels and multi-lingual detection.
+```python
+import os
+import math
+import transformers
+import torch
+from azure.ai.contentsafety import ContentSafetyClient
+from azure.core.credentials import AzureKeyCredential
+from azure.core.exceptions import HttpResponseError
+from azure.ai.contentsafety.models import AnalyzeTextOptions
+CONTENT_SAFETY_KEY = os.environ["CONTENT_SAFETY_KEY"]
+CONTENT_SAFETY_ENDPOINT = os.environ["CONTENT_SAFETY_ENDPOINT"]
+# We use Azure AI Content Safety to filter out any content that reaches "Medium" threshold
+# For more information: https://learn.microsoft.com/en-us/azure/ai-services/content-safety/
+def should_filter_out(input_text, threshold=4):
+    # Create an Content Safety client
+    client = ContentSafetyClient(CONTENT_SAFETY_ENDPOINT, AzureKeyCredential(CONTENT_SAFETY_KEY))
+    # Construct a request
+    request = AnalyzeTextOptions(text=input_text)
+    # Analyze text
+    try:
+        response = client.analyze_text(request)
+    except HttpResponseError as e:
+        print("Analyze text failed.")
+        if e.error:
+            print(f"Error code: {e.error.code}")
+            print(f"Error message: {e.error.message}")
+            raise
+        print(e)
+        raise
+    categories = ["hate_result", "self_harm_result", "sexual_result", "violence_result"]
+    max_score = -math.inf
+    for category in categories:
+        max_score = max(max_score, getattr(response, category).severity)
+    return max_score >= threshold
+def run_inference(model_path, inputs):
+    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
+    model = transformers.AutoModelForCausalLM.from_pretrained(model_path)
+    model.to(device)
+    tokenizer = transformers.AutoTokenizer.from_pretrained(
+        model_path,
+        model_max_length=4096,
+        padding_side="right",
+        use_fast=False,
+        add_special_tokens=False,
+    )
+    inputs = tokenizer(inputs, return_tensors='pt')
+    inputs = inputs.to(device)
+    output_ids = model.generate(inputs["input_ids"], max_length=4096, do_sample=False, temperature=0.0, use_cache=True)
+    sequence_length = inputs["input_ids"].shape[1]
+    new_output_ids = output_ids[:, sequence_length:]
+    answers = tokenizer.batch_decode(new_output_ids, skip_special_tokens=True)
+    return answers
+model_path = 'microsoft/Orca-2-7b'
+system_message = "You are Orca, an AI language model created by Microsoft. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior."
+user_message = "\" \n :You can't just say, \"\"that's crap\"\" and remove it without gaining a consensus. You already know this, based on your block history. —/ \" \nIs the comment obscene? \nOptions : Yes, No."
+# We use Chat Markup Language https://github.com/MicrosoftDocs/azure-docs/blob/main/articles/ai-services/openai/includes/chat-markup-language.md#working-with-chat-markup-language-chatml
+prompt =  f"<|im_start|>system\n{system_message}<|im_end|>\n<|im_start|>user\n{user_message}<|im_end|>\n<|im_start|>assistant"
+answers = run_inference(model_path, prompt)
+final_output = answers[0] if not should_filter_out(answers[0]) else "[Content Filtered]"
+print(final_output)
+```