|
--- |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# Orca 2 |
|
|
|
<!-- Provide a quick summary of what the model is/does. --> |
|
|
|
In Orca 2, we continue exploring how improved training signals can give smaller LMs enhanced reasoning abilities, typically |
|
found only in much larger models. We seek to teach small LMs to employ different solution |
|
strategies for different tasks, potentially different from the one used by the |
|
larger model. For example, while larger models might provide a direct answer |
|
to a complex task, smaller models may not have the same capacity. In Orca |
|
2, we teach the model various reasoning techniques (step-by-step, recall |
|
then generate, recall-reason-generate, direct answer, etc.). More crucially, |
|
we aim to help the model learn to determine the most effective solution |
|
strategy for each task. Orca 2 models were trained by continual training of LLaMA-2 base models of the same size. |
|
|
|
|
|
## Model Details |
|
|
|
Refer to LLaMA-2 for details on model architectures. |
|
|
|
## Uses |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Orca 2, built upon the LLaMA 2 model family, retains many of its limitations, as well as the |
|
common limitations of other large language models or limitation including by its training |
|
process, including: |
|
|
|
**Data Biases**: Large language models, trained on extensive data, can inadvertently carry |
|
biases present in the source data. Consequently, the models may generate outputs that could |
|
be potentially biased or unfair. |
|
|
|
**Lack of Contextual Understanding**: Despite their impressive capabilities in language understanding and generation, these models exhibit limited real-world understanding, resulting |
|
in potential inaccuracies or nonsensical responses. |
|
|
|
**Lack of Transparency**: Due to the complexity and size, large language models can act |
|
as “black boxes”, making it difficult to comprehend the rationale behind specific outputs or |
|
decisions. We recommend reviewing transparency notes from Azure for more information. |
|
|
|
**Content Harms**: There are various types of content harms that large language models |
|
can cause. It is important to be aware of them when using these models, and to take |
|
actions to prevent them. It is recommended to leverage various content moderation services |
|
provided by different companies and institutions. On an important note, we hope for better |
|
regulations and standards from government and technology leaders around content harms |
|
for AI technologies in future. We value and acknowledge the important role that research |
|
and open source community can play in this direction. |
|
|
|
**Hallucination**: It is important to be aware and cautious not to entirely rely on a given |
|
language model for critical decisions or information that might have deep impact as it is |
|
not obvious how to prevent these models from fabricating content. Moreover, it is not clear |
|
whether small models may be more susceptible to hallucination in ungrounded generation |
|
use cases due to their smaller sizes and hence reduced memorization capacities. This is an |
|
active research topic and we hope there will be more rigorous measurement, understanding |
|
and mitigations around this topic. |
|
|
|
**Potential for Misuse**: Without suitable safeguards, there is a risk that these models could |
|
be maliciously used for generating disinformation or harmful content. |
|
|
|
**Data Distribution**: Orca 2’s performance is likely to correlate strongly with the distribution |
|
of the tuning data. This correlation might limit its accuracy in areas underrepresented in |
|
the training dataset such as math, coding, and reasoning. |
|
|
|
**System messages**: Orca 2 demonstrates variance in performance depending on the system |
|
instructions. Additionally, the stochasticity introduced by the model size may lead to |
|
generation of non-deterministic responses to different system instructions. |
|
|
|
**Zero-Shot Settings**: Orca 2 was trained on data that mostly simulate zero-shot settings. |
|
While the model demonstrate very strong performance in zero-shot settings, it does not show |
|
the same gains of using few-shot learning compared to other, specially larger, models. |
|
|
|
**Synthetic data**: As Orca 2 is trained on synthetic data, it could inherit both the advantages |
|
and shortcomings of the models and methods used for data generation. We posit that Orca |
|
2 benefits from the safety measures incorporated during training and safety guardrails (e.g., |
|
content filter) within the Azure OpenAI API. However, detailed studies are required for |
|
better quantification of such risks. |
|
|
|
This model is solely designed for research settings, and its testing has only been carried |
|
out in such environments. It should not be used in downstream applications, as additional |
|
analysis is needed to assess potential harm or bias in the proposed application. |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model. |
|
|
|
[More Information Needed] |
|
|
|
|