gugarosa commited on
Commit
25a6b01
1 Parent(s): ca30dd4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -5
README.md CHANGED
@@ -17,7 +17,6 @@ Phi-2 is a Transformer with **2.7 billion** parameters. It was trained using the
17
 
18
  Our model hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more.
19
 
20
-
21
  ## Intended Uses
22
 
23
  Phi-2 is intended for research purposes only. Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.
@@ -69,13 +68,34 @@ def print_prime(n):
69
  ```
70
  where the model generates the text after the comments.
71
 
72
- **Notes**
73
  * Phi-2 is intended for research purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
74
  * Direct adoption for production tasks is out of the scope of this research project. As a result, the Phi-2 model has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details.
75
  * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
76
 
77
  ## Sample Code
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  ```python
80
  import torch
81
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -85,8 +105,7 @@ torch.set_default_device("cuda")
85
  model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
86
  tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
87
 
88
- inputs = tokenizer('''```python
89
- def print_prime(n):
90
  """
91
  Print all primes between 1 and n
92
  """''', return_tensors="pt", return_attention_mask=False)
@@ -96,7 +115,7 @@ text = tokenizer.batch_decode(outputs)[0]
96
  print(text)
97
  ```
98
 
99
- **Remark.** In the generation function, our model currently does not support beam search (`num_beams > 1`).
100
  Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
101
 
102
  ## Limitations of Phi-2
 
17
 
18
  Our model hasn't been fine-tuned through reinforcement learning from human feedback. The intention behind crafting this open-source model is to provide the research community with a non-restricted small model to explore vital safety challenges, such as reducing toxicity, understanding societal biases, enhancing controllability, and more.
19
 
 
20
  ## Intended Uses
21
 
22
  Phi-2 is intended for research purposes only. Given the nature of the training data, the Phi-2 model is best suited for prompts using the QA format, the chat format, and the code format.
 
68
  ```
69
  where the model generates the text after the comments.
70
 
71
+ **Notes:**
72
  * Phi-2 is intended for research purposes. The model-generated text/code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
73
  * Direct adoption for production tasks is out of the scope of this research project. As a result, the Phi-2 model has not been tested to ensure that it performs adequately for any production-level application. Please refer to the limitation sections of this document for more details.
74
  * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
75
 
76
  ## Sample Code
77
 
78
+ There are four types of execution mode:
79
+
80
+ 1. FP16 / Flash-Attention / CUDA:
81
+ ```python
82
+ model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", flash_attn=True, flash_rotary=True, fused_dense=True, device_map="cuda", trust_remote_code=True)
83
+ ```
84
+ 2. FP16 / CUDA:
85
+ ```python
86
+ model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", device_map="cuda", trust_remote_code=True)
87
+ ```
88
+ 3. FP32 / CUDA:
89
+ ```python
90
+ model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float32, device_map="cuda", trust_remote_code=True)
91
+ ```
92
+ 4. FP32 / CPU:
93
+ ```python
94
+ model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype=torch.float32, device_map="cpu", trust_remote_code=True)
95
+ ```
96
+
97
+ To ensure the maximum compatibility, we recommend using the second execution mode (FP16 / CUDA), as follows:
98
+
99
  ```python
100
  import torch
101
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
105
  model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2", torch_dtype="auto", trust_remote_code=True)
106
  tokenizer = AutoTokenizer.from_pretrained("microsoft/phi-2", trust_remote_code=True)
107
 
108
+ inputs = tokenizer('''def print_prime(n):
 
109
  """
110
  Print all primes between 1 and n
111
  """''', return_tensors="pt", return_attention_mask=False)
 
115
  print(text)
116
  ```
117
 
118
+ **Remark:** In the generation function, our model currently does not support beam search (`num_beams > 1`).
119
  Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
120
 
121
  ## Limitations of Phi-2