gugarosa commited on
Commit
654b690
1 Parent(s): eac5218

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -11,7 +11,7 @@ tags:
11
  ---
12
  ## Model Summary
13
 
14
- The language model Phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), phi-1 has demonstrated an impressive accuracy rate exceeding 50% on the simple Python coding benchmark, HumanEval.
15
 
16
  ## Intended Uses
17
  Given the nature of the training data, Phi-1 is best suited for prompts using the code format:
@@ -37,6 +37,7 @@ where the model generates the code after the comments. (Note: This is a legitima
37
  * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
38
 
39
  ## Sample Code
 
40
  ```python
41
  import torch
42
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -56,10 +57,9 @@ text = tokenizer.batch_decode(outputs)[0]
56
  print(text)
57
  ```
58
 
59
- **Remark.** In the generation function, our model currently does not support beam search (`num_beams` >1).
60
  Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
61
 
62
-
63
  ## Limitations of Phi-1
64
 
65
  * Limited Scope: 99.8% of the Python scripts in our fine-tuning dataset use only the packages "typing, math, random, collections, datetime, itertools". If the model generates Python scripts that utilize other packages, we strongly recommend users manually verify all API uses.
@@ -93,7 +93,7 @@ Given these potential pitfalls, and others not explicitly mentioned, it's essent
93
  ### Software
94
  * [PyTorch](https://github.com/pytorch/pytorch)
95
  * [DeepSpeed](https://github.com/microsoft/DeepSpeed)
96
- * [flash-attention](https://github.com/HazyResearch/flash-attention)
97
 
98
  ### License
99
  The model is licensed under the [Research License](https://huggingface.co/microsoft/phi-1/resolve/main/Research%20License.docx).
 
11
  ---
12
  ## Model Summary
13
 
14
+ The language model Phi-1 is a Transformer with 1.3 billion parameters, specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), Phi-1 has demonstrated an impressive accuracy rate exceeding 50% on the simple Python coding benchmark, HumanEval.
15
 
16
  ## Intended Uses
17
  Given the nature of the training data, Phi-1 is best suited for prompts using the code format:
 
37
  * If you are using `transformers>=4.36.0`, always load the model with `trust_remote_code=True` to prevent side-effects.
38
 
39
  ## Sample Code
40
+
41
  ```python
42
  import torch
43
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
57
  print(text)
58
  ```
59
 
60
+ **Remark.** In the generation function, our model currently does not support beam search (`num_beams > 1`).
61
  Furthermore, in the forward pass of the model, we currently do not support outputting hidden states or attention values, or using custom input embeddings.
62
 
 
63
  ## Limitations of Phi-1
64
 
65
  * Limited Scope: 99.8% of the Python scripts in our fine-tuning dataset use only the packages "typing, math, random, collections, datetime, itertools". If the model generates Python scripts that utilize other packages, we strongly recommend users manually verify all API uses.
 
93
  ### Software
94
  * [PyTorch](https://github.com/pytorch/pytorch)
95
  * [DeepSpeed](https://github.com/microsoft/DeepSpeed)
96
+ * [Flash-Attention](https://github.com/HazyResearch/flash-attention)
97
 
98
  ### License
99
  The model is licensed under the [Research License](https://huggingface.co/microsoft/phi-1/resolve/main/Research%20License.docx).