suriyagunasekar commited on
Commit
9c0466b
1 Parent(s): 9691b01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -16
README.md CHANGED
@@ -8,10 +8,10 @@ tags:
8
  ---
9
  ## Model Summary
10
 
11
- The phi-1 family encompasses three distinct models: phi-1, phi-1-base, and phi-1-small, each specialized for basic Python coding. Their training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), both phi-1 and phi-1-small have demonstrated an impressive accuracy rate exceeding 45% on the simple Python coding benchmark, HumanEval.
12
 
13
  ## Intended Uses
14
- Given the nature of the training data, the phi-1 and phi-1-small models are best suited for prompts using the code format:
15
 
16
  #### code format:
17
  ```python
@@ -29,8 +29,8 @@ def print_prime(n):
29
  where the model generates the code after the comments. (Note: This is a legitimate and correct use of the else statement in Python loops.)
30
 
31
  **Notes**
32
- * The phi-1 family models are intended for research purposes. The model-generated code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
33
- * Direct adoption for production coding tasks is out of the scope of this research project. As a result, the phi-1 family models have not been tested to ensure that they perform adequately for production-level code. Please refer to the limitation sections of this document for more details.
34
 
35
  ## Limitations of phi-1
36
 
@@ -42,7 +42,7 @@ where the model generates the code after the comments. (Note: This is a legitima
42
  * Potential Biases: The phi-1 family models, like other AI models, are trained on web and synthetic data. This data can contain biases and errors that might affect the AI's performance. Biases could stem from various sources like unbalanced representation, stereotypes, or controversial opinions present in the training data. As a result, the AI model might sometimes generate responses that reflect these biases or errors.
43
 
44
  ## Warning about Security Risks
45
- When leveraging the phi-1 family models, it's paramount to be vigilant. These models, though powerful, can inadvertently introduce security vulnerabilities in the generated code. Examples include, but are not limited to:
46
 
47
  * Directory Traversal: The code might fail to implement safe checks against directory traversal attacks, potentially allowing unauthorized access to sensitive files on your system.
48
  * Injection Attacks: There could be lapses in escaping strings properly, making the application susceptible to SQL, OS commands, or other injection attacks.
@@ -54,21 +54,13 @@ When leveraging the phi-1 family models, it's paramount to be vigilant. These mo
54
  Given these potential pitfalls, and others not explicitly mentioned, it's essential to thoroughly review, test, and verify the generated code before deploying it in any application, especially those that are security-sensitive. Always consult with security experts or perform rigorous penetration testing when in doubt.
55
 
56
  ## Training
57
- ### Model (phi-1-base)
58
- * Architecture: a Transformer-based model with next-word prediction objective
59
- * Pretraining steps: 24000 step
60
- * Pretraining tokens: 51B tokens
61
- * Precision: fp16
62
- * GPUs: 8 A100
63
- * Training time: 4 days
64
-
65
  ### Model (phi-1)
66
  * Architecture: a Transformer-based model with next-word prediction objective
67
- * Fine-tuning steps: 6000 step
68
- * Fine-tuning tokens: 3B tokens
69
  * Precision: fp16
70
  * GPUs: 8 A100
71
- * Training time: 7 hours
72
 
73
  ### Software
74
  * [PyTorch](https://github.com/pytorch/pytorch)
 
8
  ---
9
  ## Model Summary
10
 
11
+ The phi-1.5 is a language model with 1.3 billion parameters specialized for basic Python coding. Its training involved a variety of data sources, including subsets of Python codes from [The Stack v1.2](https://huggingface.co/datasets/bigcode/the-stack), Q&A content from [StackOverflow](https://archive.org/download/stackexchange), competition code from [code_contests](https://github.com/deepmind/code_contests), and synthetic Python textbooks and exercises generated by [gpt-3.5-turbo-0301](https://platform.openai.com/docs/models/gpt-3-5). Even though the model and the datasets are relatively small compared to contemporary Large Language Models (LLMs), phi-1 has demonstrated an impressive accuracy rate exceeding 45% on the simple Python coding benchmark, HumanEval.
12
 
13
  ## Intended Uses
14
+ Given the nature of the training data, the phi-1 model are best suited for prompts using the code format:
15
 
16
  #### code format:
17
  ```python
 
29
  where the model generates the code after the comments. (Note: This is a legitimate and correct use of the else statement in Python loops.)
30
 
31
  **Notes**
32
+ * The phi-1 model are intended for research purposes. The model-generated code should be treated as a starting point rather than a definitive solution for potential use cases. Users should be cautious when employing these models in their applications.
33
+ * Direct adoption for production coding tasks is out of the scope of this research project. As a result, the phi-1 model have not been tested to ensure that they perform adequately for production-level code. Please refer to the limitation sections of this document for more details.
34
 
35
  ## Limitations of phi-1
36
 
 
42
  * Potential Biases: The phi-1 family models, like other AI models, are trained on web and synthetic data. This data can contain biases and errors that might affect the AI's performance. Biases could stem from various sources like unbalanced representation, stereotypes, or controversial opinions present in the training data. As a result, the AI model might sometimes generate responses that reflect these biases or errors.
43
 
44
  ## Warning about Security Risks
45
+ When leveraging the phi-1 model, it's paramount to be vigilant. The model, though powerful, can inadvertently introduce security vulnerabilities in the generated code. Examples include, but are not limited to:
46
 
47
  * Directory Traversal: The code might fail to implement safe checks against directory traversal attacks, potentially allowing unauthorized access to sensitive files on your system.
48
  * Injection Attacks: There could be lapses in escaping strings properly, making the application susceptible to SQL, OS commands, or other injection attacks.
 
54
  Given these potential pitfalls, and others not explicitly mentioned, it's essential to thoroughly review, test, and verify the generated code before deploying it in any application, especially those that are security-sensitive. Always consult with security experts or perform rigorous penetration testing when in doubt.
55
 
56
  ## Training
 
 
 
 
 
 
 
 
57
  ### Model (phi-1)
58
  * Architecture: a Transformer-based model with next-word prediction objective
59
+ * Training steps: ~24000 step
60
+ * Training tokens: ~51B tokens
61
  * Precision: fp16
62
  * GPUs: 8 A100
63
+ * Training time: 4 days
64
 
65
  ### Software
66
  * [PyTorch](https://github.com/pytorch/pytorch)