Tawkat commited on
Commit
ed46aad
1 Parent(s): 667a76d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -6
README.md CHANGED
@@ -199,11 +199,11 @@ The intended use of GreenLLaMA is for the detoxification tasks. We aim to help r
199
 
200
  ### Limitations
201
 
202
- * Data Generation Process
203
  This work uses ChatGPT, a gpt-3.5-turbo version from June, 2023. Since the model can be updated on a regular interval, the data generation process should be treated accordingly.
204
- * Data Quality
205
  GreenLLaMA proposes an automated data generation pipeline to create a pseudo-parallel cross-platform corpus. The synthetic data generation process involves multi-stage data processing without the necessity of direct human inspection. Although this automated pipeline makes the overall data generation process scalable, it comes at the risk of allowing low-quality data in our cross-platform corpus. Hence, human inspection is recommended to remove any sort of potential vulnerability and maintain a standard quality of the corpus.
206
- * Model Responses
207
  Although GreenLLaMA exhibits impressive ability in generating detoxified responses, we believe there is still room for improvement for the model in terms of producing meaning-preserved detoxified outcomes. Moreover, the models can sometimes be vulnerable to implicit, adversarial tokens and continue to produce toxic content. Therefore, we recommend that GreenLLaMA should be couched with caution before deployment.
208
 
209
  ### Ethical Considerations and Risks
@@ -211,9 +211,9 @@ The intended use of GreenLLaMA is for the detoxification tasks. We aim to help r
211
  The development of large language models (LLMs) raises several ethical concerns.
212
  In creating an open model, we have carefully considered the following:
213
 
214
- * Data Collection and Release
215
- We compile datasets from a wide range of platforms. To ensure proper credit assignment, we refer users to the original publications in our paper. We create the cross-platform detoxification corpus for academic research purposes. We intend to share the corpus. We would also like to mention that some content are generated using GPT-4 for illustration purposes.
216
- * Potential Misuse and Bias
217
  GreenLLaMA can potentially be misused to generate toxic and biased content. For these reasons, we recommend that GreenLLaMA not be used in applications without careful prior consideration of potential misuse and bias.
218
 
219
  ## Citation
 
199
 
200
  ### Limitations
201
 
202
+ * **Data Generation Process:**
203
  This work uses ChatGPT, a gpt-3.5-turbo version from June, 2023. Since the model can be updated on a regular interval, the data generation process should be treated accordingly.
204
+ * **Data Quality:**
205
  GreenLLaMA proposes an automated data generation pipeline to create a pseudo-parallel cross-platform corpus. The synthetic data generation process involves multi-stage data processing without the necessity of direct human inspection. Although this automated pipeline makes the overall data generation process scalable, it comes at the risk of allowing low-quality data in our cross-platform corpus. Hence, human inspection is recommended to remove any sort of potential vulnerability and maintain a standard quality of the corpus.
206
+ * **Model Responses:**
207
  Although GreenLLaMA exhibits impressive ability in generating detoxified responses, we believe there is still room for improvement for the model in terms of producing meaning-preserved detoxified outcomes. Moreover, the models can sometimes be vulnerable to implicit, adversarial tokens and continue to produce toxic content. Therefore, we recommend that GreenLLaMA should be couched with caution before deployment.
208
 
209
  ### Ethical Considerations and Risks
 
211
  The development of large language models (LLMs) raises several ethical concerns.
212
  In creating an open model, we have carefully considered the following:
213
 
214
+ * **Data Collection and Release:**
215
+ We compile datasets from a wide range of platforms. To ensure proper credit assignment, we refer users to the original publications in our paper. We create the cross-platform detoxification corpus for academic research purposes. We intend to share the corpus in the future. We would also like to mention that some content are generated using GPT-4 for illustration purposes.
216
+ * **Potential Misuse and Bias:**
217
  GreenLLaMA can potentially be misused to generate toxic and biased content. For these reasons, we recommend that GreenLLaMA not be used in applications without careful prior consideration of potential misuse and bias.
218
 
219
  ## Citation