nicholasKluge commited on
Commit
3f02a48
1 Parent(s): 121149e

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -38,7 +38,8 @@ inference:
38
 
39
  `Aira-Instruct-PT-124M` is a instruction-tuned GPT-style model based on [GPT-2](https://huggingface.co/pierreguillou/gpt2-small-portuguese). The model was trained with a dataset composed of `prompt`, `completions`, generated via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework. `Aira-Instruct-PT-124M` instruction-tuning was achieved via conditional text generation.
40
 
41
- The dataset used to train this model combines two main sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset focused on Ethics, AI, AI safety, and related topics. The dataset is available in both Portuguese and English.
 
42
 
43
  Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
44
 
@@ -100,7 +101,6 @@ responses = aira.generate(**inputs,
100
  print(f"Question: 👤 {question}\n")
101
 
102
  for i, response in enumerate(responses):
103
- # print only the response and remove the question
104
  print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
105
  ```
106
 
@@ -137,4 +137,4 @@ The model will output something like:
137
 
138
  ## License
139
 
140
- The `Aira-Instruct-PT-124M` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
 
38
 
39
  `Aira-Instruct-PT-124M` is a instruction-tuned GPT-style model based on [GPT-2](https://huggingface.co/pierreguillou/gpt2-small-portuguese). The model was trained with a dataset composed of `prompt`, `completions`, generated via the [Self-Instruct](https://github.com/yizhongw/self-instruct) framework. `Aira-Instruct-PT-124M` instruction-tuning was achieved via conditional text generation.
40
 
41
+ The dataset used to train this model combines the following sources of data: the [`synthetic-instruct-gptj-pairwise`](https://huggingface.co/datasets/Dahoas/synthetic-instruct-gptj-pairwise) dataset, the [`databricks_dolly_15k`](https://huggingface.co/datasets/HuggingFaceH4/databricks_dolly_15k) dataset, the [`instruction-dataset`](https://huggingface.co/datasets/HuggingFaceH4/instruction-dataset) dataset, and a subset of [Aira's](https://github.com/Nkluge-correa/Aira-EXPERT) fine-tuning dataset, focused on Q&A related to Ethics, AI, AI safety, and other related topics. The dataset is available in both Portuguese and English.
42
+
43
 
44
  Check our gradio-demo in [Spaces](https://huggingface.co/spaces/nicholasKluge/Aira-Demo).
45
 
 
101
  print(f"Question: 👤 {question}\n")
102
 
103
  for i, response in enumerate(responses):
 
104
  print(f'Response {i+1}: 🤖 {tokenizer.decode(response, skip_special_tokens=True).replace(question, "")}')
105
  ```
106
 
 
137
 
138
  ## License
139
 
140
+ The `Aira-Instruct-PT-124M` is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.