Text Generation
Transformers
PyTorch
English
llama
sft
Inference Endpoints
text-generation-inference
andreaskoepf commited on
Commit
41d97e2
1 Parent(s): bd00b25

move dataset composition info

Browse files
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -68,6 +68,18 @@ This model was trained on:
68
  - [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)
69
  - [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
70
  - [atom-in-the-universe/fanfics-10k-50k](https://huggingface.co/datasets/atom-in-the-universe/fanfics-10k-50k)
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  The dataset [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat) combines similar examples of the GPT-4 subset of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) to form longer conversations
73
  to improve long-context training.
@@ -105,15 +117,6 @@ llama2_13b_orca_8k:
105
  type: linear
106
  scale: 2
107
  datasets:
108
- # Dataset Composition:
109
- # Tain (sampled):
110
- # orca-chat: 100.00% (188842)
111
- # fanfics: 100.00% (47760)
112
- # red_pajama: 25.00% (188262)
113
- # Valid:
114
- # orca-chat: 5000 (71.43%)
115
- # fanfics: 1000 (14.29%)
116
- # red_pajama: 1000 (14.29%)
117
  - orca-chat:
118
  max_val_set: 5000
119
  - fanfics:
 
68
  - [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat)
69
  - [togethercomputer/RedPajama-Data-1T-Sample](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T)
70
  - [atom-in-the-universe/fanfics-10k-50k](https://huggingface.co/datasets/atom-in-the-universe/fanfics-10k-50k)
71
+
72
+ ```
73
+ Dataset Composition:
74
+ Tain (sampled):
75
+ orca-chat: 188842 (100%)
76
+ fanfics: 47760 (100%)
77
+ red_pajama: 188262 (25%)
78
+ Valid:
79
+ orca-chat: 5000
80
+ fanfics: 1000
81
+ red_pajama: 1000
82
+ ```
83
 
84
  The dataset [shahules786/orca-chat](https://huggingface.co/datasets/shahules786/orca-chat) combines similar examples of the GPT-4 subset of [ehartford/dolphin](https://huggingface.co/datasets/ehartford/dolphin) to form longer conversations
85
  to improve long-context training.
 
117
  type: linear
118
  scale: 2
119
  datasets:
 
 
 
 
 
 
 
 
 
120
  - orca-chat:
121
  max_val_set: 5000
122
  - fanfics: