reaperdoesntknow commited on
Commit
f47e179
·
verified ·
1 Parent(s): 66e9d41

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md CHANGED
@@ -6,6 +6,7 @@ datasets:
6
  - QingyiSi/Alpaca-CoT
7
  - HuggingFaceH4/MATH-500
8
  - zai-org/LongWriter-6k
 
9
  language:
10
  - en
11
  pipeline_tag: text-generation
@@ -66,6 +67,7 @@ Not intended: safety-critical use, heavy factual QA at web scale, or domains req
66
  - QingyiSi/Alpaca-CoT ~128K Tokens [2, 1024], [1, 2048] [4, 512]
67
  - HuggingFaceH4/MATH-500 ~256k Tokens, [8, 256] [4, 512]
68
  - zai-org/LongWriter-6k ~128k Tokens [2, 1024] [1, 2048]
 
69
 
70
  Training used modest token budgets (hundreds of thousands). Reported training logs showed healthy loss descent on both 512 and 1024 sequence lengths on CPU runs. Exact metrics will vary with tokenizer, preprocessing, and optimizer settings.
71
 
 
6
  - QingyiSi/Alpaca-CoT
7
  - HuggingFaceH4/MATH-500
8
  - zai-org/LongWriter-6k
9
+ - m-a-p/DeepWriting-20K
10
  language:
11
  - en
12
  pipeline_tag: text-generation
 
67
  - QingyiSi/Alpaca-CoT ~128K Tokens [2, 1024], [1, 2048] [4, 512]
68
  - HuggingFaceH4/MATH-500 ~256k Tokens, [8, 256] [4, 512]
69
  - zai-org/LongWriter-6k ~128k Tokens [2, 1024] [1, 2048]
70
+ - SFT: prithivMLmods/Deepthink-Reasoning [8, 256] ~ Final Loss 0.3200/ Total Tokens 128512.0
71
 
72
  Training used modest token budgets (hundreds of thousands). Reported training logs showed healthy loss descent on both 512 and 1024 sequence lengths on CPU runs. Exact metrics will vary with tokenizer, preprocessing, and optimizer settings.
73