loubnabnl HF staff commited on
Commit
7678306
1 Parent(s): f81388d
Files changed (1) hide show
  1. datasets/codegen.txt +1 -1
datasets/codegen.txt CHANGED
@@ -11,7 +11,7 @@ The second and third datasets used the following preprocessing:
11
  - Exact match deduplication
12
  - Average line length < 100 tokens
13
  - Maximum line length < 1000 MB
14
- - >90% of the characters being decimal or hexadecimal digits
15
 
16
  **Remark**:
17
  The reported data sizes are after preprocessing.
 
11
  - Exact match deduplication
12
  - Average line length < 100 tokens
13
  - Maximum line length < 1000 MB
14
+ - Characters being decimal or hexadecimal digits >90%
15
 
16
  **Remark**:
17
  The reported data sizes are after preprocessing.