m3hrdadfi commited on
Commit
a90e731
1 Parent(s): 74e88fc

Add normalization steps

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -34,7 +34,11 @@ python create_config.py --name_or_path gpt2-medium --params '{"vocab_size": 4200
34
  Steps:
35
 
36
  - [ ] Remove stretched words such as ســــــــــلام
 
37
  - [ ] Remove links, user-mentioning (such as @jane_doe)
38
- - [ ] Remove Telegram, Instagram advertisements, or posts (whole record)
 
 
39
  - [ ] Remove advertisement records
40
- - [ ] Remove separated words (or the whole record) which are showed up as an individual record, while they are just the tags at the end of the post (such as بلاب ... بلاب ... ورزشی، خبری، سیاسی، اجتماعی، خانوده)
 
 
34
  Steps:
35
 
36
  - [ ] Remove stretched words such as ســــــــــلام
37
+
38
  - [ ] Remove links, user-mentioning (such as @jane_doe)
39
+
40
+ - [ ] Remove Telegram, Instagram advertisements, or posts (a whole record)
41
+
42
  - [ ] Remove advertisement records
43
+
44
+ - [ ] Remove separated words (or the whole record) which are showing up as an individual record, while they are just the tags at the end of the post (such as بلاب ... بلاب ... ورزشی، خبری، سیاسی، اجتماعی، خانوده)