hooking-dev commited on
Commit
95cb6ff
1 Parent(s): ff85c3e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -73,7 +73,7 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
73
 
74
  ### Training Data
75
 
76
- The model was trained on the OSCAR Hebrew dataset, a large-scale, open corpus consisting of diverse text collected from the web, reflecting common usage of Hebrew in various contexts.
77
 
78
  ### Training Procedure
79
 
@@ -116,3 +116,21 @@ If you use this model in your research, please cite it as follows:
116
  year={2024},
117
  url={https://huggingface.co/hooking-dev/Hebrew_v1.0}
118
  }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
73
 
74
  ### Training Data
75
 
76
+ The model was trained on the OSCAR Hebrew dataset, a large-scale, open corpus consisting of diverse text collected from the web, reflecting common usage of Hebrew in various contexts. For more details on the dataset, see the citations related to OSCAR below.
77
 
78
  ### Training Procedure
79
 
 
116
  year={2024},
117
  url={https://huggingface.co/hooking-dev/Hebrew_v1.0}
118
  }
119
+
120
+ @article{2022arXiv221210440J,
121
+ author = {{Jansen}, Tim and {Tong}, Yangling and {Zevallos}, Victoria and {Ortiz Suarez}, Pedro},
122
+ title = "{Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data}",
123
+ journal = {arXiv e-prints},
124
+ year = 2022,
125
+ month = dec,
126
+ eid = {arXiv:2212.10440},
127
+ pages = {arXiv:2212.10440},
128
+ doi = {10.48550/arXiv.2212.10440},
129
+ archivePrefix = {arXiv},
130
+ eprint = {2212.10440},
131
+ primaryClass = {cs.CL},
132
+ adsurl = {https://ui.adsabs.harvard.edu/abs/2022arXiv221210440J},
133
+ adsnote = {Provided by the SAO/NASA Astrophysics Data System}
134
+ }
135
+
136
+ }