Pedro Ortiz Suarez
pjox
AI & ML interests
Language modeling, parsing, sequence tagging, NER, historical languages.
Organizations
pjox's activity
Set `sep="\s+"` for the duplicates file
2
#1 opened 7 months ago
by
lhoestq
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1594214747713-5e9ecfc04957053f60648a3e.png)
Porn-related strings in the datasets (zh)
2
#8 opened about 1 year ago
by
kiwakwok
colab crashed after trying to load the dataset
1
#4 opened over 1 year ago
by
MhondGhod
Change foldernames
4
#3 opened over 1 year ago
by
hac541309
Unsafe Files
20
#12 opened almost 2 years ago
by
GetzPro
![](https://cdn-avatars.huggingface.co/v1/production/uploads/644331389788699939b42206/5KMdZuyUoscJ1iOUmEGJf.png)
About the number of documents
6
#6 opened over 1 year ago
by
lixin4ever
Upload the rest of the data for 05-06-23
#1 opened over 1 year ago
by
pjox
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1632322117476-602705869cad998710e2e8ab.jpeg)
Changing into Parquet
2
#5 opened over 1 year ago
by
hac541309
the link to RoBERTa base model directs us to bert-base-uncased
1
#1 opened almost 2 years ago
by
hurrial
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1657356215739-61eac837f5dbc066be45e586.jpeg)
Deduplicated English Corpus
2
#3 opened almost 2 years ago
by
conceptofmind
![](https://cdn-avatars.huggingface.co/v1/production/uploads/6276ba3c2d26ac639e5a2b01/k7LHkSbNjPR31ma4EereF.png)
Data hosting on Huggingface
1
#2 opened almost 2 years ago
by
hieuhocnlp
![](https://cdn-avatars.huggingface.co/v1/production/uploads/1669750003541-noauth.png)
How to download only one language?
2
#1 opened almost 2 years ago
by
musabg
full of sexy content and does't have 200G in zh corpus
1
#10 opened almost 2 years ago
by
Hzhiqiang