Pedro Ortiz Suarez

pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Organizations

pjox's activity

New activity in commoncrawl/statistics 28 days ago
New activity in oscar-corpus/OSCAR-2301 7 months ago
New activity in oscar-corpus/colossal-oscar-1.0 11 months ago

Change foldernames

4
#3 opened 12 months ago by hac541309
New activity in oscar-corpus/OSCAR-2201 12 months ago

Unsafe Files

20
#12 opened about 1 year ago by GetzPro
New activity in oscar-corpus/OSCAR-2301 12 months ago

About the number of documents

6
#6 opened 12 months ago by lixin4ever
New activity in oscar-corpus/colossal-oscar-1.0 12 months ago
New activity in oscar-corpus/OSCAR-2301 about 1 year ago

Changing into Parquet

2
#5 opened about 1 year ago by hac541309
New activity in pjox/dalembert about 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Deduplicated English Corpus

2
#3 opened over 1 year ago by conceptofmind

Data hosting on Huggingface

1
#2 opened over 1 year ago by hieuhocnlp

How to download only one language?

2
#1 opened over 1 year ago by musabg
New activity in oscar-corpus/OSCAR-2201 over 1 year ago