Pedro Ortiz Suarez

pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Organizations

ALMAnaCH (Inria)'s profile picture BigScience Workshop's profile picture SciCorpus's profile picture OSCAR's profile picture BigScience Catalogue Data's profile picture Scilons Project's profile picture BigScience Data's profile picture Web Data Commons's profile picture Speech and Language Technology, DFKI's profile picture Just some testing..'s profile picture Common Crawl Foundation's profile picture Occiglot's profile picture

pjox's activity

New activity in commoncrawl/statistics 6 months ago
New activity in oscar-corpus/OSCAR-2301 11 months ago
New activity in oscar-corpus/colossal-oscar-1.0 about 1 year ago
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago

Change foldernames

4
#3 opened over 1 year ago by hac541309
New activity in oscar-corpus/OSCAR-2201 over 1 year ago

Unsafe Files

20
#12 opened over 1 year ago by GetzPro
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

About the number of documents

6
#6 opened over 1 year ago by lixin4ever
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Changing into Parquet

2
#5 opened over 1 year ago by hac541309
New activity in pjox/dalembert over 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Deduplicated English Corpus

2
#3 opened over 1 year ago by conceptofmind

Data hosting on Huggingface

1
#2 opened over 1 year ago by hieuhocnlp

How to download only one language?

2
#1 opened over 1 year ago by musabg
New activity in oscar-corpus/OSCAR-2201 over 1 year ago