Pedro Ortiz Suarez

pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Recent Activity

published a dataset 8 days ago
scilons/texts-full
updated a Space 15 days ago
oscar-corpus/README
liked a dataset 7 months ago
oscar-corpus/community-oscar
View all activity

Organizations

ALMAnaCH (Inria)'s profile picture BigScience Workshop's profile picture OSCAR's profile picture BigScience Catalogue Data's profile picture Scilons Project's profile picture BigScience Data's profile picture Web Data Commons's profile picture Speech and Language Technology, DFKI's profile picture Just some testing..'s profile picture Common Crawl Foundation's profile picture Occiglot's profile picture

pjox's activity

New activity in commoncrawl/statistics 9 months ago
New activity in oscar-corpus/OSCAR-2301 about 1 year ago
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2201 over 1 year ago

Unsafe Files

20
#12 opened almost 2 years ago by
GetzPro
New activity in oscar-corpus/OSCAR-2301 over 1 year ago
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

Changing into Parquet

2
#5 opened over 1 year ago by
hac541309
New activity in pjox/dalembert almost 2 years ago
New activity in oscar-corpus/OSCAR-2301 almost 2 years ago

Deduplicated English Corpus

2
#3 opened almost 2 years ago by
conceptofmind

Data hosting on Huggingface

1
#2 opened about 2 years ago by
hieuhocnlp
New activity in oscar-corpus/OSCAR-2301 about 2 years ago

How to download only one language?

2
#1 opened about 2 years ago by
musabg
New activity in oscar-corpus/OSCAR-2201 about 2 years ago