Pedro Ortiz Suarez

pjox

AI & ML interests

Language modeling, parsing, sequence tagging, NER, historical languages.

Recent Activity

published a dataset about 1 month ago
scilons/texts-full
updated a Space about 2 months ago
oscar-corpus/README
liked a dataset 8 months ago
oscar-corpus/community-oscar
View all activity

Organizations

ALMAnaCH (Inria)'s profile picture BigScience Workshop's profile picture OSCAR's profile picture BigScience Catalogue Data's profile picture Scilons Project's profile picture BigScience Data's profile picture Web Data Commons's profile picture Speech and Language Technology, DFKI's profile picture Just some testing..'s profile picture Common Crawl Foundation's profile picture Occiglot's profile picture

pjox's activity

New activity in commoncrawl/statistics 10 months ago
New activity in oscar-corpus/OSCAR-2301 over 1 year ago
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2201 over 1 year ago

Unsafe Files

20
#12 opened almost 2 years ago by
GetzPro
New activity in oscar-corpus/OSCAR-2301 over 1 year ago

About the number of documents

6
#6 opened almost 2 years ago by
lixin4ever
New activity in oscar-corpus/colossal-oscar-1.0 over 1 year ago
New activity in oscar-corpus/OSCAR-2301 almost 2 years ago

Changing into Parquet

1
2
#5 opened almost 2 years ago by
hac541309
New activity in pjox/dalembert almost 2 years ago
New activity in oscar-corpus/OSCAR-2301 about 2 years ago

Data hosting on Huggingface

1
#2 opened about 2 years ago by
hieuhocnlp

How to download only one language?

2
#1 opened about 2 years ago by
musabg
New activity in oscar-corpus/OSCAR-2201 about 2 years ago