Commit History

Merge branch 'main' of https://huggingface.co/spaces/huggingface/text-data-filtering
f6058aa

HugoLaurencon commited on

back to before portuguese
091dbe4

HugoLaurencon commited on

update visu for Portuguese
2b811ac

HugoLaurencon commited on

add register information
061d2e4

HugoLaurencon commited on

new filter on word repetition ratio
4809033

HugoLaurencon commited on

visualization: small step for the slider on flagged words ratio
fa81556

HugoLaurencon commited on

visualization: choose between several languages
0610f9d

HugoLaurencon commited on

distributions for the filters on words and discarded words by filter
da13b29

HugoLaurencon commited on

visualization: upload our own stop words and flagged words list
5d56c36

HugoLaurencon commited on

everything in expanders
2c2527f

HugoLaurencon commited on

display distributions in sidebar and filtering parameters in expanders
5d485e5

HugoLaurencon commited on

rename badwords to flagged words + new flagged words list of 68 words
f217a73

HugoLaurencon commited on

button to download parameters
bfbcd60

HugoLaurencon commited on

fix division by 0 in compute_special_characters_ratio
b607b76

HugoLaurencon commited on

new tool to analyse our own doc
6f25c5c

HugoLaurencon commited on

filter on repetition removal
693f997

HugoLaurencon commited on

Delete en_examples_with_stats_no_small_docs.json
58d483d

HugoLaurencon commited on

Delete en_examples_with_stats_ldnoob.json
b190ef8

HugoLaurencon commited on

Delete en_examples_with_stats.json
0376199

HugoLaurencon commited on

remove zipf's law and update of the doc
3fd19c1

HugoLaurencon commited on

visu with discarded documents by filter
14574d7

HugoLaurencon commited on

faster visu (less documents)
07c617e

HugoLaurencon commited on

Create LICENSE
eabb5f9

Teven Le Scao commited on

typo
5d392e6

teven commited on

better description, flagged words
c8f45af

teven commited on

adding counts for docs
96e0b3b

teven commited on

removed prints
5628a45

teven commited on

add ldnoob
ddfdd4f

teven commited on

update app
a446a8b

teven commited on