2023-04-11 03:13:38,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-11 03:13:38,182:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:270 Working with dataset: 2023-04-11 03:13:38,182:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:271 Dataset({ features: ['text', 'user_id', 'subforum_id', 'num_contexts', 'label'], num_rows: 10944 }) 2023-04-11 03:13:38,329:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:266 Saving dataset to disk 2023-04-11 03:13:38,333:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-11 03:13:38,339:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-11 03:13:38,339:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-11 03:13:39,164:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-04-11 03:13:39,164:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 As of March 13th , 2014 , the booklet had been... [as, of, march, 13th, 2014, the, booklet, had,... 1 In order to help increase the booklets downloa... [in, order, to, help, increase, the, booklets,... 2 ( Simply copy and paste the following text int... [simply, copy, and, paste, the, following, tex... 3 Click below for a FREE download of a colorfull... [click, below, for, a, free, download, of, a, ... 4 Click on the `` DOWNLOAD ( 7.42 MB ) '' green ... [click, on, the, download, 7, 42, mb, green, b... ... ... ... 10939 Billy - `` That guy would n't leave me alone ,... [billy, that, guy, would, n, t, leave, me, alo... 10940 Wish we at least had a Marine Le Pen to vote f... [wish, we, at, least, had, a, marine, le, pen,... 10941 Its like the choices are white genocide candid... [its, like, the, choices, are, white, genocide... 10942 Why White people used to say that sex was a si... [why, white, people, used, to, say, that, sex,... 10943 Now I get it ! [now, i, get, it] [10944 rows x 2 columns] 2023-04-11 03:13:39,184:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-04-11 03:13:39,315:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-04-11 03:13:39,315:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-04-11 03:13:39,490:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-04-11 03:13:39,510:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-04-11 03:13:39,528:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-04-11 03:13:39,547:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-04-11 03:13:39,567:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-04-11 03:13:39,583:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-04-11 03:13:39,603:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-04-11 03:13:39,623:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-04-11 03:13:39,645:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-04-11 03:13:39,670:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-04-11 03:13:39,691:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-04-11 03:13:39,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-04-11 03:13:39,730:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-04-11 03:13:39,748:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-04-11 03:13:39,764:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-04-11 03:13:39,780:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-04-11 03:13:39,798:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-04-11 03:13:39,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-04-11 03:13:39,833:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-04-11 03:13:39,849:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-04-11 03:13:40,556:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-04-11 03:13:40,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-04-11 03:13:40,698:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-11 03:13:40,698:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-11 03:13:40,707:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-11 03:13:40,708:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-11 03:13:42,720:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-11 03:13:42,720:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057} 2023-04-11 03:13:42,720:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:316 Preparing general stats 2023-04-11 03:13:45,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-11 03:17:44,529:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-11 03:17:44,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-11 03:17:44,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-11 03:17:44,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-11 03:17:44,534:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-11 03:17:44,534:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-11 03:17:44,582:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-11 03:17:44,629:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-11 03:17:44,629:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-11 03:17:44,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-11 03:17:44,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-11 03:17:44,666:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-11 03:17:44,666:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-11 03:17:44,668:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-11 03:17:45,253:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:15:47,526:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:15:47,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:15:47,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:15:47,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:15:47,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:15:47,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:15:47,585:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:15:47,653:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:15:47,653:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:15:47,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:15:47,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:15:47,693:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:15:47,693:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:15:47,695:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:15:48,316:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:16:56,531:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:16:56,537:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:16:56,538:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:16:56,541:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:16:56,542:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:16:56,542:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:16:56,624:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:16:56,673:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:16:56,673:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:16:56,686:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:16:56,687:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:16:56,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:16:56,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:16:56,715:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:16:57,275:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:18:10,514:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:18:10,527:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:18:10,527:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:18:10,531:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:18:10,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:18:10,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:18:10,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:18:10,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:18:10,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:18:10,677:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:18:10,677:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:18:10,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:18:10,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:18:10,700:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:18:11,260:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:19:16,383:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:19:16,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:19:16,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:19:16,389:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:19:16,391:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:19:16,395:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:19:16,449:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:19:16,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:19:16,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:19:16,502:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:19:16,503:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:19:16,524:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:19:16,525:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:19:16,525:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:19:17,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:20:03,970:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:20:03,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:20:03,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:20:03,978:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:20:03,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:20:03,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:20:04,049:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:20:04,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:20:04,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:20:04,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:20:04,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:20:04,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:20:04,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:20:04,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:20:04,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:21:16,863:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:21:16,868:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:21:16,868:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:21:16,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:21:16,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:21:16,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:21:16,941:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:21:16,985:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:21:16,985:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:21:16,994:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:21:16,995:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:21:17,014:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:21:17,014:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:21:17,015:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:21:17,561:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:30:13,607:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:30:13,612:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:30:13,613:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:30:13,615:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:30:13,617:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:30:13,617:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:30:13,670:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:30:13,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:30:13,714:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:30:13,722:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:30:13,722:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:30:13,742:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:30:13,742:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:30:13,742:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:30:14,333:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 19:55:58,016:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:55:58,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:55:58,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:55:58,141:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 19:55:58,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 19:55:58,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 19:55:58,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 19:55:58,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 19:55:58,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 19:55:58,236:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 19:55:58,236:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 19:55:58,256:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-04-13 19:55:58,256:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-04-13 19:55:58,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-04-13 19:55:58,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-04-13 20:12:33,604:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:12:33,609:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:12:33,609:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:13:27,174:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:13:27,179:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:13:27,179:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:15:37,511:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:15:37,517:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:15:37,517:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:30:43,823:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:30:43,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:30:43,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:32:05,456:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:32:05,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:32:05,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:34:19,725:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:34:19,730:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:34:19,731:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:34:19,734:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:34:19,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:34:19,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:34:19,792:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:34:19,838:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:34:19,838:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:34:19,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:34:19,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:47:39,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:47:39,313:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:47:39,314:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:47:39,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:47:39,325:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:47:39,325:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:47:39,505:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:47:39,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:47:39,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:47:39,621:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:47:39,621:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:50:43,811:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:50:43,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:50:43,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:50:43,817:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:50:43,818:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:50:43,819:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:50:43,900:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:50:44,005:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:50:44,006:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:50:44,017:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:50:44,018:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:51:35,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:51:35,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:51:35,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:51:35,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:51:35,715:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:51:35,715:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:51:35,770:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:51:35,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:51:35,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:51:35,823:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:51:35,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:53:09,603:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:53:09,607:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:53:09,607:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:53:09,608:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:53:09,609:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:53:09,610:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:53:09,689:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:53:09,739:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:53:09,739:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:53:09,749:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:53:09,749:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:56:50,420:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:56:50,424:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:56:50,424:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:56:50,426:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:56:50,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:56:50,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:56:50,479:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:56:50,563:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:56:50,563:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:56:50,572:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:56:50,572:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 20:58:39,497:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:58:39,501:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:58:39,501:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:58:39,502:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 20:58:39,504:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 20:58:39,504:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 20:58:39,561:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 20:58:39,612:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 20:58:39,612:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 20:58:39,623:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 20:58:39,623:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:00:19,786:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:00:19,791:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:00:19,791:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:00:19,792:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:00:19,794:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:00:19,794:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:00:19,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:00:19,902:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:00:19,903:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:00:19,922:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:00:19,922:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:02:07,228:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:02:07,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:02:07,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:02:07,236:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:02:07,237:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:02:07,237:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:02:07,313:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:02:07,369:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:02:07,370:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:02:07,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:02:07,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:07:28,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:07:28,517:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:07:28,517:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:07:28,518:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:07:28,519:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:07:28,519:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:07:28,589:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:07:28,634:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:07:28,635:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:07:28,645:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:07:28,645:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:26:03,041:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:26:03,047:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:26:03,047:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:26:03,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:26:03,050:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:26:03,050:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:26:03,102:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:26:03,149:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:26:03,149:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:26:03,161:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:26:03,161:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:26:04,025:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:26:04,026:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:26:04,026:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:26:04,027:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:26:04,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:26:04,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:26:04,090:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:26:04,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:26:04,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:26:04,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:26:04,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:48:07,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:48:07,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:48:07,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:48:07,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:48:07,881:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:48:07,881:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:48:07,953:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 21:48:08,003:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 21:48:08,003:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 21:48:08,014:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 21:48:08,014:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-13 21:56:44,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:56:44,536:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:56:44,536:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 21:56:46,103:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 21:56:46,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 21:56:46,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 22:04:06,280:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 22:04:06,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 22:04:06,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 22:04:06,286:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-13 22:04:06,288:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-13 22:04:06,288:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-13 22:04:06,510:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-13 22:04:06,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-13 22:04:06,555:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-13 22:04:06,565:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-13 22:04:06,565:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 01:09:03,965:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 01:09:03,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 01:09:03,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 01:09:03,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 01:09:03,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 01:09:03,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 01:09:04,080:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 01:09:04,152:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 01:09:04,152:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 01:09:04,169:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 01:09:04,170:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 04:18:25,479:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 04:18:25,483:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 04:18:25,483:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 04:18:25,485:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 04:18:25,491:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 04:18:25,491:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 04:18:25,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 04:18:25,718:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 04:18:25,718:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 04:18:25,732:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 04:18:25,733:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 15:37:58,866:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 15:37:58,870:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 15:37:58,870:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 15:41:24,898:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 15:41:24,903:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 15:41:24,904:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 15:41:24,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 15:41:24,907:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 15:41:24,907:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 15:41:24,963:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 15:41:25,015:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 15:41:25,015:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 15:41:25,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 15:41:25,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 15:49:25,283:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 15:49:25,288:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 15:49:25,288:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 15:49:25,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 15:49:25,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 15:49:25,292:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 15:49:25,355:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 15:49:25,394:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 15:49:25,394:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 15:49:25,403:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 15:49:25,403:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 16:17:26,562:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:17:26,567:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:17:26,567:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:17:26,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:17:26,570:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:17:26,570:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:17:26,630:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 16:17:26,690:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 16:17:26,690:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 16:17:26,702:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 16:17:26,702:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 16:24:14,173:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:24:14,184:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:24:14,185:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:24:14,187:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:24:14,188:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:24:14,188:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:24:14,243:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 16:24:14,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 16:24:14,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 16:24:14,303:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 16:24:14,303:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 16:26:05,175:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:26:05,178:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:26:05,179:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:26:05,179:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:26:05,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:26:05,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:26:05,226:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 16:26:05,268:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 16:26:05,268:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 16:26:05,276:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 16:26:05,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-25 16:30:26,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:30:26,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:30:26,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:30:26,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-25 16:30:26,145:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-25 16:30:26,145:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-25 16:30:26,201:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-25 16:30:26,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-25 16:30:26,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-25 16:30:26,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-25 16:30:26,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-28 11:39:29,990:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-28 11:39:29,995:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-28 11:39:29,995:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-28 11:39:29,997:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-28 11:39:29,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-28 11:39:29,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-28 11:39:30,066:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-28 11:39:30,126:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-28 11:39:30,126:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-28 11:39:30,140:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-28 11:39:30,140:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-29 00:12:49,160:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 00:12:49,164:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 00:12:49,164:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 00:12:49,165:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 00:12:49,167:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 00:12:49,167:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 00:12:49,278:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-29 00:12:49,355:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-29 00:12:49,356:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-29 00:12:49,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-29 00:12:49,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-29 01:12:49,923:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 01:12:49,926:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 01:12:49,927:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 01:12:49,929:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 01:12:49,931:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 01:12:49,931:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 01:12:50,071:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-29 01:12:50,169:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-29 01:12:50,169:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-29 01:12:50,198:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-29 01:12:50,198:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-04-29 01:31:20,244:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 01:31:20,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 01:31:20,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 01:31:20,249:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-04-29 01:31:20,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-04-29 01:31:20,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-04-29 01:31:20,386:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-04-29 01:31:20,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-04-29 01:31:20,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-04-29 01:31:20,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-04-29 01:31:20,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:29:33,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:29:33,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:29:33,821:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:29:33,821:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:29:33,823:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:29:33,823:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:29:33,894:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:29:33,965:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:29:33,965:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:29:33,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:29:33,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:29:34,017:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-05-09 01:29:34,017:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-05-09 01:29:34,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-05-09 01:29:34,778:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-05-09 01:32:23,842:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:32:23,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:32:23,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:32:23,851:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:32:23,852:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:32:23,852:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:32:23,909:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:32:23,958:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:32:23,958:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:32:23,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:32:23,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:36:13,176:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:36:13,180:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:36:13,180:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:36:13,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:36:13,183:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:36:13,183:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:36:13,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:36:13,274:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:36:13,274:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:36:13,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:36:13,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:38:17,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:38:17,984:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:38:17,984:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:38:17,985:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:38:17,987:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:38:17,987:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:38:18,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:38:18,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:38:18,076:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:38:18,084:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:38:18,084:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:46:26,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:46:26,884:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:46:26,885:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:46:26,885:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:46:26,887:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:46:26,887:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:46:26,932:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:46:26,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:46:26,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:46:26,989:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:46:26,989:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:49:12,813:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:49:12,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:49:12,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:49:12,817:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:49:12,818:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:49:12,818:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:49:12,855:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:49:12,898:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:49:12,899:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:49:12,909:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:49:12,909:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 01:52:10,786:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:52:10,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:52:10,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:52:10,789:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 01:52:10,791:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 01:52:10,791:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 01:52:10,831:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 01:52:10,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 01:52:10,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 01:52:10,884:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 01:52:10,884:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:02:27,114:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:02:27,118:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:02:27,118:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:02:27,119:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:02:27,121:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:02:27,121:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:02:27,168:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:02:27,218:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:02:27,218:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:02:27,228:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:02:27,228:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:05:47,424:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:05:47,427:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:05:47,427:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:05:47,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:05:47,429:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:05:47,429:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:05:47,491:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:05:47,535:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:05:47,536:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:05:47,556:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:05:47,556:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:08:13,808:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:08:13,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:08:13,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:08:13,813:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:08:13,814:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:08:13,814:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:08:13,855:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:08:13,899:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:08:13,899:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:08:13,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:08:13,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:10:21,138:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:10:21,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:10:21,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:10:21,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:10:21,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:10:21,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:10:21,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:10:21,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:10:21,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:10:21,239:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:10:21,239:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:15:54,391:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:15:54,396:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:15:54,396:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:15:54,397:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:15:54,399:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:15:54,399:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:15:54,444:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:15:54,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:15:54,489:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:15:54,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:15:54,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:17:54,385:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:17:54,389:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:17:54,389:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:17:54,390:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:17:54,391:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:17:54,391:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:17:54,437:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:17:54,481:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:17:54,481:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:17:54,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:17:54,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:19:45,409:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:19:45,413:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:19:45,413:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:19:45,414:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:19:45,416:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:19:45,416:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:19:45,461:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:19:45,503:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:19:45,503:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:19:45,511:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:19:45,511:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:22:12,589:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:22:12,597:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:22:12,597:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:22:12,598:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:22:12,600:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:22:12,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:22:12,646:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:22:12,693:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:22:12,693:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:22:12,702:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:22:12,702:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:25:49,948:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:25:49,953:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:25:49,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:25:49,955:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:25:49,959:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:25:49,959:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:25:50,013:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:25:50,058:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:25:50,058:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:25:50,068:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:25:50,068:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:36:11,924:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:36:11,928:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:36:11,928:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:36:11,930:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:36:11,931:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:36:11,931:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:36:11,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:36:12,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:36:12,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:36:12,043:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:36:12,044:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:40:12,900:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:40:12,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:40:12,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:40:12,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:40:12,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:40:12,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:40:12,959:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:40:13,003:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:40:13,003:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:40:13,011:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:40:13,011:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:42:32,526:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:42:32,531:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:42:32,531:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:42:32,532:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:42:32,538:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:42:32,539:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:42:32,600:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:42:32,649:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:42:32,650:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:42:32,658:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:42:32,659:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:44:32,994:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:44:32,998:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:44:32,998:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:44:32,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:44:33,000:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:44:33,001:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:44:33,043:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:44:33,086:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:44:33,087:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:44:33,096:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:44:33,097:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:47:52,376:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:47:52,380:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:47:52,380:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:47:52,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:47:52,383:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:47:52,383:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:47:52,432:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:47:52,476:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:47:52,476:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:47:52,487:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:47:52,487:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 02:56:36,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:56:36,025:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:56:36,025:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:56:36,027:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 02:56:36,028:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 02:56:36,028:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 02:56:36,085:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 02:56:36,138:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 02:56:36,139:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 02:56:36,151:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 02:56:36,151:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:02:54,453:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:02:54,458:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:02:54,459:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:02:54,460:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:02:54,462:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:02:54,462:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:02:54,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:02:54,559:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:02:54,559:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:02:54,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:02:54,568:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:04:39,530:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:04:39,534:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:04:39,534:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:04:39,535:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:04:39,537:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:04:39,537:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:04:39,578:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:04:39,628:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:04:39,629:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:04:39,637:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:04:39,637:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:06:15,107:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:06:15,112:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:06:15,112:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:06:15,114:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:06:15,116:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:06:15,116:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:06:15,163:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:06:15,211:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:06:15,211:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:06:15,220:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:06:15,220:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:07:51,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:07:51,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:07:51,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:07:51,753:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:07:51,755:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:07:51,755:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:07:51,798:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:07:51,842:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:07:51,842:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:07:51,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:07:51,850:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:09:56,856:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:09:56,860:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:09:56,860:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:09:56,861:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:09:56,863:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:09:56,863:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:09:56,910:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:09:56,958:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:09:56,958:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:09:56,967:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:09:56,967:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:11:52,422:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:11:52,427:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:11:52,427:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:11:52,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:11:52,430:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:11:52,430:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:11:52,478:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:11:52,521:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:11:52,534:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:11:52,556:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:11:52,572:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:12:42,342:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:12:42,346:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:12:42,346:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:12:42,347:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-05-09 03:12:42,349:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-05-09 03:12:42,349:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:12:42,391:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-05-09 03:12:42,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-05-09 03:12:42,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:12:42,443:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-05-09 03:12:42,443:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:35:18,408:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:35:18,412:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:35:18,412:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:35:18,412:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:35:18,414:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:35:18,414:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:35:18,636:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 03:35:18,684:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 03:35:18,684:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:35:18,698:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 03:35:18,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:35:18,729:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-05-09 03:35:18,729:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-05-09 03:35:18,729:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-05-09 03:43:53,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:43:53,079:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:43:53,079:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:43:53,080:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:43:53,082:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:43:53,082:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:43:53,243:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 03:43:53,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 03:43:53,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:43:53,297:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 03:43:53,297:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:43:53,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-05-09 03:43:53,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-05-09 03:43:53,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-05-09 03:48:39,078:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:48:39,080:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:48:39,080:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:48:39,081:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:48:39,082:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:48:39,082:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:48:39,158:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 03:48:39,210:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 03:48:39,210:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:48:39,223:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 03:48:39,223:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:48:39,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-05-09 03:48:39,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-05-09 03:48:39,249:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-05-09 03:51:15,632:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:51:15,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:51:15,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:51:15,640:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:51:15,642:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:51:15,642:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:51:15,995:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 03:51:16,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 03:51:16,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:51:16,049:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 03:51:16,049:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 03:54:18,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:54:18,037:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:54:18,037:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:54:18,038:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 03:54:18,047:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 03:54:18,047:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 03:54:18,342:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 03:54:18,387:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 03:54:18,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 03:54:18,398:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 03:54:18,398:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 04:24:12,477:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:24:12,480:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:24:12,480:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:24:12,481:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:24:12,482:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:24:12,482:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:24:12,574:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 04:24:12,625:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 04:24:12,625:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 04:24:12,635:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 04:24:12,635:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 04:24:21,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:24:21,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:24:21,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:24:21,873:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:24:21,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:24:21,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:24:21,928:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 04:24:21,971:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 04:24:21,971:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 04:24:21,979:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 04:24:21,979:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 04:29:50,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:29:50,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:29:50,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:29:50,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:29:50,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:29:50,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:29:50,565:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 04:29:50,607:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 04:29:50,608:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 04:29:50,615:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 04:29:50,615:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-09 04:43:39,502:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:43:39,506:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:43:39,506:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:43:39,507:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-09 04:43:39,510:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-09 04:43:39,510:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-09 04:43:39,682:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-09 04:43:39,727:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-09 04:43:39,728:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-09 04:43:39,737:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-09 04:43:39,737:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-21 21:27:38,848:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:27:38,851:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:27:38,851:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:27:38,852:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:27:38,854:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:27:38,854:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:27:39,017:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-21 21:27:39,064:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-21 21:27:39,064:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-21 21:27:39,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-21 21:27:39,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-21 21:27:39,100:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-05-21 21:27:39,100:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-05-21 21:27:39,101:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-05-21 21:30:03,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:30:03,047:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:30:03,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:30:03,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:30:03,054:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:30:03,055:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:30:03,455:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-21 21:30:03,498:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-21 21:30:03,498:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-21 21:30:03,507:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-21 21:30:03,507:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-05-21 21:40:50,193:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:40:50,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:40:50,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:40:50,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-05-21 21:40:50,198:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-05-21 21:40:50,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-05-21 21:40:50,269:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-05-21 21:40:50,326:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-05-21 21:40:50,327:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-05-21 21:40:50,342:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-05-21 21:40:50,342:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 02:50:40,350:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:50:40,354:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:50:40,355:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:50:40,355:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:50:40,361:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:50:40,361:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:50:40,410:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 02:50:40,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 02:50:40,464:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 02:50:40,476:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 02:50:40,476:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 02:50:40,505:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-07-31 02:50:40,505:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-07-31 02:50:40,506:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-07-31 02:53:29,818:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:53:29,821:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:53:29,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:53:29,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:53:29,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:53:29,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:53:29,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 02:53:29,924:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 02:53:29,924:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 02:53:29,933:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 02:53:29,933:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 02:53:29,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-07-31 02:53:29,954:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-07-31 02:53:29,956:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-07-31 02:53:31,855:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:53:31,856:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:53:31,856:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:53:31,856:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:53:31,857:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:53:31,857:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:53:31,895:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 02:53:31,941:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 02:53:31,941:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 02:53:31,947:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 02:53:31,948:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 02:53:31,967:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-07-31 02:53:31,967:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-07-31 02:53:31,967:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-07-31 02:56:56,027:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:56:56,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:56:56,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:56:56,033:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 02:56:56,038:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 02:56:56,038:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 02:56:56,087:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 02:56:56,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 02:56:56,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 02:56:56,159:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 02:56:56,159:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 02:56:56,184:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-07-31 02:56:56,185:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-07-31 02:56:56,185:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-07-31 03:02:52,394:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:02:52,399:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:02:52,399:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:02:52,400:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:02:52,402:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:02:52,402:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:02:52,445:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 03:02:52,485:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 03:02:52,485:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 03:02:52,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 03:02:52,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 03:02:52,512:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-07-31 03:02:52,512:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-07-31 03:02:52,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-07-31 03:09:58,696:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:09:58,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:09:58,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:09:58,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:09:58,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:09:58,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:09:58,755:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 03:09:58,809:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 03:09:58,809:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 03:09:58,828:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 03:09:58,829:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 03:23:17,557:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:23:17,561:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:23:17,561:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:23:17,562:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:23:17,564:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:23:17,564:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:23:17,618:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 03:23:17,682:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 03:23:17,682:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 03:23:17,707:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 03:23:17,707:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 03:23:56,273:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:23:56,286:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:23:56,286:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:23:56,287:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:23:56,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:23:56,290:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:23:56,325:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 03:23:56,362:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 03:23:56,363:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 03:23:56,369:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 03:23:56,369:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-07-31 03:24:00,100:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:24:00,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:24:00,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:24:00,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-07-31 03:24:00,108:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-07-31 03:24:00,108:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-07-31 03:24:00,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-07-31 03:24:00,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-07-31 03:24:00,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-07-31 03:24:00,187:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-07-31 03:24:00,187:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:27:12,305:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:27:12,323:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:27:12,323:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:27:12,324:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:27:12,331:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:27:12,331:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:27:12,410:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-14 17:27:12,467:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-14 17:27:12,467:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-14 17:27:12,484:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-14 17:27:12,484:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:27:33,556:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:27:33,558:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:27:33,558:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:27:33,559:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:27:33,560:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:27:33,560:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:27:33,618:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-14 17:27:33,660:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-14 17:27:33,660:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-14 17:27:33,670:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-14 17:27:33,670:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:28:52,710:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:28:52,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:28:52,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:28:52,713:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:28:52,714:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:28:52,715:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:28:52,763:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-14 17:28:52,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-14 17:28:52,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-14 17:28:52,826:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-14 17:28:52,826:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:28:52,867:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-14 17:28:52,867:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-14 17:28:52,870:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-14 17:33:31,972:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:33:31,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:33:31,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:33:31,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:33:31,976:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:33:31,976:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:33:32,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-14 17:33:32,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-14 17:33:32,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-14 17:33:32,258:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-14 17:33:32,258:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:33:32,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-14 17:33:32,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-14 17:33:32,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-14 17:33:40,373:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:33:40,377:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:33:40,380:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:33:40,386:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-14 17:33:40,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-14 17:33:40,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-14 17:33:40,445:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-14 17:33:40,495:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-14 17:33:40,495:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-14 17:33:40,505:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-14 17:33:40,505:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-14 17:33:40,536:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-14 17:33:40,536:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-14 17:33:40,537:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-15 14:29:34,529:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 14:29:34,533:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 14:29:34,533:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 14:29:34,533:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 14:29:34,535:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 14:29:34,535:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 14:29:34,590:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-15 14:29:34,640:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-15 14:29:34,640:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-15 14:29:34,659:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-15 14:29:34,659:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-15 14:29:34,683:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-15 14:29:34,683:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-15 14:29:34,683:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-15 17:34:45,035:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 17:34:45,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 17:34:45,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 17:34:45,039:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 17:34:45,041:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 17:34:45,041:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 17:34:45,094:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-15 17:34:45,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-15 17:34:45,143:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-15 17:34:45,159:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-15 17:34:45,159:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-15 17:34:45,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-15 17:34:45,181:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-15 17:34:45,182:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-15 17:36:45,519:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 17:36:45,524:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 17:36:45,524:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 17:36:45,525:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-15 17:36:45,528:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-15 17:36:45,529:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-15 17:36:45,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-15 17:36:45,654:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-15 17:36:45,654:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-15 17:36:45,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-15 17:36:45,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-15 17:36:45,687:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-15 17:36:45,688:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-15 17:36:45,688:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-21 19:27:41,058:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:27:41,061:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:27:41,061:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:27:41,061:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:27:41,063:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:27:41,063:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:27:41,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:27:41,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:27:41,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:27:41,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:27:41,200:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:27:41,228:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:27:41,228:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:27:41,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:27:41,796:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:28:32,224:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:28:32,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:28:32,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:28:32,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:28:32,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:28:32,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:28:32,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:28:32,324:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:28:32,324:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:28:32,334:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:28:32,335:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:28:32,361:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:28:32,361:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:28:32,361:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:28:32,913:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:31:57,548:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:31:57,552:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:31:57,552:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:31:57,552:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:31:57,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:31:57,554:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:31:57,615:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:31:57,663:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:31:57,663:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:31:57,676:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:31:57,676:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:31:57,701:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:31:57,701:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:31:57,702:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:31:58,272:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:34:59,322:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:34:59,328:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:34:59,328:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:34:59,330:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:34:59,332:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:34:59,332:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:34:59,390:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:34:59,449:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:34:59,449:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:34:59,462:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:34:59,462:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:34:59,487:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:34:59,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:34:59,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:35:00,040:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:36:13,831:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:36:13,836:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:36:13,836:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:36:13,837:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:36:13,840:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:36:13,840:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:36:13,938:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:36:14,620:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:36:14,620:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:36:14,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:36:14,699:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:36:14,798:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:36:14,799:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:36:14,806:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:36:15,761:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:37:01,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:37:01,708:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:37:01,708:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:37:01,709:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:37:01,711:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:37:01,711:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:37:01,764:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:37:01,807:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:37:01,807:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:37:01,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:37:01,816:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:37:01,833:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:37:01,833:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:37:01,834:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:37:02,385:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:39:30,070:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:39:30,073:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:39:30,073:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:39:30,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:39:30,076:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:39:30,076:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:39:30,331:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:39:30,423:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:39:30,423:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:39:30,447:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:39:30,448:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:39:30,502:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:39:30,502:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:39:30,503:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:39:31,593:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:39:32,861:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:39:32,863:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:39:32,863:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:39:32,865:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:39:32,873:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:39:32,873:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:39:32,961:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:39:33,032:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:39:33,033:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:39:33,045:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:39:33,046:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:39:33,077:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:39:33,079:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:39:33,108:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:39:33,806:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:47:56,802:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:47:56,805:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:47:56,805:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:47:56,807:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:254 Doing text dset. 2023-08-21 19:47:56,810:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:258 Loaded dataset from disk 2023-08-21 19:47:56,811:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:259 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:47:56,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-21 19:47:56,938:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-21 19:47:56,938:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:47:56,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-21 19:47:56,951:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:47:56,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 Duplicates results: 2023-08-21 19:47:56,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:305 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:47:56,978:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:313 Loading cached general stats 2023-08-21 19:47:57,544:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:348 No label field. Not computing label statistics. 2023-08-21 19:49:22,164:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:49:22,169:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:49:22,169:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:49:22,170:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:49:22,172:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:49:22,172:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:49:22,245:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-21 19:49:22,292:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-21 19:49:22,292:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:49:22,305:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-21 19:49:22,305:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:49:22,330:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-21 19:49:22,331:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:49:22,331:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-21 19:53:59,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:53:59,234:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:53:59,234:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:53:59,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:53:59,237:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:53:59,237:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:53:59,322:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-21 19:53:59,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-21 19:53:59,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:53:59,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-21 19:53:59,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:53:59,408:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-21 19:53:59,408:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:53:59,413:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-21 19:58:45,178:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:58:45,182:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:58:45,182:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:58:45,183:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 19:58:45,184:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 19:58:45,185:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 19:58:45,234:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-21 19:58:45,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-21 19:58:45,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 19:58:45,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-21 19:58:45,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 19:58:45,311:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-21 19:58:45,311:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 19:58:45,311:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-21 20:03:41,124:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 20:03:41,133:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 20:03:41,133:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 20:03:41,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-21 20:03:41,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-21 20:03:41,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-21 20:03:41,188:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:364 Reading vocab from cache 2023-08-21 20:03:41,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:379 unfiltered vocab 2023-08-21 20:03:41,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-21 20:03:41,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 filtered vocab 2023-08-21 20:03:41,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-21 20:03:41,273:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:302 Duplicates results: 2023-08-21 20:03:41,273:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-21 20:03:41,274:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:311 Loading cached general stats 2023-08-22 00:27:30,025:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 00:27:30,026:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 00:27:30,026:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 10 }) 2023-08-22 00:33:14,888:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 00:33:14,889:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 00:33:14,889:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 00:41:40,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 00:41:40,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 00:41:40,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 10000 }) 2023-08-22 02:25:51,665:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 02:25:51,666:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 02:25:51,666:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 02:28:44,343:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 02:28:44,343:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 02:28:44,343:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 02:28:44,373:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 02:28:44,477:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 02:28:44,477:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... .. ... ... 193 The ‘Theory of Everything’ actor admitted he’s... [the, theory, of, everything, actor, admitted,... 194 Mia was always my favorite. Oh, I know parents... [mia, was, always, my, favorite, oh, i, know, ... 195 ​​Hacienda, or the Spanish taxman to you and m... [hacienda, or, the, spanish, taxman, to, you, ... 196 As mentioned in the report, ESG Kullen offered... [as, mentioned, in, the, report, esg, kullen, ... 197 A professional botanist and biologist with an ... [a, professional, botanist, and, biologist, wi... [198 rows x 2 columns] 2023-08-22 02:28:44,500:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 02:28:44,521:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 02:28:44,521:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 02:28:44,566:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 02:28:44,579:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 02:28:44,589:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 02:28:44,597:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 02:28:44,607:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 02:28:44,616:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 02:28:44,625:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 02:28:44,635:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 02:28:44,645:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 02:28:44,655:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 02:28:44,665:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 02:28:44,674:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 02:28:44,684:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 02:28:44,693:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 02:28:44,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 02:28:44,712:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 02:28:44,722:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 02:28:44,731:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 02:28:44,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 02:28:44,749:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 02:31:19,131:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 02:31:19,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 02:31:19,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 02:31:19,149:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 02:31:19,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 02:31:19,257:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... .. ... ... 193 The ‘Theory of Everything’ actor admitted he’s... [the, theory, of, everything, actor, admitted,... 194 Mia was always my favorite. Oh, I know parents... [mia, was, always, my, favorite, oh, i, know, ... 195 ​​Hacienda, or the Spanish taxman to you and m... [hacienda, or, the, spanish, taxman, to, you, ... 196 As mentioned in the report, ESG Kullen offered... [as, mentioned, in, the, report, esg, kullen, ... 197 A professional botanist and biologist with an ... [a, professional, botanist, and, biologist, wi... [198 rows x 2 columns] 2023-08-22 02:31:19,280:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 02:31:19,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 02:31:19,299:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 02:31:19,340:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 02:31:19,348:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 02:31:19,356:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 02:31:19,363:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 02:31:19,371:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 02:31:19,379:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 02:31:19,388:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 02:31:19,396:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 02:31:19,405:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 02:31:19,413:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 02:31:19,421:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 02:31:19,430:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 02:31:19,438:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 02:31:19,447:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 02:31:19,455:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 02:31:19,463:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 02:31:19,471:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 02:31:19,480:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 02:31:19,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 02:31:19,499:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:07:41,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:07:41,764:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:07:41,767:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:07:41,768:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:07:41,788:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:07:41,788:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:07:41,853:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:07:41,902:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:07:41,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:07:41,933:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:07:41,952:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:07:41,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:07:41,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:07:41,974:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:14:59,488:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:14:59,493:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:14:59,493:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:14:59,494:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:14:59,496:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:14:59,496:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:14:59,542:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:14:59,594:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:14:59,594:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:14:59,608:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:14:59,608:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:14:59,686:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:14:59,686:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:14:59,688:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:22:20,665:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:22:20,667:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:22:20,668:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:22:20,669:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:22:20,671:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:22:20,672:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:22:20,778:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:22:20,836:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:22:20,836:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:22:20,844:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:22:20,844:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:22:20,862:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:22:20,862:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:22:20,867:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:22:41,808:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:22:41,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:22:41,812:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:22:41,813:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:22:41,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:22:41,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:22:41,926:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:22:41,972:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:22:41,972:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:22:41,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:22:41,980:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:22:42,002:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:22:42,002:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:22:42,007:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:23:39,801:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:23:39,802:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 03:23:39,802:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 03:23:39,830:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 03:23:39,839:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:23:39,840:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:23:39,840:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 03:23:39,922:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:23:39,923:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... .. ... ... 192 The long-lost wolf OR3, once thought dead, has... [the, long, lost, wolf, or3, once, thought, de... 193 The wolf OR3 has been spotted in the Southern ... [the, wolf, or3, has, been, spotted, in, the, ... 194 Wolf OR7 has pups in Siskiyou National Forest\... [wolf, or7, has, pups, in, siskiyou, national,... 195 When the wolves return to Western Oregon\n\nUn... [when, the, wolves, return, to, western, orego... 196 Boeing has rolled out its 10,000th 737 aircraf... [boeing, has, rolled, out, its, 10, 000th, 737... [197 rows x 2 columns] 2023-08-22 03:23:39,937:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 03:23:39,956:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 03:23:39,956:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 03:23:39,998:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 03:23:40,007:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 03:23:40,014:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 03:23:40,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 03:23:40,027:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 03:23:40,036:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 03:23:40,045:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 03:23:40,053:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 03:23:40,060:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 03:23:40,068:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 03:23:40,075:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 03:23:40,085:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 03:23:40,092:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 03:23:40,104:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 03:23:40,112:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 03:23:40,118:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 03:23:40,126:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 03:23:40,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 03:23:40,141:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 03:23:40,149:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:23:40,339:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 03:23:40,344:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 03:23:40,402:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:23:40,402:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 03:23:40,407:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:23:40,407:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 03:23:42,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:23:42,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0} 2023-08-22 03:23:42,378:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 03:25:10,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:25:10,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 03:25:10,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 03:25:10,838:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 03:25:10,842:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:25:10,843:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:25:10,844:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:25:10,866:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:25:10,866:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 https://eppc.org/wp-content/uploads/2017/07/sh... [https, eppc, org, wp, content, uploads, 2017,... 1 https://cdn.motor1.com/images/mgl/oRKO0/s1/lam... [https, cdn, motor1, com, images, mgl, orko0, ... 2 https://slidelegend.com/img/60x80/management-f... [https, slidelegend, com, img, 60x80, manageme... 3 https://www.henryusa.com/wp-content/uploads/20... [https, www, henryusa, com, wp, content, uploa... 4 https://www.henryusa.com/wp-content/uploads/20... [https, www, henryusa, com, wp, content, uploa... .. ... ... 243 http://www.gannett-cdn.com/-mm-/9065941e142eb7... [http, www, gannett, cdn, com, mm, 9065941e142... 244 https://cdn.businesstraveller.com/wp-content/u... [https, cdn, businesstraveller, com, wp, conte... 245 https://cdn.businesstraveller.com/wp-content/u... [https, cdn, businesstraveller, com, wp, conte... 246 https://cdn.businesstraveller.com/wp-content/u... [https, cdn, businesstraveller, com, wp, conte... 247 https://cdn.businesstraveller.com/wp-content/u... [https, cdn, businesstraveller, com, wp, conte... [248 rows x 2 columns] 2023-08-22 03:25:10,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 03:25:10,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 03:25:10,880:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 03:25:10,885:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 03:25:10,890:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 03:25:10,896:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 03:25:10,900:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 03:25:10,905:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 03:25:10,910:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 03:25:10,915:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 03:25:10,920:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 03:25:10,926:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 03:25:10,930:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 03:25:10,936:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 03:25:10,942:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 03:25:10,947:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 03:25:10,952:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 03:25:10,959:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 03:25:10,964:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 03:25:10,970:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 03:25:10,976:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 03:25:10,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 03:25:10,988:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:25:11,021:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 03:25:11,025:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 03:25:11,035:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:25:11,035:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 03:25:11,042:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:25:11,042:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 03:25:13,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:25:13,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0} 2023-08-22 03:25:13,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 03:25:56,596:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:25:56,599:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:25:56,599:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:25:56,599:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:25:56,601:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:25:56,602:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:25:57,094:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:25:57,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:25:57,136:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:25:57,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:25:57,142:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:25:57,161:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:25:57,161:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:25:57,165:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:26:11,107:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:11,122:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:11,122:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 198 }) 2023-08-22 03:26:11,126:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:11,128:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:11,129:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 198 }) 2023-08-22 03:26:11,179:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:26:11,219:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:26:11,219:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3243 0.061 the of 1571 0.030 of and 1519 0.029 and to 1419 0.027 to a 1187 0.022 a ... ... ... ... headache 1 0.000 headache hd 1 0.000 hd hazelnut 1 0.000 hazelnut hazards 1 0.000 hazards 豆漿 1 0.000 豆漿 [9083 rows x 3 columns] 2023-08-22 03:26:11,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:26:11,227:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one new 97 0.003 new people 95 0.003 people said 91 0.003 said also 87 0.003 also ... ... ... ... headache 1 0.000 headache hd 1 0.000 hd hazelnut 1 0.000 hazelnut hazards 1 0.000 hazards 豆漿 1 0.000 豆漿 [8862 rows x 3 columns] 2023-08-22 03:26:11,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:26:11,250:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:26:11,251:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:26:14,412:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:14,415:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:14,415:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:26:14,416:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:14,418:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:14,418:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:26:14,424:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:26:14,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:26:14,428:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 03:26:14,434:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:26:14,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 03:26:14,441:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:26:14,441:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:26:14,441:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:26:26,649:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:26,652:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:26,652:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 03:26:26,652:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:26,654:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:26,654:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 03:26:26,662:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:26:26,684:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:26:26,684:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 03:26:26,689:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:26:26,689:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 03:26:26,695:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:26:26,695:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:26:26,696:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:26:59,926:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:59,927:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 03:26:59,927:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 03:26:59,940:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 03:26:59,944:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:26:59,945:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:26:59,946:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 100 }) 2023-08-22 03:26:59,964:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:26:59,964:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 {"url": "https://eppc.org/publication/declarat... [url, https, eppc, org, publication, declarati... 1 {"url": "https://lamborghinichat.com/forum/new... [url, https, lamborghinichat, com, forum, news... 2 {"url": "https://slidelegend.com/the-influenci... [url, https, slidelegend, com, the, influencin... 3 {"url": "https://www.henryusa.com/flexforkal/"... [url, https, www, henryusa, com, flexforkal, w... 4 {"url": "https://www.arout.net/hotel-occupancy... [url, https, www, arout, net, hotel, occupancy... .. ... ... 95 {"url": "https://feifa.eu/spectacular-tax-savi... [url, https, feifa, eu, spectacular, tax, savi... 96 {"url": "http://blogs.reading.ac.uk/crg/advent... [url, http, blogs, reading, ac, uk, crg, adven... 97 {"url": "https://www.wired.com/2009/11/alt-tex... [url, https, www, wired, com, 2009, 11, alt, t... 98 {"url": "https://www.statesmanjournal.com/stor... [url, https, www, statesmanjournal, com, story... 99 {"url": "https://www.businesstraveller.com/bus... [url, https, www, businesstraveller, com, busi... [100 rows x 2 columns] 2023-08-22 03:26:59,975:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 03:26:59,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 03:26:59,977:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 03:26:59,983:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 03:26:59,988:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 03:26:59,993:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 03:26:59,999:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 03:27:00,003:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 03:27:00,009:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 03:27:00,015:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 03:27:00,023:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 03:27:00,030:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 03:27:00,035:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 03:27:00,042:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 03:27:00,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 03:27:00,053:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 03:27:00,059:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 03:27:00,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 03:27:00,073:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 03:27:00,079:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 03:27:00,084:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 03:27:00,090:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 03:27:00,095:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:27:00,133:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 03:27:00,137:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 03:27:00,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:27:00,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word main 200 0.055 main warc 200 0.055 warc cc 200 0.055 cc data 100 0.028 data warc_filename 100 0.028 warc_filename ... ... ... ... 20230209014102 1 0.000 20230209014102 20230208122053 1 0.000 20230208122053 20230208092053 1 0.000 20230208092053 20230208090523 1 0.000 20230208090523 zwillgen 1 0.000 zwillgen [1465 rows x 3 columns] 2023-08-22 03:27:00,152:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:27:00,152:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word main 200 0.061 main warc 200 0.061 warc cc 200 0.061 cc data 100 0.031 data warc_filename 100 0.031 warc_filename ... ... ... ... 20230209014102 1 0.000 20230209014102 20230208122053 1 0.000 20230208122053 20230208092053 1 0.000 20230208092053 20230208090523 1 0.000 20230208090523 zwillgen 1 0.000 zwillgen [1344 rows x 3 columns] 2023-08-22 03:27:03,671:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:27:03,671:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0} 2023-08-22 03:27:03,671:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 03:27:39,794:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:27:39,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 03:27:39,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100 }) 2023-08-22 03:27:39,875:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 03:27:39,889:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:27:39,891:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:27:39,891:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 247 }) 2023-08-22 03:27:39,923:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:27:39,923:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 https://eppc.org/wp-content/uploads/2017/07/sh... [https, eppc, org, wp, content, uploads, 2017,... 1 https://cdn.motor1.com/images/mgl/oRKO0/s1/lam... [https, cdn, motor1, com, images, mgl, orko0, ... 2 https://slidelegend.com/img/60x80/management-f... [https, slidelegend, com, img, 60x80, manageme... 3 https://www.henryusa.com/wp-content/uploads/20... [https, www, henryusa, com, wp, content, uploa... 4 https://www.henryusa.com/wp-content/uploads/20... [https, www, henryusa, com, wp, content, uploa... .. ... ... 242 https://www.saturdayeveningpost.com/wp-content... [https, www, saturdayeveningpost, com, wp, con... 243 http://feifa.eu/wp-content/uploads/2017/11/pig... [http, feifa, eu, wp, content, uploads, 2017, ... 244 https://www.chicagocondofinder.com/uploads/age... [https, www, chicagocondofinder, com, uploads,... 245 http://blogs.reading.ac.uk/crg/files/2015/12/M... [http, blogs, reading, ac, uk, crg, files, 201... 246 http://blogs.reading.ac.uk/crg/files/2015/12/M... [http, blogs, reading, ac, uk, crg, files, 201... [247 rows x 2 columns] 2023-08-22 03:27:39,935:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 03:27:39,939:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 03:27:39,939:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 03:27:39,945:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 03:27:39,950:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 03:27:39,956:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 03:27:39,962:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 03:27:39,968:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 03:27:39,973:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 03:27:39,978:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 03:27:39,985:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 03:27:39,990:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 03:27:39,995:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 03:27:40,001:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 03:27:40,007:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 03:27:40,012:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 03:27:40,018:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 03:27:40,024:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 03:27:40,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 03:27:40,041:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 03:27:40,048:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 03:27:40,057:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 03:27:40,065:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:27:40,090:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 03:27:40,093:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 03:27:40,099:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:27:40,099:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word https 218 0.068 https com 217 0.068 com jpg 194 0.061 jpg content 86 0.027 content files 82 0.026 files ... ... ... ... 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline 41 1 0.000 41 zwillgen 1 0.000 zwillgen [949 rows x 3 columns] 2023-08-22 03:27:40,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:27:40,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word https 218 0.073 https com 217 0.073 com jpg 194 0.065 jpg content 86 0.029 content files 82 0.027 files ... ... ... ... ftrct1nv 1 0.000 ftrct1nv 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline zwillgen 1 0.000 zwillgen [894 rows x 3 columns] 2023-08-22 03:27:42,117:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:27:42,117:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0} 2023-08-22 03:27:42,117:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 03:28:00,868:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:28:00,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:28:00,877:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:28:00,879:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:28:00,881:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:28:00,882:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:28:00,890:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:28:00,898:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:28:00,899:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 03:28:00,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:28:00,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 03:28:00,920:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:28:00,920:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:28:00,920:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:28:57,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:28:57,753:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:28:57,753:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:28:57,754:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:28:57,756:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:28:57,756:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:28:57,822:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:28:57,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:28:57,869:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:28:57,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:28:57,876:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:28:57,901:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:28:57,901:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:28:57,901:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:29:07,736:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:29:07,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:29:07,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:29:07,738:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:29:07,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:29:07,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:29:07,743:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:29:07,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:29:07,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 03:29:07,753:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:29:07,753:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 03:29:07,759:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:29:07,759:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:29:07,759:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:29:13,372:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:29:13,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:29:13,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 03:29:13,381:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:29:13,386:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:29:13,386:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 03:29:13,394:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:29:13,417:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:29:13,417:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 03:29:13,423:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:29:13,423:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 03:29:13,431:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:29:13,431:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:29:13,431:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:31:36,867:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:36,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:36,872:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:31:36,873:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:36,877:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:36,877:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 03:31:37,050:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:31:37,125:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:31:37,125:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 03:31:37,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:31:37,146:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 03:31:37,167:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:31:37,167:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 03:31:37,168:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:31:44,733:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:44,735:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 247 }) 2023-08-22 03:31:44,736:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:44,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:44,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 247 }) 2023-08-22 03:31:44,750:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:31:44,755:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:31:44,755:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab https 218 0.068 https com 217 0.068 com jpg 194 0.061 jpg content 86 0.027 content files 82 0.026 files ... ... ... ... 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline 41 1 0.000 41 zwillgen 1 0.000 zwillgen [949 rows x 3 columns] 2023-08-22 03:31:44,760:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:31:44,760:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab https 218 0.073 https com 217 0.073 com jpg 194 0.065 jpg content 86 0.029 content files 82 0.027 files ... ... ... ... ftrct1nv 1 0.000 ftrct1nv 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline zwillgen 1 0.000 zwillgen [894 rows x 3 columns] 2023-08-22 03:31:44,768:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:31:44,768:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:31:44,768:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:31:50,744:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:50,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:50,747:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 198 }) 2023-08-22 03:31:50,748:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:50,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:50,752:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 198 }) 2023-08-22 03:31:50,775:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:31:50,813:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:31:50,813:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3243 0.061 the of 1571 0.030 of and 1519 0.029 and to 1419 0.027 to a 1187 0.022 a ... ... ... ... headache 1 0.000 headache hd 1 0.000 hd hazelnut 1 0.000 hazelnut hazards 1 0.000 hazards 豆漿 1 0.000 豆漿 [9083 rows x 3 columns] 2023-08-22 03:31:50,819:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:31:50,819:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one new 97 0.003 new people 95 0.003 people said 91 0.003 said also 87 0.003 also ... ... ... ... headache 1 0.000 headache hd 1 0.000 hd hazelnut 1 0.000 hazelnut hazards 1 0.000 hazards 豆漿 1 0.000 豆漿 [8862 rows x 3 columns] 2023-08-22 03:31:50,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:31:50,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:31:50,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:31:55,979:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:55,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:55,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:31:55,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:31:55,983:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:31:55,983:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 03:31:55,996:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:31:56,002:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:31:56,002:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 03:31:56,008:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:31:56,008:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 03:31:56,018:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:31:56,018:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:31:56,018:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:32:00,195:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:32:00,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:32:00,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 247 }) 2023-08-22 03:32:00,197:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:32:00,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 03:32:00,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 247 }) 2023-08-22 03:32:00,203:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 03:32:00,207:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:32:00,207:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab https 218 0.068 https com 217 0.068 com jpg 194 0.061 jpg content 86 0.027 content files 82 0.026 files ... ... ... ... 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline 41 1 0.000 41 zwillgen 1 0.000 zwillgen [949 rows x 3 columns] 2023-08-22 03:32:00,212:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:32:00,212:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab https 218 0.073 https com 217 0.073 com jpg 194 0.065 jpg content 86 0.029 content files 82 0.027 files ... ... ... ... ftrct1nv 1 0.000 ftrct1nv 001 1 0.000 001 gaga 1 0.000 gaga gamecocksonline 1 0.000 gamecocksonline zwillgen 1 0.000 zwillgen [894 rows x 3 columns] 2023-08-22 03:32:00,219:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:32:00,219:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 03:32:00,219:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 03:46:12,512:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 03:46:12,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 03:46:12,513:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 100000 }) 2023-08-22 03:46:14,648:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 03:47:42,881:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:47:42,883:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... ... ... ... 213248 The Patriot batteries currently based in Turke... [the, patriot, batteries, currently, based, in... 213249 I came across a video on YouTube titled "Steve... [i, came, across, a, video, on, youtube, title... 213250 23, 1997." I believe any company that has hit ... [23, 1997, i, believe, any, company, that, has... 213251 Context. Supernova remnants are known to accel... [context, supernova, remnants, are, known, to,... 213252 “I love @webmasterjoe because he understood ho... [i, love, webmasterjoe, because, he, understoo... [213253 rows x 2 columns] 2023-08-22 03:47:42,914:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 03:48:12,832:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 03:48:12,833:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 03:48:48,989:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 03:48:53,239:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 03:48:56,456:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 03:48:59,612:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 03:49:02,710:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 03:49:05,864:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 03:49:08,888:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 03:49:11,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 03:49:14,966:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 03:49:17,991:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 03:49:21,019:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 03:49:24,090:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 03:49:27,145:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 03:49:30,235:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 03:49:33,291:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 03:49:36,310:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 03:49:39,328:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 03:49:42,355:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 03:49:45,413:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 03:49:48,480:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 03:50:12,056:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 03:50:12,436:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 03:50:15,360:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 03:50:15,360:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word the 3411314 5.941011e-02 the and 1579221 2.750310e-02 and of 1547877 2.695722e-02 of to 1545271 2.691184e-02 to a 1334475 2.324070e-02 a ... ... ... ... janneck 1 1.741561e-08 janneck jannazzo 1 1.741561e-08 jannazzo jannayak 1 1.741561e-08 jannayak jannatul 1 1.741561e-08 jannatul 𝟮𝟱 1 1.741561e-08 𝟮𝟱 [458951 rows x 3 columns] 2023-08-22 03:50:15,368:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 03:50:15,368:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word one 162630 5.280176e-03 one also 118340 3.842194e-03 also said 113455 3.683591e-03 said new 107660 3.495442e-03 new time 107410 3.487325e-03 time ... ... ... ... janneck 1 3.246742e-08 janneck jannazzo 1 3.246742e-08 jannazzo jannayak 1 3.246742e-08 jannayak jannatul 1 3.246742e-08 jannatul 𝟮𝟱 1 3.246742e-08 𝟮𝟱 [458695 rows x 3 columns] 2023-08-22 03:50:23,308:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 03:50:23,308:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0011676271846117192} 2023-08-22 03:50:23,308:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 03:52:07,968:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 03:52:07,970:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... ... ... ... 213248 The Patriot batteries currently based in Turke... [the, patriot, batteries, currently, based, in... 213249 I came across a video on YouTube titled "Steve... [i, came, across, a, video, on, youtube, title... 213250 23, 1997." I believe any company that has hit ... [23, 1997, i, believe, any, company, that, has... 213251 Context. Supernova remnants are known to accel... [context, supernova, remnants, are, known, to,... 213252 “I love @webmasterjoe because he understood ho... [i, love, webmasterjoe, because, he, understoo... [213253 rows x 2 columns] 2023-08-22 03:52:08,005:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 05:02:03,279:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:03,283:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:03,283:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:02:03,283:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:03,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:03,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:02:03,328:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:02:03,372:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:02:03,372:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 05:02:03,383:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:02:03,383:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 05:02:03,404:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:02:03,404:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 05:02:03,404:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:02:13,802:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:13,806:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:13,807:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 05:02:13,807:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:13,810:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:13,810:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 248 }) 2023-08-22 05:02:13,817:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:02:13,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:02:13,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab com 228 0.069 com https 217 0.066 https jpg 192 0.058 jpg content 87 0.026 content files 82 0.025 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [984 rows x 3 columns] 2023-08-22 05:02:13,824:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:02:13,825:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab com 228 0.074 com https 217 0.070 https jpg 192 0.062 jpg content 87 0.028 content files 82 0.027 files ... ... ... ... filters 1 0.000 filters fire 1 0.000 fire first 1 0.000 first 360x202 1 0.000 360x202 zwillgen 1 0.000 zwillgen [926 rows x 3 columns] 2023-08-22 05:02:13,830:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:02:13,830:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 05:02:13,830:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:02:20,019:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:20,022:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:20,022:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 05:02:20,022:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:02:20,023:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:02:20,024:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 05:02:20,031:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:02:20,051:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:02:20,051:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 05:02:20,057:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:02:20,057:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 05:02:20,063:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:02:20,063:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 05:02:20,063:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:10:16,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:10:16,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:268 Working with dataset: 2023-08-22 05:10:16,705:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:269 Dataset({ features: ['images', 'metadata', 'general_metadata', 'texts'], num_rows: 50000 }) 2023-08-22 05:10:17,725:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:264 Saving dataset to disk 2023-08-22 05:10:54,078:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 05:10:54,078:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... ... ... ... 106540 ...and thus releases an energy wave that disin... [and, thus, releases, an, energy, wave, that, ... 106541 I’ve officially kicked off a new project from ... [i, ve, officially, kicked, off, a, new, proje... 106542 Alex Espinoza is a software developer in Calif... [alex, espinoza, is, a, software, developer, i... 106543 Louise Hansen is a Danish retired association ... [louise, hansen, is, a, danish, retired, assoc... 106544 What Do You Think How Much Louise Hansen Has E... [what, do, you, think, how, much, louise, hans... [106545 rows x 2 columns] 2023-08-22 05:10:54,097:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 05:11:03,365:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:372 Calculating vocab afresh 2023-08-22 05:11:03,365:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:540 Fitting dummy tokenization to make matrix using the previous tokenization 2023-08-22 05:11:17,404:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 0 of 2000 vocab batches 2023-08-22 05:11:18,533:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 100 of 2000 vocab batches 2023-08-22 05:11:19,557:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 200 of 2000 vocab batches 2023-08-22 05:11:20,557:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 300 of 2000 vocab batches 2023-08-22 05:11:21,553:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 400 of 2000 vocab batches 2023-08-22 05:11:22,558:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 500 of 2000 vocab batches 2023-08-22 05:11:23,590:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 600 of 2000 vocab batches 2023-08-22 05:11:24,614:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 700 of 2000 vocab batches 2023-08-22 05:11:25,627:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 800 of 2000 vocab batches 2023-08-22 05:11:26,665:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 900 of 2000 vocab batches 2023-08-22 05:11:27,695:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1000 of 2000 vocab batches 2023-08-22 05:11:28,729:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1100 of 2000 vocab batches 2023-08-22 05:11:29,762:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1200 of 2000 vocab batches 2023-08-22 05:11:30,785:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1300 of 2000 vocab batches 2023-08-22 05:11:31,818:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1400 of 2000 vocab batches 2023-08-22 05:11:32,844:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1500 of 2000 vocab batches 2023-08-22 05:11:33,878:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1600 of 2000 vocab batches 2023-08-22 05:11:34,902:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1700 of 2000 vocab batches 2023-08-22 05:11:35,930:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1800 of 2000 vocab batches 2023-08-22 05:11:36,958:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:551 1900 of 2000 vocab batches 2023-08-22 05:11:49,185:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:374 Making dfs with proportion. 2023-08-22 05:11:49,420:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:378 Writing out. 2023-08-22 05:11:51,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:11:51,231:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab word the 1706746 5.944568e-02 the and 789170 2.748666e-02 and of 773350 2.693565e-02 of to 772115 2.689264e-02 to a 667183 2.323787e-02 a ... ... ... ... jhonny 1 3.482983e-08 jhonny jhonnylever 1 3.482983e-08 jhonnylever jhoola 1 3.482983e-08 jhoola jhoolas 1 3.482983e-08 jhoolas 𝟓 1 3.482983e-08 𝟓 [301235 rows x 3 columns] 2023-08-22 05:11:51,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:11:51,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab word one 81127 5.268675e-03 one also 59695 3.876805e-03 also said 56007 3.637293e-03 said new 53867 3.498314e-03 new time 53532 3.476558e-03 time ... ... ... ... jhonny 1 6.494355e-08 jhonny jhonnylever 1 6.494355e-08 jhonnylever jhoola 1 6.494355e-08 jhoola jhoolas 1 6.494355e-08 jhoolas 𝟓 1 6.494355e-08 𝟓 [300979 rows x 3 columns] 2023-08-22 05:11:56,602:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:11:56,602:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0010699704350274342} 2023-08-22 05:11:56,602:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:315 Preparing general stats 2023-08-22 05:12:35,290:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:477 tokenized df is 2023-08-22 05:12:35,290:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:478 text tokenized_text 0 One of the essential, if often unstated, job r... [one, of, the, essential, if, often, unstated,... 1 The buyer would get everything, including Lamb... [the, buyer, would, get, everything, including... 2 The agriculture sector as the main source of i... [the, agriculture, sector, as, the, main, sour... 3 management factors influencing open innovation... [management, factors, influencing, open, innov... 4 Henry Repeating Arms is proud to announce this... [henry, repeating, arms, is, proud, to, announ... ... ... ... 106540 ...and thus releases an energy wave that disin... [and, thus, releases, an, energy, wave, that, ... 106541 I’ve officially kicked off a new project from ... [i, ve, officially, kicked, off, a, new, proje... 106542 Alex Espinoza is a software developer in Calif... [alex, espinoza, is, a, software, developer, i... 106543 Louise Hansen is a Danish retired association ... [louise, hansen, is, a, danish, retired, assoc... 106544 What Do You Think How Much Louise Hansen Has E... [what, do, you, think, how, much, louise, hans... [106545 rows x 2 columns] 2023-08-22 05:12:35,301:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:480 Saving tokenized dataset to disk 2023-08-22 05:35:22,195:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:35:22,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:35:22,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:35:22,199:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:35:22,201:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:35:22,201:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:35:22,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:35:22,276:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:35:22,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 05:35:22,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:35:22,285:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 05:35:22,302:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:35:22,302:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 05:35:22,303:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:35:32,614:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:35:32,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:35:32,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 106545 }) 2023-08-22 05:35:32,639:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:35:32,648:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:35:32,648:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 106545 }) 2023-08-22 05:35:36,343:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:35:37,318:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:35:37,319:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 1706746 0.059 the and 789170 0.027 and of 773350 0.027 of to 772115 0.027 to a 667183 0.023 a ... ... ... ... jhonny 1 0.000 jhonny jhonnylever 1 0.000 jhonnylever jhoola 1 0.000 jhoola jhoolas 1 0.000 jhoolas 𝟓 1 0.000 𝟓 [301235 rows x 3 columns] 2023-08-22 05:35:37,324:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:35:37,324:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 81127 0.005 one also 59695 0.004 also said 56007 0.004 said new 53867 0.003 new time 53532 0.003 time ... ... ... ... jhonny 1 0.000 jhonny jhonnylever 1 0.000 jhonnylever jhoola 1 0.000 jhoola jhoolas 1 0.000 jhoolas 𝟓 1 0.000 𝟓 [300979 rows x 3 columns] 2023-08-22 05:35:38,103:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:35:38,103:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0010699704350274342, 'duplicates_dict': {"Get fresh music recommendations delivered to your inbox every Friday.\nWe've updated our Terms of Use. You can review the changes here.": 3, 'The Batman – watch the Bat and the Cat trailer': 2, 'END_OF_DOCUMENT_TOKEN_TO_BE_REPLACED': 69, 'A €500m aid package for EU farmers, a derogation from greening obligations and supports for feed and fertiliser are being considered by the European Commission.': 2, 'An 11-Year-Old Girl Advises Her Teacher On Punishment Methods – And...': 2, 'Molly grew up in California but now lives in the oh-so-amazing state of Texas with her husband, daughter, and fur babies. When she’s not diving into the world of her characters, some of her hobbies include hiking, snowboarding, traveling, and long walks on the beach … which roughly translates to being a homebody with her hubby and dishing out movie quotes. She has a weakness for crude-humored movies and fried pickles, and loves curling up in a fluffy comforter during a thunderstorm … or under one in a bathtub if there are tornados. That way she can pretend they aren’t really happening.': 2, 'The 9-year-old got into character, pairing her leather jacket and pants with Jackson’s own “Smooth Criminal” hat.': 2, "Highland's Maddie Dortch runs at the start of the race during the Triad Invitational on Wednesday, September 30, 2020 at Triad High School in Troy, Ill. Paul Halfacre, STLhighschoolsports.com": 2, 'After excellent first-cut silage crops, it is a case of keeping the shoulder to the wheel to ensure fodder reserves are met for the coming winter. Declan Marren reports.': 2, 'Scroll back to top': 2, "Already got the injury now what ☺️\n\nSuffer till it's better jk lol": 2, 'We will write the formula as below:': 2, 'There was an error retrieving images from Instagram. An attempt will be remade in a few minutes.': 3, 'In the meantime, learn about Mobile Workers Compensation below through our articles and write-up!': 2, "Lowe's in south Fort Myers is one of several area stores that have restocked on essentials to include water, gas containers and generators in preparation for Hurricane Dorian. A manager at the Lowe's said, if needed, they will ship supplies to stores in areas hardest hit by Hurricane Dorian. Kinfay Moroti/The News-Press USA Today Network-Florida\nFullscreen": 2, '80 Hindu couples tie the knot at mass wedding in Karachi': 2, 'This site uses Akismet to reduce spam. Learn how your comment data is processed.': 4, 'SEE ALL OF VELOCITY’S SUPERCARS AT PUKEKOHE HERE': 2, 'skip to main | skip to sidebar': 2, 'Posted 3 years ago by Yahoo': 2, 'Not since van Gogh lopped off his ear has an artist’s knife been put to such good use.—Tessa Laird\n\nNew Zealand collage artist Peter Madden draws much of his imagery from old issues of National Geographic. He plunders and reworks the magazine’s discredited ’empire of signs’ to forge his own. His surrealistic pictures, objects, and installations—with their watchmaker detail and intensity—have been described as ‘microcosms’ and ‘intricate kingdoms of flying forms’ Madden has one foot in the vanitas still-life tradition and the other in new-age thinking. On the one hand, he is death obsessed: a master of morbid decoupage. (Moths and butterflies—symbols of transient life—abound. His assemblages in bell jars suggest some Victorian taxidermist killing time in his parlour.) On the other hand, with his flocks, schools, and swarms of quivering animal energy, he revels in biodiversity and magic. Madden’s works manage to be at once morbid and abundant, rotting and blooming, creepy and fey. This book serveys Madden’s work of the last ten years': 2, 'Fallout 4: How to Get Vertibird Support': 2, 'For Fallout 4 on the PlayStation 4, a GameFAQs message board topic titled "Vertibirds going down constantly?".': 2, 'I am a committed Piano tutor and composer with over 15 years experience teaching a wide range of pupils from children to...': 2, 'We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.\nCookie SettingsAccept All\nManage consent\n\nThis website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.\nNecessary Always Enabled\nNecessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.\nFunctional\nFunctional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.\nPerformance\nPerformance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.\nAnalytics\nAnalytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.\nAdvertisement\nAdvertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.\nOthers\nOther uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.\nSAVE & ACCEPT': 2, 'Serbia signs Memorandum of Understanding with USAID on energy efficiency\n\nKeep up with the latest trends and news of the CEE energy market! Sign up for our newsletters to receive curated news across the energy agenda in 20+ countries in Central and South-eastern Europe.': 2, 'Concerns over effect of Rotorua plan': 2, 'Jet skier in our wake': 2, 'After breaking the partition, a sturdy metal frame in placed to ensure the upper part of the wall is safely supported and to facilitate access to the roof.': 2, 'From the window situated over the release module and behind glass we can watch the chicks without them seeing us.': 2, 'During the release process a young one-year old male from the wild population, visited the release module, attracted by the Colony Environment effect. It is probable that it is an individual from the urban centre of San Vicente where at least two pairs of lesser kestrel breed.': 2, 'I’ve had a long love of books, and some of my most prized books are art books. This is a review of books from my collection that can be found on shelves in my studio. I will provide links when possible.': 2, 'The Fairy Tales of Oscar Wilde': 2, "The West Side Lofts, a mixed-use development in the heart of Red Bank's antique district, brought a fresh infusion of downtown residents when it opened about four years ago. Tanya Breen\nFullscreen": 2, 'Interior of one of the apartments during the opening of Element, a new high-end 35 unit apartment complex along the Navesink River in Red Bank, NJ Wednesday May 29, 2019. Tanya Breen\nFullscreen': 2, 'How To Responsibly Donate To Ukrainian Causes': 2, 'The Subtle Violence Of So...': 2, 'Corona-virus: Fun things to do while social distancing': 2, 'Barcelona try to make up for Messi’s lost time': 2, 'The Milton and Tamar Maltz Performing Arts Center, located on East 105th Street and Ansel Road in Cleveland. Prior to being used by Case Western Reserve University, the building was The Temple-Tifereth Israel’s home until the 1970s.': 2, 'It was all over before I knew it and I just could not believe I could see almost perfectly straight after the surgery. Read more...': 2, 'Watch music on TV: AXS TV programming highlights for the week of April 15-21': 2, 'The BL King’s Topographical Collection: "THE NORTH-EAST VIEW OF SCALEBY-CASTLE, IN THE COUNTY OF CUMBERLAND. "': 2}} 2023-08-22 05:35:38,103:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:50:51,698:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:50:51,701:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:50:51,701:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:50:51,701:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:50:51,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:50:51,703:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 05:50:51,740:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:50:51,776:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:50:51,776:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 05:50:51,784:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:50:51,784:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 05:50:51,799:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:50:51,799:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 05:50:51,799:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 05:51:00,417:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:51:00,432:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:51:00,432:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 106545 }) 2023-08-22 05:51:00,432:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 05:51:00,440:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 05:51:00,440:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 106545 }) 2023-08-22 05:51:04,105:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 05:51:05,108:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 05:51:05,108:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 1706746 0.059 the and 789170 0.027 and of 773350 0.027 of to 772115 0.027 to a 667183 0.023 a ... ... ... ... jhonny 1 0.000 jhonny jhonnylever 1 0.000 jhonnylever jhoola 1 0.000 jhoola jhoolas 1 0.000 jhoolas 𝟓 1 0.000 𝟓 [301235 rows x 3 columns] 2023-08-22 05:51:05,113:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 05:51:05,113:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 81127 0.005 one also 59695 0.004 also said 56007 0.004 said new 53867 0.003 new time 53532 0.003 time ... ... ... ... jhonny 1 0.000 jhonny jhonnylever 1 0.000 jhonnylever jhoola 1 0.000 jhoola jhoolas 1 0.000 jhoolas 𝟓 1 0.000 𝟓 [300979 rows x 3 columns] 2023-08-22 05:51:05,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 05:51:05,435:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0010699704350274342, 'duplicates_dict': {"Get fresh music recommendations delivered to your inbox every Friday.\nWe've updated our Terms of Use. You can review the changes here.": 3, 'The Batman – watch the Bat and the Cat trailer': 2, 'END_OF_DOCUMENT_TOKEN_TO_BE_REPLACED': 69, 'A €500m aid package for EU farmers, a derogation from greening obligations and supports for feed and fertiliser are being considered by the European Commission.': 2, 'An 11-Year-Old Girl Advises Her Teacher On Punishment Methods – And...': 2, 'Molly grew up in California but now lives in the oh-so-amazing state of Texas with her husband, daughter, and fur babies. When she’s not diving into the world of her characters, some of her hobbies include hiking, snowboarding, traveling, and long walks on the beach … which roughly translates to being a homebody with her hubby and dishing out movie quotes. She has a weakness for crude-humored movies and fried pickles, and loves curling up in a fluffy comforter during a thunderstorm … or under one in a bathtub if there are tornados. That way she can pretend they aren’t really happening.': 2, 'The 9-year-old got into character, pairing her leather jacket and pants with Jackson’s own “Smooth Criminal” hat.': 2, "Highland's Maddie Dortch runs at the start of the race during the Triad Invitational on Wednesday, September 30, 2020 at Triad High School in Troy, Ill. Paul Halfacre, STLhighschoolsports.com": 2, 'After excellent first-cut silage crops, it is a case of keeping the shoulder to the wheel to ensure fodder reserves are met for the coming winter. Declan Marren reports.': 2, 'Scroll back to top': 2, "Already got the injury now what ☺️\n\nSuffer till it's better jk lol": 2, 'We will write the formula as below:': 2, 'There was an error retrieving images from Instagram. An attempt will be remade in a few minutes.': 3, 'In the meantime, learn about Mobile Workers Compensation below through our articles and write-up!': 2, "Lowe's in south Fort Myers is one of several area stores that have restocked on essentials to include water, gas containers and generators in preparation for Hurricane Dorian. A manager at the Lowe's said, if needed, they will ship supplies to stores in areas hardest hit by Hurricane Dorian. Kinfay Moroti/The News-Press USA Today Network-Florida\nFullscreen": 2, '80 Hindu couples tie the knot at mass wedding in Karachi': 2, 'This site uses Akismet to reduce spam. Learn how your comment data is processed.': 4, 'SEE ALL OF VELOCITY’S SUPERCARS AT PUKEKOHE HERE': 2, 'skip to main | skip to sidebar': 2, 'Posted 3 years ago by Yahoo': 2, 'Not since van Gogh lopped off his ear has an artist’s knife been put to such good use.—Tessa Laird\n\nNew Zealand collage artist Peter Madden draws much of his imagery from old issues of National Geographic. He plunders and reworks the magazine’s discredited ’empire of signs’ to forge his own. His surrealistic pictures, objects, and installations—with their watchmaker detail and intensity—have been described as ‘microcosms’ and ‘intricate kingdoms of flying forms’ Madden has one foot in the vanitas still-life tradition and the other in new-age thinking. On the one hand, he is death obsessed: a master of morbid decoupage. (Moths and butterflies—symbols of transient life—abound. His assemblages in bell jars suggest some Victorian taxidermist killing time in his parlour.) On the other hand, with his flocks, schools, and swarms of quivering animal energy, he revels in biodiversity and magic. Madden’s works manage to be at once morbid and abundant, rotting and blooming, creepy and fey. This book serveys Madden’s work of the last ten years': 2, 'Fallout 4: How to Get Vertibird Support': 2, 'For Fallout 4 on the PlayStation 4, a GameFAQs message board topic titled "Vertibirds going down constantly?".': 2, 'I am a committed Piano tutor and composer with over 15 years experience teaching a wide range of pupils from children to...': 2, 'We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept All”, you consent to the use of ALL the cookies. However, you may visit "Cookie Settings" to provide a controlled consent.\nCookie SettingsAccept All\nManage consent\n\nThis website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.\nNecessary Always Enabled\nNecessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.\nFunctional\nFunctional cookies help to perform certain functionalities like sharing the content of the website on social media platforms, collect feedbacks, and other third-party features.\nPerformance\nPerformance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.\nAnalytics\nAnalytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics the number of visitors, bounce rate, traffic source, etc.\nAdvertisement\nAdvertisement cookies are used to provide visitors with relevant ads and marketing campaigns. These cookies track visitors across websites and collect information to provide customized ads.\nOthers\nOther uncategorized cookies are those that are being analyzed and have not been classified into a category as yet.\nSAVE & ACCEPT': 2, 'Serbia signs Memorandum of Understanding with USAID on energy efficiency\n\nKeep up with the latest trends and news of the CEE energy market! Sign up for our newsletters to receive curated news across the energy agenda in 20+ countries in Central and South-eastern Europe.': 2, 'Concerns over effect of Rotorua plan': 2, 'Jet skier in our wake': 2, 'After breaking the partition, a sturdy metal frame in placed to ensure the upper part of the wall is safely supported and to facilitate access to the roof.': 2, 'From the window situated over the release module and behind glass we can watch the chicks without them seeing us.': 2, 'During the release process a young one-year old male from the wild population, visited the release module, attracted by the Colony Environment effect. It is probable that it is an individual from the urban centre of San Vicente where at least two pairs of lesser kestrel breed.': 2, 'I’ve had a long love of books, and some of my most prized books are art books. This is a review of books from my collection that can be found on shelves in my studio. I will provide links when possible.': 2, 'The Fairy Tales of Oscar Wilde': 2, "The West Side Lofts, a mixed-use development in the heart of Red Bank's antique district, brought a fresh infusion of downtown residents when it opened about four years ago. Tanya Breen\nFullscreen": 2, 'Interior of one of the apartments during the opening of Element, a new high-end 35 unit apartment complex along the Navesink River in Red Bank, NJ Wednesday May 29, 2019. Tanya Breen\nFullscreen': 2, 'How To Responsibly Donate To Ukrainian Causes': 2, 'The Subtle Violence Of So...': 2, 'Corona-virus: Fun things to do while social distancing': 2, 'Barcelona try to make up for Messi’s lost time': 2, 'The Milton and Tamar Maltz Performing Arts Center, located on East 105th Street and Ansel Road in Cleveland. Prior to being used by Case Western Reserve University, the building was The Temple-Tifereth Israel’s home until the 1970s.': 2, 'It was all over before I knew it and I just could not believe I could see almost perfectly straight after the surgery. Read more...': 2, 'Watch music on TV: AXS TV programming highlights for the week of April 15-21': 2, 'The BL King’s Topographical Collection: "THE NORTH-EAST VIEW OF SCALEBY-CASTLE, IN THE COUNTY OF CUMBERLAND. "': 2}} 2023-08-22 05:51:05,436:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 06:07:27,906:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:07:27,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:07:27,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 06:07:27,908:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:07:27,909:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:07:27,909:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 06:07:27,946:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 06:07:27,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 06:07:27,981:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 06:07:27,989:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 06:07:27,989:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 06:07:28,005:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 06:07:28,005:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 06:07:28,005:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 06:07:45,778:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:07:45,783:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:07:45,783:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 06:07:45,783:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:07:45,787:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:07:45,787:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 06:07:45,795:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 06:07:45,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 06:07:45,815:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 06:07:45,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 06:07:45,820:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 06:07:45,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 06:07:45,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 06:07:45,827:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/data-measurements-tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 06:31:54,134:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:31:54,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:31:54,144:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 06:31:54,149:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:31:54,151:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:31:54,151:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 10944 }) 2023-08-22 06:31:54,186:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 06:31:54,222:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 06:31:54,222:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 6770 0.037 the to 4703 0.026 to i 4577 0.025 i and 4317 0.024 and a 4006 0.022 a ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16372 rows x 3 columns] 2023-08-22 06:31:54,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 06:31:54,229:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab white 1273 0.014 white like 793 0.009 like people 617 0.007 people one 521 0.006 one youtube 516 0.006 youtube ... ... ... ... hose 1 0.000 hose hospitalised 1 0.000 hospitalised hospitality 1 0.000 hospitality hostages 1 0.000 hostages 采用左眼专利技术 1 0.000 采用左眼专利技术 [16136 rows x 3 columns] 2023-08-22 06:31:54,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 06:31:54,246:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.016812865497076057, 'duplicates_dict': {'In order to help increase the booklets downloads , it would be great if all Stormfronters who had YouTube accounts , could display the following text in the description boxes of their uploaded YouTube videos .': 2, 'Click below for a FREE download of a colorfully illustrated 132 page e-book on the Zionist-engineered INTENTIONAL destruction of Western civilization .': 2, 'Then why not simply copy this text ( & links ) and paste it into the description box of your YouTube videos ?': 2, "Y' all have a nice day .": 2, 'Good.': 2, 'Thanks.': 3, 'LOL': 2, 'Interesting .': 2, 'Aragorn': 9, 'Cheers': 2, '16': 4, 'Indeed.': 4, 'No .': 3, '88': 3, "This is from the back-file at Stormfront 's Advanced Scout forum which is solely devoted to promoting the Pioneer Litte Europe strategy .": 2, "This is from Stormfront 's Advanced Scout forum which is solely devoted to promoting PLE - and its local militant front Legionism .": 2, 'This is my next read .': 2, "I 'm going to carefully study it so hopefully I 'll have some decent input to contribute .": 2, 'Nope.': 2, "Guess who picked Canada 's first batch of refugees from Turkey ?": 2, '- YouTube': 12, 'Same here .': 3, 'Really ?': 2, 'Yeah.': 3, '14/88': 7, 'Agreed.': 6, 'Absolutely.': 4, '2508': 21, 'Thank you .': 4, 'Karina Sorensen is a member of the Danish Peoples Party , a pro-Danish heritage party .': 2, 'Susan': 2, 'Wolf 278': 3, 'Well done .': 3, 'Really?': 3, 'Thank you': 2, 'Exactly.': 8, 'Thanks for posting .': 2, 'Source': 2, 'No.': 2, 'Sad but true .': 2, 'God Bless': 3, 'To learn more , click here : www.jonasridgeway.com/tech2.html www.spiritual.com.au/astral.html www.astralweb.org www.near-death.com/experiences/cayce01.html www.astralvoyage.com/projection/index.html It would be pretty cool for teachers to tell White kids they go out-of-body every night .': 2, 'Nice .': 2, '88 !': 2, 'Good luck !': 3, "Hell I 'd rather clean a dozen tiolets a day then have to walk into a bathroom and see some mongrel standing there with a mop giving me a glassy-eyed stare .": 3, '1 .': 2, '2 .': 3, '-Yankee Jim': 3, 'CF': 3, 'Thanks !': 4, 'David ( SS )': 2, 'Period .': 2, 'YouTube - Broadcast Yourself .': 15, 'I never attacked you and i am not your son plus i dont care what you done.P M if you have a problem': 2, 'Hails': 2, 'Greetings all .': 2, 'Thanks .': 4, 'Whites shoudl clean their own mess !': 2, 'Getting someone else to do our dirty work got us into the mess we are in today .': 2, 'It is a noble idea but I would rather stay here in mostly white Missouri then move to 35 % black South Carolina .': 3, 'This post by Jack boot and the piece by Marc Moran , are very inspiring .': 2, 'They make you want to go out and reach people .': 2, "I was very inspired by Moran 's writing .": 2, "Until we have a dedicated cable television station , there 's always the opportunity to use the free cable access airwaves as per this thread : Make a Cable Access TV Program !": 2, 'I have never seen an Asian woman dating an Asian man.They like the white man they re-force wiggers and our detrimental towards our cause.Cpamikei': 2, 'Thank you for posting this .': 2, 'Peter the Great .': 3, '?': 2, 'Wow!': 5, 'Camie': 2, 'I clean my own toilet as for public toilets and such there is no dirty work there it is an Economical contribution 2508': 2, 'Welcome to Stormfront .': 2, 'Japan REACTOR - RODS MELT !': 2, 'NUCLEAR PLANT HAS FULLY EXPLODED !': 2, '11': 2, 'I agree .': 2, "I do n't think so .": 2, 'Just saying .': 2, 'Hello.': 2, 'Tereasa': 2, 'Thank you !': 2, 'Amen .': 2, '-Zoë': 2, 'I usually know when friends or relatives are distressed through illness or accident/incident.I then phone them and find out what has happened .': 2, 'But what do you say to the ivy who wants to become a tree ??': 2, 'Or for that matter thinks he is a tree ?': 2, '32': 2, 'Erik': 2}} 2023-08-22 06:31:54,247:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats 2023-08-22 06:32:07,238:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:32:07,244:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:32:07,244:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 06:32:07,245:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:252 Doing text dset. 2023-08-22 06:32:07,249:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:256 Loaded dataset from disk 2023-08-22 06:32:07,249:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:257 Dataset({ features: ['text'], num_rows: 197 }) 2023-08-22 06:32:07,256:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:365 Reading vocab from cache 2023-08-22 06:32:07,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:380 unfiltered vocab 2023-08-22 06:32:07,277:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:381 count proportion vocab the 3279 0.061 the of 1582 0.029 of and 1524 0.028 and to 1439 0.027 to a 1189 0.022 a ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [9145 rows x 3 columns] 2023-08-22 06:32:07,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:382 filtered vocab 2023-08-22 06:32:07,282:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:383 count proportion vocab one 139 0.005 one people 99 0.003 people said 92 0.003 said new 90 0.003 new also 86 0.003 also ... ... ... ... heartening 1 0.000 heartening healthcare 1 0.000 healthcare healing 1 0.000 healing heady 1 0.000 heady 豆漿 1 0.000 豆漿 [8923 rows x 3 columns] 2023-08-22 06:32:07,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:303 Duplicates results: 2023-08-22 06:32:07,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:304 {'duplicate_fraction': 0.0, 'duplicates_dict': {}} 2023-08-22 06:32:07,289:/Users/ezi/Desktop/HF/DMT/data-measurements-tool/DMT2023/launch/IDEFICS_Data_Measurement_Tool/data_measurements/dataset_statistics.py, dataset_statistics:312 Loading cached general stats